Quick Navigation Bar records and references :: regular expressions :: files and directories [ toc | forums ] |
Note: If the document URL does not begin with https://randu.org/tutorials/perl/ then you are viewing a copy. Please direct your browser to the correct location for the most recent version. |
while (<>) { if (/abc/) { print $_; } }The
/abc/
is the regular expression or pattern we wish
to match. This code block would loop through standard in and process
any matches such as: abc, abcabc, abcabcabc. A
similar grep statement would be: grep abc file.txt
./[aeiouAEIOU]/Would match any word with a vowel in it (which I believe are all words in the English language, except for those containing "y"). Notice character classes begin and end with brackets,
[ ]
./[a-zA-Z0-9]/Would match any alphanumeric "word". If you wanted to match a literal -, you must escape it by using the backslash, \.
/[.\n]/Would match ALL characters, because you specified a period and then a newline. You can also negate character classes by placing a circumflex inside of the character class:
/[^0-9]/Would match anything not containing a numeric digit.
\d # equivalent to [0-9] \D # equivalent to [^0-9] \w # equivalent to [a-zA-Z0-9_] \W # equivalent to [^a-zA-Z0-9_] \s # equivalent to [ \r\t\n\f] (whitespace) \S # equivalent to [^ \r\t\n\f]
/abc*/ /a+bc/ /ab?c/The first regex matches an a, then a b then zero or more c's. The second regex matches one or more a's followed by a b and a c. The last regex matches an a, then zero or one b's and then a c. You can even specify the number of matches:
/a.{2,4}c/ /xyz{3}/ /j{2,}kl/The brackets on the first one specify that you must match a minimum of 2 of any character, up to 4. in between an a and a c. The second one says that the string must contain an x, y, and exactly 3 z's. The last one specifys at least 2 or more j's followed by a k and an l. So here's an equivalence chart:
.* # same as .{0,} .+ # same as .{1,} .? # same as .{0,1}
/a(.*)b\1c/Matches an a followed by zero or more any non-newline characters and then the
\1
specifies the same regex (.*)
followed by a c. So this would match, for example: azzzzbzzzzc, but
not azzzzbyyyyc. You can have more than one parethesized part of
a regex, you can specify other ones by \2, \3 and so on, numbered
from the left./abc|jkl|xyz/Would match "abc", "jkl", or "xyz".
/\bmo/ # Matches anything starting with mo /^mo/ # same thing /mo\b/ # Matches anything that ends with mo /mo$/ # same thing
/B
is the opposite of /b, matches when there is no
word boundary.?=
, ?!
,
?<=
, ?<!
are lookarounds and lookaheads.
See Wall pg. 203-204 for more information.$_
,
you can change the target of the regex by using the =~
operator:
$a = "hello world"; $a =~ /^he/; # true
if (=~ /^y/i) { # line begins with y, let's do something with it }
$match = "this"; $target = "This sentence contains this word."; if ($target =~ /$match/) { # do stuff }
$_ = "foot fool buffoon"; s/foo/bar/;Now $_ contains "bart fool buffoon". What if we wanted to match ALL instances of foo? Append "g" to the matching operator:
s/foo/bar/g;would do the trick. $_ contains "bart barl bufbarn".
$line = "user:600:100:/home/user:/usr/bin/perl"; @fields = split(/:/, $line);So now we have a list called @fields that contains each part of $line. The /:/ denotes that we will use : as the delimiter. Note that the /:/ is a regex!
$newline = join(":", @fields);Now $newline is a string with the fields list delimited by :.
$_ = "fred and barney"; tr/fb/bf/;$_ now contains "bred and farney". You can even append "d" to the end of the pattern operator to delete patterns not matched. (This could be useful to remove extraneous characters: ^M for example from dos text files.
sub by_number { if ($a < $b) { return -1; } elsif ($a == $b) { return 0; } elsif ($a > $b) { return 1; } } @sortedlist = sort by_number @list;Seems pretty simple. But Perl provides for an even easier sorting method. Because this three-way evaluation happens regularly with routines like sorting, Perl has a built-in operator
<=>
or the spaceship operator. So we can rewrite this as:
@sortedlist = sort { $a <=> $b } @list;Very simple! There is a comparable operator for string scalars, it is called
cmp
instead of <=>
(but
sort automatically performs an ASCII-based sort by default).
Notice: Please do not replicate or copy these pages and
host them elsewhere. This is to ensure that the latest version can always
be found here.
Disclaimer: The document author has published these pages
with the hope that it may be useful to others. However, the document
author does not guarantee that all information contained on these
webpages are correct or accurate. There is no warranty, expressed or
implied, of merchantability or fitness for any purpose. The author does
not assume any liability or responsibility for the use of the information
contained on these webpages.
If you see an error, please send an email to the address below indicating
the error. Your feedback is greatly appreciated and will help to
continually improve these pages.