|
Quick Navigation Bar records and references :: regular expressions :: files and directories [ toc | forums ] |
Note: If the document URL does not begin with http://randu.org/tutorials/perl/ then you are viewing a copy. Please direct your browser to the correct location for the most recent version. |
while (<>) {
if (/abc/) {
print $_;
}
}
The /abc/ is the regular expression or pattern we wish
to match. This code block would loop through standard in and process
any matches such as: abc, abcabc, abcabcabc. A
similar grep statement would be: grep abc file.txt./[aeiouAEIOU]/Would match any word with a vowel in it (which I believe are all words in the English language, except for those containing "y"). Notice character classes begin and end with brackets,
[ ]./[a-zA-Z0-9]/Would match any alphanumeric "word". If you wanted to match a literal -, you must escape it by using the backslash, \.
/[.\n]/Would match ALL characters, because you specified a period and then a newline. You can also negate character classes by placing a circumflex inside of the character class:
/[^0-9]/Would match anything not containing a numeric digit.
\d # equivalent to [0-9] \D # equivalent to [^0-9] \w # equivalent to [a-zA-Z0-9_] \W # equivalent to [^a-zA-Z0-9_] \s # equivalent to [ \r\t\n\f] (whitespace) \S # equivalent to [^ \r\t\n\f]
/abc*/ /a+bc/ /ab?c/The first regex matches an a, then a b then zero or more c's. The second regex matches one or more a's followed by a b and a c. The last regex matches an a, then zero or one b's and then a c. You can even specify the number of matches:
/a.{2,4}c/
/xyz{3}/
/j{2,}kl/
The brackets on the first one specify that you must match a minimum
of 2 of any character, up to 4. in between an a and a c. The second
one says that the string must contain an x, y, and exactly 3 z's. The
last one specifys at least 2 or more j's followed by a k and an l.
So here's an equivalence chart:
.* # same as .{0,}
.+ # same as .{1,}
.? # same as .{0,1}
/a(.*)b\1c/Matches an a followed by zero or more any non-newline characters and then the
\1 specifies the same regex (.*)
followed by a c. So this would match, for example: azzzzbzzzzc, but
not azzzzbyyyyc. You can have more than one parethesized part of
a regex, you can specify other ones by \2, \3 and so on, numbered
from the left./abc|jkl|xyz/Would match "abc", "jkl", or "xyz".
/\bmo/ # Matches anything starting with mo /^mo/ # same thing /mo\b/ # Matches anything that ends with mo /mo$/ # same thing
/B is the opposite of /b, matches when there is no
word boundary.?=, ?!,
?<=, ?<! are lookarounds and lookaheads.
See Wall pg. 203-204 for more information.$_,
you can change the target of the regex by using the =~
operator:
$a = "hello world"; $a =~ /^he/; # true
if (=~ /^y/i) { # line begins with y, let's do something with it }
$match = "this";
$target = "This sentence contains this word.";
if ($target =~ /$match/) {
# do stuff
}
$_ = "foot fool buffoon"; s/foo/bar/;Now $_ contains "bart fool buffoon". What if we wanted to match ALL instances of foo? Append "g" to the matching operator:
s/foo/bar/g;would do the trick. $_ contains "bart barl bufbarn".
$line = "user:600:100:/home/user:/usr/bin/perl"; @fields = split(/:/, $line);So now we have a list called @fields that contains each part of $line. The /:/ denotes that we will use : as the delimiter. Note that the /:/ is a regex!
$newline = join(":", @fields);
Now $newline is a string with the fields list delimited by :.$_ = "fred and barney"; tr/fb/bf/;$_ now contains "bred and farney". You can even append "d" to the end of the pattern operator to delete patterns not matched. (This could be useful to remove extraneous characters: ^M for example from dos text files.
sub by_number {
if ($a < $b) {
return -1;
} elsif ($a == $b) {
return 0;
} elsif ($a > $b) {
return 1;
}
}
@sortedlist = sort by_number @list;
Seems pretty simple. But Perl provides for an even easier sorting
method. Because this three-way evaluation happens regularly with
routines like sorting, Perl has a built-in operator <=>
or the spaceship operator. So we can rewrite this as:
@sortedlist = sort { $a <=> $b } @list;
Very simple! There is a comparable operator for string scalars,
it is called cmp instead of <=> (but
sort automatically performs an ASCII-based sort by default).
