Thursday, July 14, 2011

I Love Regex

regex is awesome. Its powerful, its fun, and it can make you extremely efficient. There are 2 places where regex is most needed.

1) fixing code.
You've seen maybe 3 videos on my youtube channel about using regex with text wrangler.  I don't know about you, but I have to fix code sometimes.  And sometimes, it requires replacing some text a thousand times.  Or say you have a long list of stuff, and you wanna make it into a comma list.  That regex is simply

Find:
\r
Replace with:
,

So, it's quite often that I need to replace a bunch of stuff in some HTML or PHP, or even CSS and knowing regex makes this task very simple.  If you didn't know regex, you would either have to pay some kid to do the same task 1000 times, or you would have to waste your time doing it.  Or, if you didn't know how to use regex in your text editor, you'd have to write a script to do the replacement for you.  So get to know your text editor. This function is also called "grep".

2) Pattern Matching
This is pretty generic.  This can mean anything from page scraping, to dom crawling, to email syntax checking, to curse word validation.  It's all pattern matching.
There are 2 extremely useful regex you should know about.

1) match until character.  This is great for DOM crawling.  But it's also great for email addresses.  The regex is [^c]+  Where "c" is the character your are matching until.  So to match every div on the page, you would do div[^>]+  That would match <div however many times it was on your page. If you wanted the closing > you would just add it on.  div[^>]+>  ta da.

2) match until string.  This is extremely useful, and a lot of people never know about it. Lets say you have a div of class "hello" and inside maybe 20 divs down, is a class "world"  You can't use the "match until character" to find "world".  You would use this awesome regex.

hello(?:(?!world).)+

Complicated right?  yea, well just copy and paste it when you need to use it.  Lets explain.

  1. We start with hello. Pretty simple.

  2. Then we first use a negated lookahead.  (?!world).  Which basically says "match one time the string "world" NOT followed by anything. (which is the .)  So again, were matching 1 time, the word "world" that is NOT followed by anything.

  3. Then, we need to not just match 1 time. That would just give us 1 character. We need to match everything until we get there. Just like the "match until character" had the +, we need a + here as well.

  4. But you can't just add a +.  And you can't wrap the whole thing in square brackets like a normal regex.  So we need to wrap in parentheses so we can use the +  ((?!word).)+

  5. BUT, we have a problem that we just used parentheses, which are going to capture the result.  So we now need to NON CAPTURE those parentheses so they stay out of the way.
    (?:(?!world).)+

And that is the "match until string".

You could also use another regex instad of a string.  say we wanted to find hello world or hello planet.
(?:(?!world|planet).)+

Heres a video explaining this if I didn't do a good job writing it.

No comments:

Post a Comment