Saturday, September 27, 2008

expression tricks2

RegEx is exactly the same idea, but the idea has been developed to allow just about any kind of text searching you can imagine to be done. Andrew Watt wrote an 800 page book ( that he called, Beginning Regular Expressions. I wonder how many pages it it would take to write the "Advanced" version.

The big problem with RegEx's is that they are sort of a "write-only" language. A RegEx that does something meaningful, (such as this example to match patterns for US telephone numbers (credit to author Jesse Sweetland) ...

^1?\s*-?\s*(\d{3}|\(\s*\d{3}\s*\))\s*-?\s*\d{3}\s*-?\s*\d{4}$

... can be hard for anyone to figure out how it works after it's written. Don't be fooled because this is just one line of code. It might take even an experienced programmer quite a while to craft something like this. (I'll use this RegEx in some VB code in a few paragraphs.)

For this reason, there are a lot of utilities that you can download that help you figure out what a regex is doing. Some examples:

  • RegexBuddy from JGsoft
  • RegexDesigner .NET from Chris Sells

Historical Sidetrack ...

A lot of people (myself included) get confused right away because we don't even understand why these things are called 'regular expressions'. Just to clear this out of the way so we can get to the important stuff, the term was first used by the American mathematician Stephen Kleene. For him, it was a branch of mathematics and he figured out math rules that make it work. For programmers, it's just a name, so call them "widgets" or "thingies" if it will help you understand better.

But if you want to get geek points for knowing obscure stuff, the '*' character that we usually call a "wildcard" is sometimes called a Kleene star in academic circles and - here's a good one - Kleene pronounced his last name klay'nee. His son, Ken Kleene, wrote: "As far as I am aware this pronunciation is incorrect in all known languages. I believe that this novel pronunciation was invented by my father." Dr. "Clay Knee" must have been a geek of the first water, to be sure!




No comments: