Regex for Dummies: Day 3

We’re going to start reviewing real-world examples. We’ll begin by learning the correct syntax for matching an email address. In day four, we’ll take things a step further and implement this code into a PHP and Javascript app.

Day 3: Validating Email Addresses

Be sure to click on the “Full Screen Toggle”.

Quiz

1: When validating email addresses, why is it better to use the range {2,4} to match the extension (.com, .org, etc.), rather than using the more simple plus (+) symbol?

2: How would I match a block of text that is a minimum of five characters and a maximum of ten? Think about this one; there’s more to it than just the range.

3: How do we turn the ? symbol from greedy to lazy? What are the differences between the two?



29

Comments
  • Ragnar Þór Valgeirsson says:

    Great series! but I’m a little confused… where do use Regular Expressions?

  • Paperboy says:

    There are addresses like me@wherever.co.uk too with an extra dot that needs to be taken into consideration too nah?

  • Thanks for this great series. I’ve always side stepped actually having to understand those colourful strings of letters and symbols.

    The only problem I can see with the above is that it doesn’t account for double barreled extensions such as .co.uk or .ac.uk

  • Allan says:

    there are domains with numbers as well (subdomains ;) )

  • Elijah Manor says:

    You’ve been tweeted (a good thing) – Tweetback from @elijahmanor

    http://twitter.com/elijahmanor/status/1426146307

  • dreake says:

    Hy Jeff!
    Thanks for great video series.
    You forgave about number in e-mails.
    For example: dan67e5@nettuts.com

  • Another great tutorial :D Do you know any easy way to take a textarea input from a webpage then add tags with regular expressions? :P

  • Andrew says:

    Great video; to match a few extra characters, I came up with this:

    \b[\w-\.\+]+@[\w-\.]+\.[A-Za-z_-]{2,4}\b

    Which matches:

    whatever@andrewburgess.otherinbox.com

    a06p18b@gmail.com

    a06p18b+regex@gmail.com

    3×4mp1e@4ddr355.com.com.com

  • Meshach says:

    Hey Jeffrey, thanks for this awesome screencast!

  • Willabee says:

    @Paperboy – use:

    \b[\w-]+@[\w-]+(\.[A-Za-z_-]{2,4})+\b

    to test common two letter DNS addresses like my-name@someDomain.co.au

    Notice the parenthesis around the extension test followed by the + symbol allowing for one or more DNS extensions.

  • Jeff Adams says:

    Thanks Jeff!

    I must admit, these still freak me out a bit being hisorically a non-programmer but it really helps to understand what is possible with php, no to mention javascript or jQuery integration.

    I’m gonna kick ass with some my smaller projects, I really recommend others try to either follow along or pick a sample project to utilise these, it helps for me.

    Jeff – I wonded if you could do a tutorial on regex in relation to limiting uploaded file extension. i know you’ve done this in another screencast but it’d be a neat addition and much needed I reckon!

  • If you need to match a domain like .com.au or .co.uk, this works for me

    \b[\w-]+@[\w-]+\.[a-zA-Z]{2,4}[\.a-zA-Z]*\b

  • Michael says:

    I have a fair amount of experience with regular expression so I feel obligated to point out to new regular expression user that you can’t validate an email address with 100% accuracy. Just about any email pattern is going to be flawed. Many by a fairly large degree. A pattern that matches email mailbox according to the specifications (RFC2822) with a high degree of accuracy is several thousands of characters long and not practical to use.

    So don’t rely on any pattern of this type to be rock solid. Most are probably 60% accurate at best

    I don’t won’t to hijack this tutorial so I won’t go into details since it’s a bit advanced. But if someone really wants more details I can provide links to more detailed information.

    Also Jeff I don’t know if you mentioned it but another thing for new user to be aware of is regular expression syntax and features don’t port across all languages that support them so if your are working in a different language things may work differently or not at all.

  • Thomas says:

    Michael, you are right. I faced the same problem once and I found this hilariously gigantic regular expression for matching email addresses:

    http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html

    However, I think you can get a better rate than 60% with a simple regular expression. It depends on what you actually want to achieve. In most cases you know that an existing valid email addresses is placed somewhere within a string, so you don’t really have to care about strict RFC822-conformance when applying your RegEx.

    @Jeffry: there are indeed top level domains with more than 4 characters, such as .museum:
    http://en.wikipedia.org/wiki/Template:Generic_top-level_domains

  • Yoosuf says:

    i have no comments, its a good series

  • Mike says:

    Thanks , it’s very helps me . And I’m already use it in my small project .

  • Mike says:

    is the diving into php series all over?

  • Michael says:

    @Thomas
    I wasn’t saying you couldn’t do better than 60%, which is a total guesstamation on my part, I was say most, not all, of the patterns I’ve seen don’t do a much better job. In practice you may encounter a high percentage of matches that’s more dumb luck that correctness. Even not following strict compliance by ignoring things like whitespaces and comments many email matching patterns leave out allowable characters and will reject perfectly valid address. Plus many don’t allow for allowable syntax. Just glancing at the patterns on this page only Andrew’s would accept John.Q.Public@example.com which is valid but that same pattern would accept J…n@example.com which isn’t. It looks all like the other patterns here would reject both not just the bad one.

    I’m not suggestion any should try write a pattern to match the RFC, in fact having done it I recommend that you don’t, but just be aware there will be holes in your pattern. The simpler the pattern the more holes there will be. I’ve often recommended to not use a regular expression to match email unless you are OK with there being a noticeable degree potential of failure of at least 20%.

  • Kevin says:

    Great idea for a series, just great.

    There are some government domains in Ontario that even have triple nested extensions. Primarily the Government of Ontario Canada whose domain names follow the schema ‘http://WEBSITE.gov.on.ca’ and so will the ontario government email addresses. politician@health.gov.on.ca

    Knowing this I suppose I could just add a secondary regex to check for the gov.on.ca after the first regex fails.

    Regards,

    Kevin

  • damon medic says:

    Fantastic series, Jeff!

    Damon Medic

  • Hey Jeff , great series thanks

    i have used this reg ex and its working great

    ^[A-Za-z0-9-_]+@[a-zA-Z0-9-_]+\.[A-Za-z]{2,4}(\.[A-Za-z]{2,4})?$

  • Yosy says:

    Great Series Jeffrey !
    I have a question,In Regular Expression I can do something like If Statement?
    For Example -

    if we have a-zA-Z after the word something then
    find *.!
    else
    find $%^

    This is possible?

  • David Singer says:

    Email: /^[a-zA-Z0-9\._%\-]+@[a-zA-Z0-9\.\-]+\.[a-zA-Z]{2,6}$/
    URL: /^((http(s?):\/\/)?)[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix

  • Saber says:

    For Email:
    /^\w+[\+\.\w-]*@([\w-]+\.)*\w+[\w-]*\.([a-z]{2,4}|\d+)$/i

  • TR says:

    Regular expression examples for decimals input

    Positive Integers — ^\d+$
    Negative Integers — ^-\d+$
    Integer — ^-{0,1}\d+$
    Positive Number — ^\d*\.{0,1}\d+$
    Negative Number — ^-\d*\.{0,1}\d+$
    Positive Number or Negative Number – ^-{0,1}\d*\.{0,1}\d+$
    Phone number — ^\+?[\d\s]{3,}$
    Phone with code — ^\+?[\d\s]+\(?[\d\s]{10,}$
    Year 1900-2099 — ^(19|20)[\d]{2,2}$
    Date (dd mm yyyy, d/m/yyyy, etc.) — ^([1-9]|0[1-9]|[12][0-9]|3[01])\D([1-9]|0[1-9]|1[012])\D(19[0-9][0-9]|20[0-9][0-9])$
    IP v4 — ^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$

    Regular expression examples for Alphabetic input

    Personal Name — ^[\w\.\']{2,}([\s][\w\.\']{2,})+$
    Username — ^[\w\d\_\.]{4,}$
    Password at least 6 symbols — ^.{6,}$
    Password or empty input — ^.{6,}$|^$
    email — ^[\_]*([a-z0-9]+(\.|\_*)?)+@([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
    domain — ^([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$

    Other regular expressions
    Match no input — ^$
    Match blank input — ^\s[\t]*$
    Match New line — [\r\n]|$

  • nathan says:

    The day 3 email does not seem to be loading.

  • nathan says:

    I meant the day 3 video

  • Hugo says:

    For everyone looking for the best way to validate an email address in PHP:
    http://www.linuxjournal.com/article/9585

    This the best way I’ve seen till now.

    But that was off-topic. Nice tut jeffrey

  • Ahmad Alfy says:

    @Willabee
    Thanks, your approach is better than mine :)
    [\w-_.]+@+[\w-]+\.+([a-zA-z]{2,4}+\.[a-zA-z]{2,3}|[a-zA-z]{2,4})
    to match domains with country extensions!