Regex for Dummies: Day 3
We’re going to start reviewing real-world examples. We’ll begin by learning the correct syntax for matching an email address. In day four, we’ll take things a step further and implement this code into a PHP and Javascript app.
Day 3: Validating Email Addresses
Be sure to click on the “Full Screen Toggle”.
Quiz
1: When validating email addresses, why is it better to use the range {2,4} to match the extension (.com, .org, etc.), rather than using the more simple plus (+) symbol?
2: How would I match a block of text that is a minimum of five characters and a maximum of ten? Think about this one; there’s more to it than just the range.
3: How do we turn the ? symbol from greedy to lazy? What are the differences between the two?
- Subscribe to the ThemeForest RSS Feed for more daily web development screencasts and articles.
















Great series! but I’m a little confused… where do use Regular Expressions?
There are addresses like me@wherever.co.uk too with an extra dot that needs to be taken into consideration too nah?
Thanks for this great series. I’ve always side stepped actually having to understand those colourful strings of letters and symbols.
The only problem I can see with the above is that it doesn’t account for double barreled extensions such as .co.uk or .ac.uk
there are domains with numbers as well (subdomains
)
You’ve been tweeted (a good thing) – Tweetback from @elijahmanor
http://twitter.com/elijahmanor/status/1426146307
Hy Jeff!
Thanks for great video series.
You forgave about number in e-mails.
For example: dan67e5@nettuts.com
Another great tutorial
Do you know any easy way to take a textarea input from a webpage then add tags with regular expressions?
Great video; to match a few extra characters, I came up with this:
\b[\w-\.\+]+@[\w-\.]+\.[A-Za-z_-]{2,4}\b
Which matches:
whatever@andrewburgess.otherinbox.com
a06p18b@gmail.com
a06p18b+regex@gmail.com
3×4mp1e@4ddr355.com.com.com
Hey Jeffrey, thanks for this awesome screencast!
@Paperboy – use:
\b[\w-]+@[\w-]+(\.[A-Za-z_-]{2,4})+\b
to test common two letter DNS addresses like my-name@someDomain.co.au
Notice the parenthesis around the extension test followed by the + symbol allowing for one or more DNS extensions.
Thanks Jeff!
I must admit, these still freak me out a bit being hisorically a non-programmer but it really helps to understand what is possible with php, no to mention javascript or jQuery integration.
I’m gonna kick ass with some my smaller projects, I really recommend others try to either follow along or pick a sample project to utilise these, it helps for me.
Jeff – I wonded if you could do a tutorial on regex in relation to limiting uploaded file extension. i know you’ve done this in another screencast but it’d be a neat addition and much needed I reckon!
If you need to match a domain like .com.au or .co.uk, this works for me
\b[\w-]+@[\w-]+\.[a-zA-Z]{2,4}[\.a-zA-Z]*\b
I have a fair amount of experience with regular expression so I feel obligated to point out to new regular expression user that you can’t validate an email address with 100% accuracy. Just about any email pattern is going to be flawed. Many by a fairly large degree. A pattern that matches email mailbox according to the specifications (RFC2822) with a high degree of accuracy is several thousands of characters long and not practical to use.
So don’t rely on any pattern of this type to be rock solid. Most are probably 60% accurate at best
I don’t won’t to hijack this tutorial so I won’t go into details since it’s a bit advanced. But if someone really wants more details I can provide links to more detailed information.
Also Jeff I don’t know if you mentioned it but another thing for new user to be aware of is regular expression syntax and features don’t port across all languages that support them so if your are working in a different language things may work differently or not at all.
Michael, you are right. I faced the same problem once and I found this hilariously gigantic regular expression for matching email addresses:
http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html
However, I think you can get a better rate than 60% with a simple regular expression. It depends on what you actually want to achieve. In most cases you know that an existing valid email addresses is placed somewhere within a string, so you don’t really have to care about strict RFC822-conformance when applying your RegEx.
@Jeffry: there are indeed top level domains with more than 4 characters, such as .museum:
http://en.wikipedia.org/wiki/Template:Generic_top-level_domains
i have no comments, its a good series
Thanks , it’s very helps me . And I’m already use it in my small project .
is the diving into php series all over?
@Thomas
I wasn’t saying you couldn’t do better than 60%, which is a total guesstamation on my part, I was say most, not all, of the patterns I’ve seen don’t do a much better job. In practice you may encounter a high percentage of matches that’s more dumb luck that correctness. Even not following strict compliance by ignoring things like whitespaces and comments many email matching patterns leave out allowable characters and will reject perfectly valid address. Plus many don’t allow for allowable syntax. Just glancing at the patterns on this page only Andrew’s would accept John.Q.Public@example.com which is valid but that same pattern would accept J…n@example.com which isn’t. It looks all like the other patterns here would reject both not just the bad one.
I’m not suggestion any should try write a pattern to match the RFC, in fact having done it I recommend that you don’t, but just be aware there will be holes in your pattern. The simpler the pattern the more holes there will be. I’ve often recommended to not use a regular expression to match email unless you are OK with there being a noticeable degree potential of failure of at least 20%.
Great idea for a series, just great.
There are some government domains in Ontario that even have triple nested extensions. Primarily the Government of Ontario Canada whose domain names follow the schema ‘http://WEBSITE.gov.on.ca’ and so will the ontario government email addresses. politician@health.gov.on.ca
Knowing this I suppose I could just add a secondary regex to check for the gov.on.ca after the first regex fails.
Regards,
Kevin
Fantastic series, Jeff!
Damon Medic
Hey Jeff , great series thanks
i have used this reg ex and its working great
^[A-Za-z0-9-_]+@[a-zA-Z0-9-_]+\.[A-Za-z]{2,4}(\.[A-Za-z]{2,4})?$
Great Series Jeffrey !
I have a question,In Regular Expression I can do something like If Statement?
For Example -
if we have a-zA-Z after the word something then
find *.!
else
find $%^
This is possible?
Email: /^[a-zA-Z0-9\._%\-]+@[a-zA-Z0-9\.\-]+\.[a-zA-Z]{2,6}$/
URL: /^((http(s?):\/\/)?)[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(([0-9]{1,5})?\/.*)?$/ix
For Email:
/^\w+[\+\.\w-]*@([\w-]+\.)*\w+[\w-]*\.([a-z]{2,4}|\d+)$/i
Regular expression examples for decimals input
Positive Integers — ^\d+$
Negative Integers — ^-\d+$
Integer — ^-{0,1}\d+$
Positive Number — ^\d*\.{0,1}\d+$
Negative Number — ^-\d*\.{0,1}\d+$
Positive Number or Negative Number – ^-{0,1}\d*\.{0,1}\d+$
Phone number — ^\+?[\d\s]{3,}$
Phone with code — ^\+?[\d\s]+\(?[\d\s]{10,}$
Year 1900-2099 — ^(19|20)[\d]{2,2}$
Date (dd mm yyyy, d/m/yyyy, etc.) — ^([1-9]|0[1-9]|[12][0-9]|3[01])\D([1-9]|0[1-9]|1[012])\D(19[0-9][0-9]|20[0-9][0-9])$
IP v4 — ^(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5])\.(\d|[1-9]\d|1\d\d|2[0-4]\d|25[0-5]){3}$
Regular expression examples for Alphabetic input
Personal Name — ^[\w\.\']{2,}([\s][\w\.\']{2,})+$
Username — ^[\w\d\_\.]{4,}$
Password at least 6 symbols — ^.{6,}$
Password or empty input — ^.{6,}$|^$
email — ^[\_]*([a-z0-9]+(\.|\_*)?)+@([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
domain — ^([a-z][a-z0-9\-]+(\.|\-*\.))+[a-z]{2,6}$
Other regular expressions
Match no input — ^$
Match blank input — ^\s[\t]*$
Match New line — [\r\n]|$
The day 3 email does not seem to be loading.
I meant the day 3 video
For everyone looking for the best way to validate an email address in PHP:
http://www.linuxjournal.com/article/9585
This the best way I’ve seen till now.
But that was off-topic. Nice tut jeffrey
@Willabee
Thanks, your approach is better than mine
[\w-_.]+@+[\w-]+\.+([a-zA-z]{2,4}+\.[a-zA-z]{2,3}|[a-zA-z]{2,4})
to match domains with country extensions!