One of the basic things you want to do when building a site where people can sign up is validating an email address. Now this is not a very straightforward thing to do.
There are various methods of validating an address. The first line of defense is the composition of the address. Obviously, an address must have an @ and at least one dot (.). But we can be more restrictive. There are sooo many regexes out there, one even complexer than the next. I don't care though.
My method only checks for basic validity (*@*.*) and excludes certain characters which will never be part of a (valid) email address but could cause email header injections (%\;). The \ is obviously used to insert escape chars, the % is used to insert escape chars encoded and the ; is used to enter multiple addresses. So now we end up with this:
^.*[%\\;]+.*$
When the regex tells us the structure is ok, we get the domain part of the email and see if we can find the DNS. When found, we assume it's ok. For safety reasons you should always validate an email address by sending an activation link to that address, but this can prevent you from receiving bounces in the first place...
It's _that_ simple. So how does this all work? Like this:
// 0 = ok, 1 = did not pass regex, 2 = invalid/unresolved domain
function isInvalidEmail($strEmail) {
// basically allows any address in the form of user@(sub.)domain.tld,
// but user/sub/domain/tld are not checked explicitly
$regex1 = "^.+@(.+\.)+.{2,}$";
// these are common to abuse and are not part of any "valid email"
$regex2 = "^.*[%\\;]+.*$";
// check composition of email
if (!preg_match("/$regex1/i", $strEmail) || preg_match("/$regex2/i", $strEmail)) {
// invalid composition
return 1;
}
// check if domain exists by looking at its dns
// alternatively, you can use getmxrr()
if (!checkdnsrr(substr($strEmail, strpos($strEmail, "@")+1))) {
// invalid domain (or unable to resolve...)
return 2;
}
// ok!
return 0;
}
Hope it helps ya :)