In this tutorial, I will dissect the components that make up a regular expression to validate a user input string for a valid email address. My understanding of what a Regular Expressions is that it is a search pattern expression that can be used to search for or omit objects/characters in any given string.
What Is a Regex? A regex, which is short for regular expression, is a sequence of characters that defines a specific search pattern. When included in code or search algorithms, regular expressions can be used to find certain patterns of characters within a string, or to find and replace a character or sequence of characters within a string. They are also frequently used to validate input.
According to the definition of Regular Expressions on MDN Web Docs:
Regular expressions are patterns used to match character combinations in strings. In JavaScript, regular expressions are also objects. These patterns are used with the exec() and test() methods of RegExp, and with the match(), matchAll(), replace(), replaceAll(), search(), and split() methods of String.
The following regular expression can be used to verify that user input is a valid email address:
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
Each component of this regex has a unique responsibility to make sure that a user enters an email address that begins with an unspecified number of characters for UserName input preceding the @ symbol, followed by a Domain, ending with an valid Extension.
- Anchors
- Quantifiers
- OR Operator
- Character Classes
- Flags
- Grouping and Capturing
- Bracket Expressions
- Greedy and Lazy Match
- Boundaries
- Back-references
- Look-ahead and Look-behind
Let's take a deeper dive into each of the compoenents that makes up the validation of user input email using regular expression search pattern:
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
There are 3 components: 1) UserName, 2) Domain, 3) Extension.
- USERNAME
([a-z0-9_.-]+)
The first component is related to the verification of the username part (for e.g, Min_kk, john.Doe, james-007...) of the email address.
- DOMAIN
@([\da-z.-]+)
The second component is related to the verification of the email server domain (for e.g, gmail, yahoo, protonmail, nike, adidas...) of the email address.
- EXTENSION
([a-z.]{2,6})
The third component is related to the verification of the extension of the email address (for e.g, .com, .org, .edu, .gov, co.uk...).
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
There are 2 specific anchors used in the Regex validation of an email address. The first is " ^ " prefixing at the start of the Regex and the second anchor is " $ " suffixing at the end of the Regex:
/^
The " ^ " prefix is responsible for matching the first component of the email address, USERNAME. Not matching the requirements defined in the first set of parantheses " ([a-z0-9_.-]+) " will prevent the user from submitting the invalid user input.
$/
The " $ " suffix is used to match the components following after USERNAME, specifically looking at the DOMAIN and EXTENSION components of the email address. If the user input values do not match the requirements specified in the second and third sets of paranthese " @([\da-z.-]+) " and " ([a-z.]{2,6}) " respectively, will prevent the user from submitting the invalid user input.
Quantifiers are used to match the quantity of the character required for the set of paranthese that it is suffixed to. In short, the quantifier defines the quantity of the range of characters within the bracketed expression [...]+ and [...]{2,6}
/^([a-z0-9_.-]+)@([\da-z.-]+).([a-z.]{2,6})$/
There are 2 quantifiers involved with the validation of user input email string: " + " and " {2,6}"
[a-z0-9_.-]+
The " + " quantifier in this case will match 1 or more characters of the first set of parantheses USERNAME between the range of lowercases a-z, digits 0-9, and special characters _.-.
[a-z.]{2,6}
The " {2,6} " quantifier will match the number of characters to specifically be in the range of of minimum 2 characters and maximum 6 characters for the preceeding bracket reference to the last set of parantheses EXTENSION.
In our case of Regex Email Validation, the " | " OR Operator is not used but in general, the " | " will match the set of expression preceeding before or following after the " | ".
According to JavaScript.info:
Alternation is the term in regular expression that is actually a simple “OR”. In a regular expression it is denoted with a vertical line character |. For instance, we need to find programming languages: HTML, PHP, Java or JavaScript. The corresponding regexp: html|php|java(script)?.
@([\da-z.-]+).
With regards to our Email Validation Regex above, the Character Classes in this case is the " \d " contained within the second set of paranthese DOMAIN. The " \d " is used to match a single digit character while capitalized " \D " on the other hand, will match any characters that is not a digit.
In general, FLAGS in Regular Expression are used for advanced searching and for our case of Email Validation Regex, the anchors " ^ " and " $ " can be referred to as FLAGS and more specifically, is known as " Multiline Mode ", having the effect of allowing the expression to be on multiple lines of code without breaking the expression.
As mentioned in the Components section for our Email Validation Regex, it can be broken down into 3 groupings:
([a-z0-9_.-]+)
the first set of parantheses is grouped for USERNAME,
([\da-z.-]+)
the second set of parantheses is grouped for DOMAIN,
([a-z.]{2,6})
and the third set of parantheses if grouped for EXTENSION.
It is used to seperate meta characters from literal characters by grouping them inside sets of parantheses. According to MDN Web Docs:
Groups group multiple patterns as a whole, and capturing groups provide extra submatch information when using a regular expression pattern to match against a string. Backreferences refer to a previously captured group in the same regular expression.
As the name implies, bracket expressions are wrapped inside " [ " opening and " ] " closing brackets and are used to determine the type and range of characters to be matched/searched. In our case of Email Validation, there are 3 groups of bracket expressions:
[a-z0-9_.-] |
---|
- a-z Matches any lower case alphabets a to z |
- 0-9 Matches any digit from 0 to 9 |
- _ Matches special character _ |
- . Matches special character . |
- " - " Matches special character - |
[\da-z.-] |
---|
- \d Matches one single digit from 0 to 9 |
- a-z Matches any lower case alphabets a to z |
- . Matches special character . |
- " - " Matches special character - |
[a-z.] |
---|
- a-z Matches any lower case alphabets a to z |
- . Matches special character . |
- " - " Matches special character - |
According to JaveScript.info:
In the greedy mode (by default) a quantified character is repeated as many times as possible. The regexp engine adds to the match as many characters as it can for .+, and then shortens that one by one, if the rest of the pattern doesn’t match.
As such we can determine that there are two greedy mode matches for our case:
-
As mentioned in the above quote, the " + " quantifier repeatedly matches the previous token between one and as many characters as possible.
-
The second greedy mode match is the {2,6} quantifier and it also matches the previous token but as defined by the values, it will match minimum 2 and maximum 6 times.
For our case, Boundaries are not necessary to be used here but the " @ " works like a boundary between USERNAME and DOMAIN.
Likewise, Back-references for our Email Validation Regex will not be needed as it pertains more so to matching HTML elements/documents.
According to JavaScript.info:
Lookahead and lookbehind (commonly referred to as “lookaround”) are useful when we’d like to match something depending on the context before/after it. For simple regexps we can do the similar thing manually. That is: match everything, in any context, and then filter by context in the loop. The syntax is: Positive lookbehind: (?<=Y)X, matches X, but only if there’s Y before it. Negative lookbehind: (?<!Y)X, matches X, but only if there’s no Y before it.
GitHub Username: min-hinthar
GitHub URL: https://github.com/min-hinthar
Email: [email protected]