- In computing, Regular Expressions (regex) serves as a powerful search tool that helps you find, replace, or match text in a document based on defined patterns or sequences.
- Think of it as Swiss Army Knife for working with text, most often used for pattern matching and fuzzy searches over text and documents.
- It looks a bit like an alien language, but there's only about 10-20 "words" or concepts you need to understand to be effective. Check out Table of Elements for most of the language.
Phone number regex
/\d{3}-\d{3}-\d{4}/
The above is a quick example which captures phone numbers in the XXX-XXX-XXXX format.
Reading from left to right...
- The
/ /
slashes surround regexes as a convention. \d
matches a digit.{3}
indicates the preceding character (a digit in this case) must be repeated exactly 3 times.-
matches the literal dash character-
- Then the number blocks are repeated for another 3 and 4 digits.
- This pattern will match phone numbers in any input document! All other text is ignored.
Match "gray" or "grey"
/gr[ae]y/
- The
[ae]
matches any character in the brackets once.
Match "earth", "wind", OR "fire"
/(earth|wind|fire)/
- The
( | | )
bars inside of the parentheses separate possible strings.
Match everything up to "The End."
/^.*The End\./
- The
^
starts the matching at the start of the line. - The
.*
matches any characters, unlimited times. - Until the last instance of:
The End
is reached. - The
\.
distinguishes the literal dot (.
) from the "match anything character" dot, so that this doesn't match with an exclamation point for example.
Tip | Name | Details |
---|---|---|
Use Regex101 for testing & development. | Regex Testing | Use a tool like Regex101 that allows for immediate feedback while learning and experimenting with regex patterns. |
Test iteratively | Iterative Development | Build and validate your regex incrementally, testing each section before adding more complexity. |
Element | Description | Example |
---|---|---|
Characters | ||
\d |
Decimal (number) | \d{2} matches "12" |
\w |
Word (character) | \w+ matches "Hello" |
\s |
Space (whitespace, newlines) | \s matches " " |
[abc] |
Match a, b, or c | [AaBb]+ matches "Baba" |
[A-Z] |
Uppercase characters | [A-Z]+ matches "HELLO" |
[^A-Z] |
Anything but uppercase | [^A-Z]+ matches "hello" |
. |
Any character | . matches "a" |
Quantifiers | ||
.* |
Match anything unlimited times | .* matches "Hello 123" |
.? |
Match 0-1 times | a? matches "a" or "" |
.+ |
Match 1+ times | a+ matches "aaa" |
.{3} |
Match exactly 3 times | a.{3} matches "abbb" |
.{2,4} |
Match between 2-4 times | a.{2,4} matches "azzzz" |
.{2,} |
Match 2 or more times | a.{2,} matches "azxzxz" |
Capture Groups | ||
(\w+) |
Save text into numbered capture group | (\w+) matches "word" and captures "word" |
(?<name>\w+) |
Save text into "name" capture group | (?<name>\w+) captures "word" as "name" |
(a|b) |
Match either A or B (and capture) | (abc|def) matches "abc" or "def" and captures it |
(?:don't match me) |
Don't save into capture group | (?:a|b) matches "a" or "b" but doesn't capture |
Lookarounds | ||
(?=...) |
Positive lookahead | x(?=y) matches "x" only if "x" is followed by "y" |
(?!...) |
Negative lookahead | x(?!y) matches "x" only if "x" is not followed by "y" |
(?<=...) |
Positive lookbehind | (?<=x)y matches "y" only if "y" is preceded by "x" |
(?<!...) |
Negative lookbehind | (?<!x)y matches "y" only if "y" is not preceded by "x" |
Predefined Classes | ||
\D |
Not a digit | \D+ matches "abc" |
\W |
Not a word character | \W+ matches "@" |
\S |
Not a whitespace | \S+ matches "hello" |
Special Characters | ||
\t |
Tab | \t matches a tab character |
\r |
Carriage return | \r matches a carriage return |
\n |
Newline | \n matches a newline character |
Tip | Name | Details |
---|---|---|
.*? instead of .* |
Non-Greedy Matching | Utilize ? after + , * , or {} to make your match non-greedy, stopping at the first match rather than last. |
\ (before . , ( ) , ? , etc.) |
Escape Special Characters | Use \ before special characters when you want to match them literally, for more precise matches. |
^ and $ |
Anchoring | Employ ^ to match the start and $ to match the end of a line, preventing unexpected matches elsewhere. |
(?#comment) |
Comments | Incorporate inline comments within your regex to explain complex sections and enhance readability. |
-
Email Matching:
/^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$/
-
URL Matching:
/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/
-
Date in YYYY-MM-DD:
/\b(19[0-9]{2}|200[0-9]|201[0-9])-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])\b/
-
Extracting Hashtags:
/#\w+/
- RegExr - Another powerful tool for learning and testing regex.
- Regular-Expressions.info - Comprehensive resource for learning regex.
- Mozilla Developer Network (MDN) - For a solid understanding of regex in JavaScript.