Skip to content

Instantly share code, notes, and snippets.

@mattvr
Created December 15, 2023 17:32
Show Gist options
  • Save mattvr/70a57f273e0d766c73caee4db28eb197 to your computer and use it in GitHub Desktop.
Save mattvr/70a57f273e0d766c73caee4db28eb197 to your computer and use it in GitHub Desktop.

← Back to index

⌕|  Regex

  • In computing, Regular Expressions (regex) serves as a powerful search tool that helps you find, replace, or match text in a document based on defined patterns or sequences.
  • Think of it as Swiss Army Knife for working with text, most often used for pattern matching and fuzzy searches over text and documents.
  • It looks a bit like an alien language, but there's only about 10-20 "words" or concepts you need to understand to be effective. Check out Table of Elements for most of the language.

Examples

Phone number regex

/\d{3}-\d{3}-\d{4}/

The above is a quick example which captures phone numbers in the XXX-XXX-XXXX format.

Reading from left to right...

  • The / / slashes surround regexes as a convention.
  • \d matches a digit.
  • {3} indicates the preceding character (a digit in this case) must be repeated exactly 3 times.
  • - matches the literal dash character -
  • Then the number blocks are repeated for another 3 and 4 digits.
  • This pattern will match phone numbers in any input document! All other text is ignored.

Match "gray" or "grey"

/gr[ae]y/

  • The [ae] matches any character in the brackets once.

Match "earth", "wind", OR "fire"

/(earth|wind|fire)/

  • The ( | | ) bars inside of the parentheses separate possible strings.

Match everything up to "The End."

/^.*The End\./

  • The ^ starts the matching at the start of the line.
  • The .* matches any characters, unlimited times.
  • Until the last instance of: The End is reached.
  • The \. distinguishes the literal dot (.) from the "match anything character" dot, so that this doesn't match with an exclamation point for example.

Tips

Tip Name Details
Use Regex101 for testing & development. Regex Testing Use a tool like Regex101 that allows for immediate feedback while learning and experimenting with regex patterns.
Test iteratively Iterative Development Build and validate your regex incrementally, testing each section before adding more complexity.

Table of Elements

Element Description Example
Characters
\d Decimal (number) \d{2} matches "12"
\w Word (character) \w+ matches "Hello"
\s Space (whitespace, newlines) \s matches " "
[abc] Match a, b, or c [AaBb]+ matches "Baba"
[A-Z] Uppercase characters [A-Z]+ matches "HELLO"
[^A-Z] Anything but uppercase [^A-Z]+ matches "hello"
. Any character . matches "a"
Quantifiers
.* Match anything unlimited times .* matches "Hello 123"
.? Match 0-1 times a? matches "a" or ""
.+ Match 1+ times a+ matches "aaa"
.{3} Match exactly 3 times a.{3} matches "abbb"
.{2,4} Match between 2-4 times a.{2,4} matches "azzzz"
.{2,} Match 2 or more times a.{2,} matches "azxzxz"
Capture Groups
(\w+) Save text into numbered capture group (\w+) matches "word" and captures "word"
(?<name>\w+) Save text into "name" capture group (?<name>\w+) captures "word" as "name"
(a|b) Match either A or B (and capture) (abc|def) matches "abc" or "def" and captures it
(?:don't match me) Don't save into capture group (?:a|b) matches "a" or "b" but doesn't capture
Lookarounds
(?=...) Positive lookahead x(?=y) matches "x" only if "x" is followed by "y"
(?!...) Negative lookahead x(?!y) matches "x" only if "x" is not followed by "y"
(?<=...) Positive lookbehind (?<=x)y matches "y" only if "y" is preceded by "x"
(?<!...) Negative lookbehind (?<!x)y matches "y" only if "y" is not preceded by "x"
Predefined Classes
\D Not a digit \D+ matches "abc"
\W Not a word character \W+ matches "@"
\S Not a whitespace \S+ matches "hello"
Special Characters
\t Tab \t matches a tab character
\r Carriage return \r matches a carriage return
\n Newline \n matches a newline character

More

More tips

Tip Name Details
.*? instead of .* Non-Greedy Matching Utilize ? after +, *, or {} to make your match non-greedy, stopping at the first match rather than last.
\ (before ., ( ), ?, etc.) Escape Special Characters Use \ before special characters when you want to match them literally, for more precise matches.
^ and $ Anchoring Employ ^ to match the start and $ to match the end of a line, preventing unexpected matches elsewhere.
(?#comment) Comments Incorporate inline comments within your regex to explain complex sections and enhance readability.

More examples

  1. Email Matching:

    /^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$/
    
  2. URL Matching:

    /https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{2,256}\.[a-z]{2,6}\b([-a-zA-Z0-9@:%_\+.~#?&//=]*)/
    
  3. Date in YYYY-MM-DD:

    /\b(19[0-9]{2}|200[0-9]|201[0-9])-(0[1-9]|1[0-2])-(0[1-9]|[12][0-9]|3[01])\b/
    
  4. Extracting Hashtags:

    /#\w+/
    

Resources

  1. RegExr - Another powerful tool for learning and testing regex.
  2. Regular-Expressions.info - Comprehensive resource for learning regex.
  3. Mozilla Developer Network (MDN) - For a solid understanding of regex in JavaScript.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment