Getting Comfortable with Regular Expressions

Learning basic Regex

A search logo

Regular Expressions, often shortened to “Regex” or “RegExp” are patterns that can be used to match, search or replace text.

This was one of the first things I learnt in my 1 year of self-taught Web Development(MERN) and I’d like to make it as interesting as I understood it, I’ll add some additional links for further learning below.

Regex are written with 2 backslashes and a character between them, we then use a method (a.k.a function), in the case below the .test() method to test if a pattern we created is present, this returns a boolean(true or false value) when testing the "words" variable.

Example:

let words = "Lets get started with Regex"

let regex = /r/

console.log(regex.test(words)) // true

That brief example matched the single letter “r” in “started” from our string “Lets get started with Regex”.

We can confirm this by using another method to create an array that has the value of the string we matched.


let words = "Lets get started with Regex"

let regex = /r/

console.log(words.match(regex)) // [ 'r', index: 12, input: 'Lets get started with Regex', groups: undefined ]

The .match() method “matches” the regex and returns what was matched as an array and in our JavaScript playground, it also displays the exact index the string “r” was matched(this is from using RunJS to run the code).

These examples using the .match() and .test() methods are an example of a "literal match", which is to say we are matching only the exact character of our regex, nothing else.

Lets examine what the .test() method does:

//  The .test() method is used by prepending the regex and then passing
// the string you want to test

let string = "documentary"

let regex = /documentary/

regex.test(string) // true

// or

/documentary/.test(string) // true

// The second example is just to pass the message across

So the .test() returns true for strings that match the regex.

Then we can take a look at the .match() method:

// The .match() method accepts a regex as an argument and the string is
// prepended to it

let string = "foxes"

let regex = /foxes/

string.match(regex) // [ 'foxes', index: 0, input: 'foxes', groups: undefined ]

A way to think about what we get from using the .match() method is that we converted the string into an array with the string inside it.

// This is basically what the regex prints out when we use .match()

["foxes"] // foxes will be found on index 0

Regex can also be created by using the RegExp constructor, this is best used if you don’t know the value before hand.

let name = "David Bayode" // variable holding a string

let davRegex = new RegExp(name, "i") 
// pass name variable and case-insensitive flag "i"

console.log(davRegex) // /David Bayode/i

This is nice to know if you’re iterating by certain words and want to find a certain set of characters, but we’ll stick to creating manually for this article.

Matching Capitalized and Upper Case Letters

We can do more of these to understand how we can use regex to extract one or more texts in our strings.


let alphabetG = "G" // A capital G

let named = "David Bayode" // My name in title case

let favoriteFlavor = "Chocolate" // Chocolate in title case

console.log(/g/.test(alphabetG)) // false

console.log(/david bayode/.test(named)) // false

console.log(/chocolate/.test(favoriteFlavor)) // false

console.log(alphabetG.match(/g/)) // null

console.log(named.match(/david bayode/)) // null

console.log(favoriteFlavor.match(/chocolate/)) // null

We get false for the strings we’re testing with .test() and we get null for the characters we try to match with .match() in the above examples, Why?

Regular Expressions are Case Sensitive

Yup, you heard it here first!(Jokingly I might add)

Remember what I wrote up there a few moments about “literal matches”, they match the exact characters we write, that means if we have a regex with the character “chocolate” then they wouldn’t match the string “Chocolate”, this applies to the example above. Another thing to keep in mind about regexes are:

Regular Expressions match one thing a time

Lets see what we get for the following strings:

let sentence = "The quick fox jumped over the lazy dog"

let regex = /qui/

regex.test(sentence) // true

// matched "qui" from "quick"

It matches exactly what we put into it so that means we can match letters and individual characters.

Also one thing to keep in mind we can match symbols and numbers so long as they’re strings.

let num = "4"

let regex = /4/

regex.test(num) // true

// And symbols too

let symbol = "@"

let regex = /@/

regex.test(symbol) // true

Nice, so now you should be familiar with matching upper case and lower case letters in strings.

Matching Uppercase and Lowercase Characters

We can match characters on any case using flags:

Flags

Flags are a way to extend our regex beyond the literal text we put inside of it, this is by adding a character after the regex, in this case an “i”.

The Case-Insensitive Flag “i”

let string = "Fifty"

let regex = /fifty/i // appended an "i" for ignore-casing or 
// case insensitive

console.log(regex.test(string)) // true

The flag “i” will change the regex to match for both upper and lowercase changes. Take note, this also means we can return:

let string = "ChEEse"

let regex = /cheese/i

console.log(regex.test(string)) // true

Remember we are ignoring the casing when we adda the “i” flag and we are also matching the characters one after the other so we will return true for the jumbled text in the string variable.

Lets match a repeating string.

Repeating Strings

Repeating bananas

What if we wanted to know how many times a certain word repeats itself in a string? We use the Global flag “g”.

The Global Flag “g”

The global flag “g” matches for repeats of the regex character, so we can use it to know if there is a repeating pattern that conforms to our regex.

let string = "this is a big misstep"

let regex = /is/g

regex.test(string) // true
// the "is" repeats several times in the string variable,
// 1st the "is" in "this", then the next "is" and finally,
// the "is" in "misstep

console.log(string.match(regex)) // ["is", "is", "is]

We can use this to even find the number of times a string repeats.

let repeatingString = "This is an island"

let repeatingRegex = /is/g

console.log(repeatingString.match(repeatingRegex).length) // 3

We get back 3 for the number of times “is” repeats in the variable “repeatingString”.

Lets give a brief list of flags and what they match:

  1. i - Case-Insensitive Matching: /abc/i matches "ABC", "abc", "AbC", and so on, regardless of letter case.

  2. g - Global Matching: /abc/g matches all occurrences of "abc" in a string, not just the first one.

  3. m - Multi-Line Matching: /^abc/m matches "abc" when it appears at the start of a line within a multi-line string.

  4. s - Dot-all Mode (ES2018 and later): /a.b/s matches any character, including line terminators (newline characters), with the dot (.) metacharacter.

  5. u - Unicode Matching: /\p{Script=Hiragana}/u matches any character in the Hiragana script in a Unicode-aware way.

  6. y - Sticky Matching (ES2015 and later): /abc/y matches "abc" only at the start of the input string or at the index specified by the lastIndex property.

  7. d - Unicode Digit Matching (ES2018 and later): /\d/u matches any Unicode digit character.

  8. w - Unicode Word Matching (ES2018 and later): /\w/u matches any Unicode word character.

  9. A - Any Character (ES2018 and later): /./A matches any character, including newline characters.

  10. U - Ungreedy (Lazy) Matching (ES2018 and later): /a.*?b/ matches the shortest possible sequence between "a" and "b."

  11. c - Constrained (Chaining) Matching (ES2022 and later): /\babc\b/c matches "abc" only as a whole word.

  12. n - Match Empty Strings (ES2022 and later): /a*n/ matches an empty string followed by "n."

There’s no need to cram or put in memory most of these, the important ones to know to be productive with I’ll try to cover them.

Matching Repeating Letters

Alright, so we’ve been able to match repeating words from strings and also case sensitive words, what about when we want to match against a word that repeats a certain letter?

Words like “goooooooaaal!” or “woooohooo”, how are we going to match the strings using regular expressions?

The + Character

The + Character is used after a character string in our regex to match one or more of that character, for example:

let repeatingString = "gooooooaaal"

let repeatRegex = /go+a+l/

repeatRegex.test(repeatingString) // true

repeatingString.match(repeatRegex) // ["gooooooaaal"]

As you see above, we match all repetitions of the characters “o” and “a” by adding a “+” character to the letters in our regex.

Note: The “+” character is used for matching letters that return at least once or more than. Lets take a few examples to understand this:

let string = "Helllooooo!" // String we're trying to match

let regex = /hel+o/i // Regex that checks for the characters

regex.test(string) // true, we matched "Hello"

We can observe two things in the above example, our string we want to test has repeating letters “l” and “o” while our Regex uses the case-insensitive flag to match for any capitals, in this case the letter “H” in our initial string.

The Regex created will match the letters “Hello” in our “Helllooooo!” because we are looking for a case-insensitive(upper or lower case) match for a character “h”, “e”, single or multiple character “l” and a single “o”, which means we matched “hello”(upper or lower case) which was present in our string variable.

Another example:

let string = "hoooorrray!"

let regex = /ho+r+a+y!/ // Regex that matches all characters in the string

regex.test(string) // true

We match all the letters in the string variable in this case, even the “!” character, if you’re wondering if we would keep having to include all individual characters the answer is no, we’ll cover cases that match words without having to know all the characters.

The * Character

The * Character is used to match letters that appear zero or more times as opposed to the + Character which checks for letters that appear atleast one or more times.

For instance:

let string = "fify" // Incorrectly spelled string

let regex = /fift*y/ // Regex that checks for 0 or more of "t"

regex.test(string) // true, we get the match

The string is “fify” not “fifty” and we were able to match the incorrectly spelled string by using the “*” Character to indicate the letter “t” may not be present in the string we’re testing for.

We can do more to try out cases where a letter may or may not be present.

Lets check for multiple words with the global flag “g” and case insensitive flag “i”:

let string = "Favor Favour" // American and British spellings of favour

let regex = /favou*r/gi // Regex to look for case-insensitive favour with u appearing 0 or more times.

regex.test(string) // true

In the above example, we matched the string for both cases because we used the * Character to check if there are 0 or more appearances of the letter “u” in the string. We can test this by getting a match on this:

let string = "Favor Favour" // American and British spellings of favour

let regex = /favou*r/gi // global flag for multiple cases and case-insensitive for upper or lower case

string.match(regex) // [ 'Favor', 'Favour' ]

We get back the matches for both words!

What about if we only want to know if the letter appears or not? That-is 0 or 1 time?

The ? Character

The ? Character matches for letters that appear 0 or 1 times, this is fairly straightforward. Example:

let string = "Jon" // Shortened form

let regex = /joh?n/i // Regex that checks if there is 0 or 1 character h

regex.test(string) // true, matches as there is 0 letter "h"

We would get a match because we used the ? Character to indicate the letter “h” may not be present.

You can play with this character to find if there’s a shortened, British or American word.

The . Character

The . Character is used to test for an unknown letter, let’s say we want to find a word but don’t know a particular letter that may be present, we would use the . Character to indicate that.

Note: Using a “.” character in your regex means you know the word definitely exists, unlike if you used the * sign which would mean the letter might not exist.

An Example:

let string = "John" // John

let regex = /jo.n/i // Regex to match the word jo and an unknown letter, then n.

regex.test(string) // true

In the example above we get back true, as we matched all characters and an unknown character using “.” which the Regex matched as the “h” in the string “John”.

We could also combine our Characters from before.

let string = "Johhhhhn" // "Johhhhhn"

let regex = /jo.+n/i // Regex to match "jo" and then a repeating unknown character that appears one or more times, then n

regex.test(string) // true

In the example above, we use a case-insensitive regex(with “i” flag) to look for the characters “jo” then an unknown character(“.”) that appears one or more times(“+”) then finally an “n” to match the string “Johhhhhn”.

Matching multiple words with Character Classes

Here, we’ll be matching multiple words with Character Classes, first off, what are Character Classes?

Character Classes allow you to define a group of characters you want your Regex to look for in the string. Example:

let string = "hit hat hot" // Three strings beginning and ending with the same letter

let regex = /h[aio]t/g // Regex that matches all three

regex.test(string) // true

string.match(regex) // [ 'hit', 'hat', 'hot' ]

We will get true and return all matched strings when we used the Character Class “[aio]” in the regex as “/h[aio]t/g”, this is because we first match with the letter “h” then we read the Character class as the second letter either being an “a”, “i” or “o” this means whatever string matches those characters at the first index(second letter) then we end the Regex with a “t”, we also specify a global flag to return multiple matches else we only return the first match “hit”.

Lets see another example:

let string = "fit fat faith" // 2 three letter words and a third 5 letter word

let regex = /f[aioth]t/g // Regex that you might think would match all.

regex.test(string) // true

string.match(regex) // only matches [ 'fit', 'fat' ]

So we might think we can use a Character class Regex of “/f[aioth]t/g” to match “fit”, “fat” and “faith” but the reality is that we are actually looking for a 3 letter word with the regex and since “faith” is a 5 letter word that does not match the 2nd letter we don’t match “faith”, it even returns false if we use the “.test()” method with it alone:

let string = "faith"

let regex = /f[aioth]h/g

regex.test(string) // false

string.match(regex) // null

We return false from trying to test it with the .test() method and there are no matches when we use the .match() method. This tells us we’re actually looking for the strings “fah”, “fih”, “foh”, “fth” and “fhh” not any combination of the letters in the Character Class itself.

Negating Character Classes

What if we wanted to match everything except some letters, we use the ^ sign with the Character Class.

Example:

let string = "fought taught" // 2 words that are almost spelled the same

let regex = /.[^a]ught/ // Our regex

regex.test(string) // true

string.match(regex) // ['fought']

Our regex “/.[^a]ught/” first uses the “.” to match either “f” or “t” in “fought” and “taught” then we use a negated character class “[^a]” to indicate we don’t want the letter “a”(anything but the letter a), then continue the regex with “ught”, this makes us to match and test for only “fought”.

Matching a wide range of letters with “-”

What if we have a situation where we want to match a whole range of letters, let’s say from “a” to “o”?

Lets tweak the previous example:

let string = "fought taught" // Words that are spelt almost alike

let regex = /.[a-o]ught/g // Regex that checks for the range of letters from "a" to "o" in the string

regex.test(string) // true

string.match(regex) // ['fought'], ['taught']

We would match both strings “fought” and “taught” because we are looking for “.” any character first, followed by a Character Class of letters “a” to “z” then following it up with the literal rest of the letters “ught” with a global flag to match more than one string that passes the regex.

Matching all alphanumeric characters

We can match all alphanumeric characters by using the “-” character that we just learnt in the following examples:

let string = "The quick brown fox jumped over the lazy dog!"

let regex = /[a-zA-Z0-9]/g // Regex that matches all letters

regex.test(string) // true

string.match(regex) // [ 'T', 'h', 'e', 'q', 'u', 'i', 'c', 'k', 'b', 'r', 'o', 'w', 'n', 'f', 'o', 'x', 'j', 'u', 'm', 'p', 'e', 'd', 'o', 'v', 'e', 'r', 't', 'h', 'e', 'l', 'a', 'z', 'y', 'd', 'o', 'g' ]

We would match all the letters in the string except the character “!” and except the spaces or whitespace between the words.

Note: We are able to match all characters because we used the global flag “g” to match multiple cases, so that’s why we return each individual letter in the word.

Shorthand to match all alphanumeric characters

We can use a shorthand to match all alphanumeric characters, using the /\w/ regex.

Lets see an example:

let string = "A boy and his dog_" // A phrase with the character "_" at the end

let regex = /\w+/g // Shorthand regex for all alphanumeric characters and the underscore _ at the end

regex.test(string) // true

string.match(regex) // [ 'A', 'boy', 'and', 'his', 'dog_' ]

In the above example, we are going to match all the words(we aren’t using character classes which match per letter) and also an underscore(“_”).

Let’s see an example with numbers:

let string = "200 bags of rice"

let regex = /\w+/g // Regex to match all alphanumeric characters including an underscore(_)

regex.test(string) // true

string.match(regex) // [ '200', 'bags', 'of', 'rice' ]

Note: Take note that we are actually using the “+” quantifier character and global flag “g” to match whole words, if we were to use the word character class by itself(/\w/) we would only match a single letter of number(or underscore)

let string = "200 bags of rice"

let regex = /\w/ // single alphanumeric character regex

regex.test(string) // true

string.match(regex) ['2']

But if we wanted all letters or numbers of the first word or character:

let string = "200 bags of rice"

let regex = /\w+/ // regex to match a continuous single alphanumeric character

regex.test(string) // true

string.match(regex) // ['200']

In the above we match with one or more alphanumeric characters, this ends at the first word because we didn’t use the global flag “g” to check for multiple cases of this.

If we do, we get:

let string = "200 bags of rice"

let regex = /\w+/g // Regex for multiple alphanumeric characters

regex.test(string) // true

string.match(regex) // [ '200', 'bags', 'of', 'rice' ]

Shorthand to match all numeric characters

We also have a shorthand for matching only numeric characters, let’s see an example:

let numberString = "1245443"

let regex = /\d/

regex.test(numberString) // true

numberString.match(regex) // ['1']

As you saw previously, without the “+” quantifier and global flag “g”, we would only get back an individual number as we only checked for a single number.

Let’s add the “+” quantifier:

let numberString = "1245443"

let regex = /\d+/ // Regex for multiple numbers(one or more numbers with no spaces)

regex.test(numberString) // true

numberString.match(regex) // ['1245443']

What if we mixed some letters into the Regex?

let numberString = "12d4d5f44f3"

let regex = /\d+/

regex.test(numberString) // true

numberString.match(regex) // ['12']

Here, the letter “d” disrupts the numbers in the string and our regex is only looking for a single or repeating set of numbers using the “+” quantifier.

Let’s add the global flag:

let numberString = "12d4d5f44f3"

let regex = /\d+/g

regex.test(numberString) // true

numberString.match(regex) // [ '12', '4', '5', '44', '3' ]

Now that we use the global flag “g” we continue matching any character that fits our regular expression excluding the alphabets present.

Matching non-alphanumeric characters

We use the /\W/ regex to match all non-alphanumeric characters:

let charString= "!@#$%^&*()_-+=~`` "

let regex = /\W+/g // Regex for all non-alphanumeric characters

regex.test(charString) // true

charString.match(regex) // [ '!@#$%^&*()', '-+=~`` ' ]

We get back only character symbols and spaces or whitespace. With this shorthand regex we don’t get back alphabets or numbers or underscore( remember that the regular /\w/ matches with an underscore).

We can get back the same results with:

let charString = "!@#$%^&*()_-+=~`` "

let regex = /[^a-zA-Z0-9_]+/g

regex.test(charString) // true

charString.match(regex) // [ '!@#$%^&*()', '-+=~`` ' ]

I also added an underscore to the regex to get the same behavior as the non-alphanumeric regex(/\W+/g).

Matching non-numeric characters

We can match all non-numeric characters using /\D/:

let numberString = "1dfdf24d5f44dfdfff3" // mixed numbers and letters

let regex = /\D+/g // Regex for multiple non-numeric characters

regex.test(numberString) // true

numberString.match(regex) // [ 'dfdf', 'd', 'f', 'dfdfff' ]

We get back all the non-numeric characters, remember in the example we use the global flag “g” and the “+” quantifier, we can also match symbols.

let numberString = "1d!@$%$^%^fdf24d5f44dfdfff3"

let regex = /\D+/g

regex.test(numberString) // true

numberString.match(regex)// [ 'd!@$%$^%^fdf', 'd', 'f', 'dfdfff' ]

Find the beginning or ending of a word

We can find the beginning and end of a word(or group of characters) with the “^” quantifier and “$” quantifier.

Let’s match the beginning of a word

let string = "Over the moon"

let regex = /^o/i 

regex.test(string) // true

string.match(regex) // [ 'O' ]

we get back the first letter of the string using the “^” quantifier.

Though this example doesn’t show us how powerful the “^” quantifier is, let’s try another one.

let string = "Over the moon"

let regex = /^\w+/i 

regex.test(string) // true

string.match(regex) // ['Over']

This time we get back the entire word, this is because we used the regex shorthand for alphanumeric characters and the “+” quantifier to get back one or more of the beginning string that match as an alphanumeric character(except spaces but including an underscore)

Let’s see how the “$” quantifier works.

Example:

let string= "I will solve 5 leet code problems a day"

let regex = /day$/g // Regex to match the last letter of the string ending in y and beginning in d and a

regex.test(string) // true

string.match(regex) // [ 'day' ]

In the example above, the regex that was set up will match the string that ends in the letter “y” and has “da” starting it.

You might be confused as to why I’ve explained it this way but we can test this statement by simply adding a space at the end of the “y” letter in “day” at the end of the string.

let string = "I will solve 5 leet code problems a day "

let regex = /day$/g // Regex to find last string "y" starting with "da"

regex.test(string) // false

string.match(regex) // null

The above regex will not match the string anymore as we aren’t looking for a space that begins with “day”, if we did:

let string = "I will solve 5 leet code problems a day "

let regex = /day $/g // Regex to find last string " " starting with "day"

regex.test(string) // true

string.match(regex) // [ 'day ' ]

We’d get a match.

Matching spaces using the /\s/ shorthand

We can match spaces using a shorthand too, lets try to match the last space from the previous example:

let string = "I will solve 5 leet code problems a day "

let regex = /\s$/g // Regex to find last string "y" starting with "da"

regex.test(string) // true

string.match(regex) // [' ']

We get back an empty array string because we matched the last empty space with both the whitespace shorthand regex and the “$” quantifier for matching the last character or letter.

Matching non-whitespace characters with a shorthand

There’s also a regex for matching non-whitespace characters(/\S/):

let string = "Just a random set of strings with white-spaces"

let regex = /\S+/g // Regex to match all non-whitespace characters

regex.test(string) // true

string.match(regex) // [ 'Just', 'a', 'random', 'set', 'of', 'strings', 'with', 'white-spaces' ]

We get back all the non-whitespace characters.

Matching an exact number of characters

There is a way to write a regex for an exact number of characters. using {n,} quantifier, n being a number.

Example:

let string = "The weather is so ccccoooollld!"

let regex = /c{1,4}o{1,}l+d/g // Regex for a "c" appearing from 1 to 4 times, to an "o" appearing atleast once or more, one or more "l" and a "d"

regex.test(string) // true

string.match(regex) // ['ccccoooollld']

We used previous quantifiers to create the regex above but also demonstrated matching an exact number of characters with the letter c in our regex having the quantifier {1,4} which means we want the string that has 1 to 4 c’s and then another shorthand quantifier for checking for atleast one or more letter o’s with {1,}.

Matching one or another set of characters

What if we wanted to match a big range of characters that we cannot set a Character Class for?

We’d use a capture group “()”

let string = "You could say Caroline or Carolin"

let regex = /c(arolin)e?/gi // Regex for both

regex.test(string) // true

string.match(regex) // [ 'Caroline', 'Carolin' ]

We’d get the multiple cases.

Using a “|” to look for one or another

We can use the “|” character to look for different strings.

Example:

let string = "yes or no"

let regex = /(yes|no)/g // Regex with global flag "g" to match more than one set of characters in the capture group

regex.test(string) // true

string.match(regex) // [ 'yes', 'no' ]

Congratulations

Congratulations, you should be somewhat familiar with basic Regex with this knowledge, there are more things to know so I’ll be dropping a reference to learn more about them.

Regexr

JavaScript.info

Javapoint

freeCodeCamp