# Regular Expressions in Notion Formulas

Learn how to use regular expressions in Notion's test(), replace(), and replaceAll() functions.
Notion supports the use of regular expressions in three functions:
• test - tests whether a string contains a regular expression. Outputs a Boolean true or false value.
• replace - matches a single instance of a regular expression within a string and replaces it with a specified replacement string.
• replaceAll - matches all instances of a regular expression and replaces them with a specified replacement string.
By using regular expressions within these functions, it is possible to do many kinds of string manipulation within Notion formulas.

## What is a Regular Expression?

A regular expression (often called a regex) is simply a set of instructions that tells a regular expression engine how to search through an input string in order to find one or more matches.
Regular expressions can often look very complex:
1
replace("Bruce Thomas Wayne", "^[-\\w]+\\b\\s?(.*)\\b\\s[-\\w]+\$", "\$1")
2
// Output: Thomas
However, they can also be very simple. This is also a regular expression:
1
replace("My cat is cute", "cat", "dog")
2
// Output: My dog is cute
This expression would tell the regex engine to search the input string for a sub-string that matches “cat". The replace function then replaces it with "dog".
A simple match like this could also be found using the contains function:
1
contains("My cat is cute","cat")
2
// Output: true
But what if you need to be more flexible with your search criteria?
Take this problem, for instance: Which of these strings contains the word “cat”?
• I have six cats.
• My dog catches fish.
• Cat food is expensive.
Here, `contains()` runs into trouble (see this in an example database):
1
contains("I have six cats.","cat") // true
2
3
contains("My dog catches fish.","cat") // true, should be false
4
5
contains("Cat food is expensive","cat") // false, should be true
`contains()` gets the first one right, but fails on the other two.
On the second string, it sees “cat” inside the word “catches”. On the third string, it fails to see “Cats” because `contains()` is case-sensitive.
This is where a regular expression can help us!
Regular expressions let us define character groups, make characters optional, check for word boundaries, and so much more.
Here’s how you could check all three of these strings correctly using the test function:
1
test("I have six cats.", "\\b[Cc]ats?\\b") // true
2
3
test("My dog catches fish.", "\\b[Cc]ats?\\b") // false
4
5
test("Cat food is expensive", "\\b[Cc]ats?\\b") // true
The regular expression we’re checking for here is `\\b[Cc]ats?\\b`. Let’s break it down:
• `\\b` is a special character that translates to “word boundary”. It’s not a space character; it’s the boundary between a word character and a non-word character. In Notion, word characters include `A-z`, `0-9`, and `_`.
• `[Cc]` is a character class. The brackets `[]` define a group of characters (`C` and `c`), and the regex engine will try to match any one of them. This allows us to check for both “Cat” and “cat”.
• `?` denotes that the preceding character is optional. It can appear zero or one times in the match. Since the `s` precedes it, this allows us to include the plural “cats” as well as the singular “cat”.
Breaking all this down to plain English, our regular expression is essentially saying:
Match any of the words “Cat, cat, Cats, or cats”.
Doing this with `contains()` would be really inefficient. You’d need to string together many, many `contains()` instances using or clauses in order to account for the many variables – not just plurality and capitalization, but word boundaries as well!
By giving us special characters to work with, regular expressions essentially give us a new language that we can use to define exactly what we’re looking for in the input string.
Once we’ve got our match (or matches), we can use Notion’s test, replace, and replaceAll functions to do incredibly useful things with them.

## Learn Regular Expressions

This page isn’t intended to be a full tutorial on writing regular expressions. It’s a reference on how to use them in Notion formulas, and on what particular regex characters are supported in Notion.
If you’re interested in learning regular expressions, here are a few resources I recommend:
When using Regex101, note that Notion requires double backslashes `\\` to escape characters, while Regex101 (and most regex engines) only require a single backslash `\`. The expression at https://regex101.com/r/WffTEp/1 would need to be written as `^[-\\w]+\\b\\s?(.*)\\b\\s[-\\w]+\$` in Notion.

## Supported Regular Expression Characters

This section includes one or more examples for every regular expression feature supported within Notion’s formula editor.
See working examples for all of these features here:

### Character Escapes

Working examples:

#### `\\u0000` - escaped Unicode reference

Note: Only works in the regular expression argument. Unicode characters can be typed elsewhere with a single backslash `\` (e.g. `\u0041`), but they will be automatically converted to their corresponding character in the formula editor upon exiting it.
1
test("A","\\u0041") // Output: True

#### `\\000` - octal character reference

Note: Only works in the regular expression argument. Doesn’t work in the input string or replacement string arguments.
1
test("A","\\101") // Output: True

#### `\\x00` - hexadecimal character reference

Note: Only works in the regular expression argument. Doesn’t work in the input string or replacement string arguments.
1
test("A","\\x41") // Output: True
You can find a list of all Unicode, octal, and hexadecimal reference codes here:

#### `\\n` - new line

Note that `\n` is used in the input string and replacement arguments, but `\\n` must be used in the regular expression.
1
replaceAll("Apple\nBanana\nOrange", "\\n", "\n\n")
2
/* Output:
3
Apple
4
5
Banana
6
7
Orange
There are also several characters that must escaped with double backslashes (`\\`) in order to be represented normally. These characters are used as special characters within regular expressions if they are not escaped.
Character
Escape
Period - `.`
`\\.`
Question mark - `?`
`\\?`
Dollar sign - `\$`
`\\\$`
Asterisk - `*`
`\\*`
Plus sign - `+`
`\\+`
Caret - `^`
`\\^`
Left parenthesis - `(`
`\\(`
Right parenthesis - `)`
`\\)`
Left bracket - `[`
`\\[`
Right bracket - `]`
`\\]`
Left curly brace - `{`
`\\{`
Right curly brace - `}`
`\\}`
Pipe - `|`
`\\|`
Forward slash - `/`
`\\/`
Backslash `\`
`\\\\`
Single quotations (`'`) and double quotations (`"`) must also be escaped, but they can only be escaped by escaping their unicode number. See the section below on Escaping Unicode Numbers.

### Character Classes

Working examples:

#### `\\w` - alphanumeric character

Notion considers non-spacing marks to be non-alphanumeric characters. Other regex engines (.NET, for example) do the opposite.
 Lowercase letters `a-z` Uppercase letters `A-Z` Numbers `0-9` Punctuation, Connector symbols Notion only supports `_`
1
replaceAll("CAPS_nocap 12345", "\\w", "*")
2
// Output: ********** *****

#### `\\W` - non-alphanumeric character

1
replaceAll("correct horse battery staple", "\\W", "")
2
// Output: correcthorsebatterystaple

#### `\\d` - digit character (0-9)

1
replaceAll(id(), "\\d", "")
2

#### `\\D` - non-digit character

1
replaceAll(id(), "\\D", "")
2
// Output: 6434594233439728 (where ID is ccad6aec4dd34e5a942334bae3e9b728)

#### `\\s` - whitespace character

1
replaceAll("charmander man bun", "\\sman\\s", " ")
2
// Output: charmander bun

#### `\\S` - non-whitespace character

1
replaceAll("charmander man bun", "\\Sman\\S", "ndl")
2
// Output: chandler man bun

#### `.` - wildcard; matches any single character except newline (\\n)

1
replaceAll("And blimey, if it ain't mutton again today!", ".", "😡")
2
// Output: 😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡
3
4
// Include newlines using (.|\\n)
5
// Assume prop "TwoLines" contains:
6
// And blimey,
7
// if it ain't
8
// mutton again today!
9
replaceAll(prop("TwoLines"),"(.|\\n)","😡")
10
// Output: 😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡😡

#### `[]` - character class (matches any single character included in the group)

Character classes support ranges such as `A-Z` (all uppercase character), `A-z` (all upper and lowercase characters), and `0-9` (all digits).
Commas may also be used to visually separate ranges and characters in your expression, but they are not needed.
1
replaceAll("gold fold bold","[gfb]","t")
2
// Output: told told told
3
4
replaceAll("27 dresses","[a-z]","👗")
5
// Output: 27 👗👗👗👗👗👗👗
6
7
replaceAll("abcdefghijklmnopqrstuvwxyz123456789", "[a-ev-z1357-9]", "🙄")
8
// With commas:
9
replaceAll("abcdefghijklmnopqrstuvwxyz123456789", "[a-e,v-z,1,3,5,7-9]", "🙄")
10
// Output: 🙄🙄🙄🙄🙄fghijklmnopqrstu🙄🙄🙄🙄🙄🙄2🙄4🙄6🙄🙄🙄
Character class subtraction is not supported.

#### `[^]` - negated character class (matches any single character not included in the group)

1
replaceAll("123456789abcdefghijklmnopqrstuvwxyz", "[^a-z]", "")
2
// Output: abcdefghijklmnopqrstuvwxyz
3
4
replaceAll("abcdefghijklmnopqrstuvwxyz123456789", "[^cow]", "")
5
// Output: cow ("c","o","w" are in alphabetical order naturally. The character class doesn't specificy order.)

### Quantifiers

Working examples:

#### `*` - match zero or more of the preceding element

1
replaceAll("Trs Tres Trees Treeeeees", "Tre*s", "🌳")
2
// Output: 🌳 🌳 🌳 🌳
3
4
replaceAll("Trs Tres Trees Treees Treeees", "Tr(ee)*s", "🌳")
5
// Output: 🌳 Tres 🌳 Treees 🌳

#### `+` - match one or more of the preceding element

1
replaceAll("Trs Tres Trees Treeeeees", "Tre+s", "🌳")
2
// Output: Trs 🌳 🌳 🌳
3
4
replaceAll("Trs Tres Trees Treees Treeees", "Tr(ee)+s", "🌳")
5
// Output: Trs Tres 🌳 Treees 🌳

#### `?` - preceding element is optional; match it zero or one times

1
replaceAll("Trs Tres Trees Treeeeees", "Tre?s", "🌳")
2
// Output: 🌳 🌳 Trees Treeeeees
3
4
replaceAll("Trs Tres Trees Treees Treeees","Tr(ee)?s","🌳")
5
// Output: 🌳 Tres 🌳 Treees Treeees

#### `??` - match preceding element zero or one times (as few times as possible)

1
replaceAll("Trs Tres Trees Treees Treeees", "Tre??s", "🌳")
2
// Output: 🌳 🌳 Trees Treees Treeees
3
4
replaceAll("Trs Tres Trees Treees Treeees", "Tr(ee)??s", "🌳")
5
// Output: 🌳 Tres 🌳 Treees Treeees

#### `+?` - match preceding element one or more times (as few times as possible)

1
replace("Tree", "Tre+?", "🌳")
2
// Output: 🌳e

#### `*?` - match preceding element zero or more times (as few times as possible)

1
replace("Heeeeeeelp", "H.*?", "*")
2
// Output: *eeeeeeelp
3
4
replace("Heeeeeeelp", "H.*?l", "*")
5
// Output: *p

#### `{n}` - match the preceding element n times

1
replace("Heeeeeeelp", "e{7}", "*")
2
// Output: H*lp

#### `{n,}` - match the preceding element n or more times

1
replace("Heeeeeeelp", "e{1,}", "*")
2
// Output: H*lp

#### `{n,m}` - match the preceding element between n and m times (inclusive)

1
replace("Heeeeeeelp", "e{1,6}", "*")
2
// Output: H*elp

#### `{n}?` - match the preceding element n times (no difference from `{n}`

1
replace("Heeeeeeelp", "e{7}?", "*")
2
// Output: H*lp

#### `{n,}?` - match the preceding element at least n times, but as few times as possible

1
replace("Heeeeeeelp", "e{1,}?", "*")
2
// Output: H*eeeeeelp

#### `{n,m}?` - match the preceding element at least n times, no more than m times, and as few times as possible

1
replace("Heeeeeeelp", "e{1,6}?", "*")
2
// Output: H*eeeeeelp

### Anchors

Working examples:

#### `^` - start of line

1
replace("dogs dogs dogs", "^dogs", "cats")
2
// Output: cats dogs dogs

#### `\$` - end of line

1
replace("dogs dogs dogs", "dogs\$", "cats")
2
// Output: dogs dogs cats

#### `\\b` - word boundary

This is not a space (that’s `\\s`). This is the boundary between a word character and a non-word character (including punctuation such as `,`, `.`, `;`, etc.). It is a zero-length match.
1
replaceAll("martini art artist", "\\bart", "!!!")
2
// Output: martini !!! !!!ist
3
4
replaceAll("martini art artist", "\\bart\\b", "!!!")
5
// Output: martini !!! artist

#### `\\B` - not on a word boundary

1
replaceAll("martini art artist", "\\Bart", "!!!")
2
// Output: m!!!ini art artist

### Character Grouping

Learn about character grouping and capturing:
Working examples:

#### `()` - capture group (is automatically assigned a sequential reference number)

1
replace("Dog Blog", "(Dog) Blog", "\$1")
2
// Output: Dog
3
4
replace("Dog Blog", "(Dog) (Blog)", "\$2")
5
// Output: Blog
6
7
replaceAll("Jack Sparrow", "^(\\w+)\\b.*", "\$1")
8
// Output: Jack

#### `(?<name>expression)` - named capture group

1
replace("Jack plays poker", "(?<sup>J).*(oker)", "\$<sup>\$2")
2
// Output: Joker
3
4
// Named capture groups are still assigned their sequential number
5
// Note the output when I reference "\$<sup>\$1" instead of "\$<sup>\$2"
6
replace("Jack plays poker", "(?<sup>J).*(oker)", "\$<sup>\$1")
7
// Output: JJ

#### `(?:)` - non-capturing group

1
replace("Jack", "(?:Jack)", "\$1")
2
// Output: \$1
3
4
replace("123", "(\\w)(?:\\w)(\\w)", "\$2")
5
// Output: 3

### Substitutions

Working examples:

#### `\$n` - capture group numbers

1
replace("Dog Blog", "(Dog) Blog", "\$1")
2
// Output: Dog
3
4
replace("Dog Blog", "(Dog) (Blog)", "\$2")
5
// Output: Blog

#### `\$&` - copy of the whole match

1
replaceAll("Hello", ".*", "\$& \$& \$&")
2
// Output: Hello Hello Hello
3
4
replaceAll("I sell pan and pan accessories", "pan", "pro\$&e")
5
// Output: I sell propane and propane accessories

1
2

#### `\$'` - copy of the entire input string after the match

1
2
// Output: boyboy

### Backreferences

Working examples:

#### `\\n` - e.g. `\\1` - backreference. Must match an existing capture group

Note that the backreference looks for matches of the content of its capture group. It is not an alias for the expression within the capture group.
1
replace("12-12-12", "([0-9]+)-\\1-\\1", "Success")
2
// Output: Success
3
4
replace("12-34-56", "([0-9]+)-\\1-\\1", "Success")
5
// Output: 12-34-56
6
7
replace("I have 56 apples, 35 bananas, and 35 grapes.", ".*(56).*(35).*\\2.*", "Success")
8
// Output: Success

#### `(?<name>\\w) \\k<name>` - named backreference

1
replace("I have 56 apples, 35 bananas, and 35 grapes.", ".*(56).*(?<two>35).*\\k<two>.*", "Success")
2
// Output: Success
3
4
// Named backreferences can still be called with their sequential number
5
replace("I have 56 apples, 35 bananas, and 35 grapes.", ".*(56).*(?<two>35).*\\2.*", "Success")
6
// Output: Success

### Alternation

Working examples:

#### `|` - either/or

1
replaceAll("jpg, jpeg, png, gif, wav", "jpg|jpeg|png|gif", "picture")
2
// Output: picture, picture, picture, picture, wav
3
4
// Order matters!
5
6
replace("mould", "ou|o", "😀")
7
// Output: m😀ld
8
9
replace("mould", "o|ou", "😀")
10
// Output: m😀uld
11
12
// Alternation can also be done inside groups:
13
14
replaceAll("My name is Bruce Wayne", "(Bruce|Wayne)", "*****")
15
// Output: My name is ***** *****

### Unsupported Features

The following features are currently not supported in Notion's flavor of regex:
• `\\A`
• `\\z`
• `\\Z`
• `\\p{name}`
• `\\P{name}`
• `\$+`
• `\$_`
• `(?>*subexpression*)`
• `(?(expression) yes | no)`
• Lookarounds are not fully supported due to lack of support in all variants of Safari. Not recommended to use them in your formulas.
• Flags/modifiers are not supported in Notion at all (which often makes case-insensitive matching very tedious)

## Unicode Numbers in Regular Expressions

When writing regular expressions in Notion formulas, it is possible to “hard code” Unicode numbers into your expression. The regex engine will then parse these as their actual Unicode characters. (Thanks to Ben Borowski for pointing this out to me).
To do so, use double-backslashes `\\` to escape the Unicode reference:
1
// \\u0027 escapes apostrophes.
2
// \\u2018 and \\u2019 handle left and right single quotes.
3
replaceAll("Mike 'Iron Mike' Tyson", "[\\u0027\\u2018\\u2019]","🥊")
4
// Output: Mike 🥊Iron Mike🥊 Tyson
See a working example of this:
It is also possible to use octal or hex codes here. For example:
1
test("A", "\\x41") // Output: true
2
3
test("A", "\\101") // Output: true
These `\\` Unicode references don’t work in input string argument, nor the replacement argument within replace and replaceAll. They’ll only be interpreted correctly within the regular expression argument.
If you type a Unicode character’s code anywhere within a Notion formula (besides a regular expression argument) using only a single backslash `\`, it’ll automatically be transformed into that character – within the formula itself (this does not work for hex and octal codes).
1
"\u0041" is automatically turned into "A"

### Escaping Double Quotations

You can usually escape a double quotation `"` in a Notion formula using a single backslash `\`:
1
"Mike \"Iron Mike\" Tyson"
2
// Output: Mike "Iron Mike" Tyson
This also works in the input-string and replacement arguments within Notion’s regular expression functions:
1
replace("Mike \"Iron Mike\" Tyson", "^(\\w+)\\b","\"\$1\"")
2
// Output: "Mike" "Iron Mike" Tyson
However, this does not work inside of regular expressions – i.e., the second argument of the test, replace, and replaceAll functions.
Fortunately, you can get around this by hard-coding their Unicode character codes into your expression:
• `\\u0022` for a normal quotation mark
• `\\u201c` for a left double quotation mark
• `\\u201d` for a right double quotation mark
For example, here’s how you could extract `"Iron Mike"` from `Mike "Iron Mike" Tyson`:
1
replace("Mike \"Iron Mike\" Tyson", ".*([\\u0022\\u201c\\u201d][^\\u0022\\u201c\\u201d]+[\\u0022\\u201c\\u201d]).*", "\$1")
2
// Output: "Iron Mike"
See a working example of this:
It’s best to ensure your character class includes all three common quotation marks: `[\\u0022\\u201c\\u201d]`
When you type a quotation mark directly into the formula editor, you’ll get a normal quotation mark `"` – however, when you type within text fields inside a Notion database, Notion intelligently uses left `“` and right `”` quotation marks to wrap your text.
In the example above, I hard-coded `Mike \"Iron Mike\" Tyson` within the formula editor. However, if that string had been pulled in via another property (e.g. `prop("Name")`), then it would likely be using left `“` and right `”` quotation marks.