Regular expression or regex is a way to match complex patterns that cannot be found with a string match. The platform allows for regex patterns to be utilized in on demand filtering, as well as patterns that can be set for analytics analysis with Keyword Intent & Content Types.
This should provide a general understanding of what a Regular Expression or RegEx Pattern is, how it works, and basic functionality such as how to include and exclude terms in a string. RegEx patterns are created to seek out and find a string or set of strings in text. When searching for specific strings, there are elements that need to be applied or left out in order to find exactly what is being looked for in a large amount of text. After exceeding the limit of 250 characters per pattern, the platform will truncate the strings to only include the first 250 characters. The remaining patterns can then easily be added as additional filters.
Filtering with Regex
Different syntaxes are used depending on the underlying infrastructure. For the purposes of using regex filtering in the platform, we will focus on RE2 syntax as outlined here: https://github.com/google/re2/wiki/Syntax. Some of these will overlap with other syntax, such as PERL. Lookbehind assertions are not currently supported. Many features allow for the utilization of regex patterns when filtering including:
- Rank Intelligence: Keyword and Ranking URL
- Research Grid: Keyword and URL
- Answer Box Opportunity: Keyword and URL in Answer Box
- Content Ideas: Ideas
- Search Analytics: Keyword and Ranking URL
Keyword Intent & Content Types
Regex patterns can be used to setup Keyword Intent & Content Type filters. These filters are available in many of the features within the platform.
Regex Pattern Elements
- Is
- Isn't
- Starts With
- Ends With
- Contains
1. Is
First, there is an option to apply RegEx patterns by using a colon (:) and a dollar sign ($). By placing a ‘:’ at the front of the string and a dollar sign at the end, the results will include only what you have input. For example, if looking for a specific term or terms, a colon will need to be put in front of one or all terms and example of what seoClarity would need to put into a RegEx string is ‘:seoClarity$’.
2. Isn't
There is also the ability to exclude strings by using an exclamation mark (!). By placing an ‘!’ in the beginning of the string, the results will exclude what you have input. For example, if looking for a specific term or terms, an exclamation point will need to be put in front of one or all terms, ‘!seoclarity’, will exclude those terms that include “seoclarity” and only other terms will appear.
3. Starts With
If looking for a sentence or a line that begins with a certain letter or word, a colon and then a caret (:^) in the beginning of the RegEx pattern would be necessary in order for you to see the results you need. Inserting a ‘:^’ before ‘seoclarity,’ ‘:^seoclarity’ would return any line or sentence that begins with ‘seoclarity’. If the starts with pattern is being applied to a URL, the protocol and domain should be included.
4. Ends with
At times, you will want to have results where you do not want anything after a certain word in the text included in the results. This is when using a dollar sign ($) at the end of a RegEx pattern is necessary. A ‘$’ signifies the end of your search. By putting a ‘$’ at the end of a string, the results returned will contain everything up to that ‘$’ in the string. For example, if someone at seoClarity was looking for any sentence that ended with ‘seoClarity,’ then they would create a string that has ‘seoClarity$’ included and it would return a result that looks like ‘here at seoClarity.’
5. Contains
When searching for text containing any number of a specific character, you will want to use an asterisk (*) after the character that you would like to find. For example, if you were looking for all of the Qs in your text, you will need to put ‘Q*’ and in the returned results you will see something along the lines of ‘QQQQ’.