The Site Audit Projects List gives you a high level view of the different crawls that have been setup for the domain. Watch the video below: "How to Create a Clarity Audit Project"
Some sites require a bot to be added to an allow list prior to crawling. You can choose between seoClarity's Desktop or Mobile User Agent or Google's Desktop and Smartphone User agent. Other user agents can be specified when setting up the crawl as well. Learn More
Project Options
Project Name: Selecting the project name will navigate to the Site Audit page for that project.
Pencil (Edit): This allows for project names to be renamed.
Trash (Delete): This will remove the crawl project and all crawls within the project.
Gear (Settings): This will display the starting URL, depth, speed and exclusions for the project.
How To Setup A New Site Audit Project
The New Site Audit button brings up a popup allowing for a new project and crawl to be setup or to run a crawl within an existing project.
Basic Settings tab
The bare essential information needed to initiate a new Site Audit can be found in this tab.
Project Type: Select Existing Project if you want to re-use a previously setup project including the same custom settings or New Project to setup a fresh crawl with no inherited settings.
Project Name: Selecting an existing project will display that project's name or create a new project by specifying a name in the text field.
Choose what to crawl: Crawls can be based on a specific URL, sitemap(s), an RSS Feed, or an upload CSV list.
Starting URL: Select the protocol (http or https) and input the URL where the crawl should begin. Subdomains are allowed in the Starting URL field when match type Broad Match is selected in the Domain Settings, Ranking Configuration. A crawl can be started from any of a domain's subdomains as long as the root domain is the same.
Sitemap(s): Select the protocol (http or https) and input the URL where the sitemap is located.
RSS: Select the protocol (http or https) and input the URL where the RSS feed is located.
Upload CSV: If you already know what URLs you want to crawl, place them in a column list of the URLs in a .csv format to upload.
Crawler Type: The Standard Crawl is the most common crawler and functions like most crawlers out there. The Javascript Enabled crawl renders JS when crawling similar to how a browser would.

Advanced: Limit the number of pages crawled per day
Crawl Depth: Custom is the number of links (levels) away from the starting URL the crawl will look for pages. Full Site Crawl will crawl all URLs found for that domain (depending on configuration, this could take a significant amount of time). Crawl only pages uploaded/found will crawl just the URLs that are specified.
Description: This optional text field allows for any additional notes to be entered related to the crawl project.
Advanced Setting tab
Configure commonly used advanced crawl options in this tab.
User Agent: A custom user agent can be set here. By default or if left blank, 'ClarityBot' will be used. This can require the bot to be added to an allow list for some domains, so that our bot is not blocked from crawling the site.
User Agent Options:
Google Desktop - Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
Google Mobile - Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
ClarityBot - Mozilla/5.0 (compatible; ClarityBot/9.0; +https://www.seoclarity.net/bot.html)
Obey Robots.txt: If no is selected, the crawl bypasses the settings in the robots.txt file of the site to be crawled. By default, the crawler obeys the robots protocol.
Store Blocked Links: If, yes is selected, the crawler stores the links blocked by robots.txt. By default, this is set to no.
Enable Cookies: Enabling this option tells the crawler to keep track of cookies sent by web servers, and then send them back on subsequent requests. This is typically used to crawl sites that redirect based on persisting cookies.
Select Region: Optional setting to crawl from a location closer to where the site is hosted.
Link Parameter Handling:
Enter URL parameter(s) to remove: Enter URLs parameters to remove automatically when crawling in a comma separated format. Enter a * (asterisk) to remove all URL parameters on discovered URLs before attempting to crawl them. URL parameters that don't change the content of your pages can interfere with efficient site crawls as they result in the same page content being available via multiple unique URLs.
Internal Links Analysis: Enabling this checkbox option will crawl the Internal Links found on the page.
HREFLang Crawl: By default, all hreflang found while crawling is captured and displayed in the Hreflang Audit tab of Site Health. Enabling this checkbox option will crawl rel="alternate" hreflang URLs. If enabled these URLs can also be crawled if the Validate option is enabled.
Canonical Crawl: By default, all canonicals found while crawling is captured and displayed in the Canonical Audit tab of Site Health. Enabling this checkbox option will crawl all canonical URLs.
Crawling Rules tab
Customize what pages are crawled and how query parameters found in URLs are handled in this tab.
Domain Crawling Rules: Enter one string match pattern per line to allow or deny domains from being crawled.
Link Crawling Rules:
URL pattern(s) to allow: You can enter a regex pattern in here which when found in the URL will include it in the list of URLs to be crawled and followed further. The URL Pattern to disallow take precedence over any patterns entered here.
URL pattern(s) to disallow: You can enter a regex pattern in here which when found in the URL will automatically cause it to be excluded from being crawled and followed further. These patterns take precedence over any URL patterns specified in the allow field above.
URLs to crawl but not index: If URLs match the regex pattern specified here, the URLs will be crawled and new links discovered from the same, but the content of the URL itself will not be crawled and indexed.
URLs to index but not crawl:If URLs match the regex pattern specified here, the content of the URL will be indexed but none of the links found on the page will be followed or added to the crawl list.
Link Discovery: Enter one string match pattern per line to restrict the region from where links should be found and crawled.
Restrict to Xpath: Specify an XPath (or list of XPath's) which defines regions inside the response where links should be extracted form. If given, only the content selected by those Xpath will be scanned for links.
Restrict to CSS: Specify a CSS selector (or list of selectors) which defines regions inside the URL being crawled from where links should be extracted. Has the same behaviour as restrict Xpath.
Configure any additional content or custom search for a page to be crawled and stored for analysis.
Content extraction: If there is additional content that should be crawled beyond the standard HTML elements (Title, Meta Description, H1, H2) input it here. More additional content can be specified via the Content Extraction button. If the same custom content element specified exists multiple times on a page, only the first instance will be retrieved.
XPATH: Enter the specific XPATH found on pages that you would like the crawler to retrieve and analyze. Make sure to specify a XPATH that uniquely identifies the content you want to retrieve.
CSS: Enter the specific CSS found on pages that you would like the crawler to retrieve and analyze. Make sure to specify CSS that uniquely identifies the content you want to retrieve.
DIV_ID: Enter the specific Div ID found on pages that you would like the crawler to retrieve and analyze. Make sure to specify Div ID that uniquely identifies the content you want to retrieve.
DIV_CLASS: Enter the specific Div class found on pages that you would like the crawler to retrieve and analyze. Make sure to specify Div class that uniquely identifies the content you want to retrieve.
Content Match: This section allows you to capture and display pages in the Custom Search Tab in Site Health based on the input entered here. There are 3 options:
Contains: This would capture the pages and the count of occurrences per page that matches the string entered.
Does Not Contain: This returns the pages that do not contain the string entered.
Regex: This returns the pages and count of occurrences based on the regex pattern entered here.
Start Audit tab
Choose when to start the Audit.
Frequency: Choose to run a one time crawl or a Weekly Recurring, Bi-Weekly Recurring, or Monthly Recurring crawl. Choosing a recurring crawl will allow you to select how many months you want to schedule the recurring crawls for.
Launch Crawl: Selection for when the crawl should launch
Start Now: The crawl queues up and begins crawling shortly after start site audit is selected.
Start Later: Use this option to schedule a crawl to start at a later date and time.
Start Date: Select the date and time to schedule the crawl.
Schedule Interval: Scheduling allows full control over the hours of the day and the days of the week that the crawler should run. This can be used to ensure that crawl activity takes place only during off-peak hours or during times of low server load. This takes precedence over the Launch Crawl settings.
Pausing Crawls: Crawls can be temporarily paused. After 7 days, paused crawls are automatically stopped.