How do recurring crawls work?

How do recurring crawls work?

The Recurring Crawl mechanism is designed to create a trend of crawl data so you can run and compare crawls within a project. Site Audits allows you to set up crawls that can be scheduled to run Weekly, Bi Weekly or Monthly, from 1 - 6 months. Recurring Crawls can be set up by updating the Frequency while setting up a new crawl via the Start Audit Tab. It can also be set up for any existing project by updating the crawl settings.  




Discovery
When setting up a crawl that starts from a "Starting Url" the discovery options are as follows:
- "Discover and crawl new pages with every recurring crawl" - This tells the crawler to crawl the 200 OK pages of the last successful crawl of the project along with any new pages found. 
- "Crawl only pages discovered in the last successful crawl" - This tells the crawler to crawl ONLY the 200 OK pages of the last successful crawl. 
- "Run a fresh crawl with every recurring crawl" - As the option suggests this tells the crawler to start a fresh crawl with every recurring crawl in a project. 



For Sitemap or CSV Crawls, the fresh crawl option is replaced with  "Crawl Sitemap(s)/RSS/Uploaded CSV with every recurring crawl". This option allows for each crawl within a project to crawl the csv/sitemap that was initially uploaded. 

















Depth Calculation for Crawls from a starting URL: 
The first crawl within a project starts from the URL specified, traverses the site and stores the depth of the pages as it finds them. The page specified as the starting url is considered as depth 0 and all pages found on depth 0 are considered as depth 1, etc. For recurring crawls that depend on the 200 OK pages of the last successful crawl, the depth of each crawl is stored and used as a starting point for the depth calculation of the next crawl. For example, if Page X of a crawl has depth of 5 then any pages found on Page X will have a depth of 6. If a page is found while crawling on multiple depths, the lowest depth (or min depth) of the page is stored as the depth of the page.  
Warning
It is recommended to choose the discovery option  for "Run a fresh crawl with every recurring crawl" if there are frequent changes in your site's architecture or if you make changes to the crawl settings before running successive crawls within a project. 

Depth Calculation for Crawls set up with to start and run from a CSV/ Sitemap:
All pages uploaded in a csv or found in the sitemap are stored as depth 0. Any pages found while crawling the pages on the sitemap/csv are calculated as depth 1 and so on.

Still have questions? 
We are happy to help. Please reach out to support@seoclarity.net

    • Related Articles

    • Customize Resources Crawled in JavaScript Crawls

      Running Javascript-enabled Site Audits requires each page to be rendered before crawling. This includes loading all the resources (CSS, JavaScript etc) on the page. The seoClarity Clarity Audit crawler is configured to automatically block some common ...
    • How do I request and schedule crawls?

      A crawl can be requested within the platform via the Crawl Request button located on the Site Audits page. This allows for customizations such as crawl speed, depth, exclusions, and inclusions for specific content blocks. The platform allows for ...
    • Site Audit Projects

      Site Audit Projects Overview The Site Audit Projects List gives you a high level view of the different crawls that have been setup for the domain. Watch the video below: "How to Create a Clarity Audit Project" Background & Requirements Some sites ...
    • Site Health: View Details of all Resource Urls rendered in Javascript crawls

      With the rise in usage of Javascript frameworks, such as React, Vue, Angular etc in Websites of today, crawling requires all the code and resource urls on the page to be processed and rendered.  Having resource urls that are not accessible or ...
    • What type of crawl should I set up, Standard or Javascript?

      Javascript vs Standard Crawls How to check which crawler option to use: Open the url in the browser Disable Javascript See below for steps to disable Javascript Load/re-load the page If you cannot see the links populate, it means that the ...