seoClarity Crawler

Crawling your site is a necessity when trying to improve your on-site SEO efforts and with everything worthwhile, challenges arise. These are challenges are not debilitating, but they are challenges that need to be thought about and worked out.

Consider crawl depth one of the challenges when planning to crawl a site, starting with the understanding of depth. This meaning can be different from person to person and company to company. The depth of a large website is also another challenge when a site is crawled. Each website has a different product, a different idea, and a different purpose. This can lead to different ways to navigate the website as well as the size of the website.

For information on how to setup a crawl, review Clarity Audit Projects.

Breadth Instead of Depth

There are two strategies when it comes crawling a site. One is the crawl depth and another is crawl breadth.

The depth of a crawl gathers data from one section until it hits the end. Once it hits the end of the specific section it is crawling, it will go back to the beginning and start at another section. Thinking of a vertical image, depth is from the surface, or top, down to the bottom. If it is crawled by depth, the site will be crawled from the home page to the category, to any sub-category, to any page below the sub-category. Once the crawler hits the bottom, then it goes back to the top of the site and starts with the next category all the way to the bottom again.

Breadth is the width of which the crawl data is gathered. As the depth goes from top to bottom, breadth goes from left to right. The website is crawled like a book is read. From left to right, top to bottom. If the site is crawled based on breadth, it will be crawl by all categories first, all the sub-categories next, and so on.

Permutations and Combinations

If the site that is being crawled has dynamic navigation, there could be many permutations due to how we crawl. Permutations is one of several possible variations, in which a set or number of things can be ordered or arranged (breadth or depth).

With a dynamic website, a million different combinations are possible. The larger the website, the greater the potential for more combinations . All of these combinations slow down the crawler to a speed that doesn't produce results as quickly as they were expected. This can be an issue with a lot of ecommerce websites that have a variety of products that have a variety of options such as different colors and sizes.

Crawl Speed & Time

Varieties of forces affect the speed of a crawl. Time is always an issue when trying to learn something and in this case, it is necessary to learn the structure and health of a website. As the website becomes larger, more intricate, the crawl will be the same. On top of that, the crawl will take more time.

To start, a crawl has three steps that are necessary in order to complete the entire crawl. First, it has to find the page and identify what it is and what is on the page. After, the webpage is on the list, the next step is the Pipeline Stage. The last step is taking each webpage and indexing it into a database.

After the three steps are complete, the crawl has ended. Once the crawl has ended, the data becomes available in your seoClarity account.

Exclusions

For the site to be crawled faster, exclusions will be necessary. Right away, you think, "excluding page data will not give me the information that I need," but matter of fact, it will. Many websites have the same template for a specific type of page. Take a product page for example. It might be a different product on each of the pages, but the template of each page will be the same. With that in mind, would it be completely necessary to crawl each product with each variation that is available?

There are other examples of data to exclude from a crawl to make a crawl faster. Leaving out internal traffic and paid options from crawls is one way to do that. Not only will the crawl be faster than before adding the exclusions, it will also provide the data to see how organic search is doing.

Exclude different crawl elements from specific crawls to see only the necessary data within a shorter amount of time than before any exclusions were added to the crawl request.

Overcoming the Challenges

Understanding the challenges of doing a crawl is the first step to overcoming them. A large part of understanding crawls is to understand how long they will take to complete. A site with about a million pages will take a month! Waiting for a crawl to finish can be frustrating, but the frustration will dissipate as more knowledge of crawls is understood.

Some of that knowledge to understand is the breadth and depth, permutations and combinations, and how all of that adds up in time. The next step in overcoming challenges with the time of crawls is to understand different solutions. Learning how to exclude parts of the website to shorten the crawl time and still have the amount of data that is necessary to make informed decisions is one of the top solutions.

seoClarity Crawler

seoClarity Crawler

Breadth Instead of Depth

Permutations and Combinations

Crawl Speed & Time

Exclusions

Overcoming the Challenges

Related Articles

seoClarity Glossary of Terms

New Domain Provisioning: How To Request Profile Creation In seoClarity

New Domain Provisioning: How To Request Profile Creation In seoClarity (Modular Apps)

How does seoClarity handle multiple backlinks from the same URL in Link Clarity?

Why do I receive Page Content Change Alerts frequently for my managed pages?