Site Health: Indexability - How to Check the Indexability of your site

Overview

Search engines find information about your site by crawling them. According to Google "The web is like an ever-growing library with billions of books and no central filing system. We use software known as web crawlers to discover publicly available webpages. Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers."

But not everything is crawled. Some pages might be blocked from crawling on your site, which make them non-indexable. Why is a page non-indexable?:

Blocked due to status: If a page returns a non 2xx reponse to a search engine, it cannot be crawled and indexed. The response may be 3XX (redirect), 4XX (Client Side Error) or 5XX (Server Side Errors).
Blocked via Robots.txt, Robots Meta, or X-Robots Header: Google's Search Console gives site owners granular choices about how Google crawls their site: they can provide detailed instructions about how to process pages on their sites, can request a re-crawl, or can opt out of crawling altogether using a file called “robots.txt”. Google also obeys instructions not to crawl and index a page when the Robots Meta tag is set to noindex or it the X-Robots-Tag in the HTTP header response for the page is set as noindex
Canonical on the page is pointing to another page: This tells google that it shouldn't index the page but instead it should index the canonical url.

Checking Indexability in seoClarity

The indexability tab in seoClarity Site Health is designed to help you audit the indexability of your site in a single view. In the summary boxes, it shows you the count of pages that are found to be Indexable and Non-Indexable. If they are non-indexable, the summary boxes show the number of pages found to be blocked, either due to response status, or because of blocking via Robots.txt, Robots Meta, Robots Header, OR because the page is Canonicalized to another page. Each of these numbers can be selected to filter the non-indexable pages in the table below based on a specific blocked reason for more granular analysis..

Indexability Summary Box

Indexable Pages: Displays a count of URLs that are indexable by search engines. Clicking on this takes you to the details tab filtered by the indexable pages found in a crawl.

Non Indexable Pages: Displays a count of URLs that are indexable by search engines. Clicking on this filters Site Health by the non-indexable pages found in a crawl.

Error Reasons: This displays the count of error reasons found during the crawl. 3xx means Redirection, 4xx means Client error and 5xx means Server error.

Blocked Reasons: This displays the count of blocked reasons found during the crawl.

By Robots.txt: This indicates the count of pages that are disallowed by Robots.txt

By Robots Meta Tag: This indicates the count of pages that are blocked by the Robots Meta Tag on the page.

By X-Robots Header: This indicates the count of pages that are blocked by a X-Robots Header on the page.

Canonical: This indicates the count of pages that are not indexable because of a canonical on the page pointing to another page.

Indexability by Depth Summary Box

This bar chart contains details of the pages found to be Indexable or Blocked based on the depth in which they are found. This chart is also useful in identifying the depth in which the pages were found. Since the search engine's crawler traverses through a site by following links, it's a good view to see how quickly pages can be found on your site. Having pages that are difficult to reach at higher depths may result in those pages being missed by the search engine crawler.

Non Indexable Pages Table

The Table on the Indexability tab contains the details of all Non Indexable pages found in the crawl. This is useful to get a detailed url view on why a page is blocked from crawling. Below is what each column of the table contains.

Title/URL: Displays the URL and Title of the page.

Status Code: Displays the status code found for that page on the date of the crawl.

Blocked by Robots Meta Tag: Displays Yes if the robots directive found on the URL is noindex.

Blocked by Robots.txt: Displays Yes if the robots directive for the URL is noindex.

Blocked by X-Robots header: Displays Yes if the X-Robots header directive for the URL is noindex.

Robots Meta Tag Value: Displays the value of the robots meta tag where available.

Canonical Type: Displays the canonical URL for the page.

Related Articles
Site Audit Details
Site Audit Details Overview Site Audit Details provides a variety of reports and analysis based on a crawl, that can impact the health of a site. Site Audit Details Use Cases Identify and fix potential user experience issues. Learn more Identify ...
Site Audit Report
Site Audit Report Overview Site Audit Reports displays a summary of the most recently completed crawls. It contains a summarized view of site health scores of crawls run within a project, the number of pages audited, crawlability and page analysis ...
Setting up a Site Audit
Overview A Site Audit will crawl pages on your site and return a summary report of the audit results through Site Audit Reports along with a detailed analysis of of pages crawled, redirect chain analysis, audits for duplicate content, canonical, ...
Site Audit Projects
Site Audit Projects Overview The Site Audit Projects List gives you a high level view of the different crawls that have been setup for the domain. Watch the video below: "How to Create a Clarity Audit Project" Background & Requirements for Site Audit ...
Site Health: Hreflang Audit
Overview Hreflang is an extremely important part of Global SEO. Regular audits of hreflang for your site can help to avoid common mistakes that impact impressions and organic traffic for your site. Background The Hreflang Audit in Clarity Audits is ...

Site Health: Indexability - How to Check the Indexability of your site

Site Health: Indexability - How to Check the Indexability of your site

Overview

Checking Indexability in seoClarity

Related Articles

Site Audit Details

Site Audit Report

Setting up a Site Audit

Site Audit Projects

Site Health: Hreflang Audit