Site Health: Indexability - How to Check the Indexability of your site

Site Health: Indexability - How to Check the Indexability of your site

Overview


Search engines find information about your site by crawling them. According to Google "The web is like an ever-growing library with billions of books and no central filing system. We use software known as web crawlers to discover publicly available webpages. Crawlers look at webpages and follow links on those pages, much like you would if you were browsing content on the web. They go from link to link and bring data about those webpages back to Google’s servers."

But not everything is crawled. Some pages might be blocked from crawling on your site, which make them non-indexable. Why is a page non-indexable?:

  1. Blocked due to status: If a page returns a non 2xx reponse to a search engine, it cannot be crawled and indexed. The response may be 3XX (redirect), 4XX (Client Side Error) or 5XX (Server Side Errors).
  2. Blocked via Robots.txt, Robots Meta, or X-Robots Header: Google's Search Console gives site  owners granular choices about how Google crawls their site: they can provide detailed instructions about how to process pages on their sites, can request a re-crawl, or can opt out of crawling altogether using a file called “robots.txt”. Google also obeys instructions not to crawl and index a page when the Robots Meta tag is set to noindex or it the X-Robots-Tag in the HTTP header response for the page is set as noindex 
  3. Canonical on the page is pointing to another page: This tells google that it shouldn't index the page but instead it should index the canonical url. 

Checking Indexability in seoClarity


The indexability tab in seoClarity Site Health is designed to help you audit the indexability of your site in a single view. In the summary boxes, it shows you the count of pages that are found to be Indexable and Non-Indexable. If they are non-indexable, the summary boxes show the number of pages found to be blocked, either due to response status, or because of blocking via Robots.txt, Robots Meta, Robots Header, OR because the page is Canonicalized to another page. Each of these numbers can be selected to filter the non-indexable pages in the table below based on a specific blocked reason for more granular analysis.. 

Indexability Summary Box


Indexable Pages: Displays a count of URLs that are indexable by search engines. Clicking on this takes you to the details tab filtered by the indexable pages found in a crawl.

Non Indexable Pages: Displays a count of URLs that are indexable by search engines. Clicking on this filters Site Health by the non-indexable pages found in a crawl. 

Error Reasons: This displays the count of error reasons found during the crawl. 3xx means Redirection, 4xx means Client error and 5xx means Server error.

Blocked Reasons: This displays the count of blocked reasons found during the crawl.
      By Robots.txt: This indicates the count of pages that are disallowed by Robots.txt
      By Robots Meta Tag: This indicates the count of pages that are blocked by the Robots Meta Tag on the page.
      By X-Robots Header: This indicates the count of pages that are blocked by a X-Robots Header on the page. 
Canonical: This indicates the count of pages that are not indexable because of a canonical on the page pointing to another page. 

Indexability by Depth Summary Box



This bar chart contains details of the pages found to be Indexable or Blocked based on the depth in which they are found. This chart is also useful in identifying the depth in which the pages were found. Since the search engine's crawler traverses through a site by following links, it's a good view to see how quickly pages can be found on your site. Having pages that are difficult to reach at higher depths may result in those pages being missed by the search engine crawler.
 
Non Indexable Pages Table 



The Table on the Indexability tab contains the details of all Non Indexable pages found in the crawl. This is useful to get a detailed url view on why a page is blocked from crawling. Below is what each column of the table contains.  


Title/URL: Displays the URL and Title of the page. 

Status Code: Displays the status code found for that page on the date of the crawl.

Blocked by Robots Meta Tag: Displays Yes if the robots directive found on the URL is noindex. 

Blocked by Robots.txt: Displays Yes if the robots directive for the URL is noindex. 

Blocked by X-Robots header: Displays Yes if the X-Robots header directive for the URL is noindex. 

Robots Meta Tag Value: Displays the value of the robots meta tag where available.  

Canonical Type: Displays the canonical URL for the page.


    • Related Articles

    • Site Audit Details

      Site Audit Details Overview Site Audit Details is a new version of Site Health. The UI is designed with a similar look and feel of the earlier Site Health but it has been rebuilt using our Clarity Grid Infrastructure. This page provides a variety of ...
    • Setting up a Site Audit

      Overview A Site Audit will crawl pages on your site and return a summary report of the audit results through Site Audit Reports along with a detailed analysis of of pages crawled, redirect chain analysis, audits for duplicate content, canonical, ...
    • Site Audit Projects

      Site Audit Projects Overview The Site Audit Projects List gives you a high level view of the different crawls that have been setup for the domain. Watch the video below: "How to Create a Clarity Audit Project" Background & Requirements Some sites ...
    • Site Health: Hreflang Audit

      Overview Hreflang is an extremely important part of Global SEO. Regular audits of hreflang for your site can help to avoid common mistakes that impact impressions and organic traffic for your site. Background The Hreflang Audit in Clarity Audits is ...
    • Site Audit Settings

      Site Audit Settings Overview Site Audits provides a variety of reports and analysis based on a crawl, that can impact the health of a site. Site Audit Settings allow for the customization of Site Audit reports. The settings enable prioritizing issues ...