Introduction
Status codes are responses issued by a server to a client' request. A status code is a three digit code, the first of which defines the classification of response.
Standard Status Codes
The HTTP response status codes can be broken down into the following categories:
1XX - Informational - Request is in motion.
2XX - Success - Request was successful. Client's request was received, understood and accepted by the server.
3XX - Redirection - Request was received by the server but further steps are needed from the client's end to successfully process the request.
4XX - Client Error - Request was received by the server but cannot be processed. Typically means that the request from the client was either incorrect, had bad syntax or cannot be fulfilled.
5XX - Server Error - Request from the client was valid but server has failed to fulfill the request.
Custom Status Codes
There is however, one more class of status code you may encounter when our crawler traverses your site - when the errors that are received from the site do not fall within the standard responses. These are captured and indicated using a custom status code. We classify the custom response as a 9XX category.
9XX - Custom - Shows a custom response received when crawling the client site.
These are the below custom status codes that may be shown as a crawl status:
900 - Max Page Size Exceeded. - Request failed because the size of the HTML was above 8 MB
901 - Unsupported Content Type - Crawler received a Content Type that is not text/html. For example a content type of PDF/text would throw this error.
940 - Bad Request Error - The request was invalid due to malformed input.
980 - System Timeout Error - The system exceeded the allotted time for processing the request.
981 - Dependency Failure - An upstream or downstream service dependency failed.
982 - Site Unreachable - The target site could not be reached.
983 - Malformed Site Response - The response from the target site was not well-formed or parseable.
984 - Navigation Failure - Navigation blocked by client site.
985 - Resource Limit Exceeded - The resource quota or limits were exceeded.
996 - Blocked by robots.txt - The page could not be crawled due settings in the robot.txt file, the client needs to review the robots.txt settings.
998 - JavaScript Crawl Timeout - Page was not loaded in the JavaScript timeout it was assigned.
999 - Http Fetch Failed - Page Could not be fetched or timed out.