Bot Server Log Integration
Bot Server Log Integration Overview
Bot Clarity and AI Discovery work by processing the log files from your servers, which we then summarize and display in the platform Learn More. Integrating bot data or log files can be done via the server you have setup to monitor Bot log activity. There are a number of effective tools that allow for exporting bot logs that can be leverage for this, including but not limited to: Akamai, IIS, Logentries, Logsearch, Logz and Splunk. Outlined in this document are the items that need to be included in the file for integration regardless of the export tool.
Background & Requirements for Bot Server Log Integration
Server logs that can be provided on a recurring basis are needed for bot log integration. The Bot Clarity feature will be activated for profiles at the time of integration.
Log files will need to be provided on a daily basis. Personally identifiable user information (PII) should be removed from this file prior to delivery to seoClarity. The data that is necessary for integration can be seen below. Extraneous data and PII should be excluded as they aren't necessary for integration. Attached is an example file with several rows of mock data.
Required Data: The date, URL, response code, and user agent/bot name should be included in the file.
File Format: We accept most file formats (as long as they include the required data outlined below), including but not limited to: W3C (Microsoft IIS and Amazon CloudFront), Apache (NGINX), and ELB (Amazon). We support a single daily file integration, as merging data opens up possibility of issues related to custom logic to handle duplicate rows and debugging any discrepancies. If you use a load balanced setup and have multiple servers, the data needs to be aggregated and combined into one file prior sending to the SFTP.
File Name: The file name should include the domain, date of export, and a discernible terminology such as "bot" or "log" for identification purposes where possible.
File Size: There is no maximum file size, however the data we process and store will not exceed 1 GB of uncompressed data per day. If the data exceeds that amount, each additional GB of data will be charged at the rate of $100 per GB. *Monthly Bot Log cost is based on the subscription package level, not the individual domain profile. 0-30 GB is included with Pro Packages and 31-60 GB would be an additional charge of $100 for the month.
Bot Server Log Delivery Method
Cloud Provider | Details |
S3 | seoClarity is capable of retrieving Bot Logs from your S3. Your S3 bucket can be shared with the below user.
arn:aws:iam::397485469449:user/seoclarity_dev
|
Microsoft Azure | Send logs to an Azure Blob Storage container and provide the following details: Storage Account URL SAS SAS URL Container Name:
Send these details to support@seoclarity.net and we will follow up with next steps in the integration process.
|
Bot Server Log Data Requirements
Required Server Log Data
- Date: The file name should include this.
- URL: In the event your Bot logs do not contain a fully qualified URL, Bot Clarity does have the ability to append the domain name to your logs. Please note that this will impact all entries, including those that may have a fully qualified URL. Additionally, one domain name may be appended per server log file. It is recommended that you ensure your bot logs are separated by domain profile if the domain name cannot be provided and we're integrating multiple profiles within Bot Clarity.
- URL Protocol: This can be included in the URL and does not have to be separate.
- Response Code: The server response that the bot met at the time of crawl.
- User Agent/Bot Name: The specific user agent or bot name.
Optional Server Log Data
- IP Address for search engine bots only (Optional)
Example URL Patterns to Exclude
These URL strings are typically filtered out during the integration process.
- Does Not Contain /api/
- Does Not Contain graphql
- Does Not Contain .js
- Does Not Contain trvl-px
- Does Not Contain .php
- Does Not Contain /botOrNot/
- Does Not Contain /cgp/simple/
- Does Not Contain /gdpr
- RegExp Not Match \/v\d\/
Search and AI Engine Bots in Bot Clarity
The user-agent strings listed below are taken from the bot owner’s documentation. Because version numbers can change over time, we recommend filtering using a “contains” match rather than an exact string.
- Googlebot
- bingbot
- Baiduspider
- GPTBot
- PerplexityBot
Related Articles
Bot Clarity (enabled feature)
Bot Clarity Overview Bot Clarity provides an incredibly powerful set of reports and analysis to help understand not only the amount and frequency of bot activity, but also correlate the same with your other metrics at both an aggregate site level and ...
Adobe Analytics Integration
Overview For Adobe Analytics integration, this article outlines the steps needed to run through in order to grant seoClarity access to your account. Adobe eVars (Conversion Variables) are not needed for the Partner Exchange integration. In the Adobe ...
Wordpress Integration
Wordpress Integration The default integration in our settings allows for Wordpress and allows web pages in Wordpress to be created or updated via Content Fusion. Once the Wordpress Integration has been set up, you will be able to use it. Best ...
Integration settings
Integrations Overview The Integrations tab in Settings provides a variety of options for data integrations with the platform. To access Integration Settings go to the gear icon in the top right corner and then click on Settings. Next click on the ...
SAML Integration
SAML Integration Overview Security Assertion Markup Language (SAML) is a standard for exchanging authentication and authorization data between an identity provider and a service provider (seoClarity). Currently available identity providers include ...