botsplit logo
    Download Contact Us Pricing Home

BotSplit - resolve Bot vs. Human visitors

BotSplit first collapses the number of visitors by automatically recognizing domain variations (e.g. crawl-123.msdn.com and crawl-456.msdn.com are treated as a single domain) and coercing them into a single IP. A series of rules are applied to distinguish human and robot. In a typical run, above better than 4:1 reduction was achieved. You results may vary. The results would be even more dramatic if we compared the reduction in number of sessions (IP + time interval) and not just visitors (IP only).


Bot Split Concept
BotSplit is a new type of log analysis software that reads log files and emits log files. It is a log file pipe. You can run BotSplit using your existing log analysis software.

BotSplit reads a single log file and creates two output files, one containing records of robotic visitors and the other human visitors. Optionally, BotSplit can re-write IP addresses into a canonical form so that subsequent analysis is more useful.

Bot Split Tests
BotSplit is not a black box. It has a number of adjustable or selectable options. Here are the options available

  • compares files against a list you can edit such that any access is a robot (e.g. robots.txt)
  • compares files against a list you can edit such that if access is confined solely to that list, the visitor is a robot (e.g. a file linked from some other site)
  • compare file extensions such that one of the extensions must be accessed to qualify as human (e.g. a graphics image).
  • uses version 1.0 of http protocol
  • IP address morphs (e.g. 111.111.111.xxx where xxx varies)
  • referrer field missing from all requests
  • browser field missing from all requests
Perhaps you can imagine outlier cases where such rules will classify a visitor as robot that is actually a real human. Our experience is that these rules are very good in insuring that whatever remains after applying them is really human. You can deselect any of these rules.

Notice that none of these rules rely on how the visitor identify themselves in the browser (user agent) field of the log record. The behavioral rules we use correlate well with robot self-identification but do not rely on goodwill.

Multi-site file management
BotSplit has added functions for managing multi-sites, sites that have multiple domain names in a single ZIP file. LogMap allows you to select a single site, all sites, or any subset of sites from all ZIP files or a subset of ZIP files in a directory. These are then consolidated into a single analysis stream. Batch allows you to integrate and manage ordinary Windows batch files for functions such as file re-name, copy, and delete.

Once set up, you may run any task from the command line without needing to invoke the interactive interface.



DAIR Computer Systems
3440 Kenneth Drive
Palo Alto, CA 94303 USA
  tele:  1-650-494-7081
DennisR@dair.com
Since 1999
Copyright © 2010 by
DAIR Computer Systems