Analog web stat configuration information
Status: Site Admin
Joined: 26 Sep 2003
Posts: 3762
Location: East Coast, West Coast? I know it's one of them.
Reply Quote
I'd been putting off getting my analog stats in better working order, but with the latest mozilla googlebot renaming, I was finding that the number of useragents falling under Mozilla Compatible had risen to a significant percentage of my overall stats, since those now include almost all googlebot visits, all yahoo slurp visits, along with a host of others.

Happily, after googling a while, I found a very nice page, statistics. I tested some of the stuff this person had listed [simply copy and paste the relevant portion into your existing analog5 configuration file, then run it. It works great.

Note: there is no need to remove the line breaks from example:

:: Code ::
##Make the actual agent names display uniquely in the Browser summary, previously most of them were part of Netscape (compatible):

BROWALIAS "Mozilla/5.0 (compatible; Yahoo! Slurp;\"\
 "Slurp/Yahoo! (compatible; Yahoo! Slurp;\"
BROWREPALIAS "Slurp/Yahoo! (compatible; Yahoo! Slurp;\"\
 "Mozilla/5.0 (compatible; Yahoo! Slurp;\"
# and so on down the lists of different configuration options you'll find on that page

just copy the data directly into your config file and it will work.

This page also had another useful link to Web Log Customization Files. This provides the following files, and also tells you how to link your config file to those downloaded files:

:: Quote ::
* SearchEngines.txt
is the latest listing of search engines. While this was built as a search engine list for Analog, it is quite easy to modify for other log analysis programs.
* RefSpam.txt
is a list of domains who are known to spam referral lists. Again configured for analog (REFEXCLUDE commands), this can serve as the basis of other blacklists or lists to prevent referrer spam.
* TypeAlias.txt
is a list of file type aliases, providing a description of many common filetypes.
* RobotInclude.txt
is a list of known search engine robots and other robots.

To use all of these files, simply save them in your analog directory, and add the following to your analog.cfg file:

CONFIGFILE SearchEngines.txt
CONFIGFILE RobotInclude.txt
CONFIGFILE TypeAlias.txt

Now be warned, these lists are VERY complete, and number in the thousands of entries, so you probably don't want to use them all that routinely, it will eat up your server resources, but it is useful to do once in a while.

But what was even more useful to me was realizing that I could now share common data between all my site log config files without having to update each and every one of them all the time. Very good stuff to know. I hadn't spent that much time searching for solutions, and the analog configuration read me stuff is not as clear and example filled as it could be.

Luckily some people out there have already done the hard part for us, thanks to those two sites, and any others I might come across.
Back to top
Display posts from previous:   

All times are GMT - 8 Hours