What are Log Based Statistics Programs and How do They Work

Webservers, be they Apache (Linux/Unix), IIS (2003/NT), or otherwise are computer programs simply processing strands of data. Each of these transactions can and shall be logged as a matter of course for the technical individuals who build the resources for computing. This information is a prerequisite for construction, editing, and maintenance purposes. If you are an economist this would be the ?MICRO? look and management of website server administration. Website managers, Webmasters, and others are charged with this micro level as well as distribution, marketing, understanding, and planning the ?MACRO? side as well.

Since these systems already log everything they do, the original method for tracking your website activity was to simply read these log files. And parse them into something of use. This activity still acts as the ?industry standard? for tracking activity on your website.

Now log files are not pretty animals. They are generally ugly and look like this to the untrained eye. One line per hit on the server (yes a million lines for a million hits).
LINE 1 - - [01/Sep/2006:00:05:11 -0500] "GET /robots.txt HTTP/1.0" 200 53 "-" "msnbot/0.11 (+" LINE 2 - - [01/Sep/2006:00:06:11 -0500] "GET /bd/faq.php HTTP/1.0" 200 5882 "-" "msnbot/0.11 (+" LINE 3 - - [01/Sep/2006:01:43:10 -0500] "GET /robots.txt HTTP/1.0" 200 53 "-" "Mozilla/5.0 (compatible; Yahoo! Slurp;"
When you looked at this webpage you downloaded, or loaded from the cache, over 15 graphics to see this page correctly. This means that you account for at least 16 hits on the page; 1 page view, 2 files and 1 user session and 1 unique IP for the one site you viewed. You also account for one visit & pageview, and 2 files downloaded as java. Here is a breakdown of what each of the terms means from a technical standpoint.
HitEach line of the log file that results in a successful transaction, whether it be for a picture, a page, or a file.
Page ViewCounts the number of Hits with .htm, .html, .php, .cfm, asp, or other similar file extensions
File Counts the number of Hits with .mp3, .wav, .wma, .wmv, avi, or other similar file extensions
UsersCounts the number of sessions served by the server. A user session (sometime referred to as a visit) is the presence of a user with a specific IP address who has not visited the site recently (typically, anytime within the past 30 minutes). The number of user sessions per day is one measure of how much traffic a Web site has. A user who visits a site at noon and then again at 3:30 pm would count as two user visits.
Unique UsersThis is a count of the number of unique IP addresses served by the server. This number can be highly misconstrued as 2 separate computers behind one corporate firewall will count as 1 Unique. Also ISPs who offer dialup services routinely have a handful of IP addresses for multiple users, further diluting the accuracy of this statistic.
The server has many ?CODES? for how it responded inside a log file. They are can be broken down into these categories here.
100-199 SRCs provide confirmation that a request was received and is being processed. (silent)
200-299 SRCs report that requests were performed successfully. (silent)
300-399 Request was not performed, a redirection is occurring.(usually silent)
400-499 Request is incomplete for some reason.
500-599 Errors have occurred in the server itself.

Notice in the above log file example that all 3 lines have a 200 error code (all is well) directly after the GET request filename. Every line of the log is read by the stats program and then those with positive results (like 200) are grouped to make activity graphs and spreadsheets.

On this page , we have assumed that the information generated by a server log file is perfect and explained it as if no outside forces influence the numbers in any way.??WE KNOW THIS IS ABSOLUTLEY NOT THE CASE!
Click here for problems with the website statistics modeling systems.

2007-02-04 19:33:32
