Mastering skills: How to extract effective information from web server logs for analysis?

Whether you're sitting at a desktop computer, reading the news on a tablet, or operating a website on a server, there are many different processes happening in the background of these devices. If an error occurs, or if you just want to learn more about what a given operating system or program is doing, log files can help you in this regard. These are logged automatically by nearly every application, server and database system.

In general, log files are rarely read and evaluated - think of them as a virtual black box: they are checked only in the most urgent cases. Because of the way they capture data, log files prove to be an excellent source for finding more information about program and system errors; they are also especially good for gathering information about user behavior. The ability to learn more about users makes this technology of particular interest to website operators because they are able to obtain useful data from log files located on their web servers.

Web server log analysis

What are log files?

Log files, sometimes called event files, usually deal with common text files. These contain information about all processes that have been defined as relevant by their respective programmers. When it comes to the database's log file, it shows all changes made to correctly executed transactions. If part of the database is deleted, for example during a system shutdown, the log files serve as the basis for restoring the dataset to its correct state.

Log files are automatically generated based on how they are programmed. You can also create your own file if you are familiar enough with the technical aspects involved. Typically, a line in a log file contains the following information:

Timestamp of recorded events (e.g. program start)
, assigning a date and time to the event

Usually, time is displayed first to show the chronological order of events.

Typical use of log files

Operating systems often create multiple protocol files by assigning different process types to fixed categories. For example, the Windows system logs information about application events, system events, security-related events, setup events, and redirection events. This gives administrators insight into the appropriate log file information to help them troubleshoot; Windows log files also show which users have logged on and off the system. Besides the operating system, the following programs and systems collect quite different data:

Background programs, such as email, database, or proxy servers, generate log files primarily for recording error and event messages and other notifications. These features help protect data and recover data in the event of a crash.

Installed software, such as official programs, games, instant messengers, firewalls, or virus scanners, save many different types of data in log files. This process may involve different configurations or chat messages. Instances of program crashes are compiled and used to help expedite troubleshooting efforts.

Servers, especially web servers, log relevant network activity; this information contains useful data about users and their behavior within the network. In addition, authorized administrators are granted information about which users launch applications or request files, when and when they do so, and the operating system used. Web log analysis is one of the oldest methods of web control and one of the best examples of the many uses of log files.

Web Server Log Files: A Textbook Example of the Potential of Log Files

Originally, the log files of a web server (such as Apache or Microsoft IIS) were the default option for recording and repairing processing errors. However, it quickly became apparent that web server log files contained much more valuable data: information about the availability and popularity of websites hosted on the server, as well as user data such as:

Page view time
Number of page views
Session duration
IP address and hostname of the user
Information about the requesting client (typically a browser)
Search engine used, including the search query
Application operating system

A typical entry in a web server log file looks like this:

Usually there will be a server log file in the directory of the website, and the content contained in the log file is similar to the following.

 
 

1

85.111.123.12 - - [18/Mar/2021:08:04:22 +0200] "GET /images/logo.jpg HTTP/1.1" 200 512 "http://www.xxxxx.org/" "Mozilla/5.0 (X11; U; Linux i686; de-DE;rv:1.7.5)"

Detailed overview of the individual parameters:

The parameter explanations provided below are for reference and analysis only, please refer to the actual website logs for details.

significance example value explain
IP address 85.111.123.12 The IP address of the requesting host
idle  – Usually unknown RFC 1413 identity
who?  – Displays the username, if HTTP authentication has been done; otherwise, as is the case in this example, it remains empty.
when? [18/Mar/2021:08:04:22 +0200] Timestamp consisting of date, time, and time offset information
What? “GET /images/logo.jpg HTTP/1.1” The event that occurred, in this case an image request over HTTP
Access status? 200 Confirm that the request was successful (HTTP status code 200)
How many? 512 If applicable: amount of data transferred (in bytes)
from where? http://www.xxxxx.org/ The URL of the requested file
By what means? “Mozilla/5.0 (X11;U;Linux i686;de-DE;rv:1.7.5)” Client technical information: browser, operating system, kernel, user interface, voice output, version

To effectively evaluate information floods, tools like Webalizer have been developed. They take the collected data and convert it into statistics, tables and graphs. Website trends, the user-friendliness of individual pages or related keywords and topics can all be determined using this information.

Even though web server log file analysis continues, this tried and true method has lost some of its former luster due to the increasing popularity of web analysis methods such as cookies or page tagging. Some of the factors driving this trend include the error-prone nature of log file analysis when sessions are allocated, and the fact that website operators often do not have access to a web server's log files. Despite this shortcoming, all bug reports are registered immediately. Additionally, the data collected from log file analysis is kept directly within the company.

Guess you like

Origin blog.csdn.net/winkexin/article/details/131652677