[Big Data ----] website behavior analysis of user behavior data collection and analysis methods

To improve the usability of the site, the general usability engineering methods, its core is based on user-centered design methodology (UCD). Overview of the current research at home and abroad for user behavior data collection and analysis methods are carried out, the characteristics of various methods, and introduce some examples of tool use appropriate methods to develop, making the construction site more in line with the needs of users, in order to protect communication between the user and the website smoothly.

  With the continuous development of the In ternet, various websites have mushroomed exponentially, competition between the various commercial websites more and more intense, followed by the construction site of the inevitable emergence of a lot of problems . Recent foreign websites for 15 large-scale statistical analysis shows that users in finding the information they need, only 42% probability can be found in most of the time users are unable to find the information they need, this so that users often experience frustration while browsing the site, seriously affecting the users of the site interest and trust. As J acob N ielsen pointed out, "If you want to find some information via the website, it is difficult to find, in general, even if we can find, but also through a series of setbacks. Can be learned from past experience, unless the project management team website throughout the design process especially considering the availability of the website, the results are often disappointing. " The characteristics of the site, at home and abroad made a lot of methods to rely on computer-aided automatic data collection and analysis of user behavior, the remainder of this article will focus on server-based method for log collection and analysis of user behavior and data from the client to collect and analyze user the method of behavioral data, and tools were introduced, according to some of the different methods developed.

  1 server-based log collection and analysis of user behavior data method

  Currently, the website, the user behavior data automatically get one of the most popular method is the method server log (Server log) basis, it is to acquire useful data through log files generated from w eb server. Server log file is used to record activity w eb servers, it provides a detailed log of the client and server interaction, including in response to client requests and server. Collected by the data format of the log file depending on the particular w eb server type, server information different w eb generated is not the same.

  Site server log advantage of 1.1 based method may be obtained by valuable log file usage data. ① log file is automatically generated by the w eb server, so the cost is relatively small. ② compared with usability lab environment artificially constructed, the data obtained by the log file can better reflect the real situation of the user's real environment. ③ Compared with the data only to a few test users within a few hours obtained, the log file is obtained by a large number of user behavior data for a long period of time, which analyzes users' behavior is very favorable , the user may be analyzed using data mining techniques. ④ develop data analysis tools based on the log file is relatively easy, cost is not too large.

  1.2 Based on the shortcomings server log method of log-based methods for usability of your web site there are still many shortcomings, because the log file is designed to generate site-level performance statistics, it is inevitable that, log file data provided will be insufficient compared with the availability of large amounts of data needed for web analytics, usability issues for research potential can only provide a small amount of data may even provide some misleading data. This is because once w eb server to the user's page request is sent, if the user does not request, the server does not record what w eb between the page and the user. Here are some deficiencies acquired from the log file data or misleading data examples.

  ① who are visiting the site. If you want to know who is accessing the site requires a log file must contain a personal ID or login to the login server marked, but the current site generally does not require user login, in most cases the client information provided by the log file is the customer's IP address , IP dynamic IP addresses in many cases these are provided by in ternet provider. And sometimes used to access the In ternet (for example, the school's campus network) through a proxy server, so you can not know which is the right users access the site.

  ② path of the user to access the site. If the log file can write down each page being viewed, then naturally you can clearly record the user's access path, however, when the user's browser is set to use the cache (cache) when (usually the default setting), users are browsing some pages can not be recorded w eb server, for example, use the Back button to browse the pages can not be recorded. Also, if offers multiple choices on the same page can be linked to the same page, users in the end is which link to use in the past, this information from the log file is also difficult to obtain, but the availability of this information to improve your site's also very important. If it is through the picture link, w eb server may record the user clicked coordinate position, so that the user can get the exact information, without the use of this technology, then it is difficult to capture this information. Moreover, when the user U RL by typing the address, or to access the page through a bookmark, w eb server can not record this information.

  ③ users per residence time. Log file record is the time to start data transmission, rather than the time the transfer is complete. And I do not know, in the process of downloading the page, the user in the end at what time to start browsing page. Except when the page is displayed, the user because of something left, or the time between the first request and the next request by comparing the current user can be roughly calculated the approximate time users stay on this page (once after a request by subtracting the first time a request for time to get, but for pages retrieved from the cache, it went into a big deviation).

  ④ the user leaves the location of the site. Log file records the last page during a user session sent, but this may not be the last page the user sees. There are two reasons: First, the user sees the last one may be obtained from the cache. Second, the user may have something to leave for a long time, and this time has exceeded the user session time w eb server defined.

  ⑤ whether the user has successfully completed what they want to do. This is the most fundamental usability problems, but by a separate log file statistics are difficult to answer, if it is "user whether or not the transaction? Whether a user successfully downloaded the files?" The type of problems, the answer is very easy to deduce. However, if you want to answer "whether the user find the information they need?" Questions like, only the log file is difficult to answer.

  1.3 data acquisition server log files based aids Click T races A nalyzer is to analyze the behavior of website users a set of tools provides a powerful To further understand the user's browsing behavior, it put a lot of complex data in a very simple the method to express the availability of a glance in the analysis of user behavior.

  Method 2 clients to collect and analyze user behavior data

  because the information obtained through the log files appear distorted the situation, but there are a lot of important data only through the log file is difficult to obtain, this information is very important to the study site usability issues, Therefore, in order to further gain more valuable data availability, usability issues found more sites, gradually created a lot of interaction techniques for cases directly from the client (page- side) users and websites. Since the data is obtained directly from the client, it can be difficult to get a lot of data from the server to obtain user behavior, which provides greater help users browse the site for further analysis of behavior, potentially improving website usability issues.

  2.1 Client advantages collecting user behavior data

  ① As the user is operating in a real environment is performed (e.g., home or office), artificially reduce disturbance factors, thus more realistic data obtained.

  ② compared with the method based on the log file, the data collection from the client to the more accurate, able to overcome many problems as described above.

  ● not affect the dynamically assigned IP address or the proxy server: tracking technology by using a client (e.g., w eb automatically assigned by the server to the client to access to each site ID and recorded in the client Cook ies, each user visit the website, w eb server can be accessed by the client Cook ies to know whether this client visited this site).

  ● correct user browsing path: Because it is recorded user behavior on the client, so the client-side code can automatically track a user's browsing path, whether through native cache or through a proxy server. For example, the actual user is browsing path from A `B, Back button click,` A `C, but the path is obtained from the log file is A` B` C. Time by page: Example As (dynamic page document is generated by cgi script) 1. This is done using Click st ream-based data collection tool and compare the server's log files generated, the log files are missing a lot of important data.

Guess you like

Origin blog.csdn.net/ningjiebing/article/details/90601005