Third, read the paper

Third, read the paper

  • Thesis Title: Marui Min, Li Xiangyun .Web log data preprocessing technology research [J] excavation Computer Engineering and Design, 2007 (10): 102358-2360.
  • Study
    Pretreatment technology Web log data mining, Bowen compared to two papers studying different from this blog that the paper proposes an algorithm that simplifies the process of pre-treatment, only four stages of work you can get the user's data.
  • Research motivation
    of traditional Web log preprocessing stage 5 is improved, made only according to the website topology, the path does not require the use of complementary technologies, algorithms accessed directly by the user to generate a sequence of mining transaction data.
  • Literature Review
    Web log mining process (framework)


Web log mining pretreatment process


Data cleaning
user identification
session identification
users get access sequence algorithm Affairs
transaction identification is to find meaningful page access sequence from user access session sequence, ie the user access transaction. Prior to pretreatment before the data transaction is identified using a path complementary technology to fill the request is not recorded in the user full access to the log, to obtain a complete access path of the user, the user correctly identify meaningful access path, and then using the maximum defined by reference to the forward path affairs.
Algorithm (STT) obtained from the user transaction access sequence, according to the website topology, do not need to add fallback path, users get access path, identify it, and ultimately get access to transaction data.
STT algorithm
STT algorithms first convert the site into a tree topology structure of the binary tree, then get on the transaction sequence binary tree structure based on the user's session sequence.
Algorithm maximum forward path following references:

IniStack (St); // initialize stack
P = T; // P points to the root of the binary tree
In Flag = 0;
the While (! = Null S) // user access sequence determination is finished
{IF (in Flag == 0)
the If (P) {// if the current node is the root of the user to access the current node in the same sequence, it is added to the Path
the If (the P-> Data == S)
{in the P added to the Path,
S ++;
IF (in Flag == 0) = in Flag. 1;}
Push (St, P); // the current node pushed onto the stack
P = P-> lchild;} // point P left child node
else {pop (St, P) ; // pop the top element and assigned to P
P = the P-> rchild;} // point P of the right child node
else if (P) {if ( P-> data = = S)
{the P added to the Path,
S ++;
Push (St, P);
P = the P-> lchild;}
Else {Push (St, P);
P = the P-> rchild;}
Else {IF (P previous node is the left node) saves the current path
path the path;
POP (St, P);
IF (P) by deleting from the P path in the path;
P = the P-> rchild;}
IF (StackEmpty (St )) // if the stack is empty but the access sequence is not over, then
the tree root point P, flag Fu is 0
{P = T; In Flag = 0;}
}

  • Study Design
  • Using data sets
  • Conclusions
    using STT algorithm enables simplified pretreatment process, improve the efficiency of mining log.
  • Learning experience

Guess you like

Origin www.cnblogs.com/zaw-315/p/11228218.html