Big Data technology summer internship eight ___ build a user-portrait (SQL statements tagging)

First, the introduction of user portrait

The core work is the portrait of the user for the user to play tag, play an important purpose of the label is to allow people to understand and convenient for computer processing, such as statistical classification can be done: how many users like the iphone? Like the iphone population, male and female ratio is how much? Data mining can also do work: analysis of the distribution of people like iphone ages clustering algorithm.

Second, build a user portrait

  Name tags 2.1   

    Topic tags: a tag belonging to characterize the type, such as various types of user attributes, user behavior, user consumption and risk control, can be A, B, C, D, etc. of each letter represents a topic label;    

    Label type: Label type can be classified as categorical and type these two types of statistics, which classify type for characterize what type of user, the case of male or female, whether a member, has been the loss of other labels, type labels for statistics the number of times certain behaviors portrayed users of statistics, such as the number of collections, nearly 30 the number of purchases and other labels, such labels are required behavior corresponds to a corresponding number of the user's weight;    

    Development approach: development of type can be divided into statistical algorithms developed and developing type two development mode. Wherein the statistical model developed from the data warehouse modeling process for each topic directly from the table, algorithmic development needs of the data processing machine learning algorithm to do to give the corresponding label;

    Whether exclusive tag: corresponding to a category with the next (e.g., a tag, two tags), the relationship between the label is mutually exclusive, the label may be divided into a non-exclusive relationship and a mutually exclusive relationship. For example, male and female tag is a mutually exclusive relationship, with a male user is not marked with labels that women are tag, for high active, in active, low-active label is mutually exclusive relationship;

    User Dimensions: to characterize the tag is playing on a user's unique identification (userid), or playing on the device (CookieID) used by the user or other unique identification. Available U, C respectively identifying the letter like userid and cookieid dimensions.

    Example: The user is male or female label, the label theme is user attributes, label type is categorical, development methods for statistical type, are exclusive, user dimension userid. This gives male users marked "A111U001_001", female users to tag "A111U001_002", which "A111U" as described above named "001" as a label id, followed by user attributes for a tag available to other dimensions. " 002 "," 003 "etc. appended named" _ "behind" 001 "and" 002 "for details of the tag of a label, if the division is high, medium and low active users, corresponding to a lower tab the details can be divided into "001", "002", "003."

   Note: in this case the label topics to user attributes and user behavior; development approach to statistical development-oriented; user dimension using the userid is uniquely identified.

  2.2 sql statement

Create Table profile_tag_user_gender 
( 
user_id String Comment ' user code ' , 
tag_id String Comment ' tag ID ' , 
the tag_name String Comment ' Gender ' , 
tag_type String Comment ' User Properties ' 
) 
Comment ' Gender tag table ' ; 

INSERT INTO profile_tag_user_gender (user_id, tag_id, the tag_name, tag_type)
 SELECT   user_id,
 Case When Gender =' M ' the then ' A111U001_001 ' 
the else  ' A111U001_002 ' 
End, 
Gender, ' Gender ' 
from USER_INFO ;
 ---------------- 
Create Table profile_tag_user_age_region 
( 
user_id String Comment " user code " , 
tag_id String Comment ' tag ID ' , 
the tag_name String Comment ' age range ' , 
tag_type String comment '用户属性'
)
comment '用户年龄段标签表';

insert into profile_tag_user_age_region (user_id,tag_id,tag_name,tag_type)
select  user_id,
case   when age_region=1 then 'A111U002_001'
when age_region=2 then 'A111U002_002'
when age_region=3 then 'A111U002_003'
when age_region=4 then 'A111U002_004'
age_region When = . 5 the then ' A111U002_005 ' 
When age_region = . 6 the then ' A111U002_006 ' 
the else  ' A111U002_008 ' 
End, 
age_region_alias, ' age range ' 
from USER_INFO ;
 ------------------ 
Table Create profile_tag_user_grade 
( 
user_id String Comment ' user code ' , 
tag_id String Comment ' tag ID ' , 
the tag_nameString Comment ' Grade ' , 
tag_type String Comment ' User Properties ' 
) 
Comment ' users Members tag label table ' ; 

INSERT INTO profile_tag_user_grade (user_id, tag_id, the tag_name, tag_type)
 SELECT   user_id,
 Case    When user_grade = ' bronze members ' the then ' A111U003_001 ' 
the when user_grade = ' Minder ' the then ' A111U003_002 ' 
the when user_grade = 'Gold Member ' the then ' A111U003_003 ' 
the when user_grade = ' Diamond ' the then ' A111U003_004 ' 
the when user_grade = ' PLUS members ' the then ' A111U003_005 ' 
End, 
 age_region_alias, ' user level ' 
from user_info ;
 ---------- ------ 
Create Table person_user_tag_action 
( 
user_id String Comment ' user code ' , 
tag_id StringComment ' tag ID ' , 
the tag_name String Comment ' tag name ' , 
tag_type String Comment ' user behavior ' , 
action_count int Comment ' behaviors number ' 
) 
Comment ' user behavior label table ' ; 

INSERT INTO person_user_tag_action (user_id, tag_id, the tag_name, tag_type, action_count)
 SELECT   user_id,
 Case    When user_action = ' 0 ' the then ' A111U004_001'
when user_action='1' then 'A111U004_002'
when user_action='2' then 'A111U004_003'
when user_action='3' then 'A111U004_004'
end,
case   when user_action='0' then '加入购物车'
when user_action='1' then '点击'
when user_action=' 2 ' the then ' buy ' 
the when user_action = ' 3 ' the then ' delete ' 
End, 
' user behavior ' , 
COUNT ( *) from Action Group by user_id, user_action;
 -------------- -

  2.3 Effects

 

Guess you like

Origin www.cnblogs.com/wjwjs/p/11504455.html