User portrait --ID-Mapping

Explaining the ID-Mapping algorithm, first say a few key concepts:

 

MAC (Media Access Control), MAC address, for the identity card, uniquely identifies network devices.

IMEI (International Mobile Equipment Identity), usually said phone serial number, mobile phone "serial number", to identify an individual mobile phones and other mobile communication device of each unit in a mobile telephone network; consensus 15-digit serial number, the first six (TAC ) is the type approval number, representing the type of phone. Then 2 (FAC) is the number of final assembly, on behalf of origin. After six (SNR) is a serial number, the representative production sequence number. The last one (SP) is generally 0, is a check code for use.

IMSI (International Mobile SubscriberIdentification Number), stored in the SIM card, the mobile user distinguish valid information; total length of not more than 15, using the same numbers 0-9. Wherein MCC is a mobile country code user belongs, representing 3 digits, the MCC is a predetermined China 460; MNC is a mobile network number, of up to two digits, a mobile communication network identifies the mobile user belongs; the MSIN mobile user an identification code for identifying a mobile subscriber in a mobile communication network.

Android system ID is randomly generated 64-bit device ID is a string of code (hexadecimal string), through which the device can know the life (after factory reset the device or brush, the value may change).

UDID (Unique Device Identifier), a unique identification code Apple IOS device, which consists of 40 alphanumeric characters composition, in order to protect user privacy Apple has read the identifying prohibited.

UUID (Universally Unique IDentifier), is based on the iOS device above a single application, as long as the user does not completely remove the application, then the UUID has remained the same when the user to use the application. If a user deletes the application, and then re-install, then this UUID has changed. The disadvantage is that the user data before you delete the program developed, basically can not get association.

OpenUDID, not Apple official, is a third alternative UDID to develop solutions disadvantage is that if you completely remove all App (such as recovery systems, etc.) with OpenUDID SDK package, then OpenUDID regenerates, and the previous value and It will be different.

IDFA (Identifier advertising), after disabling Apple UDID came up with the compromise, and to provide another set of hardware-independent identifier for businesses to monitor the advertising effect, this is the IDFA. The user can change this string of characters in the phone settings, can lead to long-term business is no way to track user behavior.

telphone (phone number). Phone number can only identify the user. Because two people of the same mobile phone number will not be at the same time.

These information given above can uniquely identify a user, as the user ID number.

 

Suppose a user Joe Smith, the first cell phone use on Baidu map, Baidu iQIYI watch videos on ipad, Baidu app using the phone on the second phone, use Baidu search on pc computer, how will the same users in these different end user information aggregation together? 

ID-Mapping The main solution to this problem, related to the ID information.

 

Algorithm thinking

We user id information collected at respective ends, the two logs is assumed that the input: 

line1: < mac1,mac2> < imei1> < tel1> 

line2: < mac1> < imei2> < tel1,tel2> 

From top to bottom are two user behavior log, you see they have mac1, two data should be the same user. 

Using multiple rounds of map-reduce polymerization method, map data block doing, reduce do merge 

The first round, and to mac1 mac2 key fields of the map and reduce 

Map output: 

mac1 line1 < mac1,mac2 > < imei1> < tel1> 

mac2 line1 < mac1,mac2> < imei1> < tel1> 

mac1 line2 < mac1> < imei2> < tel1,tel2> 

Reduce Output: 

line1 < mac1,mac2> < imei1,imei2> < tel1,tel2> 

line1 < mac1,mac2> < imei1> < tel1> 

line2 < mac1,mac2> < imei2,imei1> < tel1,tel2> 

Second round, in order to line1 line2 and key field to the map and reduce 

Map output: 

line1 < mac1,mac2> < imei1,imei2> < tel1,tel2> 

line1 < mac1,mac2> < imei1> < tel1> 

line2 < mac1,mac2> < imei2,imei1> < tel1,tel2> 

Reduce Output: 

line1 < mac1,mac2> < imei1,imei2> < tel1,tel2> 

line2 < mac1,mac2> < imei1,imei2> < tel1,tel2> 

Third round to <mac1, mac2> is the key to the map and reduce field 

Map output: 

< mac1,mac2> < imei1,imei2> < tel1,tel2> 

< mac1,mac2> < imei1,imei2> < tel1,tel2> 

Reduce Output: 

< mac1, mac2> < imei1,imei2> < tel1,tel2>

 

Sequentially designates <id> above process is repeated until no merge

 

Data and index design

Design of the database table is provided as the primary global-id Key, (similar to the role ID number), the other fields can have a plurality of (map <string, int>), which is used to represent a plurality of user identity .

 

//data sheet

global_id               string,

imei                    map<string,int>                             

mac                     map<string,int>                             

imsi                    map<string,int>                             

phone_number            map<string,int>                             

idfa                    map<string,int>                             

openudid                map<string,int>                             

uid                     map<string,int>                             

did                     map<string,int> 

1

2

3

4

5

6

7

8

9

10

For example, this four records a user can actually see, when stored put them saved as a user with global_id as the key. 

Thus obtained 

The mapping between global_id <=> imei, mac, imsi, phone_number, idfa, openudid, uid, did the. 

 

 

//direction chart

id               string                                      

global_id        string                                      

1

2

3

4

When online inquiry, it is assumed mac1 acquired type ID, according to obtain global_id mac index table, and acquires the user imei, phone_number other ID information according to the data table global_id.

 

ID expired question

For zombie users, or long-term without the user to save data does not make sense, waste of resources and long-term data is not updated after the data might not be accurate. 

ID can be added for each activity parameters, on the one hand the level of activity on behalf of the user, one can do to control the storage of ID.

 

User behavior data: represents the user's activity, activity data into a table is set to 0

ID Mapping historical data: weekly update last week on behalf of the user's data, when iteration, +1 activity

The total amount of user information: representing the whole amount of user data is introduced, the active set parameter to a reasonable value. (Eg: 60)

 

Guess you like

Origin www.cnblogs.com/java9188/p/11945983.html