Wanxiang Blockchain Industry Research: What if we look at data privacy from the perspective of user portrait implementation?

Foreword:

Dr. Xiao Feng, chairman and general manager of Wanxiang Blockchain, once said that whether it is the metaverse, blockchain or Web3.0, the essence is actually the same thing, which is the decentralization of human society in the digital age. trend. Therefore, whether in the metaverse or the Web3.0 era, personal data has become a personal asset, and the issue of personal data privacy has also become a core issue in the "new world".

This article will discuss data privacy issues from the perspective of "user portraits", hoping to provide an idea for you to think about the privacy and security protection of the Metaverse or Web3.0.

The author of this article: Wang Puyu, Chief Economist Office of Wanxiang Blockchain

Review of this article: Zou Chuanwei, Chief Economist of Wanxiang Blockchain


What is a user persona?

User portrait was first proposed by Alan Cooper, the father of interaction design. It labels the user's information around four elements: person, time, place and event (as shown in Figure 1), and then collects the user's social attributes, Consumption habits, preference characteristics and other dimensional data, and analyze and count these characteristics to mine potential value information, so as to abstract the whole picture of the user's information.

Figure 1: Four elements of personal information

User portrait is a double-edged sword, which facilitates the user's life, but at the same time violates the user's personal privacy. For example, when a user uses Alipay to scan a QR code to complete a transaction payment, the big data analysis tool will capture the user's consumption amount, location, time and other behavioral data. The connection of the four elements forms a complete user portrait, and the platform can analyze user behavior and achieve precise marketing.

 

How are user portrait tags designed?

1 Label Frame

There are currently four labeling frameworks in the market, which are:

(1) User labeling system based on marketing touchpoints; identify the user's payment process and payment willingness stage, and provide a clear opportunity for marketing. For example, the labeling framework AIPL of Ali, divides the user's cognitive stage of the brand into perception, positive There are four stages of interest, purchase and loyalty, and then segmentation labels are developed according to the marketing actions of the four stages.

(2) AARRR model based on growth funnel; this model is also known as the pirate model, proposed by Dave McClure, founder of a famous American venture capital institution, by Acquisition, Activation, Retention, Revenue, Refer Retention, revenue generation and sharing) are composed of five initials, identify the growth state of users, and implement different growth strategies for users with different life cycles. Using this model, advertisers can implement differentiated marketing strategies for each module customer and increase the conversion rate.

(3) Hierarchical models based on user value; such as RFM model, ARGO model, as well as user loyalty and user life cycle models. Among them, the RFM model is widely used in the traditional sales industry. RFM represents the last consumption of Recency, Frequency consumption frequency and Monetary consumption amount respectively. Divide each dimension into three cases: high, medium and low, and construct four quadrants by using eight elements of important (value, development, retention, retention) and general (value, development, retention, retention) elements, and intuitively Users are divided into 8 different levels, identify the value of users to do user stratification, and implement different operation strategies for users with different value stratifications.

(4) A model based on user preferences; based on the user's differentiation of product functions or commodity preferences, the information of marketers is provided to provide personalized services, such as the purpose of purchasing a house in the real estate field, regional location, price, etc.

 

2 Design and application of labels

Taking a typical e-commerce business as an example to introduce the label design process, there are three steps in total, including business process sorting, business goal determination of label design, and label design.

(1) Business process

The business process funnel includes launching the APP, registering and logging in, active browsing, deep behavior, payment and repeated payment, and finally silent loss. As shown in Figure 2, the inspection dimension of each step is sorted out according to the business process. Then according to the user's behavior in this area, to construct the user's preferred label.

Figure 2: User portraits based on business processes

 

(2) Business goals

Based on different business purposes, companies will try to construct labels from different dimensions, most of which have the same purpose, that is, to refine the operation of the overall transaction amount. According to different business purposes, enterprises will disassemble the business process. For example, the transaction amount can be divided into the transaction amount of new users and the transaction amount of old users; the process of reaching a transaction can be divided into new activation, registration, browsing details, in-depth behavior, and finally payment. Use different strategies for each disassembled link to increase the transaction amount, as shown in Figure 3.

Figure 3: User portrait business goal realization method

 

(3) Label Design

According to the data calculation logic, only when the input value meets the constraints, the final result can fall within an expected range. Therefore, in business transactions, the expected results are different targets located at the bottom as shown in Figure 3, and the labels are these different input values. Enterprises try to use big data analysis to obtain a reasonable range of these input values ​​to obtain the expected results, so the emergence of the model shown in Figure 4.

Figure 4: User portrait label design based on business value

 

Label types are divided into three types according to the labeling method, namely statistical labels, rule labels and machine learning algorithm labels.

For example, Xiao Zhang’s social APP profile shows a man, and when he meets netizens, he describes himself as “thick eyebrows, big eyes and a square face, wearing a dress with a design sense.” 10% off. Therefore, how to judge the gender of Xiao Zhang?

① Statistics tab

Xiao Zhang fills in a male in the social APP, so we think he is a male. This type of label determined based on exact data is called a statistical label; for users, their gender, age, city, constellation, active time in the past 7 days, Fields such as the number of active days in the past 7 days and the number of active times in the past 7 days can be calculated from user registration data, user access, and consumption data. These tags form the basis of user portraits.

② Rule label

Xiao Zhang is wearing a dress with a sense of design. According to people's habitual thinking, Zhang San is a woman. This judgment is based on the rules set by people. As long as someone wears a skirt, it is a woman. This type of label is called a rule label, which is a rule determined based on user behavior. In the actual process of developing portraits, since the operators are more familiar with the business, and the data staff are more familiar with the structure, distribution, and characteristics of the data, the rules for classifying labels are determined by the operators and the data staff through consultation. According to the accuracy of user data obtained by different tags, the rules of tags will be adjusted from time to time.

③ Machine learning algorithm label

The camera combined with algorithms based on various features to judge the probability that Xiao Zhang is a woman. Since Xiao Zhang looks very masculine, the algorithm determines that he is a male. Therefore, Xiao Zhang’s face-swiping payment did not get the discount for active women. This type of label is generated through machine learning mining, and is used to predict and judge certain attributes or certain behaviors of users.

For target groups with clear behavioral data, companies will collect data based on user preference tags; however, if the target group has less behavioral data, such as new users and silent users, generally starting from their life cycle tags, depolarization promotes Conversion and recall strategies.

 

Data Sources

1 Ways to obtain data

User portrait is a complex process, including data collection, data processing, data classification and data storage, etc. As shown in Figure 5, it shows the specific structure of user portrait, and we will analyze the bottom-level user data collection method in detail. .

Figure 5: User Profile Data Warehouse Architecture

As can be seen from Figure 5, the main ways to obtain the underlying data of user portraits include two parts: internal system data and external data, and internal system data includes business data, log data, and buried point data.

 

(1) Internal data

 

① Business data

Including user information table, commodity order table, commodity comment table, search log table, user favorite table, shopping cart information table.

The user information table includes user code, user name, user status (unregistered, registered, logged out), email code, user birthday, gender (natural gender, shopping gender), phone number, whether there is an image, creation time, registration Date, province, city, address, etc.;

The product order table includes the order source identifier (App, Web, H5, etc.), user code, user name, order number, product code, product name, order generation time, order date, order remarks, order status (to be paid, already Completed, cancelled, refunded, payment failed, etc.), order status time, order amount, payment account, payment method, etc.;

The product comment table stores the user's comment information on the product. The main fields include user id, user name, comment content, comment image, comment status (to be reviewed, reviewed, blocked), order id, creation time, creation date, and comment user. IP, update time, etc.;

The user collection table records the data of the user's collection of goods on the platform. The fields mainly include user id, collection date, collection time, item id, item name, collection status (collection, cancellation of collection), modification date, modification time, etc.

The shopping cart information table records the data that the user adds the product to the shopping cart. The main fields include: user id, product id, product name, product quantity, creation date, creation time, book status, modification date, modification time, etc.

② Log data

The access log table stores the relevant information of the user accessing the App and the service (LBS) of the user positioning data, which is parsed from the log data by burying the point in the client. The main fields include device login name, user id, device id, access time, reporting time (the terminal records the time when the user clicks the button), the province where the user is located, the city where the user is located, the url of the previous page, the url of the current page, the operating system, the login date, Longitude, Latitude, etc.

The search log table stores the log data related to the user's search on the APP side. The main fields include the device login name, user id, device id, search id, search date, search time, keywords searched by the user, label content, and random access data for each visit. number etc.

③ Buried point data

The buried point log table is to store the records left by the user when the user accesses the app or web page and clicks the page with the mouse or touch screen. Through the client to do the buried point, do user page statistics and statistical operation behavior monitoring, the main fields are the same as the log data.

The buried point is that the enterprise collects the behavioral data that can reflect the user's usage scenarios and real needs as completely as possible. It also revolves around the four elements in Figure 1, but the data frame is usually 4W (who\when\where\what)+1H ( how), corresponding to the person (who), time (time), place (where), and event (what + how) in the four elements.

  • who

Used to analyze who completed the action, using a unique user ID to associate the action with the user. Commonly used data include user id, mobile phone number, ID card, device or application identification code.

  • where

Locate where the user completes the behavior. Commonly used data include IP (web, mobile phone), GPS (mobile phone), and fill in the location independently (Dianping, Are you hungry, Meituan takeout, etc.).

  • when

To locate when the user completes the action, common data are timestamp and local time.

  • what

To locate the current behavior of users, in order to enable more refined management, the recorded information is becoming more and more detailed. The specific indicators include business data in the internal system data, which can be obtained by burying points.

  • how

Obtain the surrounding environment, means, equipment, etc. when the user acts, and restore the user's environment in the digital world as much as possible. Common data include operating system, device version, device model, network environment (WIFI, 5G), device version (user Use the version number of the device), browser, parent page, etc.

When the user produces a certain behavior and triggers the buried point, the 4W+1H related data is transmitted to the background for analysis, and reported in a fixed way every day, every hour or a certain data limit. Some companies only collect user portrait tag data related to their own business, while most companies collect excessive information, that is, a large amount of data that is not related to their own business. For example, if a user uploads a picture in the picture management software, the software will collect device information and user information. If the picture is a selfie, the specific appearance of the user portrait will also be bound, and the building, house number, and store name in the photo all have The user's identity and location may be exposed, and this information will help companies understand the user's financial status, living habits, and other information.

(2) External data

External data includes a number of data, which are mainly used to make up for insufficient internal user labels or insufficient data volume, and to obtain a more comprehensive user portrait by combining external data. The main external channels include: Internet public data, paid data (data providers), network collection data, data obtained through personal connections, Baidu index and webmaster tools and other detection data. Here are a few main channels:

① Internet public data

The public data is mainly about global, national, local and corporate macro-level statistical data, which will not have a direct effect on user portraits, but can provide reference. For example, the National Bureau of Statistics of China ( http://data.stats.gov.cn/index.htm) includes data on China's economy and people's livelihood; CEIC (www.ceicdata.com/zh-hans) has The economic data of more than 128 countries can accurately find in-depth data such as GDP, CPI, import, export, foreign direct investment, retail sales, sales, and international interest rates; also include Wind, Sou.com, China Statistical Information Network, Amazon Public Datasets, figshare, github, etc.

② Paid data

  • Big Data Trading Center

In 2015, the construction of big data trading centers began in various places. As of the end of 2019, there were 30 large data exchanges (centers). The trading modes of big data in my country can be roughly divided into four types: government-led or endorsed exchanges (centers), An industry data transaction model dominated by industry institutions, a data transaction platform dominated by large Internet companies and IT manufacturers, and a market-oriented data transaction model dominated by vertical data service providers.

  • Data Sharing Between Enterprises

It is difficult for similar credit companies to complete user portraits through their own data, and they usually share data with industry partners.

  • other

Cyber ​​attackers deploy SDKs through various loopholes to obtain the required data and sell them in the underground trading market, forming a complete black industry chain including hackers, multi-level suppliers (data intermediaries) and buyers. Level 4: Level 1 is hackers or company employees stealing personal data of users; Level 2 is to sell the stolen user information to suppliers; Level 3 is that suppliers continue to develop agents and resell data; Level 4 The first level is the information user. After obtaining the data, they can supplement user portraits, conduct telemarketing, or implement telecommunication fraud. For example, a material supplier said in an interview with reporters: "Personal ordinary information such as telephone, WeChat, QQ number, etc., the average cost of getting goods is 0.4 yuan per piece of information, the single sales price is 0.7-0.8 yuan, and the monthly turnover reaches 40-50 yuan. 10,000 yuan, finance, education, medical beauty and other industries do it, and the market demand is huge.”

2 Data acquisition technology

In the Internet era, in order to track, analyze and persuade consumers, advertisers have developed many convenient and mature marketing tracking technologies. Online advertising marketing accompanies every user who browses the web. The advertising industry uses different technologies such as Cookies, Flash cookies, Beacons, browser fingerprints to track user behavior.

① Cookies

Cookies are small browsing files stored in the user's content or hard drive by the website server to record the web page address that the user browses, the time spent on the web page, the user name and password entered on the web page, and the user's browsing habits. It is not generated by this machine, and is usually a small data packet sent from the website that the user browses to detect what the user is doing; it can not only track user behavior, but also recommend users who have visited It saves the user the trouble of re-entering the URL, and the user does not need to re-enter the user name and password to log in. The biggest problem caused by this technology is to track and record user behavior without the user's knowledge, which often leads to access by third parties (such as behavioral advertisers). After the advertiser collects the cookie data, it will deliver advertisements that may be of interest to users through behavioral marketing. At present, the main response method is to use the browser's incognito mode, or to periodically clear the browser's cookies to reduce data leakage.

② Flash cookies

As technology developed, developers found a better way - Flash cookies. Cookies under traditional Http are unstable. Users may clear Http Cookies in the browser, or manually set it to disabled mode in the browser options to avoid data collection. Flash cookies can rewrite the Http cookies deleted by the user to be reborn, so that the originally saved data is re-presented to the analyst after deletion. The traditional method of disabling or clearing cookies in browsers cannot counteract the rewriting, tracking and recording of users' online browsing history by websites.

③ Web Beacons

Web Beacons, also known as web bugs, are transparent GIF or PNG images with a size of 1 pixel that can be hidden in any web page element or email. and write this data to Cookies. Unlike cookies, which can be accepted or disabled by browser users, Web Beacons only come in the form of Graphics Interchange Format (GIF) or other file objects that can only be discovered through detection functions that initially involve positive implications, such as tracking copyright-infringing websites .

Beacon API (Beacon API) is an upgraded version of Web Beacons, it can achieve the same purpose without the use of invisible images or similar means. diagnostic data) back to the web server. This tracking can be accomplished without interfering or affecting website navigation using the Web Beacon API, and is invisible to the end user. This technology was introduced into Mozilla Firefox and Google Chrome web browsers successively after 2014, but in 2021, Google announced that in order to protect user privacy, it has abandoned the use of tracking personal website browsing records.

④ Browser fingerprint

Different users' browsers have their own characteristics. The website can detect the user's browser version, operating system type, installed browser plug-ins, screen resolution, time zone, downloaded fonts and other information. The method of tracking web browsers by the configuration and setting information visible to the website is called "browser fingerprinting", which is like a fingerprint on a human hand and has an individual identification degree. To avoid fingerprinting, users need to disable the website's JavaScript and Adobe's Flash technology. Even computer experts have difficulty protecting their privacy in the face of fingerprint tracking technology. Initially, browser fingerprints are stateful, requiring users to log in to their accounts to obtain valid information; the upgraded browser fingerprints allow users to be more discriminating by continuously increasing the browser’s characteristic values; now it has been based on human behavior and habits. Users establish feature values ​​and even models. On different devices, without using user login, users can be locked to specific user identities only through web browsing habits. This technology is under study. At present, it is difficult to block fingerprint tracking. As long as the user uses a browser to surf the Internet, the user's online whereabouts are like a public state.

⑤ SDK

When detecting user behavior on a website or software, some codes are usually added to the website or software. When the user triggers the corresponding behavior, data is reported, that is, the code is buried. Such code is called detection code on the website and becomes SDK (Software Development Kit) on the app. There are currently some related tools on the market, such as GrowingIO, GA, Shence, etc.

User portrait data problems and analysis

From the perspective of marketing, user portrait technology helps market suppliers to accurately locate customers, and at the same time provides customers with personalized services, which effectively improves the efficiency of market transactions. Although user portrait technology has its social value, in the first two parts, we have combed the user portrait label framework, design and application, label data source and data collection related technologies in detail, and found that enterprises exist in the process of user portrait. There are many data security issues, including data transaction channel compliance issues, illegal data collection technologies, excessive collection of user data, and lack of protection mechanisms for user personal data privacy.

1 Compliance issues of external data acquisition channels

Under normal circumstances, the provision of personal data by users and the provision of personalized services by the platform form a closed commercial loop. However, from the previous analysis, for user portraits, the company's own data cannot meet the demand for label data volume, and companies usually need to obtain some data from the outside. In the data transaction, some self-organized gray markets have emerged. As shown in Figure 6, the platform or its agents sell users' personal data to third-party institutions in the form of clear prices. The commercial closed-loop represented by institutions, third-party institutions provide some "personalized services" to users through the analysis of user information, and these frequent personalized service advertisements have a certain impact on users' lives. Due to the lack of data management, part of the data will flow into the hands of some illegal organizations, marketing false products and defrauding users.

Figure 6: Closed-loop diagram of enterprise data transactions

At present, there are few compliant data transaction channels in the market. In 2015, big data transaction centers were built in various places to promote the legal transaction and circulation of data and serve the market economy. However, data in recent years have shown that it has not met market expectations, and there is a big gap with the initial assumption. The main problem is that there are many gaps in the marketization of data elements such as data rights confirmation, data pricing, and data transactions, and the design of circulation mechanisms, which are easy to touch. Legal red lines. According to Article 42 of the "Cyber ​​Security Law": "Network operators shall not disclose, tamper with, or damage the personal information they have collected, and shall not provide personal information to others. However, the exceptions are those that cannot identify a specific individual and cannot be recovered after processing. "And we can find from the previous analysis that the premise of user portrait is to identify the individual, otherwise the user portrait of the individual cannot be technically realized. In addition to the anonymization of personal data mentioned in the Cybersecurity Law, it is also necessary to obtain user authorization and consent during data transaction and sharing, which will greatly increase the cost of corporate data compliance.

Therefore, promoting compliance with external data acquisition channels requires addressing the following issues:

  • Anonymization (non-de-identification) of personal data to cut off the "person" of the associated element.

  • In the case of anonymizing personal information, complete user portraits (available and invisible); for example, using methods such as federated learning, multi-party secure computing, and differential privacy.

  • A clear data right confirmation plan;

  • Enterprise low-cost data use authorization method;

  • Establish a sound data pricing and benefit distribution mechanism.

 

2 Prevent illegal data collection methods and excessive data collection

Early user data analysis revolves around business data, that is, forming customer consumption profiles through past consumption records. Business data can basically analyze customers' preferences for brands, colors, styles, and price affordability, but these data are not enough to further tap customers' consumption potential. The platform side usually needs more behavioral data to catch the customers' impulse demand with timeliness. For this reason, the platform side uses the technologies such as Cookies, Flash Cookies, Beacons, browser fingerprints, SDK and other technologies we mentioned earlier. , collect behavioral data and analyze the data for user profiling and precision marketing. The collection of behavioral data is shown in Figure 7:

Figure 7: Application provider data acquisition method

In the registration process, the application provider obtains the basic data of the user, and then authorizes it through the unique IMEI (International Mobile Equipment Identity) of the device (the local area network uses the Mac Address to confirm the device address), which can realize the binding of the user and the basic data, that is, Helps the application provider to determine which user the data is coming from. After that, by obtaining more permissions, such as camera, photo, address book, positioning, application list and other functions, the user's real-time behavior data is read, and these behavior data are collected by the application provider, and the word cloud analysis is performed to analyze the user's personality. , hobbies, various life preferences, etc., to profile users. With the accumulation of data, a digital character that mirrors the physical world is formed in the digital world. The control of this digital character, one day in the future, through simulation technology, the data holder can predict the next behavior of the digital character, and synchronously guide the users of the physical world to complete the purpose they want, which will be beneficial to all users. A dangerous thing.

In recent years, some mobile terminal companies have successively provided a new technology for device data protection - OAID (Anonymous Device Identifier), which uses a virtual ID to replace the original IMEI of the device to become the device identification. By providing random anonymous identity, OAID is used for device binding of various applications, so that the device can operate normally and the application provider cannot identify the specific user identity. But in this mode, there are still the following problems:

① The data security problem has not been fundamentally solved

Although OAID effectively solves the problem of unauthorized collection of user data, that is, it prevents application providers from mapping real terminal device identification codes to specific user behaviors, but this method cannot completely solve the problem of data security, because application providers can still register accounts through the application. The personal information left behind identifies a specific user. Regarding the security of registration information, the current solution is relatively complicated, usually using virtual mobile phone numbers or temporary email addresses to register accounts, and frequently registering new accounts to confuse application providers.

② It is unavoidable that the terminal provider collects data

The OAID virtual identity comes from a centralized organization. Although this method prevents application providers from using various technologies to collect terminal data, the terminal provider can map to IMEI through OAID, and the control right is equivalent to transferring from the application provider to the terminal equipment provider. However, there is still a risk of data breaches.

Regarding the problem of excessive data collection, users generally show a repulsive attitude, and terminal service providers also prevent various applications from collecting user information through technologies such as OAID. But with the further development of Internet technology, we will usher in a digital world that mirrors the physical world, and it is unavoidable that more data will be mapped from the physical world to the digital world. Since the general trend, what we need to do is not to prevent data from being collected, but to pay more attention to how to protect the security of the collected data, that is, each piece of data collected by the user can only be used in the same scenario to serve the user. That is, a closed loop between the platform and the user is formed in Figure 5 to avoid data flow to third-party organizations, illegal organizations, and the like.

3 User privacy protection

Although user portraits improve transaction efficiency and reduce supply costs, users' privacy is also controlled by other institutions or organizations, and there are various risks of leakage, including: first, enterprises sell user data through third parties; second, enterprise employees Stealing data and selling it through illegal channels; third, network attackers obtain user data in the system through technical loopholes or stealing the identity of enterprise employees. At present, companies are mainly committed to the protection of data from an ethical level, but a well-known public person once said in public that the Chinese are willing to sacrifice privacy in exchange for convenience. And CCTV's comment on this is: What people fear most is not that he said the wrong thing, but that the technology giants turn a blind eye to the core interests of users and become a kind of blurted truth.

In the digital age, data is crude oil. It can not only promote economic development, but also is an important fuel for breakthroughs in information technology. If we focus on protecting data, we will gradually lose the convenience and infinite convenience that has gradually penetrated into every corner of our lives. Business opportunities, you can't throw the child out just because you want to dump the bath water. Privacy protection and economic development are not binary oppositions. Current solutions include blockchain technology, data anonymization, differential privacy, multi-party secure computing, matrix transformation and other data desensitization technologies, all of which can achieve user data privacy, but The construction of these technologies not only requires the platform to pay the bill, but also affects the existing core interests of the platform, so the current market is very slow to implement such technologies. This situation is gradually turning around. For example, the recent incidents of Didi Taxi, Yunmanman and other Internet platforms being suspended due to data collection non-compliance issues have played a good warning role in the market.

{{o.name}}
{{m.name}}

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324220524&siteId=291194637