Big data privacy AI

1. Big Data

The term "big data" has been proposed since the end of 1990.
Big data describes the activities of collecting, accumulating, and analyzing large amounts of heterogeneous data (text, audio, video, images, metadata, etc.).

With the development of the Internet, the available data has increased exponentially.
The main sources of big data are:

  • Internet search
  • Social network
  • application
  • voice assistant
  • smart phone
  • Physical sensors and all other Internet of Things.

The phenomenon of big data is one of the main driving forces of economy, society and politics in the 21st century, and it has promoted the development of big data analysis technologies such as artificial intelligence.

1.1 Big data monopoly

Google and Facebook share the largest share, leaving only breadcrumbs.

More than 92% of web searches are conducted through Google, and Google processes more than 40,000 requests per second-more than 34.5 billion per year. The remaining markets are divided into Bing, Yahoo, Baidu, YANDEX and DuckDuckGo.

Only Facebook and Google can control more than 84% of advertising campaigns (targeted advertising) in the world (excluding China). 98.5% of Facebook's revenue comes from targeted advertising.

This monopoly has distorted the collective dream of creating a machine that enriches very few people with data created by billions of people every day and shapes the reality for their use and consumption. Most people don't know all of this.

The phenomenon of big data has given birth to the so-called supervisory capitalism:
supervisory capitalism is a new economic order that claims that human experience is the raw material for generating products and services that can be sold and shaping future predictions of human behavior .

The concept is very simple: any human behavior or experience will produce raw data. These data can be acquired and processed in real time to obtain useful information to predict our future actions, thoughts and feelings.

The person is both a raw material and a consumer . In this new production process, the person who creates products starting from his own data is in a weird endless loop.

Surveillance capitalism and big data are just concepts. The real engine of the new economy is algorithms and mathematical models based on big data.

1.2 Problems with big data analysis algorithms

Cathy O'Neill (Cathy O'Neil) in her "math weapons of mass destruction," a book the algorithm described as a point of view , and integrated into a mathematical model . We humans usually call it stereotypes . These mathematical stereotypes are usually hidden behind "personalized" or "optimized" services.

Now, every human domain is full of automatic decision-making by algorithms that affect the reality of millions of people every day: from information available on the Internet, job hunting, politics, education, justice, and finance.

These automated decision-making processes will have negative impacts (negative external effects) on communities and individuals. The organizations that adopt them, whether public or private, have no feedback, except for profit. The algorithm to increase profits is an effective algorithm.

  • The Cambridge Analytica scandal and the serious impact of the 2016 U.S. election and the Brexit referendum showed the world the dark side of big data and the power of psychoanalysis.
  • In August 2020, many English students were discriminated against by an algorithm adopted by the government that determines their high school grades based on data that has nothing to do with the students’ actual skills.
  • Even recently, an Italian judge sentenced Deliveroo to use an algorithm that would downgrade the rider based on his absence, regardless of the reason.

2. Privacy

In 2014, the European Data Protection Commission "Working Group 29" issued some statements about the impact of big data.
The committee has high hopes for big data and expects to find an innovative way to protect personal data while bringing many collective and personal benefits.

The General Data Protection Regulations ("GDPR") adopted by the European Parliament on April 14, 2016 will be officially implemented in EU member states on May 25, 2018. The scope of application of this regulation is extremely broad, and any organization that collects, transmits, retains or processes personal information in all member states of the European Union is bound by this regulation.

So far, Google still claims not to sell personal data. Indeed, their business is not to sell products, but to process personal data to obtain (and sell) intelligence. Personal data is ideal for Google to make blacksmith shops.

All hopes for transparency in the information overload and incredibly complex personal data processing operations have been destroyed. Most of the time, people don’t understand what’s going on, or even ask.

Lack of understanding is what causes people to lose interest in privacy.

3. Artificial Intelligence Algorithm AI

Artificial intelligence is a technology that is increasingly used in certain activities that seriously threaten human rights, such as:

  • Border control and immigration
  • Biometric quality monitoring
  • Social score and predictive police

The European Union has the opportunity to intervene to show the world that it is possible to carry out technological innovation while respecting people’s basic rights.

3.1 Federated learning

Federal machine learning is also known as federated learning, joint learning, and federated learning. Federal machine learning is a machine learning framework that can effectively help multiple institutions to perform data usage and machine learning modeling under the requirements of user privacy protection, data security, and government regulations.

In most industries, data often exists in the form of islands due to issues such as industry competition, privacy and security, and complicated administrative procedures. Even achieving centralized data integration between different departments of the same company faces numerous obstacles. In reality, it is almost impossible to integrate data scattered in various places and organizations, or the cost required is huge.

In response to the dilemma of data islands and data privacy, many institutions and scholars have proposed solutions. Aiming at the privacy issues of mobile terminal and multi-party data, Google and WeBank respectively proposed different "Federated Learning" (Federated Learning) algorithm frameworks. Google proposed a "Federated Learning" algorithm framework based on personal terminal devices, and AAAI Fellow Professor Yang Qiang and WeBank subsequently proposed a systematic general solution based on "Federated Learning". It can solve the problem of joint modeling between individuals (2C) and companies (2B). On the premise of meeting data privacy, security, and regulatory requirements, design a machine learning framework to allow artificial intelligence systems to use their own data more efficiently and accurately.

For example, suppose there are two different companies A and B, they have different data. For example, enterprise A has user characteristic data; enterprise B has product characteristic data and label data. According to the above-mentioned GDPR standards, the two companies cannot crudely merge the data of both parties, because the original provider of the data, that is, their respective users, may not agree to do so. Assuming that both parties establish a task model each, each task can be classification or prediction, and these tasks have been recognized by their respective users when the data is obtained, then the question is how to build high-quality models on each side of A and B. Due to incomplete data (for example, enterprise A lacks label data, enterprise B lacks user characteristic data), or insufficient data (the amount of data is not enough to build a good model), then the model at each end may not be established or the effect is not ideal . Federated learning is to solve this problem: it hopes to ensure that the own data of each company cannot be exported locally, and then the federal system can exchange parameters under the encryption mechanism, that is, establish a virtual joint venture without violating data privacy laws and regulations. model. This virtual model is like an optimal model built by everyone who aggregates data together. But when the virtual model is established, the data itself does not move, nor does it leak privacy and affect data compliance. In this way, the built models only serve local targets in their respective areas. Under such a federal mechanism, the identity and status of all participants are the same, and the federal system helps everyone establish a strategy of "common prosperity".

Insert picture description here
According to the distribution characteristics of island data, federated learning is divided into three categories:

  • The data features (X1, X2,...) of the two data sets have a large overlap, while the users (U1, U2...) have a small overlap; [horizontal federated learning]
    Insert picture description here

  • The users (U1, U2...) of the two data sets have a large overlap, while the data features (X1, X2,...) have a small overlap; [Longitudinal federated learning]
    Insert picture description here

  • The user (U1, U2...) and data feature overlaps (X1, X2,...) of the two data sets are relatively small. 【Federal Transfer Learning】
    Insert picture description here

Reference

[1] https://privacy-network.it/un-sogno-chiamato-big-data/
[2] 《My Data Is Mine
[3] https://baike.baidu.com/item/%E8%81%94%E9%82%A6%E6%9C%BA%E5%99%A8%E5%AD%A6%E4%B9%A0/23618046?fr=aladdin
[4] https://baike.baidu.com/item/%E9%80%9A%E7%94%A8%E6%95%B0%E6%8D%AE%E4%BF%9D%E6%8A%A4%E6%9D%A1%E4%BE%8B/22616576?fr=aladdin
[5] https://privacy-network.it/iniziative/contro-intelligenza-artificiale-che-viola-i-diritti-umani/

Guess you like

Origin blog.csdn.net/mutourend/article/details/112857515