Decentralized data transaction method and platform based on Ethereum + IPFS

Own thesis, alas

Table of contents

Decentralized data transaction method and platform based on Ethereum + IPFS


Decentralized data transaction method and platform based on Ethereum + IPFS

Summary:

In the process of data transactions, there are issues of unclear data ownership and data security. This paper develops a decentralized data transaction method and platform based on Ethereum + IPFS. The method includes: using natural language processing technology to perform similar calculations on text to confirm data rights; building a transaction-centered data encryption and decryption process through smart contracts to avoid malicious transactions and ensure data security in data transactions. The data trading platform consists of three parts: client front-end, back-end and database, among which the database adopts the mode of Ethereum plus interplanetary file system. Experiments show that the platform can solve the problems of unclear data ownership and data security in the process of data transactions, improve the efficiency of data transactions and reduce data transaction costs, and provide safe and reliable data transaction services for buyers and sellers.

Keywords: data transaction blockchain smart contract data confirmation transaction body

01 Introduction

The era of big data has arrived, and data is about to become an asset or has already become an asset. In recent years, with the development of smart mobile devices and Internet of Things devices, it has been well received by people because of its small size and easy portability. The data has grown exponentially. The nearly 400 million members of Taobao.com generate about 20TB of commodity transaction data every day ,Facebook about 1 billion users generate more than 300TB of log data every day. In terms of market size, the global big data industry market is expanding rapidly. According to data released by market research firm IDC, the global big data and business analysis market will reach US$189 billion in 2020, and is expected to reach US$274 billion by 2024, with a compound annual growth rate of 9.2%. At the same time, the domestic big data industry is also developing rapidly. According to data released by the China Academy of Information and Communications Technology, the scale of China's big data industry reached 838.3 billion yuan in 2019, a year-on-year increase of 15.9%. In 2022, the scale of my country's big data industry will reach 1.57 trillion yuan, an increase of 18% year-on-year, and it will become an important force to promote the development of the digital economy. [1-2].

Data transactions have gradually developed. The central government issued the "Opinions of the Central Committee of the Communist Party of China and the State Council on Building a More Complete System and Mechanism for Factor Market Allocation" to speed up the cultivation of the data factor market. With the promotion and encouragement of national policies and the gradual maturity of machine learning, deep learning, neural network and data mining technology, the role of data in these technologies is becoming increasingly obvious. How to make rational use of big data to promote the progress and development of the new generation of information technology It has become one of the current research hotspots[3-4]. Foreign countries also attach great importance to the development of data, especially the United States. Since 2009, the US federal government has begun to disclose a large number of databases, and published many data in the central information exchange database—— Data.gov website for the convenience of the public. In 2014, the IRS created a shared database called "Get transcript." The U.S. government and the power industry jointly launched the "Green Button" program in 2012 to provide energy usage information for households and businesses. Currently, it has provided services to 59 million households and businesses and helped them save energy. In addition, the U.S. government believes that the most serious challenge in the application of big data is how to ensure data security, and is constantly revising relevant laws and regulations to ensure data security. Leakage legislation, protecting the privacy of non-Americans, regulating the collection and use of student data, amending the Electronic Communication Security Act, etc. Therefore, it is necessary to create a safe data transaction sharing platform.

Nowadays, big data has become a kind of resource. Data is the basic resource of the digital economy and an important factor of production for economic development in the post-epidemic era. However, people's theoretical cognition is far behind the application practice. On the one hand, the data science and application technology With the rapid development, people need to keep learning and updating relevant knowledge in order to keep up with the pace of the times. On the other hand, the collection, processing, analysis and application of data also need to consider many complex issues, such as data quality, data privacy, data security, etc. These issues require people to conduct in-depth research and discussion. The above reasons lead to the inability of data to exert its maximum value, and there are also big problems in data verification and storage security. The key to normal conversion. There are still many problems in data transactions that need to be solved urgently [5]. Data transactions are very different from traditional commodity transactions. Due to the low cost of data replication, data is copied everywhere with one key, so it is difficult to confirm the right of data, and the data has growth potential. The speed is fast, and the value is difficult to estimate. Once the central server crashes or the data center is damaged due to other irresistible factors, the data and data transaction information will no longer exist, data security is difficult to guarantee, and the central server is also facing the risk of hacker attacks. For example: in 2018, Facebook broke user data leakage In the incident, data involving 87 million users was improperly shared with political consulting firm Cambridge Analytica. In the same year, Yahoo announced that its 500 million user account information had been stolen, including sensitive information such as usernames, email addresses, and passwords. In 2021, Under Armor's MyFitnessPal app data breach resulted in the personal information of as many as 15 million users being leaked. Data security storage is particularly important [6].

Blockchain technology is a distributed data storage database. As a decentralized platform, blockchain will promote the formation of a decentralized system. Based on blockchain technology, data transaction information can be traced and cannot be tampered with, data encryption, smart contract control transaction process, and distributed data storage [7-8], which can well deal with the above problems in the data transaction process. We establish a decentralized transaction system through the blockchain, and compare similar data before uploading the data. During the transaction, the data transaction information can be traced and cannot be tampered with, so that the data ownership can be determined; Data encryption ensures data security; smart contracts are used to control the data transaction process to prevent denial of buyers and sellers; distributed storage can prevent server single point of failure and hacker attacks.

02 Blockchain-based data transaction system

This chapter mainly introduces the data transaction process. In the process of data transaction, it involves data confirmation, data encryption and uploading; secondly, it introduces the key problems solved in data transaction; finally, it introduces the technical architecture of data transaction based on Ethereum.

2.1 Description of data transaction process

Data transaction means that the buyer (Buyer, B) searches the corresponding data set on the data trading platform (Data trading platform, DTP) according to his own needs. Data flow to buyer B. First, the seller S initiates a data upload request to the data trading platform DTP , and the data trading platform DTP confirms the data rights after receiving the request. After the data is confirmed, the data is encrypted and uploaded. The key information of the data is stored in the Ethereum , and the data set is stored in the interstellar file system . After the buyer B searches the corresponding data set, it initiates a transaction request to the data transaction platform DTP . The seller S responds to the transaction request , and then completes the data transaction in the data transaction platform. Finally, the data is delivered on the data trading platform , and buyer B can download the data set . The key operations involved in this process are: data right confirmation, data encrypted upload, data transaction, and data delivery . Current data trading platformDTP is implemented under the windows system. The main purpose is to realize the simulation experiment test. In the future, all user operations will be transferred to the smart contract to achieve complete decentralization. The data transaction process is shown in Figure 1 .

Figure 1 Data transaction process diagram

2.1.1 Data rights confirmation

We use natural language processing (Natural Language Processing, NLP) technology [9-10] to perform similar calculations on text. Prevent users from simply modifying data and re-uploading it to the system. This article is only for text data. When users submit data, they need to provide dataset keywords. The role of keywords is to facilitate the retrieval of datasets by the platform and users. There are many processing models for similar texts, for example: short text classification based on weighted similarity of category subject words [11], in which similar text recognition is carried out in the form of subject words weighting. However, the text data targeted in this article are all large text data sets, which are not suitable for using this kind of keyword weighted model. When users upload data on the data trading platform, they must confirm the keywords of the dataset, and the platform will search based on the keywords submitted by the uploader of the dataset. If there is similar data, it will compare the similar data. The six steps of data right confirmation are: word segmentation, hash calculation, weighting, merging, dimensionality reduction, and calculating Hamming distance. as shown in picture 2.

 

 

 

 

 

Guess you like

Origin blog.csdn.net/qq_38998213/article/details/132016814