[Paper Notes] "FLchain: Federated Learning via MEC-enabled Blockchain Network" Intensive Reading Notes

Information of the paper:

DOI:  10.23919 / APNOMS.2019.8892848

 

table of Contents

1. Abstract

2. Preliminaries and Definitions

2.1 Channel

2.2 Global Model State Trie

3. System Model

4.The Operation of FLchain

4.1 FLchain process

4.2 Transaction Pool

4.3 Global Model Update

4.4 Consensus Protocol

4.5 Analysis

5.Evaluation

6. Thinking


 

 

1. Abstract

1) Proposed "FLchain" based on the blockchain network.

2) Introduced "the Global model state trie".

 

2. Preliminaries and Definitions

2.1 Channel

Separate peers into small "regions", each of which belongs to the same channel. Only peers in the same channel have the right to read, submit, and verify transactions in the channel. Each channel has a separate ledger and a separate consensus. In FLchain, for each global model, a new channel with a genesis block (used to store the ledger of the channel) is created. The genesis block stores the initial weight of the global model, the dimension of the weight, hyperparameters, activation function, and bias.

2.2 Global Model State Trie

Similar to the "Account StateTrie" used to track the status of an account. The "Global Model State Trie" proposed by the author is used to track the weight of the global model in FLchain. Each channel has its own trie in the form of "Merkle Patricia tree". Global Model State Trie stores weights in key-value pairs, where "key" is the weight position (the index of the weight in the table below), and "value" is the weight coefficient. After reaching a consensus, trie provides updated weight coefficients for the global learning model.

 

3. System Model

The author combines multi-access edge computing (MEC) and blockchain network to propose a system model suitable for FL. FLchain includes mobile devices and edge devices (such as Fig.1). The mobile device uses the data sample on the device to calculate the local model update. Edge devices have two functions: ①Provide network resources to mobile devices with limited resources; ②Serve as a node in FLchain's blockchain network.

 

Each global model is performed on a separate channel.

In FLchain, the blockchain network is composed of edge devices. For a specific channel, it stores local model updates from the device on a separate blockchain in the form of blocks. The blockchain network also calculates and securely stores the global model update of a specific channel on the ledger.

Fig.2 shows the simplified structure of the blockchain of the specific channel FL.

FLchain's underlying blockchain platform should be custom developed, with the characteristics of Hyperledger Fabric (a distributed ledger technology) and Ethereum (Ethereum).

 

4.The Operation of FLchain

4.1 FLchain process

Algorithm 1 above describes the operation of FLchain.

[L.2] Initialization.

tRepresents the number of global iterations; w_{j}represents the global model weight of w_{i,j}channel j ; represents the local model weight of device i in D_ {j}channel j ; represents the number of devices on channel j.

The experimental part of this article is basically the same as the method of "Blockchained On-Device Federated Learning" , both of which solve the linear regression problem, and the Loss Function is basically the same, so I won't repeat it in this article.

[L3~L7] Allocate channels for each device.

When a device wants to join a channel, it executes the "channel inquiry" operation, and then the blockchain network sends a channel list to the device.

The device selects a channel and marks it c_{i}=j. Then, go to the selected channel to register, and the device will get a set of public and private keys. The device can upload the local model weight coefficients to the channel through the public and private keys.

[L8~L20] Model update in a channel. among them,

[L9~L17] Operation of all devices in the channel.

[L10~L11] is to download and synchronize the global model to the local. ( Download the latest global model parameters from the blockchain through the edge node in the channel.  Question: How did the edge node be selected? What if the edge node crashes? )

[L12~L14] The local device performs V round calculation (through formula 6).

The method in this article also uses SVRG (random variance gradient descent).

[L15~L16] Upload the local model weights to the edge nodes of the channel. The block is generated and forwarded to the blockchain network. And wait for the channel's response.

[L19] Use formula (7) to calculate global weight.

[L20] Update the global model state Trie, block generation, and consensus.

 

The following is a timing diagram.

 

4.2 Transaction Pool

This article mentioned the concept of "Transaction Pool", which I understand as "Transaction Pool". The transactions submitted by the device are accumulated in the transaction pool (actually a mempool) until the time reaches the channel T_{wait,j}(the cumulative time of transactions in the memory pool of each global model iteration). T_{wait,j}After that, the edge node will forward it to the blockchain. Each node of the blockchain network has its own channel-specific mempool. Some transactions will be "late" due to network delays and other reasons. These "late" transactions will be discarded and cannot be used for the next global model update calculation.

4.3 Global Model Update

When it arrives T_{wait,j}, the edge node in channel j will generate a block that stores the information of the t-th iteration in the channel's mempool. The global model state trie saves the global model parameters w_{j}(t). The root of the global model state trie is added to the block header. DANE (Distributed Quasi-Newton Method) used for global model weight update.

4.4 Consensus Protocol

Some devices may not be able to report their local models to the victory miners within the specified time, which requires a protocol to handle these devices.

After the miners broadcast the latest block, peers in the blockchain must verify the block transaction and check the updated global model state trie. Peers calculate their own local model state trie and verify the root of the global model state trie against the broadcast block. If the block is found to be valid, the blockchain network must reach a consensus for the blockchain. But if the block is invalid, it will be rejected. The block broadcast of the winner miner is attached to a specific distributed ledger. Since each blockchain node has to calculate, verify, and confirm the global model state trie, blockchain-based FL is more reliable and robust than traditional FL. The basic consensus protocol can be replaced by pBFT (Byzantine Fault Tolerance) or POW.

4.5 Analysis

After each iteration of the global model, a judgment will be made to check whether FL has reached the desired result.

The condition for the end of iteration:, \left \|w _{j}\left ( T \right )-w_{j}\left ( T-1 \right ) \right \|_{2}\leq\varepsilon _{threshold,j}where \varepsilon _{threshold,j}is a predefined constant greater than zero.

 

5.Evaluation

The main advantages of this article:

  • FLchain provides a separate channel for learning each global model, which is used to store the updated consensus and ledger of the local model belonging to the channel.
  • The global model trie is also maintained on a per-channel basis. It can safely store the global model weights in the Merkle Patricia Tree, and it can also be traceable.
  • In FLchain, the global model update has a blockchain network for calculation, verification, and storage, instead of a central server, which is more robust than traditional FL.

 

6. Thinking

6.1 What is the essential function of channels?

Current understanding: The device is "blocked" through the channel. When the device uploads the local model, it is temporarily stored in the mempool of the edge node in the channel, and when it is reached T_{wait,j}, the edge node forwards the content in the mempool to the blockchain. In addition, when the device downloads the global model, it is downloaded from the blockchain through the edge node of this channel.

 

6.2 The transaction pool is just to temporarily store the local model uploaded by the device? All the uploads from this channel device are there, wait until unified processing? Finally uploaded to the blockchain together?

So far it looks like this.

 

6.3 All those exceeding Twait are lost. Isn't this a malfunction? (In the previous article, what should I do if the timeout is exceeded? Is it assumed that the upload is successful?)

The author did not explain the situation. The previous article did not consider the issue of being late.

This article is equivalent to a treatment of the "late" problem, that is, throw it away and not deal with it, so as not to let it affect the update of the global model.

Guess you like

Origin blog.csdn.net/Aibiabcheng/article/details/108957134