Restoring the Substrate chain: unexpected epoch changes

Problem Description

With the blockchain customized by Substrate , after all the nodes have listened for a while, the chain cannot work.

Substrate: polkadot-v0.9.25
Consensus Protocol: babe+ grandpa
System: macOS Big Sur (11.3)
cargo: cargo 1.63.0-nightly (a4c1cd0eb 2022-05-18)

babeThe protocol algorithm requires blocks to be produced in each epoch (session), so when the following conditions occur in the chain and the block cannot be produced normally, the chain will become bricked and cannot work normally .

  • The grandpa node is less than the Byzantine fault tolerance requirement, the block cannot be confirmed, and the address cannot produce a block within one epoch
  • All validator nodes are offline

Solution

When this problem occurs, it can be solved in two ways

Hard Spoon

The principle of this method is to obtain the state data at the last normal block height through the rpc interface (or state database), generate a new genesis block, and run the chain based on the new genesis block. Essentially a completely new chain that just inherits the state data.

If you don't care about the history of the block and only need to keep the current state data, you can use this method.

Refer to fork-off-substrate

I haven't had any success with this approach.
The dependent polkadot-js version is too low. After the upgrade, a new genesis block file can be generated and run with the new genesis block, but an error is still reported.

Time Warp

The reason why Babe cannot work normally is that in the next epoch after the last block is produced, there is no normal block production, so if all validator nodes adjust the system time to the time of the last block production, theoretically the chain can be restored to work. Under normal circumstances, the blockchain and the external time are consistent, artificially let the blockchain return to the historical time (the last block time), and then catch up with the time of the external world with accelerated time.

modify the code

My chain depends on substrate crates in the form of git path

The two crates dependencies to be modified need to be replaced

Modify the root Cargo.tom , excerpt as follows, and set the path to the crates directory you modified:

[patch."https://github.com/paritytech/substrate"]
sc-consensus-slots = { path = "/path/to/custom/sc-consensus-slots" }
sp-timestamp = { path = "/path/to/custom/sp-timestamp" }

Modified two crates, refer tod0decd9

Here we modify it to run at double speed.

sc-consensus-slots
sc-consensus-slots/src/slots.rs

/// Returns the duration until the next slot from now.
pub fn time_until_next_slot(slot_duration: Duration) -> Duration {
    
    
	// HOTFIX: poll the slot 2 times as often since we might be in a time warp.
	let slot_duration = slot_duration / 2;
	...
}

sp-timestamp
sp-timestamp/src/lib.rs

WARP_FACTOR: The multiple of time warp, which corresponds to the block generation time to be compressed, set your own multiple reasonably; :
FORK_TIMESTAMPFork moment, the base time of the distortion, here select the next block generation time of the last confirmed block; :
REVIVE_TIMESTAMPEach verification node of the plan The time to start running does not need to be very precise, it is less than the time at which the verification nodes get the modified executable file and run it;

impl InherentDataProvider {
    
    
	/// Create `Self` while using the system time to get the timestamp.
	pub fn from_system_time() -> Self {
    
    
		let timestamp = current_timestamp().as_millis() as u64;

		// HOTFIX: mutate timestamp to make it revert back in time and have slots
		// happen at 2x their speed from then until we have caught up with the present time.
		const REVIVE_TIMESTAMP: u64 = 1642105111666; // 2022-08-25T20:18:31.666Z
		const FORK_TIMESTAMP: u64 =   1642006674001; // 2022-08-26T16:57:54.001Z
		const WARP_FACTOR: u64 = 2;

		let time_since_revival = timestamp.saturating_sub(REVIVE_TIMESTAMP);
		let warped_timestamp = FORK_TIMESTAMP + WARP_FACTOR * time_since_revival;

		// we want to ensure our timestamp is such that slots run monotonically with blocks
		// at 1/2th of the slot_duration from this slot onwards until we catch up to the
		// wall-clock time.
		let timestamp = timestamp.min(warped_timestamp);

		Self {
    
    
			max_drift: std::time::Duration::from_secs(60).into(),
			timestamp: timestamp.into(),
		}
	}
	...
}

perform recovery

  1. Back up local data
    such as:cp -r node1 node1.bak
  2. Roll back unconfirmed blocks
    such as:./node revert --chain ./specRaw.json --base-path ./node1
  3. Start the chain with a time-warped program, pointing to the original data
  4. Observe the log or the management interface to see if the block can be generated normally
  5. wait to catch up with external system time
  6. Continue running with the original program

reference link

  1. https://github.com/paritytech/substrate/issues/4464
  2. https://github.com/paritytech/substrate/issues/11673
  3. https://substrate.stackexchange.com/questions/168/how-to-unbrick-a-substrate-chain-revert/175#175%EF%BC%89
  4. https://medium.com/polkadot-network/kusamas-first-adventure-2cd4f439a7a4
  5. https://github.com/maxsam4/fork-off-substrate

Guess you like

Origin blog.csdn.net/DAOSHUXINDAN/article/details/126541054