The Science of the Blockchain笔记（一）

一、分布式系统

1. 什么是分布式系统

分布式系统（distributed system）是建立在网络之上的软件系统。今天的计算和信息系统本质上都是分布式的，比如我们的手机，具有与云分享数据以及存在多个处理器和存储单元的特性。

2. 分布式的原因

（1）地理位置：现在大的公司都会分布在不同的地方，每个地方会有很多台的计算机用于处理计算等事务。
（2）并行性：为了加速计算，还会使用多核处理器和计算集群。
（3）可靠性：数据在不同机器上都有备份，有效防止数据丢失。
（4）可用性：备份的数据使得我们能够快速访问。

3. 分布式的优缺点

分布式系统有很多的优点：增加了存储、计算能力以及连接空间分离位置的可能性。但同时也具有缺点：一致性问题，此问题在分布式系统中经常发生，一台机器可能在几年内发生一次故障，对于一个有着数百万节点的分布式系统来说，可能每分钟就会发生一次故障。

二、容错性&Paxos

物理上我们不能改变分布式系统会频繁发生故障的事实，但可以希望系统可以容忍一些故障并继续工作。那么如何创建具有容错性（fault-tolerance）的分布式系统呢？

1. 简单的客户端-服务器算法

Algorithm 1. Naive Client-Server Algorithm
1: Client sends commands one at a time to server

一个分布式系统由很多节点组成，每个节点可以执行本地计算，还可以发送消息给其他的节点（消息传递，message passing）。

       算法1则是实现了客户端与服务器之间进行消息传递，但是它存在两个问题：
      （1）消息损坏（message corruption），即成功接收消息但其内容被损坏，在消息中增加附加信息，如校验和，就可以解决这个问题。
      （2）消息丢失（message loss），即消息未能成功抵达接收器。这样算法1则不能正确运行，于是需要对它改进。

2. 具有确认的客户端-服务器算法

Algorithm 2. Client-Server Algorithm with Acknowledgments
1: Client sends commands one at a time to server 2: Server acknowledges every command 3: If the client does not receive an acknowledgment within a reasonable time, the client resends the command

       算法2实现了只有收到上一条命令的确认后才会发送下一条（Line 1），但确认也会丢失，于是客户端会重新发送命令（Line 3）。此算法是很多可靠协议的基础，如TCP。
       该算法可以很容易地扩展到多个服务器：客户端将命令发送到所有的服务器，一旦客户端收到来自所有服务器的确认，该命令即被认为是成功执行的。
       但是对于多个客户端的情况呢？这时就会出现“可变消息延迟（variable message delay）”的问题，即尽管在两个相同节点传输，也可能会有不同的传输时间，导致服务器以不同的顺序执行命令，从而会出现状态不一致。举例说明，假如算法2应用于有2个客户端s1和s2，2个服务器u1和u2的系统，两个客户端都发布更新服务器上变量x的命令，初始时x=0。客户端u1发送命令x=x+1，u2发送命令x=2*x，由于有可变消息延迟，可能存在此种情况：s1先接收到u1的消息，s2先接收到u2的消息，因此s1计算x = (0 + 1) * 2 = 2，s2计算x = 2 * 0 + 1 = 1。这样就导致状态不一致的问题。

3. 状态复制

为了解决可变消息延迟导致的状态不一致问题，提出了状态复制（state replication）的方法：如果所有节点以相同顺序执行命令c1, c2, c3,…,那这一组节点就实现了状态复制。有两种算法可以实现状态复制：
（1）使用串行器进行状态复制

Algorithm 3. State Replication with a Serializer
1: Clients send commands one at a time to the serializer 2: Serializer forwards commands one at a time to all other servers 3: Once the serializer received all acknowledgments, it notifies the client about the success

由于单个服务器的状态复制很简单，因此可以将某个服务器指定为串行器（serializer），通过它分发命令，就可以自动实现状态复制。算法3则是使用串行器进行状态复制。
串行器的作用是转发，但是如果串行器出现故障怎么办？由于串行器联系着整个系统，一旦故障，整个系统就陷入瘫痪！那么有没有一个更分布化的方式来实现状态复制呢？
（2）两阶段协议

Algorithm 4. Two-Phase Protocol
Phase 1 1: Client asks all servers for the lock Phase 2 2: if client receives lock from every server then 3: Client sends command reliably to each server, and gives the lock back 4: else 5: Clients gives the received locks back 6: Client waits, and then starts with Phase 1 again 7: end if

算法4使用互斥锁实现状态复制，但是次算法并没有解决节点故障问题，事实上比使用串行器的算法3还要糟糕，因为算法3只要求串行器这一个节点响应，而算法4要求所有节点都响应！
假如将算法的Phase 1修改为尝试获取大多数锁，是否可行？这时就会有一些问题：如果2个或更多的客户端同时尝试获取大多数锁，这时就需要其中1个或几个客户端放弃他们已经获得的所有锁，以防死锁。但是如果在这些客户端释放锁之前出现故障，这样系统就会进入死锁状态。那么为了解决这些问题，我们是否需要一个不同的概念呢？

4. 简单的票证协议

票（ticket）的定义：票是比锁的更弱的一种形式，它有以下的特点：
（1）可重新发行：即使之前的票未被返还，服务器可发行票；
（2）过期票：服务器只接受最新发行的票t。

Algorithm 5. Naive Ticket Protocol
Phase 1 1: Client asks all servers for a ticket Phase 2 2: if a majority of the servers replied then 3: Client sends command together with ticket to each server 4: Server stores command only if ticket is still valid, and replies to client 5: else 6: Client waits, and then starts with Phase 1 again 7: end if Phase 3 8: if client hears a positive answer from a majority of the servers then 9: Client tells servers to execute the stored command 10: else 11: Client waits, and then starts with Phase 1 again 12: end if

Algorithm 5. Naive Ticket Protocol

     Phase 1
1: Client asks all servers for a ticket
     Phase 2
2: if a majority of the servers replied then
3:     Client sends command together with ticket to each server
4:     Server stores command only if ticket is still valid, and replies to client
5: else
6:    Client waits, and then starts with Phase 1 again
7: end if
     Phase 3
8: if client hears a positive answer from a majority of the servers then
9:     Client tells servers to execute the stored command
10: else
11:    Client waits, and then starts with Phase 1 again
12: end if

此算法存在这样的问题：假如有2个客户端u1和u2，3个服务器s1, s2和s3，①u1已经成功在s1, s2和s3上存储命令c1；②u2在u1的第三阶段之前发送命令c2给s2和s3，并成功存储；③u1和u2都通知服务器执行命令。这时一些服务器执行c1一些执行c2，就造成了不一致的状态。

5. Paxos

Algorithm 6. Paxos
Client (Proposer) Server (Acceptor) Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . c ~ command to execute Tmax = 0 ~ largest issued ticket t = 0 ~ ticket number to try C = ⊥ ~ stored command Tstore = 0 ~ ticket used to store C Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1: t = t + 1 2: Ask all servers for ticket t 3: if t > Tmax then 4: Tmax = t 5: Answer with ok(Tstore, C) 6: end if Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7: if a majority answers ok then 8: Pick (Tstore, C) with largest Tstore 9: if Tstore > 0 then 10: c = C 11: end if 12: Send propose(t, c) to same majority 13: end if 14: if t = Tmax then 15: C = c 16: Tstore = t 17: Answer success 18: end if Phase 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19: if a majority answers success then 20: Send execute(c) to every server 21: end if

Algorithm 6. Paxos

      Client (Proposer)                                                 Server (Acceptor)
      Initialization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
      c          ~ command to execute                              Tmax = 0     ~ largest issued ticket
      t = 0     ~ ticket number to try
                                                                                    C = ⊥          ~ stored command
                                                                                    Tstore = 0     ~ ticket used to store C
      Phase 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  1: t = t + 1
  2: Ask all servers for ticket t
                                                                                    3: if t > Tmax then
                                                                                    4:     Tmax = t
                                                                                    5:     Answer with ok(Tstore, C)
                                                                                    6: end if
      Phase 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
  7: if a majority answers ok then
  8:     Pick (Tstore, C) with largest Tstore
  9:     if Tstore > 0 then
10:         c = C
11:     end if
12:     Send propose(t, c) to same majority
13: end if
                                                                                    14: if t = Tmax then
                                                                                    15:     C = c
                                                                                    16:     Tstore = t
                                                                                    17:     Answer success
                                                                                    18: end if
      Phase 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
19: if a majority answers success then
20:     Send execute(c) to every server
21: end if

       Paxos算法改进了票证协议算法，此算法中服务器在第一阶段不再只是分发票证，还会通知客户端它目前存储的命令。如果u2已经得知u1已经成功存储了c1，就不再试图存储c2，而是支持u1发送存储c1的命令,这样客户端都执行相同的命令，服务器收到命令的顺序就不再会导致不一致问题。
       不过Paxos要求服务器崩溃数目少于一半,如果有一半（或更多）的服务器崩溃，Paxos将无法得到进展，由于客户端无法达到大多数。
       最后，我们就得到了一个相对较好的算法 - Paxos，即使系统中有少数节点崩溃，也能实现状态复制，达成一致状态。