PDB95 organize meetings

 

Parallel Data Architecture

https://csruiliu.github.io/blog/20170323-parallel-db-arch/

Shared memory

AB two different processes to share memory description, the same physical memory is mapped to a different AB process address space. A B may be time to see changes to the shared memory

Advantages: share data between processes so that traffic is very convenient, the function interface is also relatively simple.

Inter-process data is not transferred, but direct access to memory, the efficiency has improved.

Drawback: scability too bad node will compete

Fault tolerance is not good memory crash or an error occurs then the whole system will have problems

Shared Disk

Disk sharing a multiple node but each have their own memory

Improve fault tolerance because of the whole parallel database system can run each node has its own memory or a few problems down

scability but also enhance the bottleneck from the memory to the disk to disk multiple node of competition

NO Share

ShareMachine only connected through a network

"Shared-nothing" architecture means that each computer system has its own dedicated memory and a dedicated disk .

Each CPU has its own disk and memory.

The cost to build a lot smaller than the first two

Node communication overhead between multiple large lot, bring a lot of overhead to coordinate these node

It needs its own schedule, their own consider how to make full use of resources

(Better way to each of the data to a disk partition and then merge this ensures that each node can operate)

 

 speedup data remains unchanged, the same number of computer expansion task is completed in less time

scaleup data increase, also increased the number of computers to accomplish more tasks at the same time

People transaction scaleup access increases, the hardware investment to increase response time constant

A server connected to the client more

3.speedup startup and almost linear

 Parallel improve system performance close to linear

Two people doing nearly twice the speed of a person's

After two sub-tasks to do before the people to do the task do not parallel the total check

4.

 

 第二张图 加了硬件性能仍然没有提升 说明算法的问题 或协调额外开销成本过大

第三张图 最后的弯曲 每个任务被分成的大小过小 导致协调成本过高 性能会掉一点

time sharing 每台机器同时响应多个人 当前有请求的人排队,每个时间片给当前有请求的人

时间片循环 服务总人数=时间片*队列长度

从一个人切换到另一个人发生了context switch

context switch   timer发一个中断 进入kernel mode

保存当前运行CPU的状态

将接下来的进程载入CPU rip指向新进程接下来要执行的指令

再从kernel mode切换回user mode 

当time slice的大小接近于context switch的时间 机器感觉大部分时间都在context switch

额外开销抵充了并行提高的效率 并行系统性能下降

 

 scanf取一条记录,count计数,最后汇总

 

 

 范围 分区根据您为每个分区建立的分区键值范围将数据映射到分区。

hash分区 对您标识的分区键应用的哈希算法将数据映射到分区。 哈希算法将行平均分配给分区,从而使分区的大小大致相同。 散列分区是在设备之间平均分配数据的理想方法。

好处 空间使用均与  取一个大素数的模 减少冲突(冲突意味着映射不均匀)

坏处 不适用于连续的数 将连续的学号分布到不同的地方 找起来麻烦

轮循分区 

假设有一组服务器S = {S0, S1, …, Sn-1},一个指示变量i表示上一次选择的
服务器,W(Si)表示服务器Si的权值。变量i被初始化为n-1,其中n > 0。
j = i;
do {
j = (j + 1) mod n;
if (W(Sj) > 0) {
i = j;
return Si;
}
} while (j != i);
return NULL;

 

 

 

 

Guess you like

Origin www.cnblogs.com/wwqdata/p/12099424.html