Big data base - a must-see article learn big data

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/Mr_Yang888/article/details/102749645

Project process big data

  1. Data Production
  2. data collection
  3. data storage
  4. demand analysis
  5. Data preprocessing
  6. Data calculation
  7. data storage
  8. 8. The results show

Big Data Basics

table of Contents

  • What is a server?
  • What is RAID?
  • What is a cluster?
  • What is a network?
  • What is a switch, a local area network?
  • What is the network topology, rack?
  • NIC Introduction
  • Why high-speed rail fast?

What is a server?

Goal: What is the master server .
Server: also known as server, is a high-performance computer equipment to provide computing services.
Configuration server includes a processor, hard drive, memory, a system bus, and general computer architecture similar .
Because the server needs to provide highly reliable service, so in terms of processing power, stability, reliability, security, scalability, manageability, and so demanding.
And the server are the same computer functions , it may be called a server computer, but the stability and security of the server and the data processors have higher capacity. For example, we feel free to browse a website, found this site can be accessed 24 hours a day, why? The reason is that the server can not shut down the site, to ensure stable operation for a long time, and have to bear a lot of people at the same visit

Server Type

By Application level classification:
entry-level servers, workgroup servers, departmental servers and enterprise servers into four categories.
By End-use:
general-purpose server, dedicated server type categories.

Divided by the chassis structure :
Tower server
blade server
rack server (1U, 2U, 4U) 1U = 1.75 inch = 4.445 centimeters (cm)
rack server
tower server
tower server is a server configuration most easily understood type, vertical PC because of its shape and structure we usually use almost related, of course, scalability because the server motherboard stronger, but also more than a bunch of slots, so head larger than normal number of motherboard, so tower server the host chassis also larger than the standard case, generally sufficient internal space reserved for future expansion hard drives and redundant power supplies.
Here Insert Picture Description
The internal structure of
Here Insert Picture Description
the blade server
blade server refers to the insertion of a plurality of server units within the cassette in the standard height of the rack cabinet, high availability and high density. Each piece of "blade" is actually a system motherboard. They can start by "on-board" Hard own operating system, such as Windows NT / 2000, Linux and other
Here Insert Picture Description
rack server
form factor rack servers does not look like a computer, and like the switch, there is 1U (1U = 1.75 inch = 4.445 CM), 2U, 4U and other specifications. Rack-mounted servers installed in a standard 19-inch rack inside. This structure is a multi-functional server
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
rack server
in some high-end enterprise server because of the complex internal structure, internal equipment more, and some units have many different devices or several servers in a cabinet, which rack server is the server. Typically rack by the rack, blade server apparatus along with other combination.
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Storage disks (HDD)
objectives: to master disk types and differences.
Hard mechanical hard drives (HDD), solid state drive (SSD), and solid state hybrid drive (the SSHD) points.
Here Insert Picture Description
Here Insert Picture Description
Here Insert Picture Description
Mechanical hard drive:
A traditional mechanical hard disk is an ordinary hard disk, mainly by: some part of the disk, the head, and the disk spindle motor control, the head controller, data converters, interfaces, buffers and other components.

SSD (the SSD) :
Hard SSD solid state electronic storage chip is made of an array, using flash memory to store the granules, the SSD by the control unit and a storage unit (FLASH chip, a DRAM chip) components. SSD is also exactly the same shape and size of the product is identical to the normal hard disk drives in general definitions and specifications, function and use of the interface.

Hybrid hard disk :
1, shock and drop resistance: mechanical hard drives are disk type, data stored in the sector of the disk. SSD using flash memory and the particles (i.e., memory) made, so SSD solid state hard drive that there are no internal mechanical parts, the likelihood of a collision and the shock when the data loss is minimized. Compared with mechanical hard drive, solid hard has an absolute advantage.
2, data storage speed: PConline evaluation room evaluation data from the point of view, the relative mechanical hard drive SSD performance more than 2 times.
3, power consumption: power consumption should be lower than mechanical hard disk SSD.
4, Weight: SSDs are lighter in weight compared with conventional 1.8-inch drive, light weight 20-30 grams.
5, price: As of now (2018/11/20), the brand 128Gb SSDs is about 150 yuan. The 1Tb mechanical hard drive price was around 280. SSDs are more expensive than mechanical hard drive prices lower price.
6. Life: long life mechanical hard drive, solid state drive short life.

What is RAID

目标:了解什么是RAID,特点是什么,有哪些种类
RAID ( Redundant Array of Independent Disks )即独立磁盘冗余阵列,通常简称为磁盘阵列。简单地说, RAID 是由多个独立的高性能磁盘驱动器组成的磁盘子系统,从而提供比单个磁盘更高的存储性能和数据冗余的技术。RAID 是一类多磁盘管理技术,其向主机环境提供了成本适中、数据可靠性高的高性能存储。

RAID特点

RAID特点
(1) 大容量
  它扩大了磁盘的容量,由多个磁盘组成的 RAID 系统具有海量的存储空间。现在单个磁盘的容量就可以到 10TB 以上,这样 RAID 的存储容量就可以达到 PB 级。
(2) 高性能(分布式存储
   RAID 的高性能受益于数据条带化技术。单个磁盘的 I/O 性能受到接口、带宽等计算机技术的限制,性能往往很有限,容易成为系统性能的瓶颈。通过数据条带化, RAID 将数据 I/O 分散到各个成员磁盘上,从而获得比单个磁盘成倍增长的聚合 I/O 性能。
(3) 可靠性(更安全,防止数据丢失)
  可用性和可靠性是 RAID 的另一个重要特征。理论上由多个磁盘组成的 RAID 系统在可靠性方面应该比单个磁盘要差。这里有个隐含假定:单个磁盘故障将导致整个 RAID 不可用。 RAID 采用镜像和数据校验等数据冗余技术,打破了这个假定。 镜像是最为原始的冗余技术,把某组磁盘驱动器上的数据完全复制到另一组磁盘驱动器上,保证总有数据副本可用。
(4) 可管理性
  RAID 是一种虚拟化技术,它对多个物理磁盘驱动器虚拟成一个大容量的逻辑驱动器。对于外部主机系统来说, RAID 是一个单一的、快速可靠的大容量磁盘驱动器。这样,用户就可以在这个虚拟驱动器上来组织和存储应用系统数据。 从用户应用角度看,可使存储系统简单易用,管理也很便利。
RAID种类
RAID(0-7)、RAID00、RAID10、RAID01、RAID100、RAID30、RAID50、RAID60、
常用的RAID 等级有 RAID0 、 RAID1 、 RAID10 、 RAID01 和 RAID5 。

RAID0
RAID0 是一种简单的、无数据校验的数据条带化技术。实际上不是一种真正的 RAID ,因为它并不提供任何形式的冗余策略。 RAID0 将所在磁盘条带化后组成大容量的存储空间,将数据分散存储在所有磁盘中,以独立访问方式实现多块磁盘的并读访问。由于可以并发执行 I/O 操作,总线带宽得到充分利用。再加上不需要进行数据校验,RAID0 的性能在所有 RAID 等级中是最高的。
  RAID0 具有低成本、高读写性能、 100% 的高存储空间利用率等优点,但是它不提供数据冗余保护,一旦数据损坏,将无法恢复。 因此, RAID0 一般适用于对性能要求严格但对数据安全性和可靠性不高的应用,如视频、音频存储、临时数据缓存空间等。
  
RAID1
RAID1 称为镜像,它将数据完全一致地分别写到工作磁盘和镜像 磁盘,它的磁盘空间利用率为 50% 。 RAID1 在数据写入时,响应时间会有所影响,但是读数据的时候没有影响。 RAID1 提供了最佳的数据保护,一旦工作磁盘发生故障,系统自动从镜像磁盘读取数据,不会影响用户工作。
  RAID1 与 RAID0 刚好相反,是为了增强数据安全性使两块 磁盘数据呈现完全镜像,从而达到安全性好、技术简单、管理方便。 RAID1 拥有完全容错的能力,但实现成本高。 RAID1 应用于对顺序读写性能要求高以及对数据保护极为重视的应用,如对邮件系统的数据保护。

RAID5
   RAID5是有数据校验的数据条带化技术,数据分布在阵列中的所有磁盘上,使用校验盘技术,按照块的方式来组织数据,校验数据分布在阵列中的所有磁盘上。
应该是目前最常见的 RAID 等级,对于数据和校验数据,它的写操作可以同时发生在完全不同的磁盘上。RAID5 还具备很好的扩展性。当阵列磁盘 数量增加时,并行操作量的能力也随之增长。
  RAID5 兼顾存储性能、数据安全和存储成本等各方面因素,它可以理解为 RAID0 和 RAID1 的折中方案,是目前综合性能最佳的数据保护解决方案。 RAID5 基本上可以满足大部分的存储应用需求,数据中心大多采用它作为应用数据的保护方案。

什么是集群?

目标: 掌握什么是集群、什么是网络、什么是交换机、局域网、
了解什么是网络拓扑、网络的种类及优缺点、IDC数据中心。

集群是一组相互独立的、通过高速计算机网络互联的计算机,它们构成了一个组,并以单一系统的模式加以管理。一个客户与集群相互作用时,集群像是一个独立的服务器。
计算机集群简称集群是一种计算机系统, 它通过一组松散集成的计算机软件/硬件连接起来高度紧密地协作完成计算工作。在某种意义上,他们可以被看作是一台计算机。集群系统中的单个计算机通常称为节点,通常通过局域网连接,但也有其它的可能连接方式。集群计算机通常用来改进单个计算机的计算速度和/或可靠性。一般情况下集群计算机比单个计算机,比如工作站或超级计算机性能价格比要高得多。

什么是计算机网络
计算机网络是指将地理位置不同的具有独立功能的多台计算机及其外部设备,通过通信线路连接起来,在网络操作系统,网络管理软件及网络通信协议的管理和协调下,实现资源共享和信息传递的计算机系统。

什么是交换机
交换机(Switch)意为“开关”是一种用于电(光)信号转发的网络设备。它可以为接入交换机的任意两个网络节点提供独享的电信号通路。最常见的交换机是以太网交换机。其他常见的还有电话语音交换机、光纤交换机等。

什么是局域网?
局域网是指在某一区域内由多台计算机互联成的计算机组。一般是方圆几千米以内。局域网可以实现文件管理、应用软件共享、打印机共享、工作组内的日程安排、电子邮件和传真通信服务等功能。局域网是封闭型的,可以由办公室内的两台计算机组成,也可以由一个公司内的上千台计算机组成。

什么是网络拓扑
网络拓扑(Network Topology)结构是指用传输介质互连各种设备的物理布局。指构成网络的成员间特定的物理的即真实的、或者逻辑的即虚拟的排列方式。

以太网络
优点:
是当前局域网的实时标准,配置方便,即插即用,软件支持丰富。
价格便宜,随处可得。
缺点:
无论是延迟还是吞吐量都不如一些专用网络。
用途:
是构建局域网最方便的方式。
现在被广泛用于云计算中的大规模数据处理集群中。
常见的带宽,1Gbps以及10Gbps。

InfiniBand network
advantage:
extremely low latency (less than 400 nanoseconds), high throughput (up to 40Gbps).
Advanced architecture (Offloading Engine, Zero Copy).
Cons:
expensive, less software support.
Low impact, not compatible with traditional Ethernet.
Purpose:
used for high-performance computing.
Common bandwidth, 10Gbps, 20Gbps and 40Gbps.

What is the rack?
The full name of the server rack, is fixed telecommunications patch panel for the cabinet, and the device housing. Typically 19 inches wide, 7 feet tall. For the IT industry, it can be understood as a simple cabinet storage server.
Cabinet is generally used to store computer and associated control equipment items produced cold rolled steel or alloy may be provided to protect the storage device, shielding of electromagnetic interference, orderly and neatly devices, easy maintenance after the equipment. Cabinets are generally divided into server cabinets, network cabinets, consoles cabinets.

IDC data center
Internet data center (Internet Data Center) referred to IDC, the telecommunications sector is to utilize the existing Internet communication lines, bandwidth resources, establish a standardized professional-grade telecommunications room environment, providing server hosting for businesses, government, rental and related value-added aspects of the full range of services.

**

Why Harmony, renaissance number so quickly? (Why big data faster than traditional database core)

Objective: Learn teacher trains and high-speed rail speed difference is the reason
Here Insert Picture Description
Here Insert Picture Description
because ( distributed power )
vintage train power concentrated in a front.
Harmony, the revival of power - distributed across multiple front, cars.
Here Insert Picture Description

Guess you like

Origin blog.csdn.net/Mr_Yang888/article/details/102749645