Do you really know what is the "big data" do? Five minutes, you get 9 with key

REVIEW: Over the years, big data as a fashionable concept, a high frequency of occurrence, attention is also high.

Today this article, let's take five minutes to understand what, in the end what is big data.

 

For many people, when he first heard "big data" is the word, will naturally literally to understand - think big data is the large amount of data, big data technology is the large amount of data storage technology.

But that is not the case.

Big Data complicated than expected. It is not just a data storage technology, but a series of massive amounts of data and related extraction, integration, management, analysis, interpretation techniques, is a huge frame system.

Furthermore, the big data is a new way of thinking and business models.

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

 

01 large data definitions

First of all, still have to re-examine the definition of big data .

The definition of the industry have a lot of big data, there is a broad definition, there are narrow definition.

Broad definition, a little taste of philosophy - big data, refers to the physical world to the digital world map and refining. By discovering where data features, which enhance the efficiency of decision-making behavior.

Narrowly defined, technical engineer is given - big data, through the acquisition , storage , analysis , from the large-capacity data mining the value of one new technology architecture.

Comparatively speaking, I still prefer the technical definition, ha ha.

We note that in the above keywords I have the original sentence in bold Ha!

  • What to do? - access to data, data storage, data analysis
  • To whom do? - Large-capacity data
  • What is the purpose? - tap the value of

 

Data acquisition, data storage, data analysis, this series of actions are not new. We are using the computer every day, are doing this thing every day.

For example, the beginning of each month, attendance administrator to obtain the attendance information for each employee, entered Excel spreadsheet, and then there is the computer, statistical analysis of how many people are late, absent, then buckle TA wages.

However, the same behavior on a large body of data, will not work. In other words, the traditional PC, the traditional software routine, unable to cope with the data level, called "big data."

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

 

02 Big Data, in the end how much?

Our traditional personal computer data processing, is the GB / TB level. For example, our hard drive, now usually 1TB / 2TB / 4TB capacity.

 

The relationship between TB, GB, MB, KB, we should be very familiar:

1 KB = 1024 B (KB - kilobyte) 1 MB = 1024 KB (MB - megabyte) 1 GB = 1024 MB (GB - gigabyte) 1 TB = 1024 GB (TB - terabyte)

 

And what level it is big data? PB / EB level.

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

Most people have never heard. In fact, that is, continue to turn 1024 times:

1 PB = 1024 TB (PB - petabyte) 1 EB = 1024 PB (EB - exabyte)

 

Just look at these letters, it looks like is not very intuitive. Let me give you an example.

1TB , only need a hard disk can store. Capacity is approximately 200,000 photos, or 200,000 MP3 music, or 671 "Dream of Red Mansions" novels.

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲ regular hard drive

 

1PB , takes about two storage cabinets. Capacity is about 200 million photos or 200 million MP3 music. If a person constantly listen to music, you can listen to 1900 ......

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲ 2 cabinets

 

1EB , takes about 2000 cabinet storage device. If these emissions and cabinets, can stretching 1.2 kilometers long. If placed in the engine room, we need 21 standard basketball court so much room to put it down.

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲ 21 basketball courts

Ali, Baidu, Tencent this Internet giant, said to have been close to the amount of data EB level.

EB还不是最大的。目前全人类的数据量,是ZB级。

1 ZB = 1024 EB (ZB - zettabyte)

 

2011年,全球被创建和复制的数据总量是1.8ZB。

而到2020年,全球电子设备存储的数据,将达到35ZB。如果建一个机房来存储这些数据,那么,这个机房的面积将比42个鸟巢体育场还大。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

数据量不仅大,增长还很快——每年增长50%。也就是说,每两年就会增长一倍。

目前的大数据应用,还没有达到ZB级,主要集中在PB/EB级别。

大数据的级别定位:

1 KB = 1024 B (KB - kilobyte) 1 MB = 1024 KB (MB - megabyte) 1 GB = 1024 MB (GB - gigabyte) 1 TB = 1024 GB (TB - terabyte) 1 PB = 1024 TB (PB - petabyte) 1 EB = 1024 PB (EB - exabyte) 1 ZB = 1024 EB (ZB - zettabyte)

 

03 数据的来源

数据的增长,为什么会如此之快?

说到这里,就要回顾一下人类社会数据产生的几个重要阶段。

大致来说,是三个重要的阶段。

第一个阶段,就是计算机被发明之后的阶段。尤其是数据库被发明之后,使得数据管理的复杂度大大降低。各行各业开始产生了数据,从而被记录在数据库中。这时的数据,以结构化数据为主(待会解释什么是“结构化数据”)。数据的产生方式,也是被动的。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲世界上第一台通用计算机ENIAC

 

第二个阶段,是伴随着互联网2.0时代出现的。互联网2.0的最重要标志,就是用户原创内容。随着互联网和移动通信设备的普及,人们开始使用博客、facebook、youtube这样的社交网络,从而主动产生了大量的数据。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

第三个阶段,是感知式系统阶段。随着物联网的发展,各种各样的感知层节点开始自动产生大量的数据,例如遍布世界各个角落的传感器、摄像头。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

 

经过了“被动-主动-自动”这三个阶段的发展,最终导致了人类数据总量的极速膨胀。

04 大数据的4Vs

行业里对大数据的特点,概括为4个V。前面所说的庞大数据体量,就是Volume(海量化)。除了Volume之外,剩下三个,分别是Variety、Velocity、Value。

我们一个一个来介绍。

1. Variety(多样化)

数据的形式是多种多样的,包括数字(价格、交易数据、体重、人数等)、文本(邮件、网页等)、图像、音频、视频、位置信息(经纬度、海拔等),等等,都是数据。

数据又分为结构化数据非结构化数据

从名字可以看出,结构化数据,是指可以用预先定义的数据模型表述,或者,可以存入关系型数据库的数据。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲结构化数据

 

例如,一个班级所有人的年龄、一个超市所有商品的价格,这些都是结构化数据。

而网页文章、邮件内容、图像、音频、视频等,都属于非结构话数据。

在互联网领域里,非结构化数据的占比已经超过整个数据量的80%

大数据,就符合这样的特点:数据形式多样化,且非结构化数据占比高。

2. Velocity(时效性)

大数据还有一个特点,那就是时效性。从数据的生成到消耗,时间窗口非常小。数据的变化速率,还有处理过程,越来越快。例如变化速率,从以前的按天变化,变成现在的按秒甚至毫秒变化。

我们还是用数字来说话:

就在刚刚过去的这1分钟,数据世界里发生了什么?Email:2.04亿封被发出Google:200万次搜索请求被提交Youtube:2880分钟的视频被上传Facebook:69.5万条状态被更新Twitter:98000条推送被发出12306:1840张车票被卖出……

 

怎么样?是不是瞬息万变?

3. Value(价值密度)

最后一个特点,就是价值密度。

大数据的数据量很大,但随之带来的,就是价值密度很低,数据中真正有价值的,只是其中的很少一部分。

例如通过监控视频寻找犯罪分子的相貌,也许几TB的视频文件,真正有价值的,只有几秒钟。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲2014年美国波士顿爆炸案,现场调取了10TB的监控数据(包括移动基站的通讯记录,附近商店、加油站、报摊的监控录像以及志愿者提供的影像资料),最终找到了嫌疑犯的一张照片

05 大数据的价值

刚才说到价值密度,也就说到了大数据的核心本质,那就是价值

人类提出大数据、研究大数据的主要目的,就是为了挖掘大数据里面的价值。

大数据,究竟有什么价值?

早在1980年,著名未来学家阿尔文·托夫勒在他的著作《第三次浪潮》中,就明确提出:“数据就是财富”,并且,将大数据称为“第三次浪潮的华彩乐章”。

  • 第一次浪潮:农业阶段,约1万年前开始
  • 第二次浪潮:工业阶段,17世纪末开始
  • 第三次浪潮:信息化阶段,20世纪50年代后期开始

 

进入21世纪之后,随着前面所说的第二第三阶段的发展,移动互联网崛起,存储能力和云计算能力飞跃,大数据开始落地,也引起了越来越多的重视。

2012年的世界经济论坛指出:“数据已经成为一种新的经济资产类别,就像货币和黄金一样”。这无疑将大数据的价值推到了前所未有的高度层面上。

如今,大数据应用开始走进我们的生活,影响我们的衣食住行。比如大数据杀熟,相信大家前一段时间都有所耳闻。

之所以大数据会有这么快的发展,就是因为越来越多的行业和企业,开始认识到大数据的价值,开始试图参与挖掘大数据的价值。

归纳来说,大数据的价值主要来自于两个方面:

1. 帮助企业了解用户

大数据通过相关性分析,将客户和产品、服务进行关系串联,对用户的偏好进行定位,从而提供更精准、更有导向性的产品和服务,提升销售业绩。

典型的例子就是电商。

像阿里淘宝这样的电子商务平台,积累了大量的用户购买数据。在早期的时候,这些数据都是累赘和负担,存储它们需要大量的硬件成本。但是,现在这些数据都是阿里最宝贵的财富。

通过这些数据,可以分析用户行为,精准定位目标客群的消费特点、品牌偏好、地域分布,从而引导商家的运营管理、品牌定位、推广营销等。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

 

大数据可以对业绩产生直接影响。它的效率和准确性,远远超过传统的用户调研。

除了电商,包括能源、影视、证券、金融、农业、工业、交通运输、公共事业等,都是大数据的用武之地。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲大数据甚至能够帮助竞选总统

 

2. 帮助企业了解自己

除了帮助了解用户之外,大数据还能帮助了解自己。

企业生产经营需要大量的资源,大数据可以分析和锁定资源的具体情况,例如储量分布和需求趋势。这些资源的可视化,可以帮助企业管理者更直观地了解企业的运作状态,更快地发现问题,及时调整运营策略,降低经营风险。

总而言之,“知己知彼,百战百胜”。大数据,就是为决策服务的。

06 大数据和云计算

说到这里,我们要回答一个很多人心里都存在的疑惑——大数据和云计算之间,到底有什么关系?

可以这么解释:数据本身是一种资产,而云计算,则是为挖掘资产价值提供合适的工具。

从技术上,大数据是依赖于云计算的。云计算里面的海量数据存储技术、海量数据管理技术、分布式计算模型等,都是大数据技术的基础。

云计算就像是挖掘机,大数据就是矿山。如果没有云计算,大数据的价值就发挥不出来。

相反的,大数据的处理需求,也刺激了云计算相关技术的发展和落地。

也就是说,如果没有大数据这座矿山,云计算这个挖掘机,很多强悍的功能都发展不起来。

套用一句老话——云计算和大数据,两者是相辅相成的。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

 

07 大数据和物联网(5G)

第二个问题,大数据和物联网有什么关系?

这个问题我觉得大家应该能够很快想明白,前面其实也提到了。

物联网就是“物与物互相连接的互联网”。物联网的感知层,产生了海量的数据,将会极大地促进大数据的发展。

同样,大数据应用也发挥了物联网的价值,反向刺激了物联网的使用需求。越来越多的企业,发觉能够通过物联网大数据获得价值,就会愿意投资建设物联网。

其实这个问题也可以进一步延伸为“大数据和5G之间的关系”

即将到来的5G,通过提升连接速率,提升了“人联网”的感知,也促进了人类主动创造数据。

另一方面,它更多是为“物联网”服务的。包括低延时、海量终端连接等,都是物联网场景的需求。

5G刺激物联网的发展,而物联网刺激大数据的发展。所有通信基础设施的强大,都是为大数据崛起铺平道路。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

在这里我为大家介绍一个大数据的交流群,615997810  大家有兴趣的话可以加进来,每周每晚都有大数据基础与项目实战的课程更新,也可以和大家一起相互学习交流讨论,群里的这些我整理了一些大数据资料可以领取(Hadoop,Spark,flink,hbase,es等等,更有源码级的视频讲解资料),可以加群直接找群主免费领取哦。

08 大数据的产业链

接下来再说说大数据的产业链。

大数据的产业链,和大数据的处理流程是紧密相关的。简单来说,就是生产数据、聚合数据、分析数据、消费数据。

每个环节,都有相应的角色玩家。如下图:

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

 

从目前的情况来看,国外厂商在大数据产业占据了较大的份额,尤其是上游领域,基本上都是国外企业。国内IT企业相比而言,存在较大的差距。

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲大数据相关重点领域及企业(技术)

 

09 大数据的挑战

So a good word to say how much data they do not represent big data is perfect.

Big Data is also facing many challenges.

In addition to the technical difficulty of data management, the biggest challenge of big data is safe .

Data are assets, but also privacy. No one wants their privacy is exposed, so people are more and more attention to protect their privacy. The government is also constantly strengthen the protection of citizens' privacy, we introduced a number of laws.

 

Do you really know what is the "big data" do?  Five minutes, you get 9 with key

 

 

▲ The EU introduced in 2018 the most severe ever GDPR ( "General Data Protection Act"), the network rise to an unprecedented level of data protection

In this case, gaining access to user data, we need to carefully consider whether the ethical and legal. Once illegal, they will pay a very heavy price.

In addition, even if legal access enterprise data, but also worried about whether malicious attacks and theft. There's a risk can not be ignored.

In addition to security, big data we are confronted with problems of energy consumption.

In other words, if not properly protect and utilize the hands of big data, then it is a hot potato, there might as well not.

summary

Well, eloquent writing so much, to introduce today on the first here.

The main purpose of this article is to help you establish a basic understanding of big data, there is a basic understanding of big data.

REVIEW: Over the years, big data as a fashionable concept, a high frequency of occurrence, attention is also high.

Guess you like

Origin blog.csdn.net/yimenglin/article/details/92764018