"Balance sheet" and "status quo" DataOps Shu series of data

640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1


Author: DataPipeline CEO Chen Cheng "Crossing the Chasm" author Geoffrey Moore once said, "no data, operating companies as a deaf and blind person driving a car at high speed." The value of data has never been valued enterprise, IDC estimates that by 2020, there will be 44 trillion worldwide G data, each a Fortune 500 CEO, founder of start-up companies and unicorn are thinking about and how practice can support data, transformation, innovative services to get new growth. Although more and more people agree the data is extremely important asset, but because of the complexity of the full life-cycle management and use of data, data management methodology results in the past, although correct and comprehensive, but often in advancing the process of landing in easy to fall into the high slow-paced investment cycle. Input-output ratio is not clear, most projects halfway died, enterprises have become the majority in data management ineffable pain, but unfortunately most companies still do not find mature and effective philosophy and methodology to organize, promote and guide the value of data landing.





 Dr. Thomas Redman in the "Harvard Business Review" Recommended "data-driven: to profit from the most important assets," wrote "When the smoke data, the business will be on fire," the image of the location of a data timeliness is low, a significant impact on the poor quality of business development. Based on the above background, in this article we will discuss DataOps, one kind to help large companies through culture, processes and tools to promote the value of the data floor, complete concept of digital business transformation inside.
 

First, the data of the "balance sheet"


当下,数据量的增长态势已经远远超了预期,容易让我们产生一种错觉,仿佛这样就拥有了数据资产。

 但我们认为这是对事实的一种简化。单就存储庞大的数据而言,企业就要为此付出大量成本。例如,如果有100PB的数据,存储在亚马逊云服务AWS S3上一年就需要花费2500万美元。如果要让数据发挥价值,那么数据的采集流转、处理计算、质量监测以及提供数据服务的资源成本和人力成本更是会快速上升。
 在这种情况下,如果我们制作一个企业的“数据资产负债表”,到底会有多少数据是企业真正的资产?如何才能增加企业的数据资产呢?

问题看似简单,但很少有企业能在深入思考后得出严谨的回答。导致目前在使用数据的过程中存在“多、乱、慢、差”等情况,严重降低和阻碍了数据发挥作用的价值与效率。所以,只有像经营公司一样精细化地经营数据,数据才能从负债变成资产。

640?wx_fmt=gif&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1


二、当前的数据现状


若想实现精细化运营,就不得不迎接种种难题。当前,拥有几百上千个内外部数据源的组织越来越多,其中包括各种业务、流程、客户数据,结构化、半结构化、非结构化数据。如果再考虑到未来5G和区块链带来的应用级影响,将又会是一种难以言说的痛。
 在这种复杂异构的背景下,企业一方面缺少高效整合数据的方法和工具,另一方面更缺失能从这些数据中寻找规律,发掘价值的人才和文化,因此让理解、整合数据变得雪上加霜。而这恰恰是所有数据使用的起点,如果没有成熟高效的应对之道,数据驱动业务将会沦为空谈。

640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1


除了数据源数量和类型的不断增多,业务本身也在不断地进化调整,从而导致其产生的数据结构或元数据也随之发生改变,以上种种会引发一系列数据链路的连锁反应。

遗憾的是,很多企业制定元数据架构时通常是静态的,可以理解当下的数据架构和含义,但无法在业务的快速迭代发展中,始终保持与业务语义的一致,以致最后逐渐丧失指导数据分析师理解业务的能力,造成数据分析时统计口径不一致等情况,给企业进行重大决策时造成混乱。
 

640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1


当业务部门希望使用数据来辅助决策或者创造新的商业模式时,通常有两个时效性的要求:一是满足数据需求的速度,二是对于所需数据的延迟性。因为业务创新的关键点在于能否快速满足市场需求,不仅需要用数据快速测算市场规模,更需要在时间窗口打开的时机内提供相应的产品和服务从而占领市场。而这一过程越来越受数据供给速度及时效性的影响,例如银行业的实时风控系统,零售业的实时营销系统,工业界的数字孪生系统,都是业务创新对数据实时性要求非常高的典型。而大多数企业的数据部门在这两点的满足上是捉襟见肘的。


640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1



数据被使用只是万里长征的第一步,接下来数据质量差的问题会接踵而至。目前,企业经营者和高管要么不知道数据质量存在问题,要么就是以鸵鸟心态回避和掩盖问题。Gartner的数据质量市场调查显示,糟糕的数据质量平均每年会带来 1500 万美元的损失。尽管所有企业都认同数据质量的重要性,但Gartner认为仍有84%的企业的数据质量处于“不成熟”阶段。损失金钱只是硬币的一面,又有多少公司因为数据质量差,缺乏信任,而错失了创造大量收入的机会呢? 
Finally, data security and privacy is of urgent concern. Every year, many companies because of data breaches suffered a double loss on reputation and finances, the EU has developed standards and rules GDPR, and many technology giants, including Google, out of hundreds of millions in fines, while our country also making process in relevant laws and regulations. Data security and privacy issues is a very critical point, give full play to secure compliance within the scope of the data value is one of the key points DataOps, which is not just a technical problem. But under the premise of security compliance, and maximize the use of data within an organization giving authority dexterity. (Follow-up will focus on the official micro elaborate article, please stay tuned)

Third, under the reflection of the status quo


Most of the time, the reason is not just a problem of data data sector is more organizational structure and cooperation issues.Responsibility compared to some of the tools, but should think about the role of culture in which to play. Therefore, it is time to think deeply "meta-issues" behind these problems.


640?wx_fmt=png&tp=webp&wxfrom=5&wx_lazy=1&wx_co=1


We should use what ideas and methods to face the "meta-problem"? We might start DataOps proceed.

Next DataPipeline will focus on "how DataOps and enhanced enterprise data management" issues, this paper explained the current situation and data management background DataOps arise.

Later will come from "DataOps philosophy and design principles", "DataOps organizational structure and Challenges", "DataOps technical considerations," such as angle in all directions to interpretation.

Interested partners on these issues, please continue to focus DataPipeline .If you have good ideas and comments, please discuss with us.


Guess you like

Origin blog.51cto.com/13905119/2465582