New Development Prospects of Big Data Science: Four Trends You Must Know

Since 2012, almost everyone (at least in the Internet world) must say big data, and it seems that they are embarrassed to chat with others if they are not involved in big data. Since 2016, big data systems have gradually begun to enter the deployment stage in enterprises. The hype of big data has gradually dissipated, followed by a period of vigorous development of applications. Some iconic IPOs representing mature technologies have been listed in domestic and foreign capital markets. also keep appearing. In the blink of an eye, the bubble that big data experienced a few years ago is indisputably transferring to artificial intelligence. It can be said that in the past year, the "big bang" of common consciousness experienced by AI is more than that of the big data of that year. Recently, the trend has shifted to the blockchain, which to some extent has become an inducement for the anxiety of industry insiders.

But no matter how the technological hotspots change, what we can see is that as the industry settles down to make a substantial landing, the big data ecosystem is becoming more and more subdivided. Today, let me talk to you about some new changes and new trends in the field of big data.

 

1. Data Governance & Security

As far as the development trend is concerned, this can be put in the first place.

Data has been accumulating rapidly in the enterprise over the years. The Internet of Things (IoT) is accelerating the generation of data.

For many enterprises, the solution for big data is to use technologies such as open source Apache Hadoop as a foundational support to create a data lake, that is, to create a data management platform for the entire enterprise, which is used to store the enterprise in a native format. of all data. A data lake will eliminate information silos by providing a single data repository that the entire organization can use for business analytics, data mining, and more. When there is a data lake, people tend to think that this thing will become an all-round and omnipotent big data set, such as click stream data, IoT data, log data, etc. will be required to enter this lake, and these data Difficult problems are ignored.

However, unless you know exactly what's in the data lake and have access to the right data for analysis, it doesn't make sense for a data lake to be big. So in the end everyone realizes that many data lakes are under-performing resources, people don't know what's stored in them, how to access them, or how to gain insights from that data.

However, it is not easy to easily find what you are looking for while managing permissions. In addition to data lakes, another theme of governance is to provide anyone with easy access to reliable data in a secure, auditable manner.

Therefore, from the perspective of managing and using the company's data assets well, data governance, like the company's top-level system and declaration, needs to be valued and implemented with corresponding strategies and processes. The ultimate goal is to improve data management, ensure data quality, and form a new situation of open sharing by implementing data governance. In addition, data governance is also a system of organic combinations of decision-making, functions, and operational processes, and people are held accountable for these data assets.

2. Development of a data workbench dedicated to collaboration

In most large enterprises, big data adoption starts with a handful of independent projects, and so do some: like doing a little Hadoop cluster here, using an analytics tool there, running a simple business model, and realizing the need to set up some New positions (Data Scientist, Chief Data Officer), etc.

Now, business scenarios are getting richer and more heterogeneous, and a variety of tools are used across the enterprise. Within a company's organization, centralized "data science departments" are gradually giving way to more decentralized organizations, as centralized departments are increasingly bottlenecked and more prone to resource drain.

This group of data scientists, data engineers, and data analysts is increasingly embedded in different business units. So the need for the platform is already obvious, that is to make everything work together, because the success of big data is based on setting up an assembly line of technology, people and processes.

As a result, new types of collaboration platforms (such as Jupyter, etc.) are emerging at an accelerated pace, leading the development of the so-called DataOps (as opposed to DevOps) space.

 

3. Data Science Automation

Data Scientist is still a hot contender in the market. But we rarely see such people around, and even Fortune 1000 companies struggle to hire more “data scientists.” And in some organizations, the data science department is evolving from enabler to bottleneck.

At the same time, the democratization of AI and the proliferation of self-service tools has made it easier for data engineers with limited data science skills, or even data analysts, to perform basic operations that until recently were the domain of data scientists . With the help of automated tools, a lot of big data work, especially those that are simple and boring, will be handled by data engineers and data analysts instead of data scientists with deep technical skills. Of course, even so, data scientists don't need to be too "fearful" just yet.

For the foreseeable future, self-service tools and automated models will "augment" data scientists rather than eliminate them, freeing them to focus on tasks that require judgment, creativity, social skills, or vertical industry knowledge In terms of tasks, it can better reflect the name of the scientist.

 

4. The Rise of Big Data Administrators

The big data administrator (BDA) is also the same as the database administrator (DBA). Although the two English letters only change the order, their connotations are quite different. A very clear trend is that there will be demand for a new role, that of big data administrators. Everyone is very familiar with DBA, but it is very different from data administrators in the era of big data.

Data stewards sit between data consumers and data engineers. In order to be successful, data administrators must understand the meaning of the data and master some techniques applied to the data in addition to the maintenance of the big data system.

Data stewards need to understand the type of data analysis that needs to be performed across the organization, which datasets are well suited for the job, and how to transform the data from its raw state into the shape and form that data consumers need to perform the job. Data stewards should use systems like self-service data platforms to expedite the end-to-end process of data consumers accessing essential datasets without making countless copies of the data.

 

Epilogue

The above four aspects are the new requirements put forward by data science in the development of practice. Whoever can achieve good results in these aspects will take the lead in this era of big data.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325444770&siteId=291194637