Zero basic entry, senior foodie will take you to understand big data

Xiaozhi is often asked the question: what is big data?

This question may seem simple, but it is not so simple. Maybe a thousand people will have a thousand answers. Yes, everyone has their own understanding of big data, just like Xiao Zhi asked a friend, "What is the best food in Chongqing?" You can get hundreds of answers in minutes.

Today, Xiaozhi will try to explain to everyone from the perspective of foodies, what is big data?

 

1. What is Big Data? How to understand big data?

If you compare data to water on the earth, personal data (all kinds of documents, songs, movies, programs, etc. in the computer) is like a small drop of water, which can quench your thirst when you are tired at most; The data is slightly different. According to the size of the scale, some can be regarded as puddles, some are ponds, and some small fish and shrimp can already be raised for sacrifice; there are also data of some enterprises (such as Facebook, the amount of data that needs to be processed every day in 2012). It has reached 500TB) is already a large lake, which can realize large-scale fishing and large-scale breeding. However, beyond the lake, there is a wider world, which means there is more data to be discovered.

For example, foreigners often complain that Chinese food is not "precise" enough, and many ingredients are roughly described in terms of "a little", "appropriate" and "sufficient amount", and it is difficult to learn the essence in practice. With big data, the main materials, the quantity and proportion of ingredients, and the amount of oil, salt, sauce and vinegar can all be recorded accurately, and even the pork that is produced where it is produced, accompanied by green peppers and watercress, is the best. All data can be recorded. These previously undervalued and collected data are the hidden "water droplets", "ponds" and "lakes" in our big data field. A large amount of existing data, as well as data that has not yet been discovered and recorded, together constitute the basis for the development of the era of big data.

Water droplets, ponds, lakes are found more and more, and they can converge into oceans. The water (data) in the big data ocean is too much to count, and the properties and resources (the value generated by big data) are also abundant. It turned out that we raised the four major fish, "grass silver carp and bighead carp" in the lake. With the data ocean, we can easily get oysters, cod, tuna and so on.

So, do you understand big data? It is to bring together a lot of data and information, and then "catch big fish" in it.

 

2. It is said that big data has the characteristics of 4V , what does it mean?

The 4Vs of big data are "large volume", "variety", "high value" and "fast Velocity", which are also analogous to the ocean and the food inside:

A. Large capacity: About 70% of the earth's surface is ocean. Think about how many water droplets are there and how many delicious foods are there? In the era of big data, every person, every ingredient, and even the changing relationship between flavor and taste every second can form a series of data that can be updated at any time. The scale of data is unprecedentedly large, and the hidden value is far beyond most people. expectations.

B. Diversity: The substances in the ocean are very diverse, including resources and sundries; there are small and fresh seafood such as sea urchins, oysters, and elephant-trunked mussels, as well as large fish such as yellow croaker, cod, and tuna... Big data The structure is also as complex as the ocean. Just taking the file type as an example, there are pictures, text, sound, video, etc., as well as various unstructured data, so before using these resources, they need to be "lined up" Only after classification and processing can we "eat fruit".

C. High value: Needless to say, eel, lobster, salmon... The yellow-lipped fish often costs 30,000 to 40,000 yuan per 100 grams, and the saury that can save lives in critical moments. (A few years ago, there was news about a young Japanese man who gave up his life because of eating charcoal-grilled saury. Will Xiaozhi talk nonsense?) In practical applications, big data can be used to improve the management efficiency of optimized enterprises and discover new Business opportunities, and can also make accurate analysis and prediction of the development of things, etc., all kinds of business value depends on how you use it;

D. Fast speed: eat meat first, then drink soup, this ideal must be understood by everyone. The ocean of data is huge. If you want to find delicious food one step ahead of others, the speed must be fast, which requires us to be able to scan, filter, and process the entire ocean of data quickly. If you only have two small fishing boats, even if you give you the entire Pacific Ocean, you may not be able to run into a well-off life.

PS: The analogy of the 4th V is a bit far-fetched, but it doesn’t affect Xiao Zhi’s performance, they say “if you want to eat meat, you need to be thick-skinned”…

 

3. For the processing of big data, take the example of catching fish in the sea:

Through technical means, the process of discovering rich products hidden in seawater is data mining; (searching for information hidden in a large amount of data through algorithms)

Among the things found, which ones are useful, which ones are weeds and sandstones, make an analysis first, and easily eliminate the wrong, inappropriate, and worthless things. This is data cleaning; (find and correct the data in the file identifiable error)

In the "sea areas" that have been preliminarily screened, it is further scanned which are minerals, which are fishery products, which fish are in the fishery products, the different types, what is the economic value, how much is the quantity... This is data analysis; ( Analyze large amounts of data collected, extract useful information and form conclusions)

Processing and processing the hideous seafood (various numbers and tables) at a glance, and then making a beautiful meal and serving it on the table, presenting it in front of the user with all the colors and flavors (exquisite and intuitive charts), that's what we say data visualization.

 

4. What are Iaas , Paas , Saas ?

Iass is an infrastructure service. IaaS is the utilization of all computing infrastructure, including processing CPU, memory, storage, network and other basic computing resources, users can deploy and run arbitrary software, including operating systems and applications. It's like giving you a dock, equipped with all kinds of hardware equipment. Opportunities and capabilities are given to you, but you still need to rely on your own platforms and tools to obtain resources from the ocean.

Pass is a platform service. The service provided to the consumer is to deploy the application developed or acquired by the customer using the provided development language and tools (eg Java, python, .Net, etc.) to the provider's cloud computing infrastructure. In addition to the wharf, I gave you another ship, and also gave you the captain, chief mate, and sailors. With a system, you can directly face the various resources of the ocean. However, how to catch fish and what tools to use to catch fish is still up to you.

SaaS is a software service, and the service provided to the customer is the application program that the operator runs on the cloud computing infrastructure, and the user can access it through the client interface on various devices, such as a browser. This time, it will be implemented on the specific tools. The fishing plan, the nets for catching fish, and the sailing route are all ready. It just needs to be arranged: you can go to any sea area to catch any fish.

 

5. In the past few years, when talking about big data, you must say Hadoop . Later, there was a Spark . What does it mean?

If my family has been fishing for a living for generations, they used to gather on an island and drive a big boat to go fishing. (storage capacity). No matter how fast it is and how much it can catch, since there is only one ship, the sea area that can be searched is quite limited.

Now we have changed our strategy. If one ship is not good enough, we will find N more ships together. Entire families are scattered across the world's oceans, sharing their ships with other families. When necessary, we can team up with hundreds of boats to fish together. Since the covered sea area is wide enough and enough harvests can be loaded, the corresponding fishing capacity can also achieve exponential growth.

Hadoop is the basic framework of such a distributed system. By managing files in a distributed (sliced, decentralized) manner, it makes full use of the collective power for high-speed computing and storage.

As for the spark, there used to be a speedboat on board, which was originally used for escape, but now it is also used as the main tool for catching fish. (Spark is an open source computing cluster environment similar to hadoop, which enables memory distribution of data sets and reads data directly from memory. The fastest computing speed can be increased by 10 times compared to reading data from hard disks).

 

6. What is big data used for?

There are many application scenarios for big data, such as precision marketing, where fishermen know which sea area has more seafood through years of experience at sea and can sell at a good price; such as public opinion analysis, like tsunami warning, through the analysis of massive information, Compare and find out the areas that may cause tsunami disasters... Of course, the biggest use is "prediction", for example, by analyzing the movement of ocean currents for many years, you can analyze where the fish that you missed today at the Cape of Good Hope will appear next month. What? You say fish is not attractive to you? What if the forecast is for the future rise and fall of stocks? What if the forecast is for the future of the industry?

 

7. What services do big data companies provide?

The first category, cloud platform service providers, like Amazon and Alibaba Cloud, are like countries in the world that manage their own waters. You can go fishing in their waters, and you can hand over your waters to them for management. Can directly buy the finished product they catch;

The second category is data transaction intermediaries. They provide some data themselves, and more importantly, build a transaction platform to match data providers and data users to realize data exchange and facilitate the realization of data value. This is a bit like buying and selling various lakes and oceans. After the buyers get these data, they can integrate it into their own "ocean" to make their own ocean bigger and richer in products;

The third category, big data solution providers, is to send fishing fleets in all corners of the data ocean to provide a series of services such as ocean development, resource scanning, mining and fishing, processing and sales. What you want to do in the era of big data sailing They can handle everything for you.

Whisperly speaking, Xiaozhi 's company, Wisdom Steer , is an enterprise that provides big data solutions, providing data correlation analysis, data deep mining, and customized big data solutions. Hahaha, having said so much, have you increased your understanding of big data? If so, please remember to invite Xiao Zhi to dinner next time you come to Chongqing!

Wa hahaha……

 

END

Reprinted in: https://my.oschina.net/u/3407515/blog/873398

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324138097&siteId=291194637