What exactly is a "real-time data warehouse"?

1. Are real-time data warehouses, databases, and lake warehouses integrated?

Indeed, from a beginner's perspective, these technologies may sound confusing to everyone. What kind of relationship do they have? Let me briefly sort it out for you.

1. Big data platform
Let's talk about the most basic one called big data platform first. What is a big data platform? A big data platform is a technical platform that uses big data technology to solve data problems, that is to say, it is a collection of some basic capabilities of a technology, and it does not necessarily solve a specific business problem. For example, a data warehouse is a product that solves a specific business problem in the use of data, but a big data platform is a general product, so this general product can solve many aspects of data demands. We can use the big data platform to flexibly assemble it into a specific solution that meets one of our business scenarios. It is such a concept. In other words, the big data platform is a generalized technology platform. So the more typical ones, such as the data warehouse field, are widely used by everyone. In the past, we used hadoop based on hive. Now we can see a lot of big data platforms like input, procedure and clob, which can solve our problems. Some analysis and calculation problems of certain data. But frankly speaking, the platform of big data itself is still a technology platform, so this is the first one.

2. Data center
The second thing we will talk about is the data center. It can be said that in the past three to four years, the data center has been very popular. There are also some start-up manufacturers in China, and everyone is doing data center. What is a data center? We want to emphasize that the data center is not actually a technical platform, it is a business platform, that is to say, the data center is to reorganize the data services within our enterprise through business methods, and provide our front-end business system A type of platform that provides support. So the middle platform of our data here may be more for our business side to solve our business problems, so its bottom layer will rely on some basic technologies. For example, we can build our own data middle platform based on a big data platform, or build our underlying technology based on some other technology combinations to support our data middle platform. But from the concept of the middle station itself, it is not to solve technical problems, it is to solve the data business problems of our enterprise.

3. Data lake
The third is the data lake. The concept of data lake is relatively later than the concept of data warehouse. What kind of problem does it appear to solve? In the past, we used a very standardized and standardized form of organization for data, and we did a good job of data modeling. Well, we also see now that there are more data that are not modeled strictly according to our data, or they are very fragmented, scattered everywhere, very multi-mode, and there are different types of data Store some data in such a form. In the past, there was no way to organize and manage these data well. Later, because of such demands, data lake technology appeared. Through this name, you can also see that the data lake is a data lake. Its meaning is like a lake, which can gather the scattered data in our enterprise together, and then I provide certain data for calculation and processing. Some capabilities, this is the data lake we are talking about.

Then the data lake and our data warehouse had a corresponding relationship before. The data warehouse emphasizes this kind of modeling, which is relatively regular, and the data with such conditions that are preset in strict accordance with a specification requirement. Then these data are often stored in our data warehouse, and its calculation and query efficiency will be higher. The data lake also involves some data that has not been standardized. So the two are complementary. Of course, from the perspective of the enterprise, I now have two different data carriers. My standard, pre-modeled, and standardized calculation data is in our data warehouse. However, there are still some data lakes whose data is very scattered. So is there a unified view of data? At this time, our technology of integrating lake and warehouse appeared.

4. Lake-warehouse integration
The technology of lake-warehouse integration is the fusion of data lake and data warehouse technologies, providing a unified solution. Look at the data within our enterprise from a higher dimension. Therefore, the integration of lakes and warehouses provides a more global perspective to look at our data.

5. Real-time data warehouse
So the real-time data warehouse we talked about today is actually a technical platform that has been specially strengthened for the real-time part of our data warehouse. It provides us with In the field of real-time data warehouses, it is also a technical term for a unique technology in some fields that require our data collection, calculation, processing, and implementation requirements.
insert image description here

2. What exactly is a real-time data warehouse?

As we all know, data warehouse is a very old technology, which has been developed for thirty or forty years from the 1980s to the present. In the past, data warehouses were mainly used to solve some of our offline problems. Now we see more and more enterprises using data warehouses in some real-time fields, and the concept of so-called real-time data warehouses has emerged. Then why are there such demands? More comes from our enterprise's requirements for real-time data, which has gradually become more important, and even in some cases will be greater than the value of our data analysis. This time highlights a significance of our real-time data warehouse. Here we see that there are many technologies to support our data warehouse, including many familiar concepts such as data lake, lake warehouse integration, and cloud-native data warehouse. What kind of relationship do they have with real-time data warehouses? Woolen cloth? Today we will discuss with you on this occasion what kind of difference and value data warehouse technology and real-time data warehouse can bring to our enterprise.

What are the important development stages of real-time data warehouse? From the perspective of the underlying architecture, what is the most fundamental difference between the real-time data warehouse and the offline data warehouse? What are the key technologies that make the real-time data warehouse "dream into reality"?

3. Let’s first look at the first question is the development of data warehouse

I also mentioned before that during the historical development of data warehouses from the 1980s to the present 30 to 40 years, the data warehouse has probably gone through several stages:
from the early offline data warehouse, it solved the basic problems of our enterprise data analysis , it is difficult to meet our analysis requirements from the original transactional database; the offline data warehouse provides a data analysis capability under a certain data scale; now as we have higher requirements for real-time data, the following emerges Some real-time data warehouse branch technologies. For example, the Lambda architecture and Kappa architecture that everyone is more familiar with now appear more to meet our data demands in real-time processing and real-time query. This is also a prototype of our real-time data warehouse. Now we can see that the data warehouse has been used in more fields. The Lambda architecture mentioned just now includes the Kappa architecture, which also solves real-time problems to a good extent. Of course, we now have some better technologies to satisfy our real-time data warehouse.

What is the difference between our real-time data warehouse and our offline data warehouse in essence? It can be easily distinguished from the name: one is offline; the other is real-time, which is the most essential difference. The offline data warehouse means that the entire process of our entire data acquisition, processing, processing, and calculation is in an offline form, that is to say, it is not an online method. The real-time data warehouse is anti-knowledge that it provides an online real-time capability, which is the biggest difference between the two. It is precisely this difference that enables our real-time data warehouse to solve many of our business scenarios. Some scenarios that require high real-time data that cannot be met with offline data warehouses in the past can be solved with real-time data warehouses.

What kind of technology is there now to solve such problems? Including the architectures like Lambda and Kappa we talked about before. In fact, behind it is some development of our stream processing architecture, including now that we also have some infrastructures such as cloud data warehouses that provide us with a good platform through the cloud. Including the combination of AI and data warehouse, as well as some other fields, in fact, the birth of these technologies has paved the way for our real-time data warehouse.

Of course, we mentioned that the traditional data warehouse technology still has great support for the existing real-time data warehouse, including the more typical MPP-like architecture, which is still the mainstream implementation technology in our real-time data warehouse. It can be said that it is precisely such emerging technologies and some existing technical foundations of our offline data warehouse that jointly build a good technical foundation for our real-time data warehouse, and will also play a role in the future development of the entire real-time data warehouse. Good help.

4. For the current traditional enterprises, especially financial enterprises, what is the construction of real-time data warehouses? What is the demand for real-time data analysis?

It can be said that real-time data warehouse is a cross-domain and cross-industry basic technology, which can be well applied and developed in different fields. For our traditional enterprises, represented by the financial industry, they actually have higher requirements for data. I was also a practitioner in the financial industry before. The financial industry is known as a highland for data applications. Their requirements for the rigor and real-time performance of data will be very strict and demanding. In fact, the technology of real-time data warehouse will bring extraordinary significance to the financial industry. We know that finance will have stricter requirements on our data. In the past, many financial scenarios were limited by our underlying technology, and there was no way to achieve it well. Now that there is a real-time data warehouse, in fact, there will be some new business breakthroughs for the financial industry. For example, real-time risk control, anti-fraud including real-time marketing, online analysis, etc. are more common fields. It is precisely because of the emergence of real-time data warehouses that a good foundation can be laid for the financial industry to meet In response to the demand for more new business forms in some financial industries. In addition to the financial industry, some other enterprises include many industries such as logistics, manufacturing, games, e-commerce, etc., and their requirements for real-time data have their own characteristics. Then the emergence of real-time data warehouses has also opened up a new possibility of business development for these industries.

So I said that real-time data warehouses will have relatively good development in various industries. Of course, due to the development stages of different industries, the development of real-time data warehouses in different industries is also different.

Guess you like

Origin blog.csdn.net/java_cjkl/article/details/129702321