How does Redis achieve 2.2 million ops in big data analysis

Original: How does Redis achieve 2.2 million ops in big data analysis

In the era of big data, massive data analysis has become our daily work, just like eating. In order to better provide the company with operational decisions, all kinds of clever and even whimsical ideas will follow! The business is changing, and it is determined that the system must be modified every day and the data must be re-run, which requires extremely high speed of reading and storing massive data!

The company adds hundreds of millions of lines of business log data every day, from which we need to analyze business portraits of various dimensions. After a long period of exploration, Redis was chosen as the cache for reading and writing data.

 

1. The development platform, C#Net, writes Windows services to capture the original log data, merge and compress it, and write it to the Redis cluster.

2. Each business system traverses the Redis cache data from the time dimension, analyzes and processes it line by line, and writes the intermediate and final results to Redis.

3. Another set of Windows services grabs the result data in Redis and saves it back to the database. It's kind of like how MQ works here.

In fact, there is only one system in the first step, which is the data foundation. Second and third generally each subsystem has a pair. Even the results of system A directly access the result data placed in Redis by system B.

The overall degree of coupling seems to be a bit high, but this set of architecture has achieved extremely high speed, and a single subsystem instance can process 10,000 to 100,000 orders per second! And many sets of subsystems work at the same time, and a single subsystem will not consume all the performance of Redis due to business reasons. A single stress test was performed on a Redis server, and the maximum speed was 2.22 million ops. The test was a relatively simple business, and the total number of orders that met a certain business rule was counted.

 

Why do you need such high speed? ?

Once the business rules are changed, after modifying the program, it is often necessary to rerun the historical data of the most recent week and month. How many times a day? If it catches up with the peak season of Double Eleven, the speed is too slow to catch up with real-time data.

 

How Redis achieves 2.2 million ops

1. Redis is a single-threaded model, so a 32-core server installs 32 instances

2. Data is sharded, and the key is hashed and distributed to dozens of instances.

3. Turn off persistence, operation and maintenance and Linux to ensure reliability

4. Control the size of the data packets. High-performance network communication should not send and receive a large number of small packets. The best control is around 1400 bytes, and the worst is pipeline.

5. Other small tricks that can be easily found on the Internet

 

Why not use a database? ?

After a lot of verification, the same 32-core servers, the database giants generally get a query speed of 20,000qps and a write speed of nearly 10,000tps. This is a test based on a single table with millions of data and two indexes. If the data reaches tens of millions or hundreds of millions, and there are two more indexes, reading and writing are performed at the same time, then the speed will only be less than a quarter. What a miserable word!

In big data analysis, many are temporary data that need to be merged, superimposed, deduplicated, etc. Their life cycle is not long, usually 24 hours or 48 hours, and many are two or three hours. The key is that the amount of data is still very large. , tens of millions a day are common. This kind of data, it is very inappropriate to write to the database.

With Redis, a 32U512G machine can hold billions of compressed historical data for a month, and the resource usage is around 50%.

 

I am a big rock, playing since 1999, an 18-year-old code farmer. Currently engaged in data analysis architecture work in the logistics industry. Welcome to C# Big Data

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326053353&siteId=291194637