On the Method of Transaction Processing in Big Data Warehouse

                   On the Method of Transaction Processing in Big Data Warehouse

                                                                                     Li Wanhong2020-10-25

 

 

This is an era of big data. Almost all good products are inseparable from the support of big data, especially artificial intelligence. So, how difficult is it to build a data warehouse and use the big data in it? That is the transaction problem. Big data is generally OLAP, not OLTP, and does not have transaction management functions. However, in actual use, there are scenarios in which data is updated in real time and data is inquired. Therefore, transactions are essential. This gives a simple solution.

One. Locking principle and method

Reids' redLock can provide distributed locks, which can be used to solve big data transaction problems simply and cleverly. For big data warehouses based on HIve and Clickhouse, if you read and write at the same time, you can use RedLock for transaction control. RedLock provides a distributed lock for an operation, and locks the execution in the SQL statement. If it is written to a database, it will not read the data, and vice versa.

For scenarios where transaction requirements are not high, this is sufficient. After all, there is no too frequent writes in big data warehouses, and most of them are reading data, so this solution is sufficient.

For example, after real-time ETL data according to the database log, the data warehouse DW needs to be updated, and the DM main database is updated. At this time, the business application is reading the data of the DM or DW. For this reason, you can update Add a lock when writing to DW, DM, delete the lock when finished, first take the lock when reading the data, if it can be obtained, read the data, otherwise wait until the writing program deletes the lock, then the lock can be obtained and the reading is completed. Then delete the lock. The same is true for writing. You need to acquire the lock first to complete the writing. Otherwise, you need to wait for the reading to complete and delete the lock before you can get the lock and write. After the completion, delete the lock so that other programs can write or read data. .

two. Key code

 @ Slf4j

public class RedLockDemo {

    public static void main(String[] args) {

        //Connect to redis

        Config config = new Config();

        config.useSingleServer (). setAddress ("redis: //127.0.0.1: 6379");

        RedissonClient redisson = Redisson.create(config);

        log.info("Connect to Redis");

 

        //1. Define lock

        RLock lock = redisson.getLock("myTest001");

 

        try {

            //Timeout period for trying to lock

            Long timeout = 300L;

            //Lock expiration time

            Long expire = 30L;

            //2. Acquire the lock

            if (lock.tryLock(timeout, expire, TimeUnit.MILLISECONDS)) {

                //2.1. Processing of successful lock acquisition

                log.info("Lock successfully");

                //...do something     read  or   write dw 、 dm

                log.info("Used");

            } else {

                //2.2. Handling of failed lock acquisition

                log.info("Locking failed");

                log.info("Other processing");

            }

        } catch (InterruptedException e) {

            log.error("Failed to acquire distributed lock", e);

        } finally {

            //3. Release the lock

            try {

                lock.unlock();

                log.info("Lock release successfully");

            } catch (Exception e) {

                //do nothing...

            }

        }

 

        //Close the connection

        redisson.shutdown();

        log.info("Close the redis connection");

    }

}

3. The situation that does not need lock

       When there is no writing and only reading, the lock is not needed, or when there is only writing and no reading, there is no need to lock. For this reason, you need to add a judgment value Flag and put it in Redis. When the program writes or reads, first take this variable Flag, if it is reading, and flag=1, no lock is required; else flag=2, you need to take a lock. When writing is similar, if it is writing and flag=2, no lock is needed; if else flag=1, lock is needed.

This can reduce the use of locks and improve efficiency.

 

    In short, the transaction of big data warehouse cannot be ignored, adding RedLock is a simple and clever solution, you can try it.

Guess you like

Origin blog.csdn.net/qq_34231800/article/details/109273442