JMS learning eight (ActiveMQ message persistence)

ActiveMQ's message persistence mechanisms include JDBC, AMQ, KahaDB, and LevelDB. There is also a memory storage method. Since memory does not belong to the category of persistence, and if you use memory queues, you can consider using more suitable products, such as ZeroMQ. So memory storage is out of the scope of the discussion.

No matter which persistence method is used, the message storage logic is consistent.

Messages are divided into Queue and Topic. Queue is point-to-point consumption. The sender sends a message, and only one and only one consumer can consume it.

Topic is a subscription type of consumption. A message can be consumed by many subscribers. Subscribers are divided into persistent subscriptions and non-persistent subscriptions. Persistent subscription means that even if the subscriber is not currently online, the message sent by the sender to the Broker after the subscription will be consumed when the persistent subscriber goes online again, and messages will not be lost. Instead of persistent subscribers, only subscribers will consume when they are online. When they are not online, even if the broker receives new messages, when it goes online again, it will not receive missed messages.

ActiveMQ's persistence mechanism, for Queue type messages, will be stored in the Broker, but once one of the consumers completes consumption, the message will be deleted immediately. For Topic-type messages, even if all subscribers have finished consuming, the Broker may not delete useless messages immediately, but keeps the push history, and then deletes useless messages asynchronously. The offset of which message each subscriber consumes will be recorded in the Broker to avoid repeated consumption next time. Because messages are consumed sequentially, first in, first out, you only need to record where the last message was consumed.

The way to configure persistence is to modify the %ACTIVEMQ_HOME%conf/acticvemq.xml file.

The following describes the characteristics of several persistence methods:

JDBC: Many enterprise-level applications prefer this storage method. The advantage is that most companies have a dedicated DBA, and the database is used as a storage medium, which will make companies with talents in this area feel more at ease. In addition, the storage method of the database, you can see how the message is stored, you can query the message consumption status through SQL, and you can view the message content, which is not available in other persistence methods. Another advantage is that the database can support strongly consistent transactions and distributed transactions with two-phase commit. The disadvantage is performance issues, database persistence is the lowest performance way.

The reason why the persistence method of the database is introduced first is because we can well understand how ActiveMQ stores and consumes messages through the table structure.

The database will create 3 tables: activemq_msgs, activemq_acks and activemq_lock.

activemq_msgs is used to store messages, Queue and Topic are stored in this table.

The main database fields are described below:

ID: auto-incrementing database primary key

CONTAINER: Destination of the message

MSGID_PROD: The primary key of the message sender client

MSG_SEQ: is the order in which messages are sent, MSGID_PROD+MSG_SEQ can form the MessageID of JMS

EXPIRATION: The expiration time of the message, which stores the number of milliseconds from 1970-01-01 to the present

MSG: The binary data of the Java serialized object of the message ontology

PRIORITY: Priority, from 0-9, the higher the value, the higher the priority

activemq_acks is used to store subscription relationships. If it is a persistent topic, the subscription relationship between the subscriber and the server is stored in this table.

 

The main database fields are as follows:

CONTAINER: Destination of the message

SUB_DEST: If you are using a Static cluster, this field will contain information about other systems in the cluster

CLIENT_ID: Each subscriber must have a unique client ID to distinguish

SUB_NAME: Subscriber name

SELECTOR: Selector, you can choose to consume only messages that meet the conditions. Conditions can be implemented with custom attributes, and multi-attribute AND and OR operations can be supported

LAST_ACKED_ID: Record the ID of the consumed message.

The table activemq_lock is only useful in a cluster environment. Only one Broker can get the message, which is called the Master Broker. The others can only be used as backups and wait for the Master Broker to be unavailable before becoming the next Master Broker. This table is used to record which Broker is the current Master Broker.

The configuration is as follows:

<beans>  
    <broker brokerName="test-broker" persistent="true" xmlns="http://activemq.apache.org/schema/core">  
        <persistenceAdapter>  
            <jdbcPersistenceAdapter dataSource="#mysql-ds"/>   
        </persistenceAdapter>  
    </broker>  
      
    <bean id="mysql-ds" class="org.apache.commons.dbcp.BasicDataSource" destroy-method="close">  
        <property name="driverClassName" value="com.mysql.jdbc.Driver"/>   
        <property name="url" value="jdbc:mysql://localhost/activemq?relaxAutoCommit=true"/>   
        <property name="username" value="activemq"/>   
        <property name="password" value="activemq"/>   
        <property name="maxActive" value="200"/>   
        <property name="poolPreparedStatements" value="true"/>   
    </bean>  
</beans>  

First define a mysql-ds MySQL data source, then configure jdbcPersistenceAdapter in the persistenceAdapter node and reference the data source just defined.

AMQ: The performance is higher than that of JDBC. When writing a message, the message will be written to the log file. Because it is written sequentially, the performance is very high. To improve performance, create a message primary key index and provide a caching mechanism to further improve performance. The size of each log file is limited (the default is 32m, which can be configured by yourself). When this size is exceeded, the system will recreate a file. When all messages are consumed, the system will delete the file or archive it (depending on configuration). The main disadvantage is that AMQ Message will create an index for each Destination. If a large number of Queues are used, the size of the index file will occupy a lot of disk space. And because the index is huge, once the Broker crashes, rebuilding the index will be very slow.

The configuration snippet is as follows:

<persistenceAdapter>  
     <amqPersistenceAdapter directory="${activemq.data}/activemq-data" maxFileLength="32mb"/>  
</persistenceAdapter>  

Although the performance of AMQ is slightly higher than that of Kaha DB, it is no longer recommended because it takes too long to rebuild the index and the index file occupies too much disk space. I will not introduce the data structure of AMQ persistence in detail here. In the new version of ActiveMQ, AMQ has been removed.

KahaDB: The default persistence plugin since ActiveMQ 5.4, KahaDb recovery time is much shorter than its predecessor AMQ and uses fewer data files, so it can completely replace AMQ. The persistence mechanism of kahaDB is very similar to AMQ. Also based on log files, indexing and caching. Unlike AMQ, all KahaDB Destinations use an index file. "ActiveMQ In Action" indicates that it can support 10,000 connections, each of which is an independent Queue, which is sufficient for most application scenarios.

 

Data logs are used to store message logs, and the entire contents of the messages are in Data logs. Like AMQ, if the size of a Data logs file exceeds the specified maximum value, a new file will be created. It is also appending at the end of the file, and the writing performance is very fast. Each message has a count reference in Data logs, so when all messages in a file are no longer needed, the system will automatically delete the file or put it in an archive folder.

The cache is used to store messages for online consumers. If the consumer is already consuming quickly, then these messages don't need to be written to disk anymore.

The Btree index will create an index based on MessageID to quickly find messages. This index also maintains the relationship between persistent subscribers and Destinations, as well as pointers to messages consumed by each consumer.

Redo log is used to rebuild the Btree index after the system crashes. Because of the existence of Redo log, it is not necessary to read the full data of Data logs when rebuilding the index, which greatly improves the performance.

Directory structure of KahaDB:

 

db log file, named db-<Number>.log. The archive directory is used to archive archived data. db.data and db.redo are Btree index and redo log respectively.

Since it is the default persistence mechanism of ActiveMQ, KahaDB can be used without modifying the configuration file, but the configuration snippet is still posted:

<persistenceAdapter>   
    <kahaDB directory="${activemq.data}/activemq-data" journalMaxFileLength="16mb"/>   
</persistenceAdapter>

 

LevelDB: After ActiveMQ 5.6, the persistence engine of LevelDB was launched. The persistence performance of LevelDB is higher than that of KahaDB. Although the default persistence method is still KahaDB, LevelDB is the future trend. In addition, ActiveMQ 5.9 provides a data replication method based on LevelDB and Zookeeper, which is the preferred data replication scheme for the Master-slave method. LevelDB uses custom indexes instead of commonly used BTree indexes.

 

It can be seen from the above figure that LevelDB is mainly composed of 6 parts: MemTable and ImmutableMemTable in memory, as well as log file, manifest file, current file and SSTable file on hard disk. There are some other auxiliary files, which will not be explained for the time being.

Each time data is written, the log file and MemTable need to be written, that is to say, only one sequential write to the hard disk and one memory write are required. If the system crashes, the data can be recovered through the log file. Each write will first write the log file, and then write the MemTable to ensure that no data is lost.

When the MemTable reaches the memory threshold, LevelDB will create a new MemTable and log file, and the old MemTable will become ImmutableMemTable, and the content of ImmutableMemTable is read-only. Then the system will periodically and asynchronously write the data of the ImmutableMemTable to the new SSTable file.

The data structure of the SSTable file is the same as that of MemTable and ImmutableMemTable, both of which are key and value data, sorted by key.

The manifest file is used to record the starting value and ending value of the key of each SSTable, which is somewhat similar to the B-tree index. The manifest will also generate new files, and the old files will no longer be used. The current file is to specify which manifest file is currently being used.

For more specific implementation principles, please refer to: http://www.cnblogs.com/haippy/archive/2011/12/04/2276064.html

The configuration snippet is as follows:

<persistenceAdapter>  
    <levelDB directory="${activemq.data}/activemq-data"/>  
</persistenceAdapter>  

In the current ActiveMQ 5.10 version, using LevelDB directly will cause the service to fail to start, throwing java.io.IOException: com.google.common.base.Objects.firstNonNull(Ljava/lang/Object;Ljava/lang/Object;) Ljava/lang/Object;

The reason is that there are two Guava caches that cause version conflicts. The solution is:

1. Delete pax-url-aether-1.5.2.jar under %ACTIVEMQ_HOME%lib

2. Comment out the following lines of %ACTIVEMQ_HOME%conf/activemq.xml:

<bean id="logQuery" class="org.fusesource.insight.log.log4j.Log4jLogQuery"  
lazy-init="false" scope="singleton"  
init-method="start" destroy-method="stop">  
</bean>  

The address of this bug is https://issues.apache.org/jira/browse/AMQ-5225 , I hope it can be solved smoothly in the next version.

 

The following is the performance test running on my machine. The actual data is of little significance, because the configuration of each environment is different, but the performance comparison of several persistence methods can be seen by comparison.

  Send 1000 messages (ms) Send 10000 messages (ms) Time to consume 1000 messages (milliseconds) Time to consume 10000 messages (milliseconds)
JDBC-Mysql 43009 369802 610 509338
KahaDB 34227 360493 208 2224
LevelDB 34032 347712 220 2877

From this table, it can be seen that LevelDB is the fastest to send messages, KahaDB is slightly slower, and JDBC is the slowest, but it is not too slow, which is an order of magnitude. For consuming messages, KahaDB is the fastest, LevelDB is slightly slower, and JDBC is unbearably slow, several orders of magnitude worse. LevelDB doesn't show much speed advantage over KahaDB. However, since LevelDB supports highly available replicated data, the first choice is definitely LevelDB.

 

The above explanation of several persistence schemes is very detailed. Next, I will take a look at another one.

JDBC Message Store with ActiveMQ Journal

1. This method overcomes the insufficiency of the jdbc store, and uses the fast cache write technology, which greatly improves the performance. The specific configuration is as follows:

<persistenceFactory>  
             <journalPersistenceAdapterFactory journalLogFiles="2" journalLogFileSize="16" useJournal="true" useQuickJournal="true" dataSource="#mysql-ds" dataDirectory="${activemq.data}/data"/>  
        </persistenceFactory>

Others are the same as the jdbc store.

 

Advantages: Write faster than jdbdc store

Disadvantage: not used in master/slave mode

 

Note: If you use the database persistence scheme, remember to add the driver of the relevant database under the lib folder of activemq!

Original address: https://my.oschina.net/u/719192/blog/287434

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325117503&siteId=291194637