Using java serialization to realize fast file-based indexing

        Preface: This article introduces a fast file-based index using java serialization and deserialization . There may be such a demand scenario in the project: the business needs to quickly retrieve a record that meets the conditions from tens of thousands of records. And these tens of thousands of records change at any time. For example, packet capture tools receive different packets all the time. Putting these temporary dynamic data into the database is extremely cost-effective. The introduction of caching tools such as Memcached can be used to kill chickens. I feel, so consider directly storing the records in a local temporary file, and adopt a certain protocol method to speed up the indexing.

This article focuses on the following issues:

  • Specify data storage protocols to speed up indexing
  • Java uses serialization and deserialization to read and write objects
  • Convert between objects, byte arrays, files
  • Integrate Demo

1. Data storage protocol in files

       In object-oriented languages, it is very common to save data in the form of objects, while objects are directly stored in files or databases through serialization, and then deserialized to obtain objects. The intermediate process does not need to be parsed by itself. After serialization, class information needs to be saved, and the storage space overhead is large. How to establish a fast index and avoid unnecessary overhead is very important.

       First specify the storage format of each record: index (4byte) + record length (4byte) + data (variable length);

  • The index is the unique identifier of the record. If the index is hit, it will return the data part; if it is hit, continue to search for the next record. (index size can be customized)
  • The record length is the storage space occupied by the data part, and the program can skip reading the data part of this record according to this field. (The size of the record length can be customized)
  • The data section stores data, and different objects may occupy different spaces.
     Algorithm: 2. Java uses serialization and deserialization to read and write objects

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326098956&siteId=291194637