31. Elasticsearch batch import local Json file Java implementation (ES file synchronization)

Inscription

For product development needs, we need to store the data collected from the Internet in ES to realize full-text retrieval of the data.

The data collected on the Internet is often in a messy format and needs to be cleaned first .

The storage format supported by ES, json format data will be relatively convenient.

This article mainly introduces how to batch insert formatted Json files into ES.

1. Work to be done in advance

1) Design the index and Mapping;

Mapping的目的主要是——设定字段名称、字段类型,哪些字段需要进行全文检索等。
  • 1

2) Encapsulate the class in the Java program, which corresponds to the fields set by the Mapping one-to-one.

2. Decomposition of batch import steps

Step 1: Format the local file and unify it in Json format. 
A data string to be imported, saved as a Json file.

Step 2: Place it under the unified ./data path. The directory structure is as follows:

     ./data
                a_01.json
                a_02.json
                a_03.json
                ...
                a_100.json

Step 3: Loop through the ./data file to obtain the full name of the file containing the absolute path and store it in linkedlist.

Step 4: Traverse each path of linkedlist to get Json information.

Step 5: Use fastjson to parse Json, and parse it into each matching field of the corresponding designed class.

Step 6: Complete the import of local files with the help of bulk** batch Cao Cao API interface.

3. Core interface implementation

/*
**@brief:遍历Json,批量插入ES
**@param:空
**@return:空
*/
 private static void insertBulkIndex() throws Exception {
//Json文件的存储
final String JSONFILEINPUT = ESConfig.es_json_path;
logger.info("path = " + JSONFILEINPUT);
LinkedList<String> curJsonList = FileProcess.getJsonFilePath(JSONFILEINPUT);
logger.info("size = " + curJsonList.size());

for (int i = 0; i < curJsonList.size(); ++i){
//System.out.println(" i = " + i + " " + curJsonList.get(i));
String curJsonPath = curJsonList.get(i);
ImageInfo curImageInfo = JsonParse.GetImageJson(curJsonPath);
//JsonParse.printImageJson(curImageInfo);
if (curImageInfo == null){
continue;
}
//遍历插入操作
InsertIndex (curImageInfo);
}
}

/*
**@brief:单条Json插入ES(借助了Jest封装后的API)
**@param:空
**@return:空
*/
private static void InsertIndex(AgeInfo ageInfo) throws Exception {
JestClient jestClient = JestExa.getJestClient();
JsonParse.PrintImageJson( ageInfo );

Bulk bulk = new Bulk.Builder()
.defaultIndex("age_index")
.defaultType("age_type")
.addAction(Arrays.asList(
new Index.Builder( ageInfo ).build()
)).build();
  JestResult result = jestClient.execute(bulk);
  if (result.isSucceeded()){
  System.out.println("insert success!");
  }else{
  System.out.println("insert failed");
  }

}

The final effect is:

java -jar bulk_insert.jar ./data/  

You can implement all json loops under ./data to traverse and import ES.

4. Use technology

1) file traversal 
2) Json parsing 
3) ES batch insert operation

5. The pit encountered

When the program exports the Jar package, an error is reported when the jar package is generated. 
With the help of Jest's source code project, the project is generated by Maven. 
When generating a jar package, it will always prompt:

“Java.long.ClassNotFoundException"

Preliminary positioning reason: it is caused by maven, and then pom.xml, the error remains. 
Final solution: Rebuild the project and re-import the code and dependent jar packages.

postscript

If you have any questions, you are welcome to ask questions and discuss!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325484956&siteId=291194637