Inscription
For product development needs, we need to store the data collected from the Internet in ES to realize full-text retrieval of the data.
The data collected on the Internet is often in a messy format and needs to be cleaned first .
The storage format supported by ES, json format data will be relatively convenient.
This article mainly introduces how to batch insert formatted Json files into ES.
1. Work to be done in advance
1) Design the index and Mapping;
Mapping的目的主要是——设定字段名称、字段类型,哪些字段需要进行全文检索等。
- 1
2) Encapsulate the class in the Java program, which corresponds to the fields set by the Mapping one-to-one.
2. Decomposition of batch import steps
Step 1: Format the local file and unify it in Json format.
A data string to be imported, saved as a Json file.
Step 2: Place it under the unified ./data path. The directory structure is as follows:
./data
a_01.json
a_02.json
a_03.json
...
a_100.json
Step 3: Loop through the ./data file to obtain the full name of the file containing the absolute path and store it in linkedlist.
Step 4: Traverse each path of linkedlist to get Json information.
Step 5: Use fastjson to parse Json, and parse it into each matching field of the corresponding designed class.
Step 6: Complete the import of local files with the help of bulk** batch Cao Cao API interface.
3. Core interface implementation
/*
**@brief:遍历Json,批量插入ES
**@param:空
**@return:空
*/
private static void insertBulkIndex() throws Exception {
//Json文件的存储
final String JSONFILEINPUT = ESConfig.es_json_path;
logger.info("path = " + JSONFILEINPUT);
LinkedList<String> curJsonList = FileProcess.getJsonFilePath(JSONFILEINPUT);
logger.info("size = " + curJsonList.size());
for (int i = 0; i < curJsonList.size(); ++i){
//System.out.println(" i = " + i + " " + curJsonList.get(i));
String curJsonPath = curJsonList.get(i);
ImageInfo curImageInfo = JsonParse.GetImageJson(curJsonPath);
//JsonParse.printImageJson(curImageInfo);
if (curImageInfo == null){
continue;
}
//遍历插入操作
InsertIndex (curImageInfo);
}
}
/*
**@brief:单条Json插入ES(借助了Jest封装后的API)
**@param:空
**@return:空
*/
private static void InsertIndex(AgeInfo ageInfo) throws Exception {
JestClient jestClient = JestExa.getJestClient();
JsonParse.PrintImageJson( ageInfo );
Bulk bulk = new Bulk.Builder()
.defaultIndex("age_index")
.defaultType("age_type")
.addAction(Arrays.asList(
new Index.Builder( ageInfo ).build()
)).build();
JestResult result = jestClient.execute(bulk);
if (result.isSucceeded()){
System.out.println("insert success!");
}else{
System.out.println("insert failed");
}
}
The final effect is:
java -jar bulk_insert.jar ./data/
You can implement all json loops under ./data to traverse and import ES.
4. Use technology
1) file traversal
2) Json parsing
3) ES batch insert operation
5. The pit encountered
When the program exports the Jar package, an error is reported when the jar package is generated.
With the help of Jest's source code project, the project is generated by Maven.
When generating a jar package, it will always prompt:
“Java.long.ClassNotFoundException"
Preliminary positioning reason: it is caused by maven, and then pom.xml, the error remains.
Final solution: Rebuild the project and re-import the code and dependent jar packages.
postscript
If you have any questions, you are welcome to ask questions and discuss!