Graph database batch import (HugeGraph-Loader)

Graph database batch import (HugeGraph-Loader)


Chapter
Chapter 1 Link: Graph Database Batch Import (HugeGraph-Loader)

foreword

HugeGraph-Loader is the data import component of HugeGraph, which can convert data from various data sources into graph vertices and edges and import them into the graph database in batches.
Currently supported data sources include:

  1. Local disk files or directories, support TEXT, CSV and JSON format files, support compressed files
  2. HDFS file or directory, supports compressed files
  3. Mainstream relational databases, such as MySQL, PostgreSQL, Oracle, SQL Server local disk files and HDFS files, support resumable uploads.

1. Loader execution process

The basic process of using HugeGraph-Loader is divided into the following steps:

  1. Write a graph model
  2. Prepare data files
  3. Write the input source map file
  4. Execute command import

Two, csv file import

1. Data mapping file

The data mapping file is as follows. If the csv file has a header, the header under input does not need to be assigned a value. If it is assigned, the first line will be used as data analysis

 {
    
    
	"version": "2.0",
	"structs": [{
    
    
		"id": "1",
		"skip": false,
		"input": {
    
    
			"type": "FILE",
			"path": "/mnt/parastor/aimind/kg-resources/Oakcsys1/d2r/job-63c3b6727701166100cd7426/file-mapping-7f19ceeea95a417495bc33bd54fa1bf9/人员列表1.csv",
			"file_filter": {
    
    
				"extensions": ["*"]
			},
			"format": "CSV",
			"delimiter": ",",
			"date_format": "yyyy-MM-dd HH:mm:ss",
			"time_zone": "GMT+8",
			"skipped_line": {
    
    
				"regex": "(^#|^//).*|"
			},
			"compression": "NONE",
			"batch_size": 500,
			"header": null,
			"charset": "GBK",
			"list_format": {
    
    
				"start_symbol": "",
				"end_symbol": "",
				"elem_delimiter": "|",
				"ignored_elems": [""]
			}
		},
		"vertices": [{
    
    
			"label": "ry",
			"skip": false,
			"id": "姓名",
			"unfold": true,
			"field_mapping": {
    
    
				"年龄": "nl",
				"性别": "xb"
			},
			"value_mapping": {
    
    },
			"selected": ["姓名", "年龄", "性别"],
			"ignored": [],
			"null_values": ["Null"],
			"update_strategies": {
    
    },
			"field_formats": []
		}],
		"edges": []
	}]
}

Three, json file import

1. Data mapping file

The data mapping file is as follows. The json file has no header, and the header needs to be assigned a value. The json format is one JSON dataset per line.
For example:
{“name”: “marko”, “sex”: “male”, “age”: “12” , "weight": "0.4"}
{"name": "josh", "sex": "female", "age": "16", "weight": "0.4"}

  {
    
    
	"version": "2.0",
	"structs": [{
    
    
		"id": "1",
		"skip": false,
		"input": {
    
    
			"type": "FILE",
			"path": "C:\\Users\\kmliu\\Desktop\\上传文件2\\t_user3.json",
			"file_filter": {
    
    
				"extensions": ["*"]
			},
			"format": "JSON",
			"delimiter": ",",
			"date_format": "yyyy-MM-dd HH:mm:ss",
			"time_zone": "GMT+8",
			"skipped_line": {
    
    
				"regex": "(^#|^//).*|"
			},
			"compression": "NONE",
			"batch_size": 500,
			"header": ["sex", "name", "weight", "age"],
			"charset": "UTF-8",
			"list_format": {
    
    
				"start_symbol": "",
				"end_symbol": "",
				"elem_delimiter": "|",
				"ignored_elems": [""]
			}
		},
		"vertices": [{
    
    
			"label": "ry2",
			"skip": false,
			"id": "name",
			"unfold": true,
			"field_mapping": {
    
    
				"sex": "sex",
				"age": "age"
			},
			"value_mapping": {
    
    },
			"selected": ["sex", "name", "age"],
			"ignored": [],
			"null_values": ["Null"],
			"update_strategies": {
    
    },
			"field_formats": []
		}],
		"edges": []
	}]
}

Three, mysql data import

1. Data mapping file

The data mapping file is as follows

{
    
    
	"version": "2.0",
	"structs": [{
    
    
		"id": "1",
		"skip": false,
		"input": {
    
    
			"type": "JDBC",
			"vendor": "MYSQL",
			"header": ["id", "name", "age", "sex"],
			"charset": "UTF-8",
			"list_format": {
    
    
				"start_symbol": "",
				"end_symbol": "",
				"elem_delimiter": "|",
				"ignored_elems": [""]
			},
			"driver": "com.mysql.cj.jdbc.Driver",
			"url": "jdbc:mysql://xxx.xxx.xxx.xxx:3306",
			"database": "baseName",
			"schema": null,
			"table": "user3",
			"username": "root",
			"password": "root",
			"batch_size": 500,
			"primary_key": "name"
		},
		"vertices": [{
    
    
			"label": "ry",
			"skip": false,
			"id": "name",
			"unfold": true,
			"field_mapping": {
    
    
				"age": "nl",
				"sex": "xb"
			},
			"value_mapping": {
    
    },
			"selected": ["sex", "name", "age"],
			"ignored": [],
			"null_values": ["Null"],
			"update_strategies": {
    
    },
			"field_formats": []
		}],
		"edges": []
	}]
}

4. Call hugGraph step

1. Call method entry

The parameter Oakcsys1 is the collective noun in the graph database, and json1.json is the data mapping file. The generated rules have been described above, "xxx.xx.xx.xx", "-p", "18081" for the graph database address and ip.

 public static void main(String[] args) {
    
    
        // -g {GRAPH_NAME} -f ${INPUT_DESC_FILE} -s ${SCHEMA_FILE} -h {HOST} -p {PORT}
        if (args.length == 0) {
    
    
            args = new String[]{
    
    "-g", "Oakcsys1",
                    "-f", "C:\\Users\\kmliu\\Desktop\\上传文件2\\json1.json",
                    "-h", "xxx.xx.xx.xx", "-p", "18081"
            };
        }
        HugeGraphLoader loader;
        try {
    
    
            loader = new HugeGraphLoader(args);
        } catch (Throwable e) {
    
    
            Printer.printError("Failed to start loading", e);
            return;
        }
        loader.load();
    }

5. Import data log

1. The log is as follows

: -----映射任务运行中-日志打印-----
: --------------------------------------------------
: detail metrics
: input-struct '1'
:     read success                  : 4                   
:     read failure                  : 0                   
: vertex 'ry'
:     parse success                 : 4                   
:     parse failure                 : 0                   
:     insert success                : 4                   
:     insert failure                : 0                   
: --------------------------------------------------
: count metrics
:     input read success            : 4                   
:     input read failure            : 0                   
:     vertex parse success          : 4                   
:     vertex parse failure          : 0                   
:     vertex insert success         : 4                   
:     vertex insert failure         : 0                   
:     edge parse success            : 0                   
:     edge parse failure            : 0                   
:     edge insert success           : 0                   
:     edge insert failure           : 0                   
: --------------------------------------------------
: meter metrics
:     total time                    : 5.549s              
:     vertex load rate(vertices/s)  : 0                   
:     edge load rate(edges/s)       : 0                   
: -----映射任务运行中-日志打印结束-----

Summarize

The above are the basic steps of using HugeGraphLoader, which is mainly used to import data sets into graph databases, and supports data import of csv, json, txt, mysql, hive, etc. import is fast

Guess you like

Origin blog.csdn.net/Oaklkm/article/details/128671373