Data import and export cases between DataX and MongoDB

Data import and export cases between DataX and MongoDB



0. write in front

  • Description of version information:

The MySQL database and DataX are installed on a machine node01, while MongoDB is installed on a machine node02

DataX version: DataX3.0 (open source version)

MongoDB version: MongoDB-5.0.2 (installed in Linux environment)

Linux version: CentOS7.5

1. MongoDB pre-knowledge

1.1 Detailed explanation of basic concepts

1.1.1 Database

Multiple databases can be created in one mongodb. The default database for MongoDB is "db", which is stored in the data directory. A single instance of MongoDB can hold multiple independent databases, each with its own collections and permissions, and different databases are also placed in different files.

Common operations are as follows

1) Show all databases

> show dbs     
admin   0.000GB 
config  0.000GB 
local   0.000GB 

Parsed as follows:

  • admin: From a permissions perspective, this is the "root" database. If a user is added to this database, the user automatically inherits all database permissions. Some specific server-side commands can also only be run from this database, such as listing all databases or shutting down the server.

  • local: This data is never replicated and can be used to store arbitrary collections limited to a single local server

  • config: When Mongo is used for sharding settings, the config database is used internally to save information about sharding.

2) Display the currently used database

 > db 
 test     

3) Switch database

> use local 
switched to    db local    
> db 
local   

1.1.2 Collections

A collection is a MongoDB document group, similar to a table in MySQL.

Collections exist in the database, and collections have no fixed structure, which means that you can insert data of different formats and types into collections, but usually the data we insert into collections will have certain relevance.

The createCollection() method is used in MongoDB to create collections. Let's take a look at how to create a collection:

Grammar format:

db.createCollection(name, options)                                        

Parameter Description:

  • name: the collection name to create

  • options: Optional parameters, specify options related to memory size and index, with the following parameters:

field type describe
capped Boolean (optional) If true, creates a capped collection. A fixed collection is a collection with a fixed size that automatically overwrites the oldest documents when the maximum value is reached. When the value is true , the size parameter must be specified .
autoIndexId Boolean (Optional) If true, automatically create an index on the _id field. The default is false.
size value (Optional) Specify a maximum value (in bytes) for capped collections. If capped is true , this field also needs to be specified.
max value (Optional) Specifies the maximum number of documents contained in a capped collection.

Case 1: Create a collection of whybigdata in the test library

> use test switched to db test
> db.createCollection("whybigdata")
{ "ok" : 1 }
> show collections 
Whybigdata

// 插入数据
> db.whybigdata.insert({
   
   "name":"whybigdata","url":"www.whybigdata.com"}) WriteResult({ "nInserted" : 1 })
// 查看数据
> db.whybigdata.find()
{ "_id" : ObjectId("5d0314ceecb77ee2fb2d7566"), "name" : "whybigdata", "url" : "www.whybigdata.com" }

Parsing instructions:

ObjectId is similar to a unique primary key, which can be quickly generated and sorted. It contains 12 bytes (default), a string composed of 24 hexadecimal numbers (each byte can store two hexadecimal numbers), meaning:

  • The first 4 bytes represent the creation unix timestamp

  • The next 3 bytes are the machine identification code

  • The next two bytes form the PID from the process id

  • The last three bytes are random numbers

Case 2: Create a fixed collection mycol

> db.createCollection("mycol",{ capped : true,autoIndexId : true,size : 6142800, max : 1000})
> show tables; 
whybigdata
mycol

Case 3: Automatically create collections

In MongoDB, you don't need to create collections. MongoDB automatically creates collections when you insert some documents.

> db.mycol2.insert({
   
   "name":"whybigdata"}) WriteResult({ "nInserted" : 1 })
> show collections 
whybigdata
mycol 
mycol2

Case 4: Delete Collection

> db.mycol2.drop() 
True    
> show tables; 
whybigdata    
mycol    

1.1.3 Documents

A document is a set of key-value pairs. MongoDB documents do not need to set the same fields, and the same fields do not need to have the same data type, which is very different from relational databases and is also a very prominent feature of MongoDB.

A simple example:

{
    
    "name":"whybigdata"}

Notice:

  • The key/value pairs in a document are ordered.

  • MongoDB is type and case sensitive.

  • MongoDB documents cannot have duplicate keys.

  • Document keys are strings. With few exceptions, keys can use arbitrary UTF-8 characters.

2. DataX import and export case

2.1 Read MongoDB data and import it to HDFS

2.1.1 Writing configuration files

[whybigdata@node01 datax]$ vim job/mongdb2hdfs.json
{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mongodbreader",
                    "parameter": {
    
    
                        "address": ["node02:27017"],
                        "collectionName": "whybigdata",
                        "column": [
							{
    
    
								"name":"name",
								"type":"string"
							},
							{
    
    
								"name":"url",
								"type":"string"
							}
						],
                        "dbName": "test",
                    }
                },
                "writer": {
    
    
                    "name": "hdfswriter",
                    "parameter": {
    
    
                        "column": [
							{
    
    
								"name":"name",
								"type":"string"
							},
							{
    
    
								"name":"url",
								"type":"string"
							}
						],
                        "defaultFS": "hdfs://node01:8020",
                        "fieldDelimiter": "\t",
                        "fileName": "mongo.txt",
                        "fileType": "text",
                        "path": "/datax-out",
                        "writeMode": "append"
                    }
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": "1"
            }
        }
    }
}

Mongodbreader parameter analysis

  • address: MongoDB data address information, because MonogDB may be a cluster, the ip port information needs to be given in the form of Json array. 【Required】

  • userName: MongoDB username. 【Optional】

  • userPassword: MongoDB password. 【Optional】

  • collectionName: The collection name of MonogoDB. 【Required】

  • column: MongoDB document column name. 【Required】

  • name: the name of the Column. 【Required】

  • type: the type of Column. 【Optional】

  • splitter: Because MongoDB supports array types, but the Datax framework itself does not support array types, so the array types read by mongoDB must be combined into strings through this separator. 【Optional】

2.1.2 Execution

[whybigdata@node01 datax]$ bin/datax.py job/mongdb2hdfs.json

2.1.3 View Results

insert image description here

2.2 Read MongoDB data into MySQL

2.2.1 Create a table in MySQL

mysql> create table whybigdata(name varchar(20),url varchar(20));

2.2.2 Writing DataX configuration files

[whybigdata@node01 datax]$ vim job/mongodb2mysql.json
{
    
    
    "job": {
    
    
        "content": [
            {
    
    
                "reader": {
    
    
                    "name": "mongodbreader",
                    "parameter": {
    
    
                        "address": ["node02:27017"],
                        "collectionName": "whybigdata",
                        "column": [
							{
    
    
								"name":"name",
								"type":"string"
							},
							{
    
    
								"name":"url",
								"type":"string"
							}
						],
                        "dbName": "test",
                    }
                },
                "writer": {
    
    
                    "name": "mysqlwriter",
                    "parameter": {
    
    
                        "column": ["*"],
                        "connection": [
                            {
    
    
                                "jdbcUrl": "jdbc:mysql://node01:3306/datax",
                                "table": ["mongo"]
                            }
                        ],
                        "password": "123456",
                        "username": "root",
                        "writeMode": "insert"
                    }            
                }
            }
        ],
        "setting": {
    
    
            "speed": {
    
    
                "channel": "1"
            }
        }
    }
}

2.2.3 Execution

[whybigdata@node01 datax]$ bin/datax.py job/mongodb2mysql.json   

2.2.4 View Results

mysql> select * from whybigdata;
+	+	+
| name	| url	|
+	+	+
| whybigdata | www.whybigdata.com |
+	+	+

The full text is over!

Guess you like

Origin blog.csdn.net/m0_52735414/article/details/128949598