Data import and export cases between DataX and MongoDB
Article Directory
0. write in front
- Description of version information:
The MySQL database and DataX are installed on a machine node01, while MongoDB is installed on a machine node02
DataX version: DataX3.0 (open source version)
MongoDB version: MongoDB-5.0.2 (installed in Linux environment)
Linux version: CentOS7.5
1. MongoDB pre-knowledge
1.1 Detailed explanation of basic concepts
1.1.1 Database
Multiple databases can be created in one mongodb. The default database for MongoDB is "db", which is stored in the data directory. A single instance of MongoDB can hold multiple independent databases, each with its own collections and permissions, and different databases are also placed in different files.
Common operations are as follows
1) Show all databases
> show dbs
admin 0.000GB
config 0.000GB
local 0.000GB
Parsed as follows:
-
admin: From a permissions perspective, this is the "root" database. If a user is added to this database, the user automatically inherits all database permissions. Some specific server-side commands can also only be run from this database, such as listing all databases or shutting down the server.
-
local: This data is never replicated and can be used to store arbitrary collections limited to a single local server
-
config: When Mongo is used for sharding settings, the config database is used internally to save information about sharding.
2) Display the currently used database
> db
test
3) Switch database
> use local
switched to db local
> db
local
1.1.2 Collections
A collection is a MongoDB document group, similar to a table in MySQL.
Collections exist in the database, and collections have no fixed structure, which means that you can insert data of different formats and types into collections, but usually the data we insert into collections will have certain relevance.
The createCollection() method is used in MongoDB to create collections. Let's take a look at how to create a collection:
Grammar format:
db.createCollection(name, options)
Parameter Description:
-
name: the collection name to create
-
options: Optional parameters, specify options related to memory size and index, with the following parameters:
field | type | describe |
---|---|---|
capped | Boolean | (optional) If true, creates a capped collection. A fixed collection is a collection with a fixed size that automatically overwrites the oldest documents when the maximum value is reached. When the value is true , the size parameter must be specified . |
autoIndexId | Boolean | (Optional) If true, automatically create an index on the _id field. The default is false. |
size | value | (Optional) Specify a maximum value (in bytes) for capped collections. If capped is true , this field also needs to be specified. |
max | value | (Optional) Specifies the maximum number of documents contained in a capped collection. |
Case 1: Create a collection of whybigdata in the test library
> use test switched to db test
> db.createCollection("whybigdata")
{ "ok" : 1 }
> show collections
Whybigdata
// 插入数据
> db.whybigdata.insert({
"name":"whybigdata","url":"www.whybigdata.com"}) WriteResult({ "nInserted" : 1 })
// 查看数据
> db.whybigdata.find()
{ "_id" : ObjectId("5d0314ceecb77ee2fb2d7566"), "name" : "whybigdata", "url" : "www.whybigdata.com" }
Parsing instructions:
ObjectId is similar to a unique primary key, which can be quickly generated and sorted. It contains 12 bytes (default), a string composed of 24 hexadecimal numbers (each byte can store two hexadecimal numbers), meaning:
-
The first 4 bytes represent the creation unix timestamp
-
The next 3 bytes are the machine identification code
-
The next two bytes form the PID from the process id
-
The last three bytes are random numbers
Case 2: Create a fixed collection mycol
> db.createCollection("mycol",{ capped : true,autoIndexId : true,size : 6142800, max : 1000})
> show tables;
whybigdata
mycol
Case 3: Automatically create collections
In MongoDB, you don't need to create collections. MongoDB automatically creates collections when you insert some documents.
> db.mycol2.insert({
"name":"whybigdata"}) WriteResult({ "nInserted" : 1 })
> show collections
whybigdata
mycol
mycol2
Case 4: Delete Collection
> db.mycol2.drop()
True
> show tables;
whybigdata
mycol
1.1.3 Documents
A document is a set of key-value pairs. MongoDB documents do not need to set the same fields, and the same fields do not need to have the same data type, which is very different from relational databases and is also a very prominent feature of MongoDB.
A simple example:
{
"name":"whybigdata"}
Notice:
-
The key/value pairs in a document are ordered.
-
MongoDB is type and case sensitive.
-
MongoDB documents cannot have duplicate keys.
-
Document keys are strings. With few exceptions, keys can use arbitrary UTF-8 characters.
2. DataX import and export case
2.1 Read MongoDB data and import it to HDFS
2.1.1 Writing configuration files
[whybigdata@node01 datax]$ vim job/mongdb2hdfs.json
{
"job": {
"content": [
{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": ["node02:27017"],
"collectionName": "whybigdata",
"column": [
{
"name":"name",
"type":"string"
},
{
"name":"url",
"type":"string"
}
],
"dbName": "test",
}
},
"writer": {
"name": "hdfswriter",
"parameter": {
"column": [
{
"name":"name",
"type":"string"
},
{
"name":"url",
"type":"string"
}
],
"defaultFS": "hdfs://node01:8020",
"fieldDelimiter": "\t",
"fileName": "mongo.txt",
"fileType": "text",
"path": "/datax-out",
"writeMode": "append"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
Mongodbreader parameter analysis
-
address: MongoDB data address information, because MonogDB may be a cluster, the ip port information needs to be given in the form of Json array. 【Required】
-
userName: MongoDB username. 【Optional】
-
userPassword: MongoDB password. 【Optional】
-
collectionName: The collection name of MonogoDB. 【Required】
-
column: MongoDB document column name. 【Required】
-
name: the name of the Column. 【Required】
-
type: the type of Column. 【Optional】
-
splitter: Because MongoDB supports array types, but the Datax framework itself does not support array types, so the array types read by mongoDB must be combined into strings through this separator. 【Optional】
2.1.2 Execution
[whybigdata@node01 datax]$ bin/datax.py job/mongdb2hdfs.json
2.1.3 View Results
2.2 Read MongoDB data into MySQL
2.2.1 Create a table in MySQL
mysql> create table whybigdata(name varchar(20),url varchar(20));
2.2.2 Writing DataX configuration files
[whybigdata@node01 datax]$ vim job/mongodb2mysql.json
{
"job": {
"content": [
{
"reader": {
"name": "mongodbreader",
"parameter": {
"address": ["node02:27017"],
"collectionName": "whybigdata",
"column": [
{
"name":"name",
"type":"string"
},
{
"name":"url",
"type":"string"
}
],
"dbName": "test",
}
},
"writer": {
"name": "mysqlwriter",
"parameter": {
"column": ["*"],
"connection": [
{
"jdbcUrl": "jdbc:mysql://node01:3306/datax",
"table": ["mongo"]
}
],
"password": "123456",
"username": "root",
"writeMode": "insert"
}
}
}
],
"setting": {
"speed": {
"channel": "1"
}
}
}
}
2.2.3 Execution
[whybigdata@node01 datax]$ bin/datax.py job/mongodb2mysql.json
2.2.4 View Results
mysql> select * from whybigdata;
+ + +
| name | url |
+ + +
| whybigdata | www.whybigdata.com |
+ + +
The full text is over!