29.mongo-connector realizes real-time synchronization between MongoDB and elasticsearch (ES synchronization with non-relational database)

introduction:

The verification shows that the mongo-connector tool supports real-time insert, delete, and update operations between MongoDB and ES. 
For historical data, the mongo-connector tool cannot be synchronized to ES. The root cause is that the tool itself does not support it (preliminary definition), or there is no such scenario, to be investigated (updated after further research).

1. mongo-connector address:

https://github.com/mongodb-labs/mongo-connector

2. Introduction to the mongo-connector tool

The mongo-connector tool creates a pipeline from a MongoDB cluster to one or more target systems: Solr, Elasticsearch, or MongoDB cluster. 
The tool synchronizes data between MongoDB and the target system, and tracks MongoDB's oplog to keep operations synchronized with MongoDB in real time. 
The tool has been verified under python2.6, 2.7, 3.3+. 
The mongo-connector tool is a real-time synchronization service tool developed based on python. It requires mongo to run in replica-set mode and elastic2_doc_manager to write data to ES. 
write picture description here

3. Introduction to elastic2-doc-manager tool

This is the Elastic 2.x version of the document manager. The corresponding Elastic1.x version needs to use elastic-doc-manager.

4. ES and MongoDB synchronization steps:

(1) Install mongo-connector.

pip install mongo-connector

(2) Install elastic2-doc-manager.

pip install elastic2-doc-manager

Note: 
If you do not install (2) and directly enter (3) and (4), an error will be reported:

[root@5b9dbaaa148a bin]# mongo-connector -m 10.8.5.99:27017 -t 10.8.5.101:9200 -d elastic2_doc_manager
Logging to mongo-connector.log.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/usr/lib64/python2.6/threading.py", line 532, in __bootstrap_inner
  self.run()

(3) The mongo side starts

MongoDB must enable the replica set. If it is already enabled, please ignore this step:

1) Set the replica set name through –replSet.

[root@b48eafd69929 bin]# ./mongod --replSet "rs0"

2) Connect mongo with replica set members

[root@b48eafd69929 bin]# ./mongo
MongoDB shell version: 3.2.4
connecting to: test
Server has startup warnings:
2016-07-05T09:49:01.330+0100 I CONTROL [initandlisten] ** WARNING: You are running this process as the root user, which is not recommended.
2016-07-05T09:49:01.330+0100 I CONTROL [initandlisten]
2016-07-05T09:49:01.331+0100 I CONTROL [initandlisten]
2016-07-05T09:49:01.331+0100 I CONTROL [initandlisten] ** WARNING: You are running on a NUMA machine.
2016-07-05T09:49:01.331+0100 I CONTROL [initandlisten] ** We suggest launching mongod like this to avoid performance problems:
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten] ** numactl --interleave=all mongod [other options]
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten]
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/enabled is 'always'.
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten]
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten] ** WARNING: /sys/kernel/mm/transparent_hugepage/defrag is 'always'.
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten] ** We suggest setting it to 'never'
2016-07-05T09:49:01.332+0100 I CONTROL [initandlisten]

3) Initialize the replica set

> rs.initiate()
{
  "info2" : "no configuration specified. Using a default configuration for the set",
  "me" : "b48eafd69929:27017",
  "ok" : 1
}

4) [Authentication] Initialize the configuration of the replica set

rs0:SECONDARY> rs.conf()
{
  "_id" : "rs0",
  "version" : 1,
  "protocolVersion" : NumberLong(1),
  "members" : [
  {
  "_id" : 0,
  "host" : "b48eafd69929:27017",
  "arbiterOnly" : false,
  "buildIndexes" : true,
  "hidden" : false,
  "priority" : 1,
  "tags" : {

  },
  "slaveDelay" : NumberLong(0),
  "votes" : 1
  }
  ],
  "settings" : {
  "chainingAllowed" : true,
  "heartbeatIntervalMillis" : 2000,
  "heartbeatTimeoutSecs" : 10,
  "electionTimeoutMillis" : 10000,
  "getLastErrorModes" : {

  },
  "getLastErrorDefaults" : {
  "w" : 1,
  "wtimeout" : 0
  },
  "replicaSetId" : ObjectId("577b74bd0ba41a313110ad62")
  }
}

5) [Verify] The status of the replica set.

rs0:PRIMARY> rs.status()
{
  "set" : "rs0",
  "date" : ISODate("2016-07-05T08:50:55.272Z"),
  "myState" : 1,
  "term" : NumberLong(1),
  "heartbeatIntervalMillis" : NumberLong(2000),
  "members" : [
  {
  "_id" : 0,
  "name" : "b48eafd69929:27017",
  "health" : 1,
  "state" : 1,
  "stateStr" : "PRIMARY",
  "uptime" : 115,
  "optime" : {
  "ts" : Timestamp(1467708606, 1),
  "t" : NumberLong(1)
  },
  "optimeDate" : ISODate("2016-07-05T08:50:06Z"),
  "infoMessage" : "could not find member to sync from",
  "electionTime" : Timestamp(1467708605, 2),
  "electionDate" : ISODate("2016-07-05T08:50:05Z"),
  "configVersion" : 1,
  "self" : true
  }
  ],
  "ok" : 1
}

(4) ES-side synchronization operation

[root@5b9dbaaa148a bin]# mongo-connector -m 10.8.5.99:27017 -t 10.8.5.101:9200 -d elastic2_doc_manager
Logging to mongo-connector.log.

Parameter meaning: 
-m: The address and port of mongodb, the default port is 27017. 
-t: The address and port of the ES, the default port is 9200. 
-d: The name of the doc manager, the 2.x version is: elastic2-doc-manager.

5. Synchronous verification of insert operation between ES and MongoDB Insert

(1) Insert data operation on the Mongo side:

#Mongo创建数据库(对应ES的Index)
rs0:PRIMARY> use zhang_index
switched to db zhang_index

#Mongo中插入数据(其中col_02对应ES中的Type)
rs0:PRIMARY> db.col_02.insert({name:"laoluo", birth:"1964-03-21", sex:"man", company:"chuizi"});
WriteResult({ "nInserted" : 1 })
rs0:PRIMARY> db.col_02.insert({name:"renzhengfei", birth:"1954-03-21", sex:"man", company:"huawei"});

(2) Es-side retrieval verification

[root@5b9dbaaa148a test_log]# curl -XGET http://10.8.5.101:9200/zhang_index/col_02/_search?pretty
{
  "took" : 4,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 2,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "zhang_index",
  "_type" : "col_02",
  "_id" : "577b7d8ceb8e3dc2d1db12a9",
  "_score" : 1.0,
  "_source" : {
  "company" : "huawei",
  "name" : "renzhengfei",
  "birth" : "1954-03-21",
  "sex" : "man"
  }
  }, {
  "_index" : "zhang_index",
  "_type" : "col_02",
  "_id" : "577b7d4aeb8e3dc2d1db12a7",
  "_score" : 1.0,
  "_source" : {
  "company" : "chuizi",
  "name" : "laoluo",
  "birth" : "1964-03-21",
  "sex" : "man"
  }
  } ]
  }
}

6. Synchronous verification of ES and MongoDB Update update operations

(1) MongoDB update update operation

rs0:PRIMARY> db.col_02.update({'name':'laoluo'}, {$set:{'name':'luoyonghao'}})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
rs0:PRIMARY>
rs0:PRIMARY> db.col_02.find().pretty()
{
  "_id" : ObjectId("577b7d4aeb8e3dc2d1db12a7"),
  "name" : "luoyonghao",
  "birth" : "1964-03-21",
  "sex" : "man",
  "company" : "chuizi"
}
{
  "_id" : ObjectId("577b7d8ceb8e3dc2d1db12a9"),
  "name" : "renzhengfei",
  "birth" : "1954-03-21",
  "sex" : "man",
  "company" : "huawei"
}

(2) Es side retrieves the updated results

[root@5b9dbaaa148a test_log]# curl -XGET http://10.8.5.101:9200/zhang_index/col_02/_search?pretty
{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 2,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "zhang_index",
  "_type" : "col_02",
  "_id" : "577b7d8ceb8e3dc2d1db12a9",
  "_score" : 1.0,
  "_source" : {
  "company" : "huawei",
  "name" : "renzhengfei",
  "birth" : "1954-03-21",
  "sex" : "man"
  }
  }, {
  "_index" : "zhang_index",
  "_type" : "col_02",
  "_id" : "577b7d4aeb8e3dc2d1db12a7",
  "_score" : 1.0,
  "_source" : {
  "company" : "chuizi",
  "name" : "luoyonghao",
  "birth" : "1964-03-21",
  "sex" : "man"
  }
  } ]
  }
}

7. Synchronous verification of ES and MongoDB delete delete operations

(1) MongoDB delete delete operation

rs0:PRIMARY> db.col_02.remove({'name':'renzhengfei'})
WriteResult({ "nRemoved" : 1 })
rs0:PRIMARY> db.col_02.find()
{ "_id" : ObjectId("577b7d4aeb8e3dc2d1db12a7"), "name" : "luoyonghao", "birth" : "1964-03-21", "sex" : "man", "company" : "chuizi" }
rs0:PRIMARY> db.col_02.find().pretty()
{
  "_id" : ObjectId("577b7d4aeb8e3dc2d1db12a7"),
  "name" : "luoyonghao",
  "birth" : "1964-03-21",
  "sex" : "man",
  "company" : "chuizi"
}

(2) ES-side retrieval results after deletion

The result shows that the content deleted by MongoDB has been deleted synchronously on the ES side.

[root@5b9dbaaa148a test_log]# curl -XGET http://10.8.5.101:9200/zhang_index/col_02/_search?pretty
{
  "took" : 2,
  "timed_out" : false,
  "_shards" : {
  "total" : 8,
  "successful" : 8,
  "failed" : 0
  },
  "hits" : {
  "total" : 1,
  "max_score" : 1.0,
  "hits" : [ {
  "_index" : "zhang_index",
  "_type" : "col_02",
  "_id" : "577b7d4aeb8e3dc2d1db12a7",
  "_score" : 1.0,
  "_source" : {
  "company" : "chuizi",
  "name" : "luoyonghao",
  "birth" : "1964-03-21",
  "sex" : "man"
  }
  } ]
  }
}

write picture description here

See detailed introduction:

https://docs.mongodb.com/manual/tutorial/deploy-replica-set/

5 ways to sync Mongo with ES:

https://www.linkedin.com/pulse/5-way-sync-data-from-mongodb-es-kai-hao

Common Bugs:

How to setup a MongoDB replica set for the connector? 
https://docs.mongodb.com/manual/tutorial/deploy-replica-set/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325484972&siteId=291194637