One hundred million mongodb data migration

1. valid data prepared in advance a single number pool, pulls data processed by a single number

Single Table No. 1 by default

01 findAndModify state number table update unit 2 to read a single read cycle No. 100

02 by waybill number to obtain data batch query Aladin_WayBillStatus table

03 splicing new SQL statement

04 batches submitted to Hbase

05 batch update a single table number 3 states

Advantages as

Simple and crude, development of simple not more than 200 lines of code, should atomic findAndModify N nodes may be deployed.

Shortcoming way

  Efficiency is not high, and almost no space to enhance the optimization, the use of multi-threading to obtain a single number but will be more time-consuming.

  The efficiency depends on the capacity of acquiring data table.

  According to the pressure existing database

Table 2. advance period, the brush data period.

01 acquires a random period findAndModify

02 pulling a batch of data by time period

03 splicing new SQL statement

04 batches submitted to Hbase

05 batch update time table for the state 3

Advantages as

Efficiency will improve a lot more than 01 ways.

Since findAndModify can multi-node deployments.

Shortcoming way

  Every time the amount of data acquired is not controllable, the amount of time the peak of the business segment data can be very large, low peak volume of data traffic is very small, the time period will be very troublesome generation rules

       According to the pressure existing database

3. Query cursor scan data by mongodb.

The default is to find queries from the oldest data begins.

_id can use $ gt query _id are ordered.

    public void test_2(ObjectId o) {
        DBCursor s;
        if (o == null) {
            s = mt.getCollection("orderid").find();
        } else {
            DBObject lisi = new BasicDBObject();
            lisi.put("_id", new BasicDBObject("$gt", o));
            s = mt.getCollection("orderid").find(lisi);
        }
        try {
            while (s.hasNext()) {
                DBObject item = s.next();
                o = (ObjectId) item.get("_id");
                String me = ((BasicDBObject) item).toJson();
                mq.send(new Message("mgtomq", me.getBytes(RemotingHelper.DEFAULT_CHARSET)));
                System.out.println(o);
            }
        } catch (Exception e) {
            test_2(o);
        }
    }

    Advantages as:

      According to the database will not too much pressure.

      Data is read from the old to the new data.

    Way Disadvantages:

      Can not deploy multiple nodes, data acquisition and processing with data processing efficiency is not high.

      Solution: By decoupling messaging middleware, read data, production information, processing consumption data provided every 100 messages.

    , This approach fails to read data datex way is used, but the idea of ​​the same.

 

If no data is processed, it can be used directly datex 

Guess you like

Origin www.cnblogs.com/atliwen/p/11457195.html