Know MongoDB design skills, efficiency improvement of 50%

Or anti-paradigm design paradigm

Consider this scenario, our orders like this

商品：
{
  "_id": productId,
  "name": name,
  "price": price,
}

订单:
{
  "_id": orderId,
  "user": userId,
  "items": [
    productId1,
    productId2,
    productId3
  ]
}

When we query the contents of the order, first by orderId inquiry order, and product information corresponding to the query by the order information productId. In this design a query can not obtain the full order.

The result is that the paradigm of reading speed is busy, when there is consistency to ensure all orders.

Take a look at the anti-paradigm design

订单:
{
  "_id": orderId,
  "user": userId,
  "items": [
   {
    "_id": productId1,
    "name": name,
    "price": price,
   },
   {
    "_id": productId2,
    "name": name,
    "price": price,
   },
  ]
}

The product information here as there is in the order data so that when you only need to display a query on it embedded documents.

Anti paradigm reading speed, consistency weaker, change product information can not be atomically updates to multiple documents.

So which one do we usually use? When we design should consider the following questions

Reading and writing than what?

Read the product information may be ten thousand times it was modified once more information, write to that time a little faster or to ensure consistency, catch ten thousand times to read consumption worth it? And you think how often referenced data is updated once? The fewer updates, the more suitable for the anti-normalization. Some data rarely changes almost not worth reference. Such as name, gender, address and so on.

Consistency important to you?

If yes, you should paradigm.

To do a quick read?
If you want to read as soon as possible, they have anti-normalization. In this references does not matter, it can not be considered considerations, real-time applications to anti-normalization as possible.

Order documents is ideal for anti-normalization, because product information does not change frequently. Even if the change does not have to be updated to all orders. Paradigm again there is no advantage to speak of.

So in this case it is to order the anti-normalization.

Doping time point data

When a product or a discount for a picture, do not need to change the information in the original order. Similarly at this particular point in time a time data embedding processing should be done.

There is an order in this document we mentioned above, the address belongs to the time point data. If someone update personal information, we do not need to change its previous order details.

Do not embed the ever-increasing data

MongoDB data storage mechanism determines the array continues to append data is inefficient. Array and object size should be relatively fixed in normal use.

Embed 20,100, or 100,000 sub-document is not a problem, the key is to do so in advance, after essentially unchanged. Otherwise, growth will be permissive documentation makes the system slow you can not stand.

For those ever-increasing content, you must comment at this time should be more appropriate as a separate document processing.

As pre-allocated space

Just know that the beginning of the document is relatively small, and later becomes determined size can be optimized using this method, at the beginning, when inserted into the document, use and ultimate size of the data as garbage data fills, such as adding a garbage field (which contains a string, the same string size of the final document size), then immediately reset the field

db.collection.insert({"_id" : 1,/* other fields */, "garbase": longString});
db.collection.update({"_id" : 1, });

In this way, MongDB will allocate sufficient space for future growth document

mongodb document is stored in a reserved space, allowing the expansion of the document, but when the document increases to a certain point of time, it will exceed the space originally allocated, then the document will be moved

Embedded data store for an array of anonymous access

A common problem is information embedded in the end is to use an array or sub-documents exist. If you know exactly what you want to query, will sub-documents. Sometimes not clear if the specific content of inquiry, we must use the array. When the query know some entries, usually use the array.

Suppose I want to record certain items of property next game. We can model

{
  "_id": 1,
  "items" : {

    "slingshot": {
      "type" : "weapon",
      "damage" : 30,
      "ranged" : true
    },

    "jar" : {
      "type": "container",
      "contains": "fairy"
    }

  }
}

Suppose you want to find all the damage is greater than 20 weapons, the document does not support this child find the way, you can know the information in order to locate a particular kind of items, such as { "items.jar.damage": { "$ gt": 20 }}.
If the identifier need not necessary to use an array

{
  "_id": 1,
  "items" : [

    {
      "id" : "slingshot"
      "type" : "weapon",
      "damage" : 30,
      "ranged" : true
    },

    {
      "id" : "jar",
      "type": "container",
      "contains": "fairy"
    }

  ]
}

For example, {"items.damage": {"$gt":20}}on the line. If you need multi-criteria query, you can use $ elemMatch.

How to use the auto-increment id instead of the ObjectId

Sometimes in the course of business or in other limited circumstances, I do not want to use the ObjectId, but instead want to use automatically Id. But MongoDB itself does not provide this function, then how to achieve it?

You can create a collection to save the auto-increment id

{
    "_id" : ObjectId("59ed8d3df772d09a67eb25f6"),
    "fieldName" : "user",
    "seq" : NumberLong(100064)
}

fieldName which represents a collection, then the next time you want to use only remove this value plus 1 on it. code show as below

 public Long getNextSequence(String fieldName, long gap) {
    try {
        Query query = new Query();
        query.addCriteria(Criteria.where("fieldName").is(fieldName));

        Update update = new Update();
        update.inc("seq", gap);

        FindAndModifyOptions options = FindAndModifyOptions.options();
        options.upsert(true);
        options.returnNew(true);

        Counter counter = mongoTemplate.findAndModify(query, update, options, Counter.class);

        if (counter != null) {
            return counter.getSeq();
        }
    } catch (Throwable t) {
        log.error("Exception when getNextSequence from mongodb", t);
    }
    return gap;
}

Do not use the index everywhere

The index is very powerful, but to remind you that not all queries can use the index. For example, you want to return a collection of 90% of the document rather than get some records, you should not use the index.

If such a query with an index, the result is almost traverse the entire index tree, the part of, say, 40GB indexes are loaded into memory. Then according to the index data pointer is loaded in the document collection 200GB eventually loaded
data 200GB + 40GB = 240GB, and not more than the index.

So the index is generally used in the returned results only a small part of the overall data of the time. According to experience, once returned to about the collection of data in general, do not use the index.

If the index has established a field, do not want to use it on a large scale query (because the use of the index may be less effective), can be used to force the natural order MongoDB disabled index. Natural ordering is the "return data in order to store on disk" so MongDB will not use the index.

db.students.find().sort({"$natural" : 1});

If a query without an index, MongoDB will do a full table scan.

Index cover the query

For example, if you want to return there some fields and these fields can be placed in the index, MongoDB can do index covers the query, that query will not access the document pointer, but the result is returned directly indexed data, as follows index

db.students.ensureIndex(x : 1, y : 1, z : 1);

Now the query field is indexed, and asked to return only those fields, MongoDB there is no need to load the entire document

db.students.find({"x" : "xxx", "y" : "xxx"},{x : 1, y : 1, z : 1, "_id" : 0});

Note that since the return of _id is the default, but it is not a part of the index, MongoDB so we need to get the document _id, get rid of it, you can only return results based on the index.

If the query returns the value of several fields, consider its place in the index, even if they do not execute the query, the index covers the query can do. The above example, field z.

AND inquiry points

Assuming that the document conditions A, B, C to satisfy the query. A document to meet the 40,000 to meet the 9000 B, C, and 200 meet, and if so MongoDB queries in this order, efficiency is not high.

If placed before the most C, then B, then A, then for B, C only need to check up to 200 documents.

So the workload is significantly reduced. If a query is known more demanding conditions, will have to be placed to the front.

OR type queries Points

OR and AND query contrary, matching the query should be put up front, because each MongDB must match the documents are not in the result set.

Single-table queries to make use of Respostory

Development, for simple queries I generally use MongoRepository to realize the function, if there is a complex combination of MongoTemplate, note that the two can be mixed use.

converter proposal

We write for the development of a collection, some special types (such as enumeration) we need to write the converter, most of the time in both directions, such as db-> collection and collection-> db
If only one type needs to be converted, we can target this converts a property, such as the following example


@WritingConverter
@Component
public class UserStatusToIntConverter implements Converter<UserStatus, Integer> {

    @Override
    public Integer convert(UserStatus userStatus) {
        return userStatus.getStatus();
    }
}


@ReadingConverter
@Component
public class UserStatusFromIntConverter implements Converter<Integer, UserStatus> {

    @Override
    public UserStatus convert(Integer source) {
        return UserStatus.findStatus(source);
    }
}

A field Fortunately, if there are a lot of fields need to do the conversion in a class, it would produce a number of converter, this time we can write a class-level converter

@ReadingConverter
@Component
public class OperateLogFromDbConverter extends AbstractReadingConverter<Document, OperateLog> {
  @Override
  public OperateLog convert(Document source) {

      OperateLog opLog = convertBasicField(source);

      if (source.containsKey("_id")) {
          opLog.setId(source.getLong("_id"));
      }

      if (source.containsKey("module")) {

          opLog.setModule(ModuleEnum.findModule(source.getInteger("module")));
      }

      if (source.containsKey("opType")) {
          opLog.setOpType(OpTypeEnum.findOpType(source.getInteger("opType")));
      }

      if (source.containsKey("level")) {
          opLog.setLevel(OpLevelEnum.findOpLevel(source.getInteger("level")));
      }

      return opLog;
  }

  private OperateLog convertBasicField(Document source) {
      Gson gson = new Gson();
      return gson.fromJson(source.toJson(), OperateLog.class);
  }
}