CouchDB learning - Maintenance

Official Documents

1 compression


Compression operation is reduced by removing unused hard disk space or a view from the database index file and the old data. Very simple operation similar to other databases (the SQLite etc.) management systems.
During the compression target, CouchDB will create a new file extension .compact, and only the actual data transmission to the file. Therefore, CouchDB first check for available disk space - it should be twice the size of the compressed data file.
When all the actual data are successfully transferred to the compressed file, CouchDB replace the destination file with the destination file.

1.1 database compression

Database compression to compress the database file by file unused portion created during remove the update. Revision is a small amount of old documents called tombstonemetadata Instead, the metadata is used to solve the conflict during replication. You can use _revs_limitURL configuration stored revision (and tombstonequantity) of.
Compression is the operation of each database manually triggered, and runs as a background task. To start it for a particular database, the target database need to send an HTTP POST /{db}/_compactchild resources:

curl -H "Content-Type: application/json" -X POST http://localhost:5984/my_db/_compact

If successful, returns an HTTP status code immediately 202 Accepted.

HTTP/1.1 202 Accepted
Cache-Control: must-revalidate Content-Length: 12
Content-Type: text/plain; charset=utf-8 Date: Wed, 19 Jun 2013 09:43:52 GMT Server: CouchDB (Erlang/OTP)
{"ok":true}

Although not request the use of the body, but must still request with the specified application/jsonvalue of Content-Typethe header. Otherwise, you will know that HTTP Status 415不支持的媒体类型响应:

HTTP/1.1 415 Unsupported Media Type
Cache-Control: must-revalidate 
Content-Length: 78
Content-Type: application/json 
Date: Wed, 19 Jun 2013 09:43:44 GMT 
Server: CouchDB (Erlang/OTP)
{"error":"bad_content_type","reason":"Content-Type must be application/json"}

When the compression successfully up and running, you can get information about compression through a database of information resources:

curl http://localhost:5984/my_db
HTTP/1.1 200 OK
Cache-Control: must-revalidate 
Content-Length: 246
Content-Type: application/json 
Date: Wed, 19 Jun 2013 16:51:20 GMT 
Server: CouchDB (Erlang/OTP)
{
    "committed_update_seq": 76215, 
    "compact_running": true, 
    "data_size": 3787996, 
    "db_name": "my_db", 
    "disk_format_version": 6, 
    "disk_size": 17703025, 
    "doc_count": 5091, 
    "doc_del_count": 0, 
    "instance_start_time": "0", 
    "purge_seq": 0,
    "update_seq": 76215
}

Please note that the compaction_runningfield is true, in fact, indicates that compression is running. To track the progress of compression, you can query _active_tasksresources:

curl http://localhost:5984/_active_tasks
HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 175
Content-Type: application/json 
Date: Wed, 19 Jun 2013 16:27:23 GMT 
Server: CouchDB (Erlang/OTP)
[
    {
        "changes_done": 44461, 
        "database": "my_db",
        "pid": "<0.218.0>", 
        "progress": 58,
        "started_on": 1371659228, 
        "total_changes": 76215, 
        "type": "database_compaction", 
        "updated_on": 1371659241
    }
]

1.2 compression view

And database views different view of the same database as the need to compress, which is set according to the database view compress each different design documents. To start the compression, need to send HTTP POST /{db}/_compact/{ddoc}request:

curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_compact/designname
{"ok":true}

This will specify the design of the current version of the document compression index view. HTTP response code 202 Accepted(similar to the compression of the database), and will create a compressed background tasks.

1.2.1 clean-up view

View index on disk to view MD5 hash defined named. When you change the view, the old index remains on the disk. To clear all obsolete view index (with MD5 view representation of the named file, the file no longer exists), can trigger view clear:

curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_view_cleanup
{"ok":true}

1.3 Automatic compression

Although the need to compress and to manually trigger database views, but can also be configured to automatically compressed so as to compress the database and automatically trigger views based on various conditions. In CouchDB's configuration file is automatically compressed.
Daemon is /compaction_daemonresponsible for triggering compression. And enable it to start automatically by default. In the compression section configured trigger conditions for compression.

2 Performance


Regardless of how you write code, even with tens of thousands of documents, usually found CouchDB can perform well. But once you start reading millions of documents, you need to be more careful.

2.1 hard disk IO

2.1.1 File Size

The smaller file size, I / O operations to the less and more CouchDB operating system files may be cached, copy speed, the faster backup. Therefore, you should carefully check the data to be stored in the storage. For example, using a length of hundreds of key characters would be foolish, but if you use only a single character keys, the program will be difficult to maintain. By putting the view to mull duplicate data.

2.1.2 hard drive and file system performance

Use faster disks, striping RAID arrays and modern file systems can speed up the deployment of CouchDB. However, a bottleneck when the disk performance, there is a way to improve the response speed CouchDB server. Documents from Erlang module:
the operating system has a thread support, you can make file operations executed in its own thread, allowing other Erlang processes continue to execute in parallel with file operations. See erl (1) in the command line flags + A.
Setting this parameter to a number greater than zero can make your CouchDB installation remain responsive state, even at very high disk usage period. The easiest way to set this option is by ERL_FLAGSenvironment variable. For example, to provide four threads perform I / O operations for the Erlang, to add the following (prefix)/etc/defaults/couchdb(or equivalent) in which:

export ERL_FLAGS="+A 4"

2.2 system resource limits

One of the problems administrator when deploying larger resource constraints encountered system and application configuration is applied. These restrictions can make to improve your deployment beyond the scope of the default configuration supports.

2.2.1 CouchDB configuration options

delayed_commits

Delayed submission allows better write performance under certain workloads, while sacrificing a small amount of persistence. This setting allows CouchDB submit new data updated after a full second to wait before. If the server crashes before writing the header, all written since the last submission will be lost. Enabling this option at your own risk.

max_dbs_open

In the configuration ( local.inior similar versions), or an address couchdb/max_dbs_open:

[couchdb]
max_dbs_open = 100

This option will be a number of databases that can be open to the upper limit. CouchDB internal reference database access counts, and close the idle database when necessary. Sometimes it is necessary to maintain a speed beyond the default value, for example, the deployment of many databases will be continuously replicated.

Erlang

Even increasing the maximum number of connections allowed CouchDB, by default, Erlang runtime system will not allow more than 1024 connections. Add the following to the instruction (prefix)/etc/default/couchdb(or equivalent document) will increase this limit (in this case, to 4096):

export ERL_MAX_PORTS=4096

Up to 1.1.x version of CouchDB will be created for each copy Erlang Term Storage(ETS) table. If you are using CouchDB version earlier than 1.2, and must support a lot of copy, it should also set ERL_MAX_ETS_TABLESvariables. The default value is approximately 1400 tables.
Please note that on Mac OS X, Erlang does not actually increase the file descriptor limit than 1024 (ie FD_SETSIZE header defines the value of the system).

The maximum number of (unlimited) open file description

Most *nixoperating systems have a variety of resource constraints on each process. These limits are increased due to the method and system initialization particular OS version. Many operating systems default value of 1024 or 4096. On a system with a number of databases or view, CouchDB very quickly reaches this limit.
If your system is set up to use Pluggable Authentication Modules ( PAM) system (almost all modern Linux This is the case), then increase this limit is very simple. For example, the following create a named /etc/security/limits.d/100-couchdb.conffile will ensure that CouchDB can be opened up to 10,000 file descriptors once:

#<domain>  <type>    <item>  <value>
couchdb    hard      nofile  10000 
couchdb    soft      nofile  10000

If you are using Debian / Ubuntu sysvinit script ( /etc/init.d/couchdbyou also need to improve the root user limit:

#<domain>    <type>   <item>  <value>
root         hard    nofile   10000
root         soft    nofile   10000

You may also need to edit /etc/pam.d/common-sessionand /etc/pam.d/common-session-noninteractivefile to add the following line:

session required pam_limits.so

If it does not exist.
For systems based on Linux (such as CentOS / RHEL 7, Ubuntu 16.04 + , Debian 8 or later), assume that you want to systemd start CouchDB, you must also create a file by /etc/systemd/system/<servicename>.d/override.confadding the following:

[Service]
LimitNOFILE=#######

And #######replaced with the file descriptor limit CouchDB allow immediate open.
If your system does not use PAM, you can usually be used in a custom script ulimitcommands to start
CouchDB with increased resource constraints. Typical syntax is similar ulimit -n 10000.
In general, modern UNIX-like systems each process can handle a large number of file handles (eg 100000)
no problem. Do not be afraid to increase system limits.

2.3 Network

Generating and receiving each request / response has a delay overhead. Generally, you should execute the request in batches. Most have some sort of API batch mechanism, usually carried out by providing a list of documents or key in the request body. Please note that the size of the batch is selected. Larger batch requires more time for the customer to the item code JSON, and the more time the decoding of the number of responses. Use your own configuration data and some typical benchmark tests to find the best position. It may be between 1-10000 documents.
If you have a fast I / O system, you can also use concurrently - at the same time having a plurality of request / response. This reduces the assembly JSON, network connection and decoding JSONdelay involved.
Starting CouchDB 1.1.0, compared with the old version, users often report documents a low write performance. The main reason is that this version of the HTTP server library that comes with MochiWebthe latest version of the library by default TCPsocket option SO_NODELAYis set false. This means that the TCP socket to send small data (such as documents written reply to the request (or read very little documentation) response) will not be sent immediately to the network TCPwill be a while cushioning its hope that it will be asked by the same socket to send more data, and then send all the data again to improve performance. You can httpd/socket_optionsdisable this TCPbuffering behavior:

[httpd]
socket_options = [{nodelay, true}]

2.3.1 Connection Limit

MochiWebProcessing CouchDBthe request. The default maximum number of connections is 2048. To change this limit, use the server_optionsconfiguration variables. maxIt represents the maximum number of connections.

[chttpd]
server_options = [{backlog, 128}, {acceptor_pool_size, 16}, {max, 4096}]

2.4 CouchDB

2.4.1 delete operation

When you delete a file, the database will create a new revision, which contains _idand _revfields and _deletedflags. Even after database compression, this revision will remain, so that you can copy the contents deleted. As the document is not deleted as deleted documents could affect the view generation time, PUTand DELETErequest time and the size of the database, because they increase B+Treein size. You can see the number of deleted documents in the database information. If your use case many documents (for example, if you store log entries, message queues, and other short-term data) deleted created, you may need to periodically switch to the new database and delete the old database has expired).

2.4.2 document ID

Size of the database files from your documents and view size, but also depends on your _idmultiple sizes. _idNot only exists in the document, and it and some of its contents CouchDBare used to navigate the file to binary tree first found document is repeated. As an example of a real world, a user switches from a 16-byte ID to 4-byte ID, the database goes from 21GB 4GB, it contains 10 million documents (from 2.5GB to 2GB of JSON original text).
Inserting a velocity ID order (sorted at least) than the fast random ID. Therefore, you should consider yourself to generate id, allocate them in order, and use consume less byte encoding scheme. For example, the base 4 may be 62 digits (10 digits, 26 lowercase letters, uppercase letters 26) to completion of the content 16 requires the hexadecimal numbers.

2.5 view

2.5.1 view generation

When the number of documents to be processed is very low, the use of JavaScript query server generated view is very slow. Generation process so that not even a single CPU saturation, not to mention your I / O up. The reason is that CouchDB server and a separate couchjsdelay query server involved, which shows the importance of the elimination of a significant delay from the implementation.
You can have access to view "obsolete", but make sure when it happens will bring you to respond quickly and when it takes a long time to update the view, this is impractical. (Has a database of 10 million documents takes about 10 minutes to load into CouchDB, the generated view takes about 4 hours).
In a cluster, "stale" requested by a fixed set of services slice, in order to provide consistent results between the user request. This requires the availability of a trade-off - a fixed slice cluster set may not respond the fastest / available. If no such a consistency (e.g., static relative index), you may be specified by a stable = false&update = falseplace stale = okor stable = false&update = lazyreplaced stale = update_after.
View information is not copied - it will be rebuilt on each database, so you can not generate a view on a separate server.

2.5.2 built-out function

If you are using a very simple view function, reduce only perform summation or count, you can simply write _sumor _countto call their native place of function declaration Erlangachieve. This will greatly accelerate the speed, because it reduces the IO between CouchDB and JavaScript query server. For example, as a mailing list, an output (indexed and cached) time view approximately 78,000 items, time reduced from 60 seconds to 4 seconds.
prior to:

{
    "_id": "_design/foo",
    "views": {
        "bar": {
            "map": "function (doc) { emit(doc.author, 1); }",
            "reduce": "function (keys, values, rereduce) { return sum(values); }"
        }   
    }
}

after that:

{
    "_id": "_design/foo",
    "views": {
        "bar": {
            "map": "function (doc) { emit(doc.author, 1); }",
            "reduce": "_sum"
        }
    }
}

3 CouchDB backup


CouchDB can create three different types of files at run time:

  • Database files (including secondary index)
  • Profile ( * .ini)
  • The log file (if configured to log to disk)

The following are sure that all of these files consistent backup policies.

3.1 database backup

CouchDB replication easiest and simplest way is to use CouchDB CouchDB copied to another installation. You can choose between normal (single) or continuous replication copy as needed.
However, you can always CouchDB from the data directory (by default data/) the actual copy .couchfiles without problems. CouchDB secondary index and database storage format additionally ensures that only this method can work.
To ensure the reliability of the backup, it is recommended to backup secondary index (stored data/.shardsunder), then backup the main database file (stored in data/ shards, and the parent data/system at the database level) directory). This is because the update view / secondary index to automate slightly outdated view / CouchDB secondary index when the next read access, but more than two new view or its associated index database triggers completely rebuild the index. This can be a very costly and time-consuming operation and can affect your ability to recover quickly in the event of a disaster.
On supported operating systems / storage environment, you can also use to store snapshots. These advantages in the use of block storage systems (e.g., ZFSor LVMor Amazon EBSalmost instantaneous time). When using the snapshot block storage level, make sure to use OS-level utility when necessary (for example, Linux is fsfreeze) the file system to a halt. If in doubt, check your operating system or the cloud provider's documentation for more details.

3.2 Configuration Backup

CouchDB system configuration data is stored in the configuration directory (by default etc/) in the .inifile. If the configuration is changed at runtime, the configuration file last in the chain will be updated with changes.
After restoring from backup, simply back up the entire etc/directory to ensure consistent configuration.
If you are running when not to make any changes to the configuration via HTTP API, and all profiles by the configuration management system (such as Ansibleor Chef) management, you do not need to configure the backup directory.

3.3 log backup

If configured to log to a file, you might want to back up the log file written CouchDB. Any backup solution of these files can be used.
On UNIX-like systems, if you use log rotation software, you must use the method "copy and then cut off," the. After creating a copy, it will truncate the original log file to zero. CouchDB is not recognized to close its log file and creates a new signal of any signal. Therefore, and due to the difference in file handling functions, in addition to regular CouchDB restart the process, there is no simple log rotation under Microsoft Windows solutions.

Guess you like

Origin www.cnblogs.com/cbkj-xd/p/12079305.html