1 compression
Compression operation is reduced by removing unused hard disk space or a view from the database index file and the old data. Very simple operation similar to other databases (the SQLite etc.) management systems.
During the compression target, CouchDB will create a new file extension .compact, and only the actual data transmission to the file. Therefore, CouchDB first check for available disk space - it should be twice the size of the compressed data file.
When all the actual data are successfully transferred to the compressed file, CouchDB replace the destination file with the destination file.
1.1 database compression
Database compression to compress the database file by file unused portion created during remove the update. Revision is a small amount of old documents called tombstone
metadata Instead, the metadata is used to solve the conflict during replication. You can use _revs_limit
URL configuration stored revision (and tombstone
quantity) of.
Compression is the operation of each database manually triggered, and runs as a background task. To start it for a particular database, the target database need to send an HTTP POST /{db}/_compact
child resources:
curl -H "Content-Type: application/json" -X POST http://localhost:5984/my_db/_compact
If successful, returns an HTTP status code immediately 202 Accepted
.
HTTP/1.1 202 Accepted
Cache-Control: must-revalidate Content-Length: 12
Content-Type: text/plain; charset=utf-8 Date: Wed, 19 Jun 2013 09:43:52 GMT Server: CouchDB (Erlang/OTP)
{"ok":true}
Although not request the use of the body, but must still request with the specified application/json
value of Content-Type
the header. Otherwise, you will know that HTTP Status 415不支持的媒体类型响应
:
HTTP/1.1 415 Unsupported Media Type
Cache-Control: must-revalidate
Content-Length: 78
Content-Type: application/json
Date: Wed, 19 Jun 2013 09:43:44 GMT
Server: CouchDB (Erlang/OTP)
{"error":"bad_content_type","reason":"Content-Type must be application/json"}
When the compression successfully up and running, you can get information about compression through a database of information resources:
curl http://localhost:5984/my_db
HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 246
Content-Type: application/json
Date: Wed, 19 Jun 2013 16:51:20 GMT
Server: CouchDB (Erlang/OTP)
{
"committed_update_seq": 76215,
"compact_running": true,
"data_size": 3787996,
"db_name": "my_db",
"disk_format_version": 6,
"disk_size": 17703025,
"doc_count": 5091,
"doc_del_count": 0,
"instance_start_time": "0",
"purge_seq": 0,
"update_seq": 76215
}
Please note that the compaction_running
field is true
, in fact, indicates that compression is running. To track the progress of compression, you can query _active_tasks
resources:
curl http://localhost:5984/_active_tasks
HTTP/1.1 200 OK
Cache-Control: must-revalidate
Content-Length: 175
Content-Type: application/json
Date: Wed, 19 Jun 2013 16:27:23 GMT
Server: CouchDB (Erlang/OTP)
[
{
"changes_done": 44461,
"database": "my_db",
"pid": "<0.218.0>",
"progress": 58,
"started_on": 1371659228,
"total_changes": 76215,
"type": "database_compaction",
"updated_on": 1371659241
}
]
1.2 compression view
And database views different view of the same database as the need to compress, which is set according to the database view compress each different design documents. To start the compression, need to send HTTP POST /{db}/_compact/{ddoc}
request:
curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_compact/designname
{"ok":true}
This will specify the design of the current version of the document compression index view. HTTP response code 202 Accepted
(similar to the compression of the database), and will create a compressed background tasks.
1.2.1 clean-up view
View index on disk to view MD5 hash defined named. When you change the view, the old index remains on the disk. To clear all obsolete view index (with MD5 view representation of the named file, the file no longer exists), can trigger view clear:
curl -H "Content-Type: application/json" -X POST http://localhost:5984/dbname/_view_cleanup
{"ok":true}
1.3 Automatic compression
Although the need to compress and to manually trigger database views, but can also be configured to automatically compressed so as to compress the database and automatically trigger views based on various conditions. In CouchDB's configuration file is automatically compressed.
Daemon is /compaction_daemon
responsible for triggering compression. And enable it to start automatically by default. In the compression section configured trigger conditions for compression.
2 Performance
Regardless of how you write code, even with tens of thousands of documents, usually found CouchDB can perform well. But once you start reading millions of documents, you need to be more careful.
2.1 hard disk IO
2.1.1 File Size
The smaller file size, I / O operations to the less and more CouchDB operating system files may be cached, copy speed, the faster backup. Therefore, you should carefully check the data to be stored in the storage. For example, using a length of hundreds of key characters would be foolish, but if you use only a single character keys, the program will be difficult to maintain. By putting the view to mull duplicate data.
2.1.2 hard drive and file system performance
Use faster disks, striping RAID arrays and modern file systems can speed up the deployment of CouchDB. However, a bottleneck when the disk performance, there is a way to improve the response speed CouchDB server. Documents from Erlang module:
the operating system has a thread support, you can make file operations executed in its own thread, allowing other Erlang processes continue to execute in parallel with file operations. See erl (1) in the command line flags + A.
Setting this parameter to a number greater than zero can make your CouchDB installation remain responsive state, even at very high disk usage period. The easiest way to set this option is by ERL_FLAGS
environment variable. For example, to provide four threads perform I / O operations for the Erlang, to add the following (prefix)/etc/defaults/couchdb
(or equivalent) in which:
export ERL_FLAGS="+A 4"
2.2 system resource limits
One of the problems administrator when deploying larger resource constraints encountered system and application configuration is applied. These restrictions can make to improve your deployment beyond the scope of the default configuration supports.
2.2.1 CouchDB configuration options
delayed_commits
Delayed submission allows better write performance under certain workloads, while sacrificing a small amount of persistence. This setting allows CouchDB submit new data updated after a full second to wait before. If the server crashes before writing the header, all written since the last submission will be lost. Enabling this option at your own risk.
max_dbs_open
In the configuration ( local.ini
or similar versions), or an address couchdb/max_dbs_open
:
[couchdb]
max_dbs_open = 100
This option will be a number of databases that can be open to the upper limit. CouchDB internal reference database access counts, and close the idle database when necessary. Sometimes it is necessary to maintain a speed beyond the default value, for example, the deployment of many databases will be continuously replicated.
Erlang
Even increasing the maximum number of connections allowed CouchDB, by default, Erlang runtime system will not allow more than 1024 connections. Add the following to the instruction (prefix)/etc/default/couchdb
(or equivalent document) will increase this limit (in this case, to 4096):
export ERL_MAX_PORTS=4096
Up to 1.1.x version of CouchDB will be created for each copy Erlang Term Storage
(ETS) table. If you are using CouchDB version earlier than 1.2, and must support a lot of copy, it should also set ERL_MAX_ETS_TABLES
variables. The default value is approximately 1400 tables.
Please note that on Mac OS X, Erlang does not actually increase the file descriptor limit than 1024 (ie FD_SETSIZE header defines the value of the system).
The maximum number of (unlimited) open file description
Most *nix
operating systems have a variety of resource constraints on each process. These limits are increased due to the method and system initialization particular OS version. Many operating systems default value of 1024 or 4096. On a system with a number of databases or view, CouchDB very quickly reaches this limit.
If your system is set up to use Pluggable Authentication Modules ( PAM
) system (almost all modern Linux This is the case), then increase this limit is very simple. For example, the following create a named /etc/security/limits.d/100-couchdb.conf
file will ensure that CouchDB can be opened up to 10,000 file descriptors once:
#<domain> <type> <item> <value>
couchdb hard nofile 10000
couchdb soft nofile 10000
If you are using Debian / Ubuntu sysvinit script ( /etc/init.d/couchdb
you also need to improve the root user limit:
#<domain> <type> <item> <value>
root hard nofile 10000
root soft nofile 10000
You may also need to edit /etc/pam.d/common-session
and /etc/pam.d/common-session-noninteractive
file to add the following line:
session required pam_limits.so
If it does not exist.
For systems based on Linux (such as CentOS / RHEL 7, Ubuntu 16.04 + , Debian 8 or later), assume that you want to systemd start CouchDB, you must also create a file by /etc/systemd/system/<servicename>.d/override.conf
adding the following:
[Service]
LimitNOFILE=#######
And #######
replaced with the file descriptor limit CouchDB allow immediate open.
If your system does not use PAM
, you can usually be used in a custom script ulimit
commands to start
CouchDB with increased resource constraints. Typical syntax is similar ulimit -n 10000
.
In general, modern UNIX-like systems each process can handle a large number of file handles (eg 100000)
no problem. Do not be afraid to increase system limits.
2.3 Network
Generating and receiving each request / response has a delay overhead. Generally, you should execute the request in batches. Most have some sort of API batch mechanism, usually carried out by providing a list of documents or key in the request body. Please note that the size of the batch is selected. Larger batch requires more time for the customer to the item code JSON
, and the more time the decoding of the number of responses. Use your own configuration data and some typical benchmark tests to find the best position. It may be between 1-10000 documents.
If you have a fast I / O system, you can also use concurrently - at the same time having a plurality of request / response. This reduces the assembly JSON, network connection and decoding JSON
delay involved.
Starting CouchDB 1.1.0, compared with the old version, users often report documents a low write performance. The main reason is that this version of the HTTP server library that comes with MochiWeb
the latest version of the library by default TCP
socket option SO_NODELAY
is set false
. This means that the TCP socket to send small data (such as documents written reply to the request (or read very little documentation) response) will not be sent immediately to the network TCP
will be a while cushioning its hope that it will be asked by the same socket to send more data, and then send all the data again to improve performance. You can httpd/socket_options
disable this TCP
buffering behavior:
[httpd]
socket_options = [{nodelay, true}]
2.3.1 Connection Limit
MochiWeb
Processing CouchDB
the request. The default maximum number of connections is 2048. To change this limit, use the server_options
configuration variables. max
It represents the maximum number of connections.
[chttpd]
server_options = [{backlog, 128}, {acceptor_pool_size, 16}, {max, 4096}]
2.4 CouchDB
2.4.1 delete operation
When you delete a file, the database will create a new revision, which contains _id
and _rev
fields and _deleted
flags. Even after database compression, this revision will remain, so that you can copy the contents deleted. As the document is not deleted as deleted documents could affect the view generation time, PUT
and DELETE
request time and the size of the database, because they increase B+Tree
in size. You can see the number of deleted documents in the database information. If your use case many documents (for example, if you store log entries, message queues, and other short-term data) deleted created, you may need to periodically switch to the new database and delete the old database has expired).
2.4.2 document ID
Size of the database files from your documents and view size, but also depends on your _id
multiple sizes. _id
Not only exists in the document, and it and some of its contents CouchDB
are used to navigate the file to binary tree first found document is repeated. As an example of a real world, a user switches from a 16-byte ID to 4-byte ID, the database goes from 21GB 4GB, it contains 10 million documents (from 2.5GB to 2GB of JSON original text).
Inserting a velocity ID order (sorted at least) than the fast random ID. Therefore, you should consider yourself to generate id, allocate them in order, and use consume less byte encoding scheme. For example, the base 4 may be 62 digits (10 digits, 26 lowercase letters, uppercase letters 26) to completion of the content 16 requires the hexadecimal numbers.
2.5 view
2.5.1 view generation
When the number of documents to be processed is very low, the use of JavaScript query server generated view is very slow. Generation process so that not even a single CPU saturation, not to mention your I / O up. The reason is that CouchDB server and a separate couchjs
delay query server involved, which shows the importance of the elimination of a significant delay from the implementation.
You can have access to view "obsolete", but make sure when it happens will bring you to respond quickly and when it takes a long time to update the view, this is impractical. (Has a database of 10 million documents takes about 10 minutes to load into CouchDB, the generated view takes about 4 hours).
In a cluster, "stale" requested by a fixed set of services slice, in order to provide consistent results between the user request. This requires the availability of a trade-off - a fixed slice cluster set may not respond the fastest / available. If no such a consistency (e.g., static relative index), you may be specified by a stable = false&update = false
place stale = ok
or stable = false&update = lazy
replaced stale = update_after
.
View information is not copied - it will be rebuilt on each database, so you can not generate a view on a separate server.
2.5.2 built-out function
If you are using a very simple view function, reduce only perform summation or count, you can simply write _sum
or _count
to call their native place of function declaration Erlang
achieve. This will greatly accelerate the speed, because it reduces the IO between CouchDB and JavaScript query server. For example, as a mailing list, an output (indexed and cached) time view approximately 78,000 items, time reduced from 60 seconds to 4 seconds.
prior to:
{
"_id": "_design/foo",
"views": {
"bar": {
"map": "function (doc) { emit(doc.author, 1); }",
"reduce": "function (keys, values, rereduce) { return sum(values); }"
}
}
}
after that:
{
"_id": "_design/foo",
"views": {
"bar": {
"map": "function (doc) { emit(doc.author, 1); }",
"reduce": "_sum"
}
}
}
3 CouchDB backup
CouchDB can create three different types of files at run time:
- Database files (including secondary index)
- Profile (
* .ini
) - The log file (if configured to log to disk)
The following are sure that all of these files consistent backup policies.
3.1 database backup
CouchDB replication easiest and simplest way is to use CouchDB CouchDB copied to another installation. You can choose between normal (single) or continuous replication copy as needed.
However, you can always CouchDB from the data directory (by default data/
) the actual copy .couch
files without problems. CouchDB secondary index and database storage format additionally ensures that only this method can work.
To ensure the reliability of the backup, it is recommended to backup secondary index (stored data/.shards
under), then backup the main database file (stored in data/ shards
, and the parent data/
system at the database level) directory). This is because the update view / secondary index to automate slightly outdated view / CouchDB secondary index when the next read access, but more than two new view or its associated index database triggers completely rebuild the index. This can be a very costly and time-consuming operation and can affect your ability to recover quickly in the event of a disaster.
On supported operating systems / storage environment, you can also use to store snapshots. These advantages in the use of block storage systems (e.g., ZFS
or LVM
or Amazon EBS
almost instantaneous time). When using the snapshot block storage level, make sure to use OS-level utility when necessary (for example, Linux is fsfreeze
) the file system to a halt. If in doubt, check your operating system or the cloud provider's documentation for more details.
3.2 Configuration Backup
CouchDB system configuration data is stored in the configuration directory (by default etc/
) in the .ini
file. If the configuration is changed at runtime, the configuration file last in the chain will be updated with changes.
After restoring from backup, simply back up the entire etc/
directory to ensure consistent configuration.
If you are running when not to make any changes to the configuration via HTTP API, and all profiles by the configuration management system (such as Ansible
or Chef
) management, you do not need to configure the backup directory.
3.3 log backup
If configured to log to a file, you might want to back up the log file written CouchDB. Any backup solution of these files can be used.
On UNIX-like systems, if you use log rotation software, you must use the method "copy and then cut off," the. After creating a copy, it will truncate the original log file to zero. CouchDB is not recognized to close its log file and creates a new signal of any signal. Therefore, and due to the difference in file handling functions, in addition to regular CouchDB restart the process, there is no simple log rotation under Microsoft Windows solutions.