Cloud computing fueled non-relational databases ready (reprint)

Abstract: The non-relational databases are attracting attention because they can ignore many of the rules, and these rules is the rich experience accumulated database administrator profound lessons. All Web application designers dream of building applications to run more than one machine, save all data for all users, in order to do that, some of the old rules need to avoid, or even broken.

 

In the old days, when you have the need to store data management solution is simple: Install a formal database, data entry will need to be stored in, let the system help you classify management, and you only need to take the time to choose which home database provider. Now is not the case, some of the new database tools began to flood, given the "Database" these words more meaning, breaking the traditional relational model database. Experienced database administrators called "toy" that they have a very serious threat, and the threat is emerging from these databases. Some arrogant guy for emerging database is useful, fast, to meet the needs of their hand, set the expense of the threat.

Non-relational databases are attracting attention because they can ignore many of the rules, and these rules is the rich experience accumulated database administrator profound lessons. The problem is that now all roads sections of these rules has become a constraint makes it difficult to create a really strong, so that the database system to run with more than one computer. Because all Web application designers dream of building applications to run more than one machine, save all data for all users, in order to do that, some of the old rules need to avoid, or even broken.

Bear the brunt thing is to abandon the old JOIN operation. Students have strict accordance with the requirements of homework, how standardized data, a table will be divided into many parts. At that time very expensive disks, data standardization seems extra important. When the data in question when dispersed on different machines, JOIN operations really so very slowly. Disk space is cheap now, much of the data model does not benefit from the standardization of data, so JOIN operation can easily be abandoned.

Immediately consistency and eventual consistency differences depend on the importance of data to be. The new database will hear those conservatives take heart medicine is usually the bank's programmers, they want to make sure the books balance after the end of each day. After all, the leadership of the bank can not stand due to a failed database transaction accounts which led to the error.

But many modern Web sites will not fail because of a transaction can not be run. I saw Facebook often have glitches. Because the data is not lost on some of the comments can not run. These sites will not be as demanding attention Settlement of accounts like banks, they do not need all the functions of a relational database. (Some people joke that the bank should buy Oracle licenses come up with the money to set up a fund to compensate those people because the transaction failed operation resulting in the loss of money.)

To better understand these layers expand non-relational databases, I picked up a few test built several test applications. They are mainly found in the command operation does not exceed three: insert, update, delete. Some offer a cluster, some can only provide a service, there is some exaggeration to say take over the entire server stack, some better than others AJAX tool database. However, none of them fit, they can not be used for the bank.

This paper I did not introduce a few other interesting databases, one article because of space limitations, and second, because they are few, and I mentioned below is not much difference. For example, Sun company is building a relational database, called the Derby, used in conjunction with a Java virtual machine. Oracle also has its own embedded database, called the Berkeley DB, but is now called Oracle Embedded Database. Some programmers even created a low-cost program library, the object is written directly to disk. These products also extend the meaning of these words "Database", but I'm not going to set forth herein them.

Amazon SimpleDB database

Amazon SimpleDB is one of cloud computing services to promote the plan in the most advanced and most like cloud technology components. Web Service XML file once you sign Amazon's employment service, to obtain the access password, you will be able to contain the key value loaded into SimpleDB go, as long as you continue to pay, it has been storing these data for you. You do not need to install any applications or consider what backup. Amazon's Web service in the wall after it has been hidden all this work for you.

SimpleDB is a two-layered structure. The top level is "domain", the second stage is the "item". After you select the domain name and item, you write the key. SimpleDB has a relatively rich API, has the ability to sort the data, and even have the ability to calculate the number of the item matching the query results. You can even write the query, you can check those values ​​do not start from a specific string. And perhaps SQL and Oracle data we use are very different, but these low-rent database also has its own shortcomings, not even to sort the result set.

SimpleDB is designed and Amazon's Simple Storage Service (S3) used together, but the size of each key value is limited to 1024 bytes. This is a lot of strings, it is enough, but for many of the Content Engine is not enough. So you are stored in the pointer to the data S3.

Now use JOIN operation like this there are some restrictions, requiring multiple calls. Each query can only run for 5 seconds. Results holder 250 only item. Only 250 for each item. There are many common operations is limited, some people began to think about SimpleDB is brought to our lives convenient or trouble.

Amazon began to rewrite the API, an attempt to get more and better certification. To September 2009, the entire SSL runs call, providing security and authentication. Amazon also increased security mechanisms, using more sophisticated hashing algorithms to pack more requests. These are only small improvements Amazon acquired.

The company has also created more libraries, to make use of the service easier. There are many software packages and use the mainstream as well as some rare language combination. Documentation is quite extensive, easy to find. Usually you can quickly start your job, begin storing data used in time has been shortened.

Now the price is also very appropriate. Amazon recently stored in price from $ 1.50 to 25 cents per G byte. The company will charge transparency, it aims to encourage users to plan their spending budget.

Amazon set of advanced forms to address the life issues. There are many provisions to deal with problems you may encounter, there are some that caught my attention. For example, Amazon stated, "We may delete the contents of the recent presence in SimpleDB has not accessed for six months, but do not bear any responsibility." This is just to give people who do the testing system is very easy to accept. From the wording of view, Amazon's move is to head to keep its data center running well.

There are other problems. For example, using a term provisions include a long list of prohibited data, such as "encourage illegal activity," the data with the "race, sex, religion, nationality, disability, sexual orientation, age" discrimination is prohibited. This is a problem. Imagine if carried out anti-gay marriage for a church to run a website. It sounds like you do have sexual orientation discrimination. However, if you are to carry out campaigns gay marriage, against these churches do, you can say at this time that these basic religious discrimination in it?

I say to those being analyzed to deal with these complaints lawyer regretted, but at least they can rest easy, because they know that these data can be for any reason or no reason deleted. If you only use the free service, Amazon will not give you any notice, it will delete your data, but if you are paying customers, has promised a reminder notice 60 days, within the period you will be able to process your data it is good.

 

Google App Engine

On Google App Engine is not essentially a database. He is a kind of cloud technology for distributed Python applications, and it is working with its own hidden somewhere inside the database. First, no it is not possible to access the database through the application layer. But the package a database command and data format requested is not difficult, therefore we can say that App Engine is a database, but the database is attached to an embedded Python language written procedures.

This additional layer of customization is very useful. Many complain about the other "toys" around a missing database operations cause can not find the right results. If you want to add some features of the database here, you can use the Python language developed out of their own. If you want to have JOIN operation, you can own written in Python, but also to customize both memory buffer. This is particularly useful for those users to store their data in Web applications. If you need to increase the security permissions control, limit each user to see what they should see, you can also implement the Python language.

App Engine datastore structure more structural than Amazon's SimpleDB, a large part from its structural Python object model. Key-value pairs are not your storage, but Python objects, which are defined as very similar to SQL mode. You can, index between the columns you need to set the data type for each column. Transaction mechanism and Python also deeply linked, because each transaction is actually a Python function. So there are some oversimplification, because of the Python function or a series of restrictions (such as each data item can only be updated). The good news is Google are creating special data item transaction methods, some common behavior (such as "Create" or "update" line) abstract.

SQL query to retrieve intentionally made similar, in fact, Google provides its own SQL-like language, GQL. When used, GQL query is parsed into. App Engine also tied together with a set of data collection and processing queries based methodologies Python, methods. You do not need to waste query analysis period.

It is worth mentioning that the Python stack includes some of the best features of the database is not available. There is a library to manipulate the image files, cut and Goolge unique "I feel lucky" picture repair function. You can also store data as Goolge documents, spreadsheets and calendar items. App Engine initially looks just like a database, but you can easily extract the data in the Google stack.

Until a few weeks ago, App Engine still in beta, it is free to use. As long as your use of space within the limits of basic, it is still free. In addition, Google and Amazon are charging mechanism is very similar. Stored price cheaper than Amazon (G byte 12 cents per month), the bandwidth is the same charge (10 cents no G bytes).

Google's use of the term accountability is different from Amazon. You need to develop a privacy policy to protect your users' data. If your users in violation of copyright regulations, you must respond to the DMCA (Digital Millennium Copyright Act), you do not do this, Google will do it for you. Google reserves the right to remove content at any time for any reason reserved. "You agree that Google delete, any loss of stored content and Trial deliver content to keep communication without any liability."

These provisions more attention. Now Google is committed to set aside 90 days to make your data removed from the server before deciding to write-off accounts. Other provisions concern on the issue of the DMCA, which makes many people puzzled.

There is such a problem, how to do if you decide to leave Google or Google let you go. Google released a good development tool that allows you to easily test your application on a local machine. Use these tools to test on your machine is no technical problem, unless you do not support class cloud technology functions. Data storage, including testing itself is not automatically replicate themselves, but on their local machine able to perform other functions. As before, there are some legal problems because "the sole purpose of the license is to allow you to use and enjoy the benefits of providing services."

Apache CouchDB database

There is no doubt that we need to use the cloud to enjoy these new services. CouchDB is one of the many open source projects, the project to build a database for storing key-value pairs of. This project uses Erlang language, supported by the Apache Software Foundation. You can download the source files on any machine installation, and then compile and run them. There is no fee to use it, except that you need to spend money to purchase a server.

CouchDB and Amazon's tool is similar, but it has some special features. You still in rows to store key-value pairs, but these key-value pairs can be any standard JSON (JavaScript Object Notation) data types, such as Boolean and numerical type. The range of values ​​is not limited to 1024 bytes long character string, there is a long way to store allowed values, even pattern. All formatted requests and responses to JavaScript. No XML-based Web Services, only JSON.

The biggest difference is written query. CouchDB map functions can be written and reduce functions via JavaScript alone. A simple query is probably just a map function, with a "If" clause to test data is larger or smaller than a certain value. Only when you are trying to calculate statistics from the map functions a query will be used reduce functions. It found that the number of calculated line is very easy to do, but there may be missing some other cool features, because the map function can only be written by JavaScript. I found that in addition to the calculated number of matches, the other non-academic purposes I have yet to figure out. Document contains a rather impressive the reduction function, to merge the statistics, but I do not really know if CouchDB is the right tool for handling this sort of thing, if you need a more sophisticated statistics, it is to insist on proper use of traditional database, access to statistical reports.

 

There are some limitations of this project. Home project called "a distributed, fault-tolerant, free document-oriented database model," without some manual intervention you will not get distributed and fault tolerance. CouchDB has a good-looking AJAX user interface, the form contains a form that allows you to copy a database. But it is not automatic.

CouchDB project will increase access control and security model, but does not show in the form of documents, did not show in the API. They designed the original intention is to use pure JavaScript, to replace SQL, or any other language, it's a good idea, or you will not get lost permission to read the document, you can write a JavaScript function to return the result or true false.

Using pure JavaScript is not a bad thing. When I use these databases, I soon discovered that someone can develop a security model layer on the client, use some good encryption technology. Strengthen security controls at the client, the server will be able to reduce the work, I have some introduction in the "semi-transparent database" in an article.

This feature is being driven by some extreme users as a whole CouchDB server stack. J. Chris Anderson, one of the principal project, wrote an article to prove that all that is required is a CouchDB application server. Business logic and data interaction with the display is written in JavaScript, is downloaded from the packet CouchDB is a JSON.

In Anderson's eyes, when all of the functions can be implemented using JavaScript, using Ruby on the server, Python, Java, PHP is not any great significance. This view might be a bit extreme, because there will always encounter some cases, the client machine can not guarantee the realization of some functions correctly, the client's customers less than what we know. This lightweight tools like CouchDB makes people start thinking about how to complete a job code is really needed.

Persevere database

At first glance, Persevere database like most other databases the same. The key to entry into, it will be stored. However, this is only a beginning. Persevere provides a complete object hierarchy, so that the user can add more structure to the database, more than the previous generation traditional database form. Persevere shown to be more back-end storage device for JavaScript object, JavaScript object is created by like Dojo AJAX toolkit.

Persevere proud of its "schema-free", a feature that makes it very different from other databases. Persevere allows you to arbitrary increases schema. Persevere is not the top of the hierarchy is called a domain (SimpleDB so called), do not call it a document (CouchDB so called), Persevere called objects, it even lets you create subclasses of objects. If you want to violate the rules, you can also insist on certain fields using a certain type, but this is not recommended. Schema rule is optional.

Since Persevere closely connected with the Dojo, Persevere offers plenty of connectivity. You can create a grid, tree widgets, then it was directly linked to the JsonRestStore, widgets let you edit the data. You may be able to remotely access a database through 20 lines of JavaScript code.

I met many of the small misuse, misuse of these may be due to my lack of experience leads to, rather than a potential Bug. When I accurately figure out how to do some of the operations will start correctly. Persevere in itself is not particularly need to know, but AJAX framework is your face directly. Documents from the Dojo AJAX framework is better than most, but you have to spend some time to learn Dojo, in order to grasp hidden behind Persevere surface potential complexity of the problem.

Cloud and Cluster

After trying these databases, I can see why someone would have to call them "toys." Their limited functionality, even if there are new features, but these new features will be bound by your choice. Many times I realized that the world of SQL standard features to make life easier. Many standard SQL-based tools, such as reporting engine can not connect these new databases. MySQL or Oracle databases using these to accomplish many important functions.

However, this does not mean that in the future my project I do not use these new databases. They are solid state data storage, so closely integrated with AJAX, making development easier. In addition, most Web sites do not need all the features of MySQL or Oracle, JOIN-free mode is still useful for many common data structures, including many relationship, one-relational data, or even a foreign key.

Another issue is whether to use cloud technology or build your own cluster. Google and Amzon offer multi-machine service commitment, CouchDB and Persevere can not match. Persevere team says it will expand in the future. But Google and Amazon is difficult to predict how good the promise. If Amazon and Google lost a hard disk how to do? If they lose a rack how to do? They have not made very clear commitments and responsibilities lifetime.

For example, Amazon's terms of repeated statements many times: "We are authorized to access, change, delete, damage, loss of any of your content, applications, or submit your data services account are irresponsible."

I'm not saying blame Amazon or Google, because no one knows who should be ultimately responsible for lost transactions negative. There may be any one of the programmers, it is difficult to determine who actually destroyed. However, we know that more information is better. SimpleDB data is stored on RAID disk? When the earthquake occurred in the same area, while other areas do backups when a hurricane or fire? Online backup community is ready to provide details of such services, but there are no plans cloud this way.

All these concerns we clearly recognize that they are still a toy database, breaking the rules of traditional database, for those who can tolerate the loss of application data is appropriate. They're fun, fast, and very appropriate in terms of price, you can not focus on selected database provider, but rather on how to solve the problem is not how to do the JOIN operation.

 

 

 

Reproduced in: https: //www.cnblogs.com/licheng/archive/2010/09/09/1822091.html

Guess you like

Origin blog.csdn.net/weixin_34346099/article/details/92626975