NoSQL (reprint) enterprise

Source Address: http://www.infoq.com/cn/articles/nosql-in-the-enterprise

English original address: http://www.infoq.com/articles/nosql-in-the-enterprise

 

Primer

As an enterprise architect, one of my habit is to constantly explore a variety of new promising concepts and ideas, to see if it has the potential to deliver value for our services from all walks of enterprise customers. Also for the pursuit of this concept, I focus on the field of NoSQL there for some time, and even from this term (or an error is generated?) Before it began. Google first point in this respect a fire, release papers Big Table architecture , relational database is a silver bullet for this widespread belief questioned, but on Amazon Dynamo papers are followed. In the past year we have witnessed a strong momentum NoSQL, in this area there are as many as 25 products / solutions for publishing and NoSQL the antenna has reached every corner of the industry. In this context, I recently considered in depth in this area, to assess exactly how my clients can benefit from this NoSQL movement. Not only that, I would like to explore for the enterprise, whether this is the right time to seriously consider the adoption of NoSQL.


What is NoSQL-- quick review

Like many people concerned about this area, I do not like the opposition on the nature of SQL and NoSQL term from. I also do not quite satisfied with the existing interpretation of the term "Not Only SQL". For me, it is not whether to use SQL we discussed here. (In contrast, we can still choose this SQL-like query interface (such as lack of support for join) to interact with these databases, using existing resources and the development of technology to manage scalability and maintainability.)  This movement is to find other efficient way to store and retrieve data, rather than blindly regarded as one size fits all relational databases in any case. So, I think the 'Non Relational Database' (non-relational databases) are better able to express this idea.

No matter which adopted the name "non-relational database" that convey the scope of the "cover all" type means that the concept is vague (and also whether stereotypes). This in turn makes people (especially the decision makers in the enterprise) which belong to this range and which are not, and more importantly, for them, what does this mean in the end, was very confused.

To answer these questions, I try to describe by the following features to characterize the inherent nature of "non-relational databases," the.

The so-called "non-relational database" means the

  1. Loose coupling type, can be extended to the logical data pattern model (the Map, columns, documents, charts, etc.) on the data, rather than using a fixed relations mode to construct the data model tuples.
  2. To follow in the CAP theorem (to ensure that any two reached in consistency, availability and partition tolerance among the three) distribution of data across multiple nodes of the model is designed to support the horizontal scaling. This means that the necessary support for multiple data center and dynamic provisioning (transparently to join in the production of cluster / delete nodes), that is elastic (Elasticity).
  3. Have on disk or in memory, or are in the two, the ability of data persistence, sometimes also can use custom storage hot-swappable.
  4. Supports multiple 'Non-SQL' interface (typically more than one) for data access.

Image1.JPG

Around the figure four characteristics (data persistence, logical data model, data distribution model and interfaces) "non-relational database" of various modifications, in some recent articles have detailed description, and there on the Internet widely disseminated. So I do not do too much complicated description, but rather a summary of the key directions by some examples for quick reference:

Interface --REST (HBase, CouchDB, Riak etc.), MapReduce (HBase, CouchDB, MongoDB, Hypertable etc.), Get / Put (Voldemort, Scalaris etc.), Thrift (HBase, Hypertable, Cassandra , etc.), language-specific API ( MongoDB).

Logical data model - for key pairs (Voldemort, Dynomite etc.), and the Column Family (BigTable, HBase, Hypertable etc.), document-oriented (Couch DB, MongoDB etc.), FIG oriented (Neo4j, Infogrid etc.)

Data distribution model - consistency and availability (HBase, Hypertable, MongoDB, etc.), availability, and partition of (Cassandra etc.). Consistency and combinations can lead to the partition of some non-rated node generates a loss of availability. Interestingly there is no a "non-relational database" to support this combination.

Data Persistence - based memory (e.g. Redis, Scalaris, Terrastore), based on (e.g. MongoDB, Riak, etc.), or a combination of both disk and memory (e.g. HBase, Hypertable, Cassandra) disk. Help us to identify the type of storage solution that is suitable for which type. However, in most cases it was found that a combination of solution-based solution is the best option. Not only by high-performance memory data storage support, but after enough data is written to disk to ensure continuity.

How will integration with enterprise IT

Today's enterprises, not all use cases are intuitively inclined to use a relational database, or require strict ACID properties (especially consistency and isolation). In the 1980s and 1990s, most of the data stored in the company database are structured business transaction "record" and must generate or access to controlled manner, but now it has gone . It is indisputable that this type of data is still there, and will continue to also be modeled, stored and accessed by a relational database. But for the past 15 years, with the rise of Web business development, e-commerce and social computing caused in uncontrolled explosion of unstructured data and for information, how to deal with it? Companies do not need a relational database to manage the data, because the characteristics of a relational database determines that it does not apply to the nature and use of these data.

Image2.jpg

The figure summarizes the current pattern emerging web-centric enterprise information management. The "non-relational database" is the best choice to deal with these trends (compared to relational databases), provides support for unstructured data, with horizontal scalability supports partitioning, support for high availability, and so on.

The following are some of the practical support of this view scenarios:

Log Mining - multiple nodes in the cluster will have server logs, application logs and user activity logs. For solving the problem in a production environment, log mining tool is useful, it can access logging across servers, associate them together and analyzed. The use of "non-relational database" to customize such a solution will be very easy.

Analysis of social computing - many companies now are users (internal users, customers and partners) provide the capability to carry out social computing through a message forum, blog, etc.. These mining unstructured data to obtain the user's preferences and tend to further enhance the service has a vital role. The use of "non-relational database" may be a good solution to this need.

External data feed aggregation - companies need to consume data from partners in many cases. Apparently, even after several rounds of discussions and consultations, data from enterprises for the partner's format is still no voice. At the same time, in many cases, change partners based services, but also frequent changes of data formats. By "non-relational databases," or to develop a customized ETL solutions can be very successful to solve this problem.

High-capacity systems EAI - enterprise EAI system has many high-capacity transport stream (whether based on products or custom development). For reliability and auditing purposes, which flows through the EAI systems often require message persistence. For this scenario, the "non-relational database" once again demonstrated that it is well suited for underlying data storage capacity of a given data only give structural changes in the environment of the source and target systems and want.

Front-end order processing system - along with the expansion of e-commerce, through different channels flowing through retailers, banks and insurance providers, entertainment service providers, logistics providers, and so orders, applications, services requested capacity is huge. At the same time, due to behavior patterns associated with different channels of information system architecture used in each case are the differences need to add a different rule types. On top of this, the vast majority do not need real-time data processing and back-end reconciliation. What is needed is, when an end user wants to push the data from any place, these requests can be captured and not be interrupted. Subsequently, there is usually a reconciliation system update to the true rear end of the source system and the end user update order status. This is a can be applied to "non-relational databases," scenario, the initial input may be used to store the end user. This scenario is the embodiment of "non-relational database" excellent example of an application which has a high capacity, the isomeric input data type and Reconciliation "final consistency" and so on.

Enterprise content management services - for a variety of purposes, content management within the enterprise has been widely used across a number of different functional departments such as sales, marketing, retail and human resources. Most of the time the challenges companies face is a common content management services platform, we will integrate the needs of different departments together, and their metadata are different from each other. This is another "non-relational database" comes into play.

Mergers and acquisitions - companies face enormous challenges in mergers and acquisitions, as they need to be adapted to integrate the system with the same functionality. "Non-relational database" to resolve this issue, regardless of the composition of a temporary storage of common data quickly, or is a future architecture of data storage to reconcile the application of existing public structure between the merged companies.

But how can an accurate description, in relation to traditional relational databases, enterprise benefits of using "non-relational database" bring it? Here are some key benefits by central feature of non-relational databases (as discussed in the previous section) is obtained, that any IT decisions will reference the core parameters enterprises - cost reduction, better turnaround time and better quality.

Image3.JPG

Business agility - faster turnaround time

"Non-relational database" can bring business flexibility in two basic ways.

  • Free mode logical data model helps bring faster turnaround times when adjusted to any business, to minimize the impact on existing applications and functions. In most cases arising from arbitrary changes and bring you the migration is almost zero.
  • Horizontal scalability can be when more and more users load caused by cyclical changes in load, or a sudden change of application usage patterns, providing solid protection. Horizontal oriented towards scalable architecture is based on SLA constructs (such as clouds) in the first step, so as to ensure business continuity in the use of changing circumstances.

Better end-user experience - superior quality

In today's enterprise IT, the quality of the application is mainly determined by the end-user satisfaction. "Non-relational databases," the end-user by addressing the following considerations, it is possible to achieve the same effect, and these factors are most likely to occur and the most difficult to deal with.

  • "Non-relational database" to enhance the performance of applications has brought great opportunities. The core concept of distributed data is to ensure that the disk I / O (seek rate) must not become a bottleneck in application performance. Although more properties are determined by the transmission rate. On top of this, most of the solutions support a variety of next-generation high-speed computing paradigms, such as MapReduce, sort columns, Bloom Filter, only added to the B-tree, Memtable and so on.
  • Another important aspect of today's customer satisfaction is reliability. End users want to be able to access when you want to access the application, and at least at the time when they can be assigned to perform their tasks at any time. So the application is not available at the need to avoid price. Many modern "non-relational database" can adapt to and support this class has a strict and eventual consistency of availability requirements.

Lower total cost of ownership

In today's competitive market, corporate IT spending should be carefully reviewed at any time, at a reasonable cost to obtain reasonable quality was commendable. "Non-relational database" is better than traditional database to some extent in this field, especially when the data storage and processing capacity is large.

  • The basic premise of the level of scalability ensures that they can run on low-cost machines. This will not only reduce the cost of hardware resources, but also reduce operation and maintenance costs such as electricity, maintenance and so on. At the same time it further for use as a cloud, virtual data centers and other next-generation low-cost infrastructure foundation.
  • In the long run, less maintenance can bring more operation and maintenance cost advantages. For relational databases, this is definitely a need to store large volumes of data scene. It requires great skill as a high-capacity data database tuning, which means higher costs. In contrast, "non-relational database" always available and features fast response, even in the case of a substantial increase in data. Index and cache also works the same way. Developers do not have to worry too much about the hardware, the disk, re-indexing and file layout, but put more energy into the development of the application.

Enterprise adoption challenges faced

Despite all these long-term benefits, before corporate hug "non-relational database," of course, also you need to experience a variety of challenges.

Image4.JPG

Irrespective of the resistance from the top due to the conversion of existing ideas and lack of confidence generated by the current I think the most important tactical challenges are:

As "non-relational database" that the correct application / usage scenarios

Although theoretically easy to demonstrate that not all corporate data needs based on the relationship and ACID systems, however, due to inter-relational database and enterprise data binding relationship for many years, to make all the data can be decoupled by non-relational solutions the decision still has many difficulties. Many times I managers (at all levels as well as for other applications have a core responsibility of the bottom line) do not understand what they will lose, such concerns for the transition from a relational database less favorable. IT enterprise's most valuable asset is the data. Therefore, to make the decision to use a less clear or has not been widely adopted solutions to manage the same data, this ability not only need to convert the way of thinking, but also need strong support from the top (and push).

How do we choose the most suitable for our products / solutions

Another major challenge is to find the right products / tools to provide "non-relational database." As previously mentioned, the industry today there are more than 25 different products and solutions, they have different characteristics in four aspects. Because each product in these four areas of different characteristics, so choose a product to address all of the needs is particularly difficult. Sometimes likely to use various types of non-relational databases in different parts of the enterprise, the last one might need for standards in favor of a relational database is completely out.

How to get economies of scale

The former is a problem on the idea of ​​branching out in nature. If an organization requires the use of multiple non-relational database solutions (due to the application of a single program), then ensure that the technology (developers, managers, support staff), infrastructure (hardware costs, software licensing costs, support costs, consulting costs), and the workpiece (common components and services) aspects of the economies of scale is a big problem. This aspect of traditional relational database solution compared to more severe indeed, most of the time because the data is stored tissue are based on a shared services model in the running.

How do we ensure portability solutions

From the development of "non-relational database" point of view, we can be very intuitive to assume that there will be many changes in this area in the next few years, such as vendor consolidation, functional improvement and standardization. So for businesses, a better strategy is not to bet on a particular product / solution that can later switch to a more flexible and better proven products. Now that the non-relational products / solutions mostly private, so IT decision-makers before you consider trying to "non-relational databases," and had to seriously consider this important issue portability. This is purely for the need to protect existing investments.

How do we get the right type of product support

Now the "non-relational database" by external organizations while providing very little support programs. Even if there is, it can not be compared with Oracle, IBM or Microsoft. Especially in data recovery, backup and restore specific data, since the number of "non-relational database" in these areas failed to provide a robust and easy-to-use mechanism for corporate decision makers, there is still a big problem.

How do we estimate the overall cost

Compared with the heavyweight relational database, "non-relational database" data is usually characterized in terms of performance and scalability can provide less. I have not found a procedural TPC benchmarks and similar data in other areas. This business decision-makers will be placed under a "no direction" of the situation, because they do not need to know in terms of hardware, software licensing, infrastructure management and support costs and other expenses much. To arrive at a budget estimate, the lack of data judgment has become a major obstacle. Therefore, the project start-up phase, policymakers in most cases will choose the relational database solution based on the familiar.

Sometimes, even if these numbers can be obtained, but not sufficient to form a TCO model and compared with a conventional overall cost analysis (Capex + Opex) based on data stored in a relational database and non-relational data store. Normally the level of flexibility required for a large number of hardware machines (as well as software licensing costs, support costs), if Zhayi comparison, will make people feel jittery and vertical scalability, unless the resulting benefits through model-based TCO comprehensive comparison still prove to be sustainable.

Two Thoughts on how to adopt the NoSQL

Does this mean that now, companies should wait and see attitude on the NoSQL movement it? it's not true. Indeed, the "non-relational database" to widespread adoption is yet to fully mature stage. But "non-relational database" as the future potential of the enterprise backbone can not be ignored. Especially in the near future more companies will handle large volumes of semi-structured / unstructured data as well as eventual consistency, rather than relatively small capacity, strictly follow the structure of the ACID data. So now is critical to do ideological work is the key decision-makers in business, so they understand the business data processing requires the use of "non-relational database." In this process, to take some progressive steps to "non-relational database" applied to some key aspects (technology, people and processes) enterprise IT, and have some value. In this way, it can come from a whole range of issues to resolve before we summed up in a slow and steady manner.

Image5.JPG

The use of a product / solution

Now choice in the market is very diverse, according to different "non-relational databases," focuses on the face and treated differentiation. At the same time, enterprise application scenarios may require different types of characteristics. However, different solutions to deal with different application / usage scenario from the point of view of economies of scale for the enterprise is not appropriate. So it is best according to the needs of the target application to the final implementation of certain specific products / solutions program. Note that most of the solutions on the properties there will be some compromise, some features may be available in other products, some may just be a roadmap which sets a temporary location. Because most of the products will continue to mature in the near future, it is possible to provide different solutions for different configurations. So long as the existing solutions most suitable for the current needs, as a starting point might be adopted.

Select a rule of thumb products / solutions

  • Needed to support logical data model should be given a higher weight. This will essentially determine the solution in the current or future ability to flexibly adapt to different business requirements.
  • Survey physical data model supported by the product's proper or not, accordingly to make a reasonable assessment of the level of scalability, reliability, consistency and partition of the solution needed. This also can indicate the possibility of backup and recovery mechanisms.
  • Interface supports need to be aligned with corporate standard operating environment. Since these products support a variety of interfaces, so you can get a good deal.
  • As long as the product supports horizontal scalability for persistence model of choice is no longer important.

Here is a series of "non-relational databases," the table. For now the company is seriously considering the use of, this is a good place to start. To get close to the situation of the enterprise itself, the key selection criteria selected from the collection of the 25+ subset used are:

  1. The most important thing is the first enterprise application must support a certain complexity of the data structures. Otherwise, the complexity of the responsibility for application management will become very large. I think it should be more reasonable between pure key intermediate of a scheme and the relational model. For this consideration as Vlodemort, Tokyo Cabinet and other products on to exclude it from my list.
  2. The second is cost fragmentation / partition level to provide support for high-capacity data. The lack of such support makes solutions with any relational database is no different. So like Neo4J (although he has a wealth of model-based graph), Redis, CouchDB and so at this moment I would be filtered out of the list beyond.
  3. The last criteria, before I would consider the promotion of enterprise-class commercial support to some extent. Otherwise, if a problem occurs in a production environment, I go looking for? For this, I will now have some of the star products excluded, such as Cassandra (although much more likely in the near future Rackspace or Cloudera will offer its support, as it has been used in some of the production environment inside the such as Twitter, Digg, Facebook).

With these filtering criteria, I can streamline the list, in line with the current business products are available  MongoDB  (the next version will provide support shards),  Riak , Hypertable and HBase . The following table summarizes the main characteristics of these four products. A business can be based on their own specific situation from which to choose, find properties that best suits your needs.

characteristic

MongoDB

ripple

HyperTable

HBase

Logical data model

Rich documentation, and provides support for embedded documents

Rich document

To the family (Column Family)

To the family (Column Family)

CAP support

THAT

Of

THAT

THAT

Dynamically add and delete nodes

Support (soon will join in the next release)

stand by

stand by

stand by

Multi-DC support

stand by

not support

stand by

stand by

interface

A variety of language-specific API (Java, Python, Perl, C #, etc.)

JSON over HTTP

REST,Thrift,Java

C++,Thrift

Persistence model

Disk

Disk

Memory plus disk (adjustable)

Memory plus disk (adjustable)

The relative performance

Better (written in C ++)

Optimal (written in Erlang)

Better (written in C ++)

Excellent (Java written)

Business Support

10gen.com

Basho Technologies

Hypertable Inc

Cloudera

Data access abstraction

Create a separate layer of abstraction for the "non-relational database" is necessary for data access. It can bring many benefits. First, application developers can detail solutions to the underlying completely isolated. This scalable technology brings benefits. At the same time in the future if you need to change the underlying solution is also very convenient. This is also a way to meet the standard requirements of a plurality of applications (i.e., removed SQL complex features Join, Group by, etc.).

Create a model for the performance and scalability

Regardless of the selected solution, using standard techniques (such as queuing network model , the hierarchical queuing networks , etc.) to model the performance and scalability are highly recommended. It can provide the necessary data for the basic server planning, topology and overall software license costs, operational management and so on. This will essentially be the main reference data for all budget plans and make decisions to help.

Construction explicit redundancy

To prevent data loss, in addition to copy the data to the backup server, there is no other way. Although many non-relational databases provide automatic replication capabilities, but there is still the risk of single point of failure of the primary node. So it is best to use a backup secondary node, and ready for data recovery, and automatic data recovery scripts. For this purpose, we should fully understand the objectives of the program to address physical data model, to identify possible alternative recovery mechanisms, to make an assessment of these options are based on the overall demand and business practice.

Construction of public data services platform

Like relational database shared public services, like, you can also build non-relational database of public data services to promote the economies of scale to meet the needs of infrastructure and support. This further evolution and future changes can also help. This can be used as the ultimate goal on the wish list, by the mid or long-term efforts to achieve this level of maturity. However, the initial stage of the establishment of such a vision will help make the right decisions throughout the process.

Growth of enterprise technology

Every organization has some people to learn new life and non-traditional things are filled with enthusiasm. The establishment of such a group, and select personnel (full-time or part-time), pay close attention to the movements in this regard, understand the issues and challenges, forward thinking were able to provide direction and assistance to projects using these technologies. At the same time, the team can also speculation policy makers to clarify suspicions, providing perspectives from real data.

Establish a relationship with the community of product

After selecting the product, and the product community to establish good relations for the success of the two sides have great benefits. Many non-relational database currently has a very active community, very willing to help each other. Good cooperation between the business community and we can bring a win-win situation. If they can have advance knowledge of the problems and solutions, then the business when making decisions on certain features or versions can be fully confident. In turn, companies can have an impact effect on the characteristics of the product roadmap, which they themselves and their communities is beneficial. On the other hand, the community also from the practical level problems get feedback, so as to enrich and improve the product. Success Stories from large enterprises also allows them to be in the lead.

Iterative go - ahead

Taking into account non-relational database relative maturity, minimum risk strategy is to follow the adoption of iterative development methodology. Construction of public service platform for data standardization and data access abstraction can not be achieved overnight. On the contrary, by iterative reconstruction and for ways to better achieve our goals. Use less mature technology transition process, changing the way solutions will not be too unexpected. At the same time, agile way to look at things that can help to establish and implement a management from both continue to attract improve openness.

However, achieving iteration on this issue, very important point is to define the conditions for a decision matrix. Such as operating instructions (and examples), to determine an application's object model is suitable for a range of relational or non-relationship, provide guidance for infrastructure planning, listed the necessary test cases and so on.

Conclusion

Enterprise non-relational database using ideas during the biggest challenge is to transform policy makers - to convince them that not all of the data / object for relational databases. The best proof of this is to choose the appropriate use cases to try non-relational database, and then confirmed in the proper context, non-relational database is more effective than relational database solutions. Find some "non-business-critical" (but can be immediate) project suitable for non-relational databases. The success of these projects (or even failure) can contribute to the concept of change. It also helps to learn how to continue in a different way and better use of non-relational database. These children toddlers like to try the efforts and investment is worth it, if companies want to use "non-relational databases" in the future to reshape its information management system words.

About the Author

Sourav Mazumder currently InfoSys Technologies chief technology architect. He has over 14 years of experience in the field of information technology. As a key member of Infosys Technologies Advisory Group, Sourav as Infosys major customers in the US, Europe, Australia and Japan, to provide the insurance industries, telecommunications, banking, retail, security, transport as well as architecture, engineering, construction and other services. He has been involved in technical architecture and roadmap Web project definition, SOA strategy, international strategy defined, UI component-based, performance modeling, scalability analysis, unstructured data management. Sourav reference Infosys its core banking product Finacle, also provided him with a wealth of experience in product development. Sourav Infosys was also involved in the development of reusable J2EE framework, and custom software engineering methods in architecture and development of custom applications in Infosys. Sourav's experience also includes work in terms of governance and compliance framework to ensure that development projects are.

Sourav is iCMG certified software architect and TOGAF 8 certification performer. Sourav recently gave a lecture in Berkeley LISA conference on globalization. Sourav latest white paper on SOA is very popular in the community.

Sourav current interest NoSQL, Web 2.0, governance, performance and construction of globalization.

 

Reproduced in: https: //www.cnblogs.com/licheng/archive/2010/09/09/1822068.html

Guess you like

Origin blog.csdn.net/weixin_33779515/article/details/92626933