Comparison of Elasticsearch, MongoDB and Hadoop

Elasticsearch application scenarios:
1. If you have millions of documents that need to be located by keywords, Elasticsearch is definitely the best choice. Of course, if your documents are in JSON, you can use Elasticsearch as a kind of lightweight "NoSQL database". But Elasticsearch is not a proper database engine and is not very strong for complex queries and aggregations, although the statistics facet can provide some support for statistics about a given query. Facets in Elasticsearch are mainly used to support faceted browsing.
At present, Elasticsearch has added the function of aggregation
2. If you are looking for a small collection of documents corresponding to a keyword query, and you want to support faceted navigation in these results, then Elasticsearch is definitely the best choice. If you need to perform more complex calculations, execute server-side scripts on the data, and easily run MapReduce jobs, then MongoDB or Hadoop are options.
MongoDB application scenario:
It is a NoSQL database, designed to be highly scalable, and has the function of automatic sharding and some additional performance optimization functions. MongoDB is a document-oriented database that stores data in the form of JSON (to be precise, BSON, with some enhancements to JSON)—for example, a native data type. MongoDB provides a text index type to support full-text search, so we can see the line between Elasticsearch and MongoDB, with basic keyword searches corresponding to collections of documents.
1. Where MongoDB surpasses Elasticsearch is its support for server-side js scripts, aggregation pipelines, MapReduce support, and capped collections. With MongoDB, you can use aggregation pipelines to process documents in a collection, processing documents in multiple steps through a sequence of pipeline operations. Pipeline operations can generate entirely new documents and remove documents from the final result. This is a fairly strong feature of filtering, processing and transforming data when retrieving it. MongoDB also supports the execution of map/reduce jobs on a data collection, using custom js functions to operate map and reduce processes. This guarantees the ultimate flexibility in MongoDB's ability to perform any type of computation or transformation on selected data.
Another extremely powerful feature of MongoDB is called "Capped collections". Using this feature, the user can define the maximum size of a collection - which can then be written blindly and roll-over the necessary data to get logs and other streaming data for analysis.

The Hadoop application scenario
is MapReduce, which is already supported by MongoDB in-place! Is there another scenario dedicated to Hadoop, and MongoDB is just suitable.
have! Hadoop is the old MapReduce, which provides the most flexible and powerful environment for processing large amounts of data. There is no doubt that it can handle scenarios that cannot be processed by Elasticsearch or MongoDB.
To see this more clearly, take a look at how Hadoop abstracts storage using HDFS - in terms of associated computational properties. Through the data stored in HDFS, any job can perform operations on the data, using the core MapReduce API, or using Hadoop streaming technology to directly use native language programming. Based on Hadoop 2 and YARN, even the core programming model has been abstracted and you are no longer tied to MapReduce. Using YARN you can implement MPI on Hadoop and write jobs that way.
Additionally, the Hadoop ecosystem provides an interleaved set of tools, built on top of HDFS and core MapReduce, for querying, analyzing, and processing data. Hive provides an SQL-like language that enables business analytics to be queried using a user-accustomed syntax. HBASE provides a column-oriented database based on Hadoop. Pig and Sizzle provide two more distinct programming models for querying Hadoop data. For the use of data stored in HDFS, you can inherit Mahout's machine learning capabilities into your toolset. When using RHadoop, you can directly use the R statistical language to perform advanced statistical analysis on Hadoop data.
So , although Hadoop and MongoDB also have partially overlapping use cases and share some useful features (seamless horizontal scaling), the two There are still specific scenarios between them. If you just want to pass keywords and simple analysis, then Elasticsearch can do the job; if you need to query documents, and include more complex analysis process, then MongoDB is very suitable; if you have a huge amount of data, you need a lot of different For complex processing and analysis, Hadoop provides the widest range of tools and flexibility.
An eternal truth is to choose the most suitable tool at hand to do things. In the context of big data, technologies emerge in an endless stream, and the boundaries between technologies are quite blurred, which is a very difficult thing for us to choose. As you can see, specific scenarios have the most suitable technology, and this difference is quite important. The best news is that you are not limited to one tool or technique. Depending on the scenario you face, this allows us to build an integrated system. For example, we know that Elasticsearch and Hadoop work well together, using Elasticsearch for fast keyword queries, and Hadoop jobs for fairly complex analytics.
Ultimately, maximum search and careful analysis were used to identify the most suitable options. When choosing any technology or platform, you need to verify them carefully to understand which scenarios this stuff is suitable for, where optimizations can be made, and what sacrifices need to be made. Start with a small pre-research project, and after confirmation, apply the technology to the real platform and slowly upgrade to a new level.
By following these tips, you can successfully navigate big data technologies and be rewarded accordingly.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326718795&siteId=291194637