The Elastic open source agreement has changed, what should users do?

Author: Tomoko AI
January 15, the world's leading big data search and real-time processing company Elastic company CEO Shay Banon announced suddenly issued a document, elasticsearch and Kibana One of the open-source license change will occur .
It is reported that this license agreement change has no impact on most community users who use the default release version for free, and the main restriction is cloud service providers.
Of course, this is not the first time Elastic has changed its open source license agreement. According to experts, Elastic changed its open source license agreement once in 2018.
Similarly, in the open source field, database software MongDB, Redis Lab, graph database Neo4j, etc., modified the relevant open source license agreements in 2018 to change the "blood sucking" dilemma they face in the commercialization of databases.
The modification of the open source software license agreement by Elastic will inevitably have a significant impact on the production of a large number of users who use Elasticsearch and Kibana, especially on public cloud users . How to avoid the impact and ensure stable and continuous operation of applications is a major problem faced by a large number of Chinese users.
In addition, when open source software is widely used, users have to face huge risks brought by open source software protocol modification.

1. China also has its own big data comprehensive search engine Transwarp New Search

As a very popular data search and real-time analysis engine in the world, Elasticsearch is famous. In 2018 alone, its downloads exceeded 225 million people. Many companies around the world use it in a certain form.
The first version of Elasticsearch was released in 2012. After more than 7 years of update and iteration, the Elastic Stack ecosystem has matured day by day, with more and more users in China, and the number of developers applying Elasticsearch is also expanding.
Developers use Elastic Stack to develop flexible software, which is widely used in application scenarios such as search, logging, security protection, operation and maintenance indicator monitoring, database acceleration, and in the Internet and software industry, financial industry and other industries.
In recent years, cloud service providers have been using open source products, modifying their codes, and developing versions of managed (paid) service solutions . However, the modified code will not be accessible as open source code.
At the same time, the commercial behavior of cloud service providers also hinders the commercialization of open source software companies . Under the open source license, how to achieve profitability and achieve healthier development has become the biggest challenge for open source software companies.
Therefore, Elasticsearch and Kibana have undergone major changes in licenses, from the open source Apache 2.0 license to SSPL (Server-side Public License).
As early as 2018, MongoDB changed its license agreement and adopted SSPL (Server Side Public License) to protect open source code and avoid being used by cloud service providers to develop their own SaaS/DBaaS products.
There is no doubt that Elastic changes to the license agreement will have a huge impact on users, especially applications hosted on the cloud. Many countries have listed open source software modification license agreements as a major risk in the development of the software industry, causing panic among relevant users worldwide.
Fortunately, with the strong support of national policies, my country's credit innovation industry continues to grow and develop. In terms of the localization and control of big data comprehensive search, China has launched its own products.
Transwarp, a leading company in China's big data and AI basic software, has launched a big data comprehensive search engine that can completely replace Elasticsearch- Transwarp New Search, the world's leading large-scale statistics and search fusion engine , which can not only complete users' full text Search, relational precise query and analysis requirements, but also better in semi-structured data retrieval, spatio-temporal data retrieval, semantic retrieval, fuzzy retrieval, etc.
Supported by national independent and controllable policies and continuous breakthroughs in Transwarp technology , Transwarp has completed the complete independent research and development of big data basic software . In the future, there will be no open source software Hadoop. Its big data basic software products are Different fields began to replace foreign software such as Oracle and IBM.
More than 2,000 users in different industries have chosen to use Transwarp's self-developed platform to build the underlying big data infrastructure, and empower various industries with relevant business capabilities, including finance, government, energy, manufacturing, transportation, education, etc.

2. New Search is better than blue

Transwarp New Search developed by Transwarp Technology is used to build a big data search engine within the enterprise. New Search supports the storage and retrieval of unstructured data formats such as Word/Excel/PDF/CSV/Internet data/pictures/audio and video. When searching on PB-level data, it can be returned in seconds.
In terms of development interface, New Search provides complete SQL grammar, supports and provides search grammar SQL extensions. Through effective combination with Transwarp's analytical database Inceptor optimizer, developers can develop efficient development without understanding the underlying architecture. Search engine.
Compared with the open source big data search engine Elasticsearch (ES), Transwarp's self-developed New Search has more advantages :
New Search provides a distributed computing engine that can meet the needs of scenarios such as multi-table association and complex aggregation analysis. The problem of inaccurate aggregation results in the open source Elasticsearch provides accurate aggregation.
New Search supports standard SQL, SQL extended search semantics, as well as Oracle and DB2L dialects. It is equipped with Transwarp's own security management platform Guardian and big data management platform Manager to facilitate security management and operation and maintenance management.
In terms of full-text retrieval, New Search supports the storage and search of documents in common formats such as pdf/word/excel, and provides word segmenters for Chinese, Uygur, Tibetan, English, French, Japanese, Korean, German, Spanish, Portuguese and other languages. Natural language processing functions such as article similarity matching, keyword extraction, and abstract extraction.
The New Search spatio-temporal database module supports OGC-defined standard graphic types including point, line, polygon, and collection types, supports tile services based on the WMTS protocol, and supports spatio-temporal database algorithms such as adjoint analysis and trajectory similarity matching.
In addition, New Search has excellent performance in a large data volume and large cluster environment, which has been significantly improved compared to the open source Elasticsearch (ES):
When hardware investment costs are limited, users want to maximize resource utilization, so the number of instances supported by a single node in a cluster is crucial. New Search single node supports single instance 50TB , far more than open source ES single node single instance 10TB.
When the total amount of user data reaches the order of 100 TB-PB, when the open source ES software is used, when the instance exceeds 100 nodes, stability problems such as loss of connection will occur. And Transwarp’s New Search product is specially optimized for large clusters, which can greatly alleviate the problems of node disconnection. However, with the latest generation of New Search, it can still have better performance when supporting more than 200 nodes or instances. Stability .
Big data search has high requirements for high availability. It should be able to guarantee a SLA service level agreement of more than 99%. It can automatically and quickly recover in the case of node abnormalities. When manual operation and maintenance intervention is required, it can provide tools to quickly diagnose and repair the cluster. Open source ES software TB-level nodes usually take several hours to restart, while Transwarp's New Search, TB-level nodes only need a few minutes to restart . The following figure shows the restart time of NS under different storage data sizes. The difference between cold and hot is whether to exclude the operating system pageCache.

The Elastic open source agreement has changed, what should users do?

For businesses with high data dimensions that require multi-table correlation, Transwarp’s New Search has changed the practice of open source ES that does not directly support multi-table correlation operations. To meet a query result, it needs to extract fields from two or more tables. The requirement of multi-table related query of data .
For short and fast queries, open source ES supports a maximum concurrency of 700-800, and is affected by GC, and query performance has glitches. And Transwarp's New Search generation product query process is optimized, one rpc is reduced, the response time is reduced by 30%, and the heap occupancy is reduced through offheap, automerge, cooling and other technologies, which is less affected by GC.
The latest generation of New Search is optimized for thread pool and lucene. The memory usage continues to decrease by 1/3, and the fluctuation of GC is smaller. It not only meets the second-level demand for short and fast queries, but also meets the high concurrency of queries during peak periods . As shown in the figure below, when the storage data of a single machine is 4.5T, New Search greatly reduces the occupation of the heap memory and significantly reduces the GC pressure through the efficient use of off-heap memory.

The Elastic open source agreement has changed, what should users do?

The user's query is mainly for the data in the last N days, and the frequency of querying the old data is relatively low. Faced with this demand, open source ES did not do special processing on hot and cold data, while Transwarp’s New Search optimized the hot and cold data to improve query performance .
When the cluster is large and there are many table fragments, open source ES will cause very high DDL operation delay due to its own PP architecture and balance strategy, resulting in obvious usage lag and high cluster load. The central architecture of New Search can significantly optimize the performance in this area. The test comparison is shown in the figure below.

The Elastic open source agreement has changed, what should users do?
When the daily incremental data increases, users will put forward special requirements for storage performance. How to ensure the performance of the inbound data? The data storage performance of open source ES will gradually decrease as the amount of data increases. Transwarp's New Search product optimizes the storage format and improves performance by 10%-20%; while the second generation product adds bloomfilter index to minimize the impact of storage performance as the amount of data increases, and storage performance continues to improve 30%-70%. In addition, the bulkload function is supported, and massive data can be quickly imported through BulkLoad  .
As shown in the figure below, it is the comparison between the performance of New Search and ES in the batch write test using TPC-DS standard data, the unit is single node/MB/sec.

The Elastic open source agreement has changed, what should users do?

In addition to comprehensively searching for the technology and performance of products, Transwarp has many notable places in the basic software of big data. For example, Transwarp is considered to be the company with the richest product line in the big data field. With many application requirements and scenarios in the big data field, there are more successful cases in China, and it can also provide Chinese users with better localized services and support.

With the continuous development and growth of open source software on a global scale, the influence of open source software is expanding and has become an important force that cannot be ignored in the development of the software industry. The global development of open source software proves that this model is not only a business model, but also a research and development model, promotion model, and industrialization model.
With the commercialization of many open source software and the successful IPO of many open source software, under the temptation of huge profits, open source software companies frequently modify open source agreements, and the risks caused are not only unpredictable. Similarly, open source software may also be affected by trade frictions and trade sanctions. Therefore, with the support of national independent and controllable policies, it is a general trend to develop independent research and development, independent and controllable big data basic software to meet the needs of Chinese companies' big data applications.

It only takes a few minutes for the magnitude node to restart. The following figure shows the restart time of NS under different storage data sizes. The difference between cold and hot is whether to exclude the operating system pageCache.

Guess you like

Origin blog.51cto.com/15015752/2633935