[Observation] Xinghuan Technology: Layout the industry's large-scale model track and accelerate the process of localization substitution

The large models represented by ChatGPT and GPT have already formed a "tsunami effect" in China, and almost all technology companies are trying to find ways to enter the track of large models. The core driving force behind it is that the greatest value of large models lies in the general improvement of personal productivity, and companies from all walks of life are actively looking for opportunities to apply large models and generative AI, hoping to better improve the productivity of all employees .

Transwarp Technology, known as "the first stock of domestic big data basic software", will naturally not miss this feast of large models—Xiangxingli Future Data Technology Summit (FDTC) held in late May At the event, the strategic layout of the large-scale model of the Transwarp Technology industry was fully unveiled. This time, Transwarp Technology not only released the full-stack software tool Sophon LLMOps for large-scale model application construction, but also launched the industry's first financial large-scale model based on its own accumulation in the field of industrial applications. "Wuya" and big data analysis model SoLar "seek" two large industry models.

8f88aaebfa2e789010a9f81478320594.jpeg

At the same time, in order to comply with the general trend of future data processing towards "intelligence, multi-modality and civilianization", Transwarp Technology has also carried out continuous iteration and evolution in the field of big data basic software, and officially released the realization of "lake, warehouse TDH 9.3, a multi-model big data basic platform integrating ", set"; distributed analysis database ArgoDB 6.0, distributed transaction database KunDB 3.2; TimeLyre 9.1, a high-performance time-series database for scenarios; Sophon, a new-generation intelligent analysis full-process platform proposed by Liuyi, Sancang, and Two Centers; products such as Navier 3.1, a data element circulation product, can complete the domestic substitution of corresponding foreign products and help more Chinese enterprises can better realize the transformation of data and intelligence, and find more new models, new formats and new services in the new era of digital economy.

As Sun Yuanhao, founder and CEO of Transwarp Technology, said: "In the future, everyone will be a data scientist, and the way humans interact with data will undergo major changes. The interaction methods of natural languages ​​such as language and video will gradually become popular. Everyone will A 'virtual business assistant' is needed." Because of this, Transwarp Technology hopes to help users cope with the challenges of the large-scale model era through continuous technological innovation, and at the same time better release the new value generated by massive data.

Strategic layout industry large model track

Undoubtedly, large models (also known as pre-training models, basic models, etc.) are the product of a typical combination of "big data + large computing power + strong algorithms", and also a "hidden knowledge base" that condenses the inner purification of big data , and it is a general carrier to realize various artificial intelligence applications, and its importance can be seen.

With the accumulation of technology over the past few years, it can be seen that the current large model is also rapidly integrating with enterprise applications, reshaping the interaction between people and data in enterprise applications, and at the same time generating greater and greater commercial value. The most representative one is that in the field of natural language processing, OpenAI's GPT-4 model has been used for various tasks, including text generation, question answering and language understanding; and in the field of computer vision, Facebook's DETR model has also been used It is widely used in image recognition tasks.

42d0cb08c2f45b4f0e76f70e5044a337.jpeg

Transwarp Technology, as a deep-rooted big data basic software, naturally needs to keep up with the huge business opportunities brought by large models, but how to quickly and better cut into the track of large models? To this end, Transwarp Technology's approach is to strategically deploy the large-scale model market in a "two-pronged approach", that is, the tool chain for large-scale model construction and the two major areas of industry large-scale models. Specifically:

On the one hand, in terms of the large model tool chain, Transwarp has launched Sophon LLMOps, a continuous improvement and development tool for large models, to realize the training, listing and iteration of large models in the field. Sophon LLMOps mainly serves the developers of large models, helping enterprises to quickly build their own industry large models, and form artificial intelligence applications with "new human-computer interaction" and "agile and sustainable iteration" through the large model infrastructure.

In this regard, Sun Yuanhao said: "We found in practice that the gap between the large model and the application is too great, it can be said that it has become a barrier to the development of the entire industry, and the large model itself also requires the accumulation of industry knowledge. Based on this, Transwarp's strategy is to provide industry users or partners with a tool to help them build large models faster, and with their own industry knowledge, the combination of the two can better create Develop large-scale model applications suitable for each industry."

It is worth mentioning that Sophon LLMOps has been researched and developed for more than six years. Compared with the previous MLOps, the newly released LLMOps has greatly enhanced the ability of large models, including Sophon LLMOps has its own sample warehouse capabilities , covering training data development, inference data development, data maintenance, etc., cleaning, exploring, enhancing, evaluating, and managing the original data, sample data, and prompt word data involved in the large language model.

On this basis, Sophon LLMOps also has model operation and maintenance management capabilities. In addition to the six "unifications" of traditional MLOps - unified management, unified operation and maintenance, unified application, unified monitoring, unified evaluation, and unified interpretation, it is aimed at large language For model fine-tuning, continuous improvement, evaluation, alignment, etc., Sophon LLMOps also provides scheduling and optimization support from computing frameworks and tools to computing, storage, and communication.

In addition, Sophon LLMOps also has the ability to orchestrate, schedule and launch large language models and other tasks, and provides Agent, Ops, DAG, combined with various big data and database products of Transwarp Technology, such as distributed vector database Hippo and graph database StellarDB, etc., can arrange different large language models, traditional machine learning, and other processes into tasks that meet the actual domain and business needs of users, and provide services for customers.

On the other hand, in terms of large-scale industry models, unlike other companies that mainly focus on general-purpose large-scale models, Transwarp Technology focuses on the field of large-scale models in the financial industry, launching the industry's first financial large-scale model "Wuya", big data analysis Large model SoLar "Quest".

It is understood that Transwarp Technology has been deeply involved in the financial field for a long time, serving a large number of financial industry customers, and has accumulated millions of corpus in the financial professional field. The financial event training instruction set, and the two together form a solid foundation for Transwarp to develop a large language model in the financial field.

Xinghuan Wuya is a generative large language model oriented to the field of financial quantification and with ultra-large-scale parameters. The large model uses millions of professional financial corpora, covering high-quality natural language texts such as research reports, announcements, policies, and news as the secondary pre-training corpus of the basic large model, making Wuya capable of including basic Accurate comprehension ability in the field of financial general knowledge including face, technology, and news.

At the same time, Xinghuan Wuya has also constructed six types of large-scale model basic factor sets including policy, public opinion, ESG, risk, volume and price, and industrial chain. It is good at dealing with various issues in the field of financial quantification, such as policy and research report analysis, Strong comprehension and generation capabilities in news interpretation, event summary and deductive reasoning; able to comprehensively review, disseminate and deduce various market events such as stocks, bonds, funds, commodities, etc.; able to generate alternative strategic factor sets, Construct a three-dimensional attribution explanation system. Through multi-modal perception + event-driven + depth map calculation, Wuya large model expands the perspective of investment research from multiple aspects such as time and space, depth and breadth, and realizes a new paradigm of intelligent quantitative investment research.

The big data analysis big model SoLar "Quest" is a big data field big model for various scenarios in the whole life cycle of the big data industry, which can derive numerous fine-tuning big models for sub-fields and sub-tasks. According to the plan, the large model of "Quest" will be equipped with big data industry demand understanding, reasoning, various (including multi-model) structured query languages ​​and OpenCypher code generation, Python/R and other common data analysis program code generation, Query rewriting, intent recognition , text generation, embedding vector generation, knowledge reasoning and other capabilities; as long as users use natural language, they can use the "search" large model to obtain the required data analysis, display and report.

Looking back, behind the hotness of the large-scale model track is the dual promotion of market demand and technological progress. The core reason is that with the acceleration of digital transformation and the growth of intelligent demand, AI technology is more and more widely used in various industries. At the same time, the AI ​​large model technology itself is constantly innovating and breaking through, showing a diversified and diversified development trend. From this point of view, whether it is the large-scale model application construction tool Sophon LLMOps, the industry's first financial large-scale model "Wuya", and the launch of the big data analysis large-scale model SoLar "Quest", it marks that Transwarp Technology is "advancing with the times". "The strategic layout of the industry's large-scale model track, and behind it is the embodiment of Transwarp Technology's accumulation in the field of big data for many years. The smart industry moves forward.

Multimodal Data Exploration and Innovation

Gartner predicted in 2017 that multimodal data management will become the main development trend in the future, but its development speed is far beyond people's imagination. At present, it can be seen that multimodal data management has gradually become the choice of mainstream databases. It is precisely because of this that at this Future Data Technology Summit (FDTC), in addition to the large-scale model launched by Transwarp Technology, its exploration and innovation in the field of multi-modality is also a highlight.

c0f734d514cd642979278728cc8bca09.jpeg

First, in the field of vector databases. We know that the current common problem of large models is that due to the lack of richness and immediacy of their training data, it will seriously affect the generalization effect of the model, resulting in its "serious nonsense", which limits its use in the vertical field. practicality. Although the human feedback reinforcement learning mechanism (RLHF) was used to allow the model to adjust the wrong output results, this method cannot completely solve the problem of large language models, and the vector database is expected to resolve this problem. It uses vector embedding In this way, authoritative and credible unstructured data are converted into vectors and stored in the database, which can help large models build "long-term memory" and reduce the possibility of errors in model-generated content.

In order to adapt to this new transformation, Transwarp Technology has launched a self-developed vector database Transwarp Hippo, which expands the time and space dimensions of large language models. As an enterprise-level cloud-native distributed vector database, Transwarp Hippo supports storage, indexing and management Massive vector data sets can efficiently solve problems such as vector similarity retrieval and high-density vector clustering.

Sun Yuanhao said that unlike open source vector databases, Hippo has the characteristics of high availability, high performance, and easy expansion. It supports multiple vector search indexes, data partitioning, data persistence, incremental data ingestion, and vector scalar fields. Functions such as filtering and mixed query can well meet the high real-time query, retrieval, recall and other scenarios of enterprises for massive vector data.

Second, in the field of graph databases. It can be seen that driven by the application of large models and the increasing development of graph neural network technology, it not only provides a fertile ground for the development of graph intelligence, but also provides exploration opportunities for the combination of image library and graph intelligence. Based on this, Transwarp also officially released StellarDB 5.0, an enterprise-level distributed graph database for high-performance analysis, graph intelligence, and multi-model fusion.

StellarDB 5.0 optimizes the storage and computing engine, newly designs the underlying data storage structure, and optimizes the TEoC compiler. At the same time, it deeply optimizes the multi-scenario computing framework, realizing from real-time scenarios to relational analysis scenarios, to graph algorithm analysis The all-round performance improvement of the scene greatly improves the business efficiency of customers. The data shows that StellarDB 5.0 has achieved a 5-fold improvement in real-time short query scenarios, tens of thousands of QPS with high concurrency, an average 8-fold performance improvement of nearly 50 supported graph algorithms, and a 10-fold improvement in multi-degree relationship scenarios, solving the problem of infinite expansion.

At the same time, StellarDB5.0 implements the dynamic graph function, which records all the history of graph data changes, and can query the historical changes of graphs according to time points. By visualizing the time axis of dynamic graphs, it is possible to analyze graph data changes clearly, intuitively, and conveniently, helping users to more easily discover the laws behind the graphs. For example, in financial anti-fraud applications, the dynamic changes in the graph structure can represent the changes in personnel and transaction relationships of fraudulent gangs, thereby helping business personnel to perform data analysis and prediction more accurately and efficiently.

In addition, StellarDB 5.0 can also connect with ZenGraph, a deep graph framework developed by Transwarp Technology, to deeply integrate graph database technology and deep graph technology. Utilize the advantages of the graph database to quickly read and write back graph data, provide fast sub-graph filtering capabilities, and improve the processing efficiency of the entire data analysis link; the ZenGraph depth graph framework can provide different depth graph models for different business scenarios, Satisfying diverse business scenarios, compared with traditional graph algorithms, it can mine and learn more feature knowledge from graphs and make predictions more accurate.

In Sun Yuanhao's view: "As large models move towards more application scenarios, the superimposed application of vector databases and graph databases can better build large models, but it must also be noted that vector databases and graph databases alone are also Far from being enough, the future direction must be multi-modal. Therefore, the core of Transwarp Technology’s future technical route is to support more multi-modal data management on one platform, so that each database can be better. Serve the application of large models well.”

Finally, in terms of time-series databases and spatio-temporal databases, facing the large amount of time-series data and the high demand for data analysis, open source systems do not support cluster deployment, the scale of storage and computing data is limited, complex analysis is not supported, services are unstable, and lack of security and reliability. To solve problems such as controllability, Transwarp Technology launched TimeLyre 9.1, a high-performance distributed time-series database for multiple scenarios. This database can achieve a compression rate of 5-20 times on common time-series data, which is far superior to traditional databases. Data compression is high, The available capacity of single-node disks is high, which can greatly save costs.

In terms of spatio-temporal database, Transwarp also launched Spacture, a distributed spatio-temporal database. This database is "born for space and for change". Multiple standard support, mainstream ecological compatibility and other features can be applied to time series remote sensing analysis, urban expansion change detection, meteorological business support, global high temperature weather forecast, trajectory analysis, ship trajectory range retrieval, ubiquitous spatial analysis, lake area statistics, spatial Scenarios such as aggregation statistics.

In fact, the vector database, graph database, time-series database, and even the space-time database released by Transwarp Technology this time are just the "miniature" of Transwarp Technology's continuous adherence to technological innovation in the field of multi-modal databases for many years. It also confirms Transwarp's attitude and spirit of bravely entering the "no-man's land" of databases for many years, and its exploration and practice in the field of multi-modal databases is also invaluable for promoting the value of technological innovation in the field of databases in China.

Accelerate the process of "localization substitution"

At the beginning of this year, the state clearly pointed out that "it is necessary to do a good job in the localization of scientific and technological equipment, operating systems and basic software, encourage scientific research institutions, universities and enterprises to carry out joint research, improve the level of localization substitution and application scale, and strive to realize the early realization of my country's independent Research platforms, instruments and equipment to solve major basic research problems."

In this regard, as a provider of big data basic software, Transwarp is also committed to accelerating the localization process of big data basic software, and insists on making domestic big data basic software bigger and more solid. Sun Yuanhao told me: "The domestic big data technology stack, Transwarp Technology is now relatively mature. We have developed this field for about ten years. Technically, Transwarp Technology can completely replace foreign big data products. In terms of function and performance, it is basically one generation ahead of overseas products, and Transwarp Technology has also accumulated a lot of successful localized substitution cases.”

23d329c3b97f8d5ca28abc5777f457c6.jpeg

First, in the field of big data basic platform, Transwarp Technology's self-developed big data basic platform TDH and Transwarp Data Cloud Platform TDC can perfectly replace CDH/HDP and CDP, and improve functions, performance, stability and ease of use , scalability, reliability, security, domestic ecological support and other capabilities, provide a variety of model support capabilities, performance improvement can reach 5 to 100 times, and the original factory's professional service capabilities are stronger.

Not only that, but the newly released Transwarp THD 9.3 and TDC 3.2 also lead the development with new-generation lake warehouse storage, multi-model unified architecture, comprehensive performance improvement, container-based resource management technology, multi-tenancy and other technologies. The data also shows that in terms of performance, Transwarp's basic software products have self-developed high-performance distributed computing and storage engines, and the overall performance is 5-25 times that of CDP. The overall price/performance ratio is 20 times that of DB2 and 100 times that of TD ; in terms of security, the technologies provided by Transwarp's basic software products, such as container isolation, disaster recovery, access control, federated learning, privacy protection, and trusted computing, guarantee network layer and reinforcement All-round data security at the layer, governance layer, and circulation layer.

Second, in the field of transactional databases, KunDB can replace Oracle/MySQL in transactional OLTP business scenarios and high-concurrency online data service scenarios, and improve storage and computing capabilities, high availability, and cross- The ability to partition transactions can better support the balanced migration of key businesses.

In particular, the new KunDB 3.2 version integrates Transwarp's years of experience in database research and development to meet the requirements of extreme stability in financial services as the core, and is highly available, Oracle compatible, integrated, intelligent operation and maintenance, and multi-scenario application support. Aspects of capabilities have been greatly improved. The single-machine transaction performance has reached 188tpmC, and the horizontal expansion ratio has reached more than 90. It can be used for localization replacement upgrades and distributed architecture transformations in various industries, helping enterprises to lay a solid foundation for digital development.

Third, in the field of analytical data, the ArgoDB distributed analytical database can replace Oracle/DB2/TD and other foreign products in batch processing, OLAP, Ad hoc analysis and other scenarios, providing massive data analysis capabilities and improving mixed loads , real-time data analysis and other capabilities, in the actual application scenarios of customers, the comprehensive cost performance of software and hardware has been improved by 10 to 100 times.

Among them, the newly released ArgoDB 6.0 has industry-leading capabilities in real-time data processing, multi-model data processing, and data security. For example, in real-time data processing scenarios, ArgoDB 6.0 is 2-3 times that of open source products such as Greeplum and ClickHouse. At the same time, in the scenario of replacing TD, it can help enterprise users to build a new generation of integrated lake warehouse platform, realize data integration and unified management, reduce operation and maintenance costs, and accelerate business innovation.

Fourth, in terms of more localized big data software, the distributed search engine Scope created by Transwarp Technology replaces Elasticsearch, which can also help enterprises build an independent and controllable search platform; in terms of graph data, StellarDB, the constellation database database, can also replace Elasticsearch. Neo4j provides a high-level solution for the application of localized graph data; in terms of time series database, TimeLyre of Transwarp Technology can also replace InfluxDB to realize the localization replacement of time series database; in terms of data analysis, Sophon Base, an intelligent analysis tool of Transwarp Technology, can also It can replace SAS/SPSS in scenarios such as visual modeling and analysis, improve functions and performance, and reduce costs.

Objectively speaking, the current trend of "localization substitution" in China's basic software industry is accelerating. Through continuous technological innovation, Transwarp Technology has made domestic big data basic software bigger and more solid. I believe it will not only empower the digital transformation of Chinese companies, but also It can better promote and lead the transformation and innovation of China's and even the global big data basic software industry.

Summary of the full text, at this year's Future Data Technology Summit (FDTC), you can see that Transwarp Technology has released large-scale model building tools and industry large-scale model applications, strategically laying out a new track for large-scale industry models; in addition, various multi-modal databases Iteration and innovation also reflect Transwarp's attitude and persistence in breaking into the "no man's land" in the field of big data; and promoting the process of "domestic substitution" in the field of domestic big data with deeds and words, I believe it can also serve better Good for the high-quality development of China's digital economy. It can be said that the value of Transwarp Technology's forward-looking layout and continuous innovation in the field of big data is "not limited to the present, but also related to the future."

7885996544efca8af3a23c9318584427.gif

Shenyao's Science and Technology Observation was founded by a senior technology media person, Shenski, who has 20 years of experience in dissemination of enterprise-level technology content. He has long focused on the observation and thinking of industrial Internet, enterprise digitalization, ICT infrastructure, and automotive technology.

9e0a3110c5e9c76ab96a1bc6cde268d3.png

Guess you like

Origin blog.csdn.net/W5AeN4Hhx17EDo1/article/details/131015921