Key technology evolution from MLOps to LMOps

This article is compiled from the keynote speech of the same name at the QCon Global Software Development Conference 2023 Beijing Station on September 3, 2023 - From MLOps to LMOps sub-forum.

The content structure of this sharing is as follows:

  • From MLOps to LMOps;

  • MLOps overview, challenges and solutions;

  • LMOps implementation challenges and key technologies (large model inference performance optimization, prompt construction and automatic optimization, context length expansion);

  • Future outlook.

1 From MLOps to LMOps

As we all know, the main technical means we currently use to achieve artificial intelligence is machine learning technology, especially deep learning technology based on deep neural networks. The essence of machine learning is the technology of modeling data through algorithms with learning capabilities.

Deep learning uses large-scale computing power to solve the bottleneck of manual intervention in feature representation in machine learning, and has achieved great breakthroughs in effectiveness. Therefore, machine learning has become the current mainstream technology of artificial intelligence.

The relationship between deep learning and large generative models is shown on the right side of the figure below. Around 2012 to 2016, classic deep learning models such as convolutional neural networks, adversarial generative networks, and ResNet were already used in computing vision, Significant improvements have been achieved in areas such as speech recognition and natural language processing. These classic deep learning models are both discriminative and generative. They are often pre-trained on labeled data sets such as ImageNet and COCO to form a pre-trained model with pre-trained weights that can be further fine-tuned.

In 2017, the Transformer structure was first successfully applied in the field of natural language processing. After that, generative large models with Transformer as the basic component gradually became the mainstream technology in the fields of vision, natural language processing, cross-modal understanding and generation.

This type of technology usually uses Transformer and attention mechanisms as components, and it can perform self-supervised learning in parallel with a parameter scale of more than one billion. Among them, the method of applying generative large model technology to language modeling is called "large language model". After further optimization, well-known conversational and generative large language model applications such as ChatGPT and Wenxinyiyan were formed.

picture

DevOps is a methodology and best technical practice that runs through the traditional software life cycle. It includes software requirements, code development, testing, online operation and maintenance, and promotion and operation. Key technologies include requirements management, version control, continuous integration, componentization, containerization, microservices, continuous delivery, automated operation and maintenance, etc. Currently, DevOps platforms and tools have become the most effective method for most software development companies to control software R&D operations.

MLOps is DevOps in the machine learning era. Its main function is to connect the model building team and the business, and establish standardized best practices for model development, training, deployment, online, and monitoring, thereby improving quality and simplifying the management process. Automatically deploy machine learning and deep learning models in large-scale production environments to better align models with business needs and rule requirements.

Below, we will give you a brief overview of MLOps and DevOps, as well as their commonalities and differences. Later, we will also explain in detail the concepts, challenges and solutions related to MLOps and DevOps.

What MLOps and DevOps have in common:

  • Simplified step process: MLOps and DevOps simplify the software development/model development process by establishing clear, sequential steps. MLOps focuses on reducing turnaround time in ML development.

  • Reduce communication costs: MLOps and DevOps reduce communication costs by establishing standardized process steps. MLOps amounts to a consensus among system administrators, data science teams, and other departments across the organization on how to develop and maintain production models.

The difference between MLOps and DevOps:

  • MLOps has more complex version control: for machine learning, code is not the only changing input. Data, parameters, metadata, logs, and finally models all need to be versioned.

  • Continuous Monitoring and Continuous Training: The difference between monitoring in DevOps and MLOps is that software does not degrade while machine learning models do. Data will continue to change and adapt as the business environment changes and adapts, causing model degradation.

picture

Compared with classic deep learning models, large models have also undergone tremendous changes in artificial intelligence technology and application levels, such as the following four levels:

  • Data: First of all, pre-training of large models usually requires TB to PB level data. This data scale and corresponding data processing technology are very different from classic deep learning models. At the same time, today's large models increasingly use multi-modal, instruction, and dialogue data as training or tuning inputs, and the data content is also different from before.

  • Training: Today’s large models with hundreds of billions of parameters often require kilowatt-hours or even 10,000-kilo calories for distributed training. The scheduling, fault tolerance, and communication technologies are very different from before. Many problems also arise in the tuning process of large models. Low resource overhead, high efficiency technology.

  • Evaluation: Classic deep learning models are often based on manually labeled test data sets to calculate objective indicators and evaluate model effects. Because there is no standard answer to the massive content generated by large models, we cannot rely entirely on humans to judge the quality of the content. Therefore, the effect and performance of large models require the establishment of evaluation benchmarks and evaluation methods that are different from those in the past.

  • Reasoning: Use prompt engineering to tune the output of large models. No matter in the fields of natural language processing or visual generation, previous classic deep learning models do not have these capabilities.

picture

Large models have brought huge changes in technology and application models, and have also posed new challenges to enterprises, such as how to use and manage AI capabilities and how to implement large-scale models. These have put forward new requirements for data engineering, model tuning, model delivery, service operations, support capabilities and other aspects.

LMOps can help enterprises solve the above-mentioned new challenges brought by large models.

LMOps is an integrated solution for large model development and operation, allowing enterprises to quickly build and manage large models. By integrating various capabilities and standardization requirements for the entire model operation process, it helps enterprises realize the complete process from data to services, allowing large models to be quickly and efficiently deployed in the production environment, and improving the efficiency and quality of large model application implementation.

The development of artificial intelligence has brought MLOps and LMOps platforms and tools to our eyes. Next, I will break down the challenges faced by MLOps and LMOps in detail and the corresponding solutions.

picture

2 MLOps Overview, Challenges and Solutions

Data and models lack unified management: the underlying infrastructure is not unified and scattered among different algorithm development groups. For example, in the early practice of the automobile industry, the production process of a car was only manually participated by one worker. The lack of collaboration resulted in a waste of a lot of car-making time and manual energy.

This also makes the research and development of machine learning models face a second problem. The overall development, deployment and iteration cycle of the model is relatively long.

The monitoring system of the model is not perfect: the model will not change its effect in a laboratory environment, but the actual business data is different. As the data distribution and data structure change, the actual application effect of the model may attenuate. This A continuous monitoring capability is required.

Role collaboration: Because the development process of the entire artificial intelligence application system will involve the opening up of collaboration-related links between the business team, the operation and maintenance team, and the algorithm engineer team, and often in this link, we will encounter a Insurmountable obstacles, this really corresponds to Tolstoy's famous saying, smooth companies are similar, and unsmooth companies each have their own unsmooth ways. Who has what permissions, how many resources they can access, and whether they will conflict or affect each other. A platform or mechanism for unified coordination and management is also needed.

picture

In the specific practice of MLOps, if you plan to build an MLOps practice, you need to run through the entire process of the machine learning life cycle, including data processing, model construction, model training, model deployment, prediction service implementation, and subsequent monitoring and optimization. Each link has a series of problems to be solved and managed.

Therefore, building MLOps practice requires corresponding tools to automate and connect various processes, and establish a collaboration mechanism based on the pipeline.

picture

To this end, Baidu Intelligent Cloud has launched the practice of building MLOps in the AI ​​middle platform, allowing all steps of the machine learning system to be automatically run and monitored in real time.

Baidu AI middle platform relies on Baidu Brain's more than ten years of accumulation of AI technology and capabilities. Currently, it has provided intelligent middle platform solutions for finance, energy, Internet, education, operators, manufacturing, government and other industries to help enterprises build a unified AI infrastructure to realize the co-construction and sharing of AI assets and agile intelligent application development.

The functional architecture panorama of Baidu AI middle platform is shown in the figure below:

  • Sample Center: Mainly connects to the data center, obtains data, and performs feature processing or data annotation on the data.

  • AI development platform: After the data completes feature processing and annotation, the data will enter the AI ​​development platform for further development work. The AI ​​development platform is mainly aimed at algorithm engineers, helping algorithm engineers develop other functions of the platform and quickly develop and train models.

  • Model Center: Models that have completed training will enter the model center for unified management.

  • AI service operating platform: The final model deployment package processed by the model center is sent to the AI ​​service operating platform for online deployment, and is finally integrated into customer applications by software engineers.

Under the monitoring system of model risk management, the entire process of the AI ​​middle platform can be inspected and traced, reducing the risk of enterprises applying AI capabilities. The data in the entire process can also form corresponding AI assets, which can be shared across organizations and departments within the enterprise, breaking down departmental barriers and avoiding duplication of construction.

The large model platform is an integral part of the AI ​​middle platform and is an infrastructure for generative AI. Baidu AI middle platform not only directly provides model capabilities to the outside world, but also supports enterprises to build independently and efficiently.

Currently, Baidu AI middle platform has covered the core aspects of the MLOps methodology and has obtained the flagship certification of the relevant standards of the Academy of Information and Communications Technology. Baidu AI Center is also the only product or solution in China that can obtain this level of certification.

picture

Below, we will briefly introduce the core technology of MLOps.

  • Automated data annotation: When the model is actually running, manual annotation takes up a lot of time and labor costs. MLOps performs data annotation through automated methods, removes noisy data, ensures the quality of model training data, and saves the time and cost of manual data annotation.

  • Experiment management + version control: Automatically collect experimental parameters and cooperate with version control systems such as Git to manage code, data, model files, etc. When the model needs to be tracked and compared, it can be traced back through the experimental parameters automatically collected in the early stage to continuously optimize the model effect.

  • AutoML + AutoDL: Use technologies such as AutoML to automatically search algorithms and adjust parameters, quickly find the best model and accelerate the experiment cycle.

  • Interpretability: Use interpretability technology to analyze model behavior, respond to the needs of large model supervision and security, and improve model transparency.

  • Drift monitoring: After the model goes online, the data changes and the model effect decreases, so the model needs to be continuously monitored and optimized. Drift monitoring can collect logs of model training and inference, set key indicators, monitor model performance, and implement automatic retraining.

  • Model adaptation: Continuously expand the hardware scope of model adaptation to facilitate automatic deployment in a wide range of environments.

  • Model compression: Use techniques such as pruning and quantification to compress model size, reduce memory usage, improve running speed, and reduce deployment costs.

  • API-Centric: The main operating behaviors of the platform can be coded, and together with the experimental version information, they can be automatically executed.

picture

3 LMOps implementation challenges and key technologies

As shown in the figure below, although LMOps has not been around for a long time, various companies in the entire upstream and downstream have jointly built a prosperous ecosystem. In the recent investment boom triggered by large models, one-third of the funds have been invested in LMOps-related tools or platforms.

Although there are many companies and some new faces have emerged, it can still be divided into six main links: data, training, evaluation, reasoning, deployment and security. Today's time is limited, so I picked a few technical points with the characteristics of large models to share with you.

picture

At present, three technical points with the characteristics of large models, including large model inference performance optimization, prompt engineering, and context length expansion, have been integrated into the Baidu Intelligent Cloud Qianfan large model platform.

Qianfan's large model platform is based on intelligent cloud computing infrastructure and mature capabilities of the AI ​​middle platform, redefining the AI ​​application construction paradigm in the era of large models. It is widely compatible with dozens of mainstream large models on the market, covers the LMOps life cycle, and can be automated.

Application developers do not need to master model details and can perform large model fine-tuning, evaluation, compression, deployment, prompt functions and other functions through simple page interactions. At the same time, it also supports a plug-in mechanism, and the application side can expand its own large model scenes through plug-ins.

picture

3.1 Large model inference performance optimization

QAT is the abbreviation of Quantization-aware-training. It is a method that introduces pseudo-quantization (Fake-quantization) in the model training process to simulate the errors caused by the quantization process. The purpose is to allow the model to adapt Quantized numerical representation, thereby reducing the accuracy loss of the quantized model.

The advantages of QAT are:

  • The quantization error can be considered during training to make the model more robust and avoid the large accuracy loss caused by post-processing quantization.

  • Higher precision gradients can be used to update weights to avoid quantization noise from interfering with the optimization process.

  • More flexible quantization strategies can be used, such as using different quantization bit widths for different layers, using different scaling factors for different channels, etc.

The disadvantages and limitations of QAT are:

  • It is necessary to modify the model training code and add pseudo-quantization operations and observers, which may affect the model structure and performance.

  • The model needs to be retrained, increasing training time and cost.

  • Appropriate quantization configurations and hyperparameters need to be selected, such as quantization bit width, observer type, calibration data set, etc., which may affect the quantization effect.

picture

Baidu Smart Cloud provides four post-training quantization solutions for large models, which are quantization for weights, activation layers and k/v cache, thereby achieving different compression effects for developers to use.

picture

Per-channel means that each channel of weight uses separate quantization parameters. This method is more detailed than the per-tensor quantization method, and the degree of freedom of the quantization parameters is also better, so the accuracy is lossless.

Per-group refers to grouping weight parameters, and then quantizing each group to int8/Int4. Different strategies can be used for grouping. For example, every 128 weight parameters are divided into one group. Then each group can have its own different maximum and minimum value range and accuracy, and the overall accuracy will be higher.

picture

For large model quantization, a certain smooth process can be added before quantization to smooth the weight distribution and solve the problem of uneven weight distribution of large models.

The introduction of hyperparameter s can well balance the difference in difficulty between activation function quantization and weight quantization in the quantization process, making the entire quantization process smoother and improving the generalization of the quantization scheme. Through these two improvements, the accuracy loss caused by quantization of large models can be effectively reduced, making the quantized model accuracy closer to the original full-precision model, thereby achieving efficient large model deployment.

This solution has good versatility and can save half the video memory or number of cards for large models with hundreds of billions of levels, without loss of accuracy, and the speed can also be increased by 1.5 times.

picture

In the aforementioned solution, weights and activations are saved in int8, but another runtime parameter k/v cache that consumes video memory in large models is still saved in FP16.

For this reason we have added k/v cache int8 quantization. While ensuring speed, the video memory can be further compressed by 15% to save runtime video memory and achieve true full-process int8 quantization.

picture

Model sparseness is a model compression technology. Its purpose is to reduce the number of parameters in the model, thereby reducing the storage space and computational complexity of the model. The principle of model sparseness is to use some strategy (such as weight pruning, feature selection, matrix decomposition, etc.) to set some parameters in the model to zero or delete them to make the model sparse. With model sparsity, only a few parameters are non-zero.

Static compression compresses the model after training, while dynamic compression compresses the model during the training process. Dynamic compression is used more often than static compression for the following reasons:

  • Dynamic compression can be continuously optimized. When unimportant parameters are identified during the training process, they can be compressed directly. Dynamic compression can continuously optimize compression as the training process progresses.

  • Flexible adjustment. Dynamic compression can dynamically adjust the compression rate according to resource conditions to adapt to different deployment needs.

  • Dynamic compression can better preserve important information. Dynamic compression can identify parameter importance during training and retain information that is more important to the model.

picture

Baidu Intelligent Cloud's large-scale model solution is mainly implemented based on the latest exploration in the industry. SparseGPT is one of the applied solutions.

The SparseGPT algorithm, pioneered by Elias Frantar and Dan Alistarh, two researchers at the Institute of Science and Technology Austria (ISTA), enables for the first time an accurate single-shot pruning method for model sizes of 10 to 100 billion parameters.

Visualization of the SparseGPT reconstruction algorithm. Given a fixed pruning mask M, progressively prune the weights in each column of the weight matrix W using the Hessian inverse sequence (HUj) and updating the remaining weights in these rows to the "right" of the columns. Specifically, weights "to the right" of the pruned weights (dark blue) will be updated to compensate for pruning errors, while unpruned weights will not generate updates (light blue).

The performance of large models processed by this method can be improved by up to 60%. In addition, SparseGPT is complementary to quantitative methods, and the two can be applied in combination.

picture

Another solution being used is WandA.

The traditional pruning idea is very straightforward, that is, if the absolute value of a weight in the network is less than a certain threshold, it will be considered that the weight plays little role and will be cleared directly to 0.

The new WandA solution proposes that weights and activations need to be considered at the same time. Therefore, during processing, the weights and activations must be multiplied first, and then the parameters smaller than the threshold must be reset to zero.

This solution is not superior to SparseGPT in terms of accuracy, but it is very efficient and has dozens of times improvement in time consumption.

Through these methods, the inference performance of some models on the Qianfan large model platform has been improved by more than 10 times since April this year.

picture

3.2 Prompt construction and automatic optimization

Large models have powerful language generation capabilities because of their huge number of parameters. But its output is also extremely dependent on the quality of the input. If you enter incorrectly, you are likely to get the wrong answer.

Therefore, how to provide appropriate input to large models has become a problem worth studying. The work of finding the best input methods is now called prompt engineering.

Prompt engineering involves researching different types of prompt formats to find the most effective way of expressing them for a specific task. At the same time, factors such as the input length and sentence structure need to be considered so that the prompt contains enough information without being too verbose.

A good prompt can clearly explain the task requirements and allow the model to focus on key information, thereby generating high-quality output.

picture

It is often impractical for ordinary users to build complex prompts by themselves, because designing a high-quality prompt requires expertise and a lot of time. If users are directly allowed to provide questions or requests in natural language, and the system helps users automatically convert them into appropriate prompts, it will be more user-friendly. Ideally, users only need to express their needs in simple statements and do not need to worry about the underlying prompt format.

To achieve this goal, one method is to establish a prompt template library, match existing efficient templates according to the user's query intent, and then insert the key information of the query to automatically generate prompts. Another approach is to train a model that can directly convert natural language into prompt statements that fully express its intent. In addition, after the user queries, the feedback mechanism can be used to iteratively optimize the prompt multiple times until a satisfactory reply is generated.

picture

There are some classic methods for common prompt projects:

  • Asking questions directly is the most difficult way to guarantee results. By asking questions directly, the effect of the answer depends on whether the large model has been adequately trained and whether good instruction fine-tuning has been carried out. The pressure is mainly concentrated on the large model side.

  • Small sample size tip. Users first give some examples to the large model, and then ask the large model to answer the same type of questions. This method generally works better.

  • The CoT prompting process is a recently developed prompting method that encourages large language models to explain their reasoning processes. The main idea of ​​the thinking chain is to show some examples containing logical reasoning processes to the large language model, and require the large model to also show the reasoning process when answering. Interpretations of this kind of reasoning tend to lead to more accurate results.

  • Generate knowledge prompts, allowing large models to take advantage of their rich potential knowledge and obtain accurate answers by self-improving background information.

picture

As mentioned just now, there are two problems with manual prompt projects. The first problem is that exploration takes a lot of time, and ordinary users will not bother to build suitable prompts. The second problem is that different templates are suitable for limited tasks and are not universal.

In terms of project implementation, there are currently two ways to further automate the Prompt project.

The first is a dedicated model, that is, after the application system receives the Prompt, it first sends it to a classification model to let it determine whether the Prompt can be optimized. If necessary, it is sent to a new model specially trained using a large number of instructions, which polishes and supplements the original prompt, and then sent to the LLM to get a better answer. This solution is simple and straightforward, but the overall reasoning overhead becomes larger.

Another option is to let the large model generate results first, then analyze the results by itself, and provide optimization suggestions at the same time. Continue to let the large model use the optimization suggestions to generate multiple related prompts. The large model continues to evaluate and make suggestions for these newly generated prompts. Thus generating the best prompt. This solution is more automated, but it also has two limitations. One is that it relies on the capabilities of the core large model itself, and the other is that the inference overhead will be greater. It can be used as an offline task to automatically supplement the Prompt template library.

picture

3.3 Context length extension

The input of many large models is only 2K to 3K tokens, which limits the application of large models. Therefore, every time a large model, such as GPT-4 or Claude, extends Context, the market gives enthusiastic feedback.

This common pain point has led the academic and engineering circles to propose a series of technical solutions, such as plug-in solutions, direct extrapolation, interpolation solutions, etc., to quickly expand the input and output length of large models. Since there are too many solutions, this article will select two simple solutions for you to describe in detail.

picture

In order to solve the problem of insufficient context length for large models, we can take a direct approach, which is to segment the original input or background data and store it in a vector database.

Then match the user prompt in the vector database, and use the matched fragments in the vector database as prompt background knowledge to allow the large model to generate answers.

For example, you can use this method to ask questions about the content of a book.

Vector databases can provide fast query and retrieval operations, making the processing process more efficient, which is why vector databases have suddenly become popular recently. For summary tasks, you can also slice them first, then summarize them in segments, merge the summaries, or change the order of the loop. Sequential summary, layer-by-layer summary.

However, this approach also has some limitations. For example, sharding may lead to loss and duplication of information, thus affecting the accuracy of the model.

picture

In some special long context scenarios, such as reading comprehension, we can use a completely plug-in NBCE based on Naive Bayes to solve the problem of the length exceeding the limit after inputting query + original text. The principle is: cut the original text into several small fragments, use query questions for each fragment, and calculate which generated result is most relevant to the query. The assumption is that the questions are only related to parts of the original text, and the answers to the questions between the segmented fragments do not depend on each other. However, this method has strong scene limitations and average effects.

So, NBCE was born. NBCE (Naive Bayes-based Context Extension) is Naive Bayes Context Extension. The main applicable scenario of NBCE is: assuming that the answer to be predicted can be divided into several fragments, each fragment only depends on one Context. It is based on Naive Bayesian thinking to extend the Context processing length of LLM. It has the advantages of plug-and-play, model-independent, no need for fine-tuning, linear efficiency, and simple implementation.

picture

So is there a more general solution for dealing with long contexts?

At present, the position interpolation method is used more frequently in the industry. We know that in the most original Transformer, in order to make the Embedding of the same token, that is, the input x, different in different positions, the method of absolute position encoding of the input Embedding vector is used, that is, in each of the Embedding vectors A delta trigonometric function based on the absolute position is added to the dimensional component.

However, this method directly extends the upper limit of the context length, which will lead to a sharp decline in the generation effect. Therefore, some scholars have proposed that the query vector multiplied by the weight matrix Q and the key vector multiplied by the matrix K are multiplied by the increment of the position-based trigonometric function, that is, RoPE encoding, which is equivalent to the same q and k vectors are adjusted at different angles at different locations.

On the basis of this encoding, the distance system of each dimension of the vector is further adjusted to form the context length expansion method of position interpolation. This method is more versatile and only requires a small amount of long text data for tuning. Of course, the technology of long context enhancement is still developing, and other methods that do not require tuning and are more general have emerged.

picture

4 Future prospects

In the past six months, we have experienced a war of hundreds of models. New large models appear every now and then in the open source community, and related technologies are becoming more and more standardized and homogeneous. Especially the LLaMA series, we learned a lot of "camel" English words, such as llama is llama, alpaca is alpaca, and vicuna is small alpaca. Why are there so many big language models named after camels?

Because the abbreviation of Large Language Model is LLM, Meta Company felt that it was difficult to pronounce two L's together, so they chose a similar word llama, which means llama. Later, many open source models tuned based on this large open source model gave themselves camel names.

At the same time, we can see that in this round of large-model startups in Silicon Valley, excluding OpenAI, nearly 1/3 of the funds have been invested in the tool and platform direction of MLOps and LMOps.

picture

More and more high-quality open source models will flood the market. There is the LLaMA series abroad, and there are also a series of independent open source models in China. This state will continue for some time. However, due to the homogeneity of the model itself, such as Dolly 12B, which has a low number of parameters, models with average effects will be completely silent. At the same time, closed-source models will focus on multi-modal or more intelligent directions.

The industry's large model will also be a short-term boom. In the future, a new generation of super-powerful models will cover the capabilities of large models in the industry, thereby inhibiting its development momentum. The landmark event is that GPT-4's capabilities in the financial field surpassed the specially trained BloombergGPT. One explanation is that large models have acquired industry-wide knowledge with the support of trillions of training corpus, but they lack appropriate stimulation methods. Of course, this is our basic judgment, but the internal knowledge base of the industry, especially the enterprise, still has its value and is worthy of in-depth accumulation.

Finally, the LMOps platform is still important. Because companies are concerned about costs, even if they no longer try to develop large models independently, using and operating the LMOps platform still has the cost advantages brought by intensive construction and large-scale operations.

picture

The above is all the content shared today.

Click to read the original text and learn more product information

——END——

Recommended reading

Optimization practice of Baidu APP iOS package size 50M (7) Compiler optimization

Baidu search content HTAP table storage system

In the era of big models, what does the Baidu developer platform that “everyone can do AI” look like?

Hundreds of thousands of QPS, Baidu's stability guarantee practice for hot event search

Baidu search trillion-scale feature calculation system practice

Tang Xiaoou, founder of SenseTime, passed away at the age of 55 In 2023, PHP stagnated Wi-Fi 7 will be fully available in early 2024 Debut, 5 times faster than Wi-Fi 6 Hongmeng system is about to become independent, and many universities have set up “Hongmeng classes” Zhihui Jun’s startup company refinances , the amount exceeds 600 million yuan, and the pre-money valuation is 3.5 billion yuan Quark Browser PC version starts internal testing AI code assistant is popular, and programming language rankings are all There's nothing you can do Mate 60 Pro's 5G modem and radio frequency technology are far ahead MariaDB splits SkySQL and is established as an independent company Xiaomi responds to Yu Chengdong’s “keel pivot” plagiarism statement from Huawei
{{o.name}}
{{m.name}}

Guess you like

Origin my.oschina.net/u/4939618/blog/10319893