AIGC large models are deployed one after another, how can enterprises reduce the cost and increase efficiency of AI data

Editor | Song Hui

Produced | CSDN Cloud Computing

AIGC has continued to explode since the beginning of the year, and various large-scale models have emerged in China, and the model parameters easily exceed the order of 100 billion. The form and deployment of data in the model are also diverse, and the management and cost behind the huge amount of data cannot be underestimated.

Cloudera, a hybrid data manufacturer, has successively released a series of data products and solutions this year, hoping to reduce the data cost behind it for all users of AI and large models. Specifically, Cloudera recommends that users focus on improving the observability design of data and optimizing the deployment cost of hybrid clouds. In addition, for the data requirements of AI scenarios, users can focus on the hybrid data deployment method integrating lakes and warehouses.

Improve data observability and optimize hybrid cloud costs

After more than ten years of IT transformation and cloud upgrade, the hybrid cloud deployment model is gradually becoming popular and becoming an important choice for enterprises. For hybrid cloud scenarios, Cloudera has summarized three aspects that require data-related technical teams to focus on. First of all, in a hybrid cloud scenario consisting of complex systems such as containers, schedulers, and services, the controllability and stability of the data platform is the most important. In addition, the current management system still has limited visibility into cloud consumption and efficiency, which often leads to waste and overspending. Therefore, the technical team needs to pay attention to the resource management and control of the underlying data. Finally, for data system operation and maintenance, troubleshooting is troublesome, and support personnel need to travel back and forth frequently. The experience of operation and maintenance personnel and platform systems is also an important guarantee for operation and maintenance efficiency.

Therefore, Cloudera designed Cloudera Observability, a one-stop observability application solution that can cover the entire CDP platform product, for the above key technical issues. The solution is designed from a financial perspective, management monitoring, performance optimization, automated analysis, etc. The specific functions and designs are as follows:

1. Financial governance

  • Avoid exceeding budget through cost management
  • Capacity Forecast Before Planning

2. Active system monitoring

  • Historical analysis reports on infrastructure, services, workloads and users
  • Current System Monitoring and Insights

3. Workload optimization

  • Performance Tuning Recommendations
  • Reconciliation Rules Actively Invalidate and Refresh

4. Service health monitoring

  • Identify bottlenecks with continuous service monitoring
  • Correlation of events and logs to services

5. Self-service analysis

  • Automated Operations Covering All Capabilities
  • Complete impact analysis and visibility

6. Faster problem solving

  • Ready-to-use RCAs and prescriptions for faster support

According to reports, Cloudera Observability currently supports several major data engines of CDP such as Hive, Impala, and Spark, and provides services in the form of SaaS hosted by Cloudera. Next, Cloudera Observability will launch a version that can be deployed locally. According to estimates, after adopting Cloudera Observability, the cluster utilization rate of the data system can be increased by more than 30%, the SLA and SLO compliance rate can be increased by 43%, and the RCA and troubleshooting speed can be accelerated by 50 times. From the effect of data, we can see the observability technology There are very significant optimizations for infrastructure return on investment, revenue, and operating expenses.

The data dilemma of large enterprise models, Cloudera's lake warehouse integration gives a new answer

In addition to the observable technology for data, for data applications in AI and large model scenarios, the data system also faces new technical challenges that are different from previous data analysis, such as the source, accuracy, and security and more.

Taking the large language model as an example, for different requirements such as data structure and performance under Spark, Hive and other systems

In detail:

1. Lack of data background information

  • Not trained on the company's own data
  • Corporate customer background information is crucial

2. Relevance and accuracy of data

  • Incorrect responses can have serious consequences

3. Data credibility and security

  • A new philosophy of validation for intent rather than function

4. Data risk and compliance

  • Authorization, traceability, governance audit trail

Cloudera has accumulated a hybrid data platform for data weaving, integration of lakes and warehouses, data grids, and future data ecosystem architecture requirements. Based on hybrid cloud and multi-cloud deployment, after providing data compilation and arrangement, it provides unified data such as AI, BI, and machine learning. Analysis and application products.

At the Cloudera Customer Conference in April this year, Cloudera also highlighted the Cloudera Hybrid Data Platform CDP's support for data science, AI, and machine learning. For example, Cloudera Machine Learning (CML) can provide end-to-end workflow support for the machine learning lifecycle, as well as collaborative, integrated business intelligence and enhancements covering various users from data experts to data analysts.

Cloudera also summarized and shared the corresponding product and technical routes for the data requirements of enterprise training and using large models. At present, technical capabilities have been provided from three aspects: security and trustworthiness, hybrid data application, and scalability.

Specifically:

1. Credibility, security and governance . Cloudera SDX provides the security, governance and provenance needed to create trusted AI on enterprise data anywhere.

2. Hybrid data application, using the existing data of the enterprise to create enterprise AI applications . Cloudera empowers enterprise data on various public and private clouds, and enhances enterprise AI technology capabilities through contextual information related to the enterprise business.

3. Increase data scalability and provide a data foundation for ML/AI applications . Cloudera manages more than 25 million terabytes of data in cloud data management and analytics, on par with hyperscale cloud providers.

The importance of data will become more prominent in the AI ​​era. Developers will be important for data storage, management, analysis, and application. Cloudera's design ideas and suggestions for data systems are worthy of developers' attention. CSDN will continue to report on the progress of data technology.

Guess you like

Origin blog.csdn.net/FL63Zv9Zou86950w/article/details/131784785