A picture to take you to understand the real insight data on IBM cloud

In the traditional transaction database system, along with the transaction behavior of customers, corresponding transaction data is generated in the business system and stored in the relational database system, thus forming business transaction records. All kinds of business application systems are Working around relational databases.

cognos

Today, the reality that everyone has seen is:

As more and more mobile-oriented applications are used by a large number of enterprises/institutions, many transaction data are generated in the form of JSON documents and stored in NoSQL database systems.

Many enterprises/institutions have established data centers and used data warehouses as the main technology for data analysis. The data is extracted from the transaction system database, transformed, and loaded into the data warehouse , so that the data can be analyzed. This is the well-known ETL processing flow. However, this analysis is only suitable for analyzing and answering "specified" business questions, in the form of allowing users to query information and use pre-designed and established models to answer business questions within a specified scope, and Generate a report. The biggest limitation and inconvenience is to "designate", and if new business questions are to be answered, the needs of those who use this data and information will not be met.

 

In the past five years, with the popularization and application of Hadoop platform system and DataLake technology, many open source vendors for Hadoop have emerged. They store a large amount of data, various types of data in Hadoop, perform ETL processing, and save the processing results in Hadoop. Use open source technology software and cheap hardware to "fully" overcome the limitations of traditional data warehouse technology, and conditionally make the stored data history longer with good scalability. There is only one goal, that is, I hope to answer more "new problem".

 

Analytical systems that can answer so many "new questions" must be enterprise-level and cross-departmental, with high-level security and information governance capabilities. It is precisely the Hadoop platform system that is weak in these two aspects; secondly, there is a lack of a good interactive way of asking and answering questions; in addition, Hadoop is difficult to use with most of the analysis tool software on the market, requiring more complex development skills , all of which greatly restrict the space that users can perform analytical tasks on the Hadoop platform system.

 

However, if the current IT technology is still in the traditional relational data and SQL technology, it is difficult to solve the problems encountered in data analysis. For example, the vast amount of source data generated by Internet of Things (IoT) devices requires new methods to analyze this data, and more and more data is generated on the cloud, and unstructured data hides huge business value.

 

In order to face these challenges and get out of the predicament, IBM provides you with a flexible and powerful analytical strategy, and provides solutions. It is the DataWorks cloud technology and service based on the IBM Bluemix platform. It goes beyond batch processing and uses stream data processing technology to extract the required data from many data areas, add information content with "connotation", and process it. into "internal data" to help users find the answers they need. We use the following example to illustrate, which is to process and analyze weather data through IBM's Bluemix platform.

cognos

1. Data collection: IBM DataWorks can collect various data and supports a wide range of methods: ETL batch processing or streaming data (streaming); a real-time streaming data analysis engine, and an IoT-based data model (Weather Company provides a standard) acquisition engine with high speed and the ability to collect large data volumes.

 

2. Data storage: Once the data is collected, IBM DataWorks provides a variety of storage methods. For databases, it includes various NoSQL data storage formats (document, key-value, graph, columnar) to relational (SQL-based) storage formats. Object storage is also supported, such as Swift on Bluemix, Amazon's S3 service cloud storage.

 

3. Execution analysis: IBM DataWorks provides analysis tools for different levels of users to analyze each type of data, such as: reports and dashboards for business analysts; application development platforms for programmers; data pipelines, models and information Statistical tools for data scientists to use, etc.

 

4. Promotion and application: Once users develop and use analysis tool software, they can satisfactorily find the answers to the questions they care about.

IBM DataWorks provides an easier way to deploy and roll out applications. The IBM Bluemix platform supports developers in the entire application development phase, including lifecycle management, integration with Web application servers and Github functions; Cognos and Watson Analytics support enterprise-level reporting system deployment. In addition, DataWorks also provides an information governance model and deployment architecture. IBM DataWorks adheres to the tenet of putting the cloud first and the user's local center second, providing services and support for users. This powerful hybrid cloud model provides users with a broad space to perform analysis tasks on the cloud. IBM DataWorks not only allows users to complete analytical tasks in a self-service manner, but also provides data governance capabilities, which are reflected in:

 

1. Provide user permission control with different security levels to protect sensitive data in compliance with regulations;

 

2. Establish data lineage information, so that you can better understand the processing flow of data across thousands of mountains and rivers, from initial processing to final analysis;

 

3. Definition of business terms/indicators, establishing the mapping relationship between business terms/indicators and technical description definitions, filling the lack of metadata governance functions at the Hadoop data lakes level.

 

IBM DataWorks uses Apache SparkS as the underlying processing engine technology, which provides fast, flexible and scalable data processing capabilities. IBM's support and outstanding contributions to open source technology have made the entire industry see that IBM is creating a "new IBM" era .

 

For more big data and analysis related industry information, solutions, cases, tutorials, etc., please click to view >>>

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326429851&siteId=291194637