Eight industry Hadoop big data application review and outlook

Any new technology development will go through a process to be learned from the ultimate universal application of the public. Big Data technologies as a new data processing techniques, after nearly a decade of development, has just begun to be applied in various industries. However, the media and the public view, big data technology is always with mystery, it seems to have predicted future mining wealth and the magic. Widespread Big Data Applications including Target supermarket shopping According to history to determine whether the girl's pregnancy, according to credit card companies to predict the next user buying behavior of customers shopping behavior in different time and space, and so on. Big Data technology for us to portray one such "smart city", "intelligent transportation" and "medical wisdom" and so beautiful dream. These descriptions so that everyone on big data technology is full of longing and good expectations.

I summed up two important phenomena or application trends from 2014's big data applications.

The first phenomenon is the big data technology in the priority application using SQL structured data processing, the amount of data to address the challenges brought about by increased processing capability; Big Data technologies and a lot of people that publicity is best suited to process unstructured data (not suitable for processing structured data) opposite. We found that companies face two challenges, one is the increasing amount of data accumulated, up from GB to TB (PB has a level of corporate customers there, but a few), on the other hand, with the increase and complexity of the application technology, computing power can not meet the requirements. Most businesses over the years based on business needs to develop their own applications in traditional relational databases such as DB2 or Oracle, the number of the amount of data and applications are rapidly increasing, the traditional database running these applications spend more and more time, even if only 1TB the data, due to the complexity of the business logic, run statistical operations on traditional relational database, also decreased from the previous daily (daily statistics) to only do a weekly now. This has greatly limited the timeliness of business productivity. Under the IT system itself is increasingly becoming enterprise business trend, the low efficiency of IT systems has seriously affected the competitiveness of enterprises. These data to be processed are business structured business data, existing applications are also SQL-based. This is an objective reason for distributed SQL on Hadoop technology development, but also star ring technology to enhance the performance of SQL to run a full SQL support and practical needs of degree.

The second phenomenon is a demand or need for real-time processing of time-series data growing strongly, especially with the popularity of electronic devices such as sensors and monitoring devices, more and more companies have real-time data. The traditional approach is electronic data generated by the instrument and then stored in a unified database analysis. With the increasing growth of data and equipment, delays are becoming highly traditional. The use of stream processing technologies used to produce the data when it is processed in real time can greatly improve the speed and efficiency of enterprises. 2014 Star Central Technology has deployed more stream processing clusters to process the data in real-time data generated from the user to the sensor.

I think these two applications trend in 2015 will become more intense. Below a brief summary of large data applications over the past year in a number of areas operators, financial, logistics, commerce, transportation, energy, broadcasting and electricity suppliers and the like.

Telecom operators

Mobile Internet era operators face many new challenges. Emergence of micro letters and other mobile communications operators APP erosion of voice and SMS revenue, the flow of business is even more important. On the other hand, the wireless network service is the core competitiveness of operators. In recent years, operators are investing heavily in building networks to develop 4G. 4G network coverage is not high or low quality due to 3G or 4G decline 2G will be greatly reduced customer satisfaction.

After exploring last year or two, operators summed up in two directions at building big data platform, one is the use of Big Data technologies to improve operational efficiency, while exploring new business models and the data the way they operate. In the past year, large data verified in terms of operational efficiencies, and new business models are still in the exploration. Our in-memory computing technology star ring successfully calculated more than 800 indicators of reduction in Guangdong Mobile business data analysis from the original Oracle 30 hours to four hours, successfully DB2 fully migrate in Shanghai mobile traffic management system from to the TDH star ring, operating efficiency has about 5 times increase over the original cluster. We make it possible to migrate applications to complete support for SQL, and before the partners have tried to migrate applications to a well-known Hadoop distribution but without success. We are participating in 4G telecommunications network optimization project province and a municipality to move in these projects, we partner with higher performance star ring instead of the traditional TDH MPP database network optimization model establishment and operation of high-speed models, On the one hand the problems found in the network, such as signal down questions to help operators to quickly identify problem areas. On the other hand combination through the complete SQL TDH provide statistical and machine learning algorithms to find the best model and parameter optimization, the network precisely fine-grained adjustments to improve the quality and coverage of the network signal.

financial

In 2013 and 2014, part of state-owned banks and joint-stock banks more or less explored applications of big data technology, but early application confined to simple storage and retrieval of transaction history inquiry and unstructured data, It did not have an impact on critical business banks. The big data technology is widespread in the bank's prospects, transaction data processed by its own structure of the integrated bank and external Internet / government data, it can enhance the refinement of customer management and reduce the risk of large data such as credit. These prospects in 2014 did not become a reality in 2015 is expected to be applied to explore the year. But we practice a number of practical applications in the bank in 2014. In these applications, TDH as a supplement to the data warehouse, data analysis for efficiency improvement. Also benefit from our full support for SQL, and a joint-stock banks began to put some sophisticated credit risk control logic to migrate to the TDH Hadoop platform operation. These risk control model client had been tried, performance or functionality across multiple databases and Hadoop distribution MPP failed to meet their demands. From a technical point of view, the amount of data involved in these analyzes only a few TB, but the analysis of business is extremely complex, involving hundreds of sheets of fact and dimension tables, table width and some even more than the tens of thousands of bytes. This case illustrates the traditional relational database or MPP database is becoming increasingly stretched for complex calculations under the big data scene, banks need a more efficient data processing tools.

express delivery

The amount of data carried by the express delivery industry and pressure generated by the IT system in the past has not been everyone's attention. In recent years, the scale of the courier industry with the rapid development of e-commerce a rapid expansion. Huge market demand for courier companies to bring unprecedented challenges, "double-October" will express the company's processing capacity is much higher than the usual pressure applied annually. Therefore, how to ease the "double 11" warehouse explosion, avoid the express change "slow piece" is every courier company's problems.

How to improve and optimize become a problem worthy of study by analyzing large data flow of courier, express delivery industry is also an important means to improve competitiveness. Delivery of large amounts of data every aspect of production will be generated, and then monitor these data carrying capacity for acceptance and processing centers across the country, a class delivery plan to do real-time optimization and adjustment, the company will be able to reduce costs. Analysis of these data to predict trends for business development, the company will be able to be ready to deal with skyrocketing demand. However, data delivery of manufacturing process large amount of data, high concurrency, the type of complex features, the upper application of high real-time requirements, the conventional database stretched in such a case.

Teamsun and we work together to deploy a big data platform for China Post EMS express delivery sector, its investment in the country's football department, data processing centers and distribution centers (including received, retained pieces, has been lower, not lower, has delivery, non-delivery, Lanshou members, addresses, has been sealing, shipped, not shipped, etc.) for processing. Big Data platform ESB (Enterprise Production bus) to stream real-time data stream processing cluster dynamically loaded into the database and real-time, real-time statistics and indicators for monitoring, and real-time data query. This deployment gives customers an easy to use tool for real-time monitoring every aspect of the business so that they can quickly and accurately identify problems in the massive courier business, such as express mail backlog, loss, damage, etc., to improve service quality. The big data platform supports smooth 2014 "double 11" Data processing pressure. The platform also helps future according to the latest production data courier companies adjust and optimize the delivery plan for the company to reduce costs.

Industry and commerce

The business sector in the construction of the country's "economic household library" has accumulated a large number of market players information, annual inspection, law enforcement data and 12,315 complaints and other data. Statistical analysis of these data can help the business sector to understand the market and economic situation.

One simple application of big data technology is used in the data quality management and statistical analysis. Because it is manual entry of data, there is a certain probability of error is inevitable, though not a probability. While basic business and personal information is dispersed in dozens of relational tables, there is a certain degree of cross-correlation information. Through large-scale cross-reference data and statistical data can be found hidden errors and promptly corrected. This application uses in-memory computing technology star ring, the whole amount of the check data and statistics can be done in ten minutes, greatly improved work efficiency.

In addition, Big Data technology is also used to query information in the system of market players, can handle millions of users concurrent queries and returns query or search results within a few hundred milliseconds. Querying for historical snapshots of the enterprise allows users to track enterprise change information, to master the changes of the corporate life cycle. On the basis of solving the problem of storing and querying, we also help our customers take advantage of a calculation engine to quickly find an association between between business and business-related personnel. By scanning the whole library data to confirm these companies based on association relations equity, tenure, and establishing business relationship repository.

electric power

With the completion of a comprehensive power enterprise information construction fast and intelligent power systems, power data growth rate will far exceed expectations of power enterprises. Example from the power generation side, the degree of control to improve power production automation, monitoring accuracy of the pressure, flow and temperature indicators, such as a higher frequency and accuracy, mass data acquisition and processing a higher requirement. On the side of electricity, the first collection will bring to enhance the frequency of "exponential" variation of the amount of data thereof. Amount of power data growth has far exceeded the processing power of a relational database power sector would normally.

We mainly help the power sector data processing side of the electricity in 2014. Unexpectedly, we found that the statistical analysis of power data involves very complex SQL operations, from a technical point of view, extensive use of Oracle's PL / SQL extension syntax, including stored procedures / flow control / exception handling / additions and deletions to change search / transaction processing . From the application point of view, these SQL logic is mainly used for statistical analysis using historical trends and consumption of electricity, as well as the calculation of line loss. We assist our clients analyzed by machine learning methods and found to use more electricity with macroeconomic trends and climate certain correlation, it is also associated with every industry and every business situation of enterprises. Through statistical comparison of enterprise power consumption and consumption level of the industries in which it can be found in enterprise energy savings, the analysis of historical data of electricity, can be found in the effect of changes in energy-saving measures or production activities. Southern Power Supply Bureau adopted a platform TDH statistics to identify energy saving enterprises and large electricity, and energy saving enterprises subsidies aimed at sensitizing public awareness of energy conservation to guide society, to promote the industry by the extensive development of high energy consumption way can, shift to low-power high-efficiency green and harmonious development.

We also deployed as a power sector a pilot troubleshooting systems, we and partners to establish a unified power distribution network topology model, using the map database to store the entire supply network topology data from the user to the substation, using the stream processing system real-time alerts, and real-time query grid topology, quick judgments location blackout occurred and scope of influence. On this basis, power failures can notify the emergency repair team, in time to restore power. At the same time the user can take the initiative to inform and strengthen interaction with users, comprehensive and intuitive grasp of the whole power distribution network.

traffic

With the rapid economic development, motor vehicles increasing traffic congestion nationwide phenomenon more and more serious, how to improve traffic management and road safety by means of information has become an important issue.

The most commonly used method is the way bayonet deploy digital surveillance devices that 7 × 24 hours a day, capturing images and video data, and identify, through a province or municipality vehicle data generated each day tens of millions of records. This data is mainly used to provide real-time traffic information to the traffic management department, this information may be released to the public as a future reference information travel. At the same time assist in the management department of traffic management, including the monitoring of key commercial vehicles, vehicle identification and illegal surveillance, range of speed, deck analysis, real-time analytic applications. Our partners and deployed as a province Public Security traffic management department of a province-wide traffic monitoring system, vehicle information distributed queue using real-time collection of traffic each province bayonet, using streaming compute clusters for a car recording, real-time statistics and monitoring, and a variety of real-time analysis to achieve the above applications, end to end information system processing delay within two seconds, to better improve the efficiency of traffic management.

Of course, the transportation industry, large data applications is still in its infancy, has just begun or is about to complete a large centralized collection of data. The use of Big Data technologies powerful analysis and mining capabilities, the future can significantly improve the transparency of real-time traffic information, enhance the level of traffic and congestion management, reduce the incidence of accidents and to provide reference for urban planning.

Radio and TV

In China, the broadcasting system is experiencing the impact of the digital wave, the traditional radio and television operators to play a big challenge to network-based film and television. In this context, China Media keenly aware that, in order to obtain a future network of media survival and competitive advantage, now it is necessary to tilt the user to create "precision type" radio and television content and communication operators. China Media requires data infrastructure need to be able to meet the massive, multi-source, the diversity of data storage, management requirements, supported hardware platforms linear scalability, and provides fast real-time data analysis, rapid effect on the business. China Media has chosen us to deploy big data platform, developed a digital television system on its analysis. The system can provide real-time list based on the full amount of data. Time (hours / days / weeks), users and other dimensions, for on-demand programs, live broadcasts, program category, search keywords and other ranking analysis, up the chain analysis, trend analysis and so on. The system can also, depending on the viewing number to see the number, the number of new, end user analysis, etc. to complete reading the time, channel, type of film, drama and other dimensions. In addition, through the collection of user behavior data analysis, China Media accurate portrait possible for customers using smart recommendation engine, the system may precede the audience know that they need to predict the television will be sought after, tailored for each user recommendation programs to improve the reach of products to enhance customer loyalty. In addition, the system can also be tagged by the audience of actors, plot, tone, and other types of metadata, to understand audience preferences for analysis observations, to prepare for the subsequent film and television production content development. Thanks to big data analysis platform for digital TV system based China Media content from being transmitted to the content made "magnificent turn."

E-commerce

In the area of ​​electronic commerce, big data can be said to have become a key technical support services play an important role in many aspects of marketing, customer care and so on. We Jinjiang electricity supplier cooperation, the use of big data platform to create a product recommendation system for the electricity supplier. Our big data platform based on customer label system. Relying on the electricity supplier large number of members and visitors, the depth of learning and mining customer behavior data, based on RFM model and customer information form the customer consumer preferences, customer age, family status, and even constellations, zodiac signs, frequency of consumption, money, travel mode, etc. and other information included in the label customers. Then the customer tag cluster analysis, the formation of customer segmentation. So, we will be able to obtain accurate customer base, implementation of precision marketing. At the same time, we also help customers build a product labeling system. Based hotel and tourism and other types of product features, construction and mining product labels, and after some digging machine learning process, customer label and product labels docking, according to the analysis of various types of label weights, building intelligent recommendation system.

The recommendation system can intelligently recommend products, is becoming an important member of the care system for the electricity supplier and precise service system in the basic link.

Summary and Outlook

Summed up Hadoop Big Data industry applications in 2014, some applications may be a simple application before we did not anticipate, and some are complex data analysis and mining type applications. Big Data technology itself is a new data processing and analysis technology, has a capacity of processing power and depth of data mining over the prior art, however, the value of the technology itself brought about by the need to show the upper application, how to apply these capabilities to problem solving reality is that all industries are exploring the subject. 2015 is expected to be a large number of innovative applications based on big data technologies emerge.

At the same time in the past year, big data technology has been shown to significantly improve operational efficiencies, we expect in the future a year, using the SQL on Hadoop technology to solve problems facing large enterprise data difficult to count will become a general trend of application, with the continuous improvement and constantly improve the performance of SQL support levels, large enterprise application data structured data processing technology to improve operational efficiency and productivity of liberation, will get immediate results.

2014 is a big data technology began landing year, we see a huge demand for big data technologies and products. We are very optimistic about 2015 and beyond big data development. The rapid development of big data trend will continue for a long time, there are too many data values ​​have not been excavated, there will be more and more enterprises, government agencies and public organizations need big data solutions. Outstanding universal Big Data products help people solve problems of data processing, let us work together!

Recommended Reading

40 + annual salary of big data development [W] tutorial, all here!

Big Data technologies inventory

Training programmers to share large data arrays explain in Shell

Big Data Tutorial: SparkShell in writing Spark and IDEA program

Zero-based Big Data Quick Start Tutorial

Java Basic Course

web front-end development based tutorial

Guess you like

Origin blog.csdn.net/chengxvsyu/article/details/92431022