Things Big Data platform should have the functionality and features

Things are a very broad concept, refers to a variety of devices, machines are connected through the Internet, car networking, the Internet industry belong to the category of things. According to the Gartner report, networked devices in 2019, has more than 14.2 billion expected by 2021 will reach 250 billion, which is a huge number. There is no doubt that we need a big data platform of things to handle massive amounts of data generated by these networked devices.

Things which features a big data platform need to have? Compared with the general big data platform, it needs what kind of features it has? We take a closer look.

>>>>

1. Efficient Distributed

It must be efficient distributed system. The amount of data generated tremendous things, only for China, there are more than 500 million smart meters, each meter data collected once every 15 minutes a day nationwide smart meter will produce more than 500 million records. Such a large amount of data, any one server are incapable of processing, the processing system must be distributed, horizontally scaled. To reduce costs, processing performance of a node must be efficient, you need to write data to support fast and quick inquiry.

>>>>

2. Real-time processing

The system must be processed in real time. Large Internet data processing, the familiar scene is user-portrait, recommendation system, public opinion analysis and so on, these scenes what real-time, batch processing can not be required. But for the scene of things needs to be done based on data collected in real-time early warning, decision-making, the delay should be controlled within seconds. If there is no real-time computing, networking and business value will be greatly reduced.

>>>>

3. High reliability

Require highly reliable carrier-class service. Things docking system is often the production, management system, if the data processing system downtime, a direct result of discontinued, resulting in economic losses have led to the end consumer service can not provide normal. Such as smart meters, if system problems, is a direct result of thousands of households without access to electricity properly. So things big data systems must be highly reliable, must support real-time data backup, remote disaster recovery must be supported, we must support the software and hardware online upgrade, you must support online migration IDC room, otherwise the service will likely be interrupted.

>>>>

4. Efficient Cache

Need for efficient caching feature. Most of the scenes, need to quickly obtain the current status of the device or other information to the police, or other large screen display. The system needs to provide a highly efficient mechanism that allows users to get the latest status of all, or matching the filter portion of the device.

>>>>

The Real Time Streaming calculated

Require real-time flow calculations. Various real-time warning or prediction is not simply based on a certain threshold value, but rather by the data stream generated by the one or more devices in real time of polymerization calculated based not only on a point of time, but on a time window is calculated . Not only that, the calculated demand is quite complicated, because different scenes, should allow user-defined functions are calculated.

>>>>

6. Data Subscriptions

The need to support data subscriptions. More consistent with common big data platform, the same set of data tend to have many applications need, so the system should provide subscriptions, as long as there is updated with new data, real-time alerts should be applied. And the subscription should be personalized, allowing the application to set the filter conditions, such as subscription only five-minute average value of some physical quantities.

>>>>

7. and historical data processing into one

Handle real-time and historical data to be combined. Real-time data in the cache, history data in persistent storage medium, the basis for a long time and may be retained at different storage media in. The system should hide behind the store, to users and applications are presented with an interface and interface. Whether it is access to new data collected a decade ago or old data, in addition to the input parameters at different times, and the rest should be the same.

>>>>

8. The data writing steady

Need to ensure sustained and stable data can be written. For the system of things, tend to smooth the flow of data, the resource data necessary for writing can often be estimated. But change is a query, analysis, especially ad hoc queries, it is possible to spend a lot of system resources, uncontrollable. So the system must ensure that sufficient resources are allocated to ensure that data can be written to the system without being lost. Precisely, the system must be a priority system to write.

>>>>

9. A multi-dimensional analysis data

Flexible multi-dimensional analysis of the data required to support. For data networking equipment generates, the need for statistical analysis of various dimensions, such as the analysis from the area in which the device for analysis from the model, suppliers of equipment, personnel from the analysis equipment used, and so on. Further analysis of these dimensions are not preconceived, but in the actual operation of the process, according to the needs of business development laid down. So things big data systems require a flexible mechanism to increase a dimension of analysis.

>>>>

10. Support data calculation

Need to support data down, interpolation, calculation of special functions, and other operations. Collect raw data may be frequency pricey, but a detailed analysis, often do not need to be original receipts, but after falling frequency data. The system needs to provide efficient data down operation. The devices are difficult to synchronize, data acquisition time points different devices are difficult to align, so the analysis value of a particular point in time, often need to solve interpolation, linear interpolation system needs to provide, and other fixed value set before interpolation strategy Row. In the Internet industry, in addition to general statistical operations, often also need to support some special functions, such as time-weighted average.

>>>>

11. The  ad hoc query and analysis

The need to support ad hoc queries and analysis. In order to improve the efficiency of large data analysts, system should provide a command-line tool or other tools allow users to perform SQL queries and not necessarily through the programming interface. The results of the analysis can be easily exported, and then made into various icons.

>>>>

12. Flexible data management strategy

The need to provide flexible data management strategy. A large system, the kind of data acquisition range, and in addition to the raw data acquisition, there are a lot of data derived. These data each of which has different characteristics, and some high-frequency acquisition, and some require long retention time, some need multiple copies in order to ensure higher security, some need quick access to. So things Big Data platform must provide a variety of strategies, so that users can select and configure the characteristics of, and the coexistence of various strategies.

>>>>

13. The open system

It must be open. The system needs to support popular industry standard SQL, offers a variety of language development interfaces, including C / C ++, Java, Go , Python, RESTful , etc., also need the support of Spark, R, Matlab etc., to facilitate integration of various machine learning, artificial intelligent algorithms or other applications, so that large data processing platform continues to expand, rather than become an island.

>>>>

14. Support for heterogeneous environments

The system must support heterogeneous environments. Build a big data platform is a long-term work, each batch procurement of servers and storage devices are not the same, the system must support a variety of grades, a variety of different configurations of servers and storage devices coexist.

>>>>

Support edge 15. Cloud cooperative

Need support while cloud collaboration. To have a flexible mechanism to upload data to the cloud edge compute nodes, according to specific needs, data can be raw data, processing or computing, or data only meet the filter criteria synced to the cloud, and can be canceled at any time to change the policy .

>>>>

16. Single Admin

We needed a single back office systems. Easy to view system status, cluster management, user management, manage various system resources, and the system can be seamlessly integrated with third-party IT operation and maintenance monitoring platform, easy to manage.

>>>>

17. The deployment of privatization

To facilitate the privatization of deployment. Because many companies for security and consider a variety of factors, looking to adopt privatization deployment. Traditional enterprises are often not strong IT operation and maintenance team, so the installation, the deployment needs to be done is simple, fast, strong maintainability.

Above summarizes the main features and characteristics of things Big Data platform. Although things big data platform itself is also being evolved, but the overall goal will not change, that is efficient, scalable, real-time, reliable, flexible, open, simple, and easy to maintain.

Click "read the original" Learn TDengine

Released six original articles · won praise 1 · views 1803

Guess you like

Origin blog.csdn.net/taos_data/article/details/97863221