Big data basic issues

1. What is big data?
Big data refers to a collection of data whose content cannot be captured, managed, and processed with conventional software tools within a certain period of time.

2. What is big data technology? Which technologies are suitable for big data?
Big data technology refers to the ability to quickly obtain valuable information from various types of data. Technologies applicable to big data include massively parallel processing (MPP) databases, data mining grids, distributed file systems, distributed databases, cloud computing platforms, the Internet, and scalable storage systems.

3. The characteristics of big data?
(1) The data volume is huge.
(2) Diversified data types.
(3) Fast processing speed.
(4) The value density is low.

4. What is the role of big data?
(1) The processing and analysis of big data has become the node of the new generation of information technology fusion applications.
(2) Big data is a new engine for the continuous rapid growth of the information industry.
(3) The use of big data will become a key factor in improving core competitiveness.
(4) The methods and means of scientific research in the era of big data will undergo major changes.

5. What are the methods and theories for big data analysis?
(1) Visual analysis.
(2) Data mining algorithm.
(3) Predictive analysis.
(4) Semantic engine.
(5) Data quality and data management.

6. Big data technology?
(1) Data collection : ETL tools are responsible for extracting data from distributed and heterogeneous data sources, such as relational data, flat data files, etc., to the temporary intermediate layer for cleaning, conversion, and integration, and finally loading to the data warehouse or data mart In, become the basis of online analysis processing and data mining.
(2) Data access : relational database, NOSQL, SQL, etc.
(3) Infrastructure : cloud storage, distributed file storage, etc.
(4) Data processing : Natural Language Processing (NLP, Natural Language Processing) is a subject that studies the language problems of human-computer interaction. The key to processing natural language is to let the computer "understand" natural language, so natural language processing is also called natural language understanding (NLU, Natural Language Understanding), also called computational linguistics (Computational Linguistics. On the one hand, it is language information processing A branch, on the other hand, it is one of the core topics of artificial intelligence (AI, Artificial Intelligence).
(4) Statistical analysis : hypothesis testing, significance testing, difference analysis, correlation analysis, T test, analysis of variance, chi-square analysis , Partial correlation analysis, distance analysis, regression analysis, simple regression analysis, multiple regression analysis, stepwise regression, regression prediction and residual analysis, ridge regression, logistic regression analysis, curve estimation, factor analysis, cluster analysis, principal component analysis, Factor analysis, fast clustering method and clustering method, discriminant analysis, correspondence analysis, multiple correspondence analysis (optimal scale analysis), bootstrap technology, etc.
(5) Data mining: Classification, Estimation, Prediction, Affinity grouping or association rules, Clustering, description and visualization, Description and Visualization, complex data type mining ( Text, Web, graphics and images, video, audio, etc.)
(6) Model prediction : predictive model, machine learning, modeling and simulation.
(7) Results presentation : cloud computing, tag cloud, relationship diagram, etc.

7. The basic process of big data processing?
(1) Collection;
(2) Import/preprocessing;
(3) Statistics/analysis;
(4) Mining.

8. What are the problems facing storage in the era of big data?
Capacity issues, delay issues, security issues, cost issues, etc.

9. A case of big data application?
Medical industry, energy industry, communication industry, retail industry, etc.

Guess you like

Origin blog.csdn.net/qq_36294338/article/details/108726660