What is big data? how to learn

What is big data? The more official definition refers to a collection of data that cannot be captured, managed, and processed by conventional software tools within a certain period of time. It requires a new processing model to have stronger decision-making power, insights and process optimization. Massive capabilities, high growth rates, and diverse information assets.

Simply put, big data is structured traditional data plus unstructured new data. So what are traditional data and new data? Traditional data is the data in the IT business system, such as customer information, financial data, etc. These data are structured, and the amount is not particularly large, generally only terabytes. Compared with traditional data, there is also a kind of "new data", which comes from social networks, the Internet and other channels, including text, pictures, audio, video and other unstructured data. At present, more than 75% of the world is unstructured data, and it has been showing explosive growth.

What does big data development do?

There are two types of big data development, writing Hadoop and Spark applications and developing the big data processing system itself. The big data development engineer is mainly responsible for the development and maintenance of the company's big data platform, architecture design and product development of related tool platforms, network log big data analysis, real-time computing and streaming computing, data visualization and other technology research and development, and network security business theme construction. model work.

Skills required for big data development:

The languages ​​currently engaged in the development of big data applications include Java, Python, Scala, R, etc. It is necessary to be familiar with the principles and usage methods of Hadoop, HBbase, hive, spark, Flink, ES, Presto, Flume, and Kafka ecology, and master data development and data mining of various processes.

Big data learning route and resources:

Getting Started: Getting Started with Linux → MySQL Database
Core Foundation: Hadoop
Data Warehouse Technology: Hive Data Warehouse Project
PB Memory Computing: Getting Started with Python → Advanced Python → pyspark Framework → Hive+Spark Project

Getting Started with Big Data Development in Phase 1

Pre-study guide: Start with traditional relational databases, master data migration tools, BI data visualization tools, and SQL, and lay a solid foundation for subsequent learning.

1. Big data data development foundation MySQL8.0 from entry to proficiency

MySQL is the entire IT basic course, and SQL runs through the entire IT life. As the saying goes, if SQL is well written, you can find a job easily. This course fully explains MySQL8.0 from zero to advanced level. After studying this course, you can have the SQL level required for basic development.

2022 latest MySQL knowledge intensive lecture + mysql practical case _ a complete set of tutorials from zero-based mysql database entry to advanced

The core foundation of big data in the second stage

Pre-study guide: learn Linux, Hadoop, Hive, and master the basic technology of big data.

2022 Big Data Hadoop Introductory Tutorial
Hadoop offline is the core and cornerstone of the big data ecosystem, an introduction to the entire big data development, and a course that lays a solid foundation for the later Spark and Flink. After mastering the three parts of the course: Linux, Hadoop, and Hive, you can independently realize the development of visual reports for offline data analysis based on the data warehouse.

2022 latest big data Hadoop introductory video tutorial, the most suitable big data Hadoop tutorial for zero-based self-study

The third stage of hundreds of billions of data warehouse technology

Pre-study guide: The course at this stage is driven by real projects, learning offline data warehouse technology.

Data offline data warehouse, enterprise-level online education project practice (complete process of Hive data warehouse project)
This course will establish a group data warehouse, unify the group data center, and centralize the storage and processing of scattered business data; the purpose is from demand research, design, Version control, R&D, testing, and launch, covering the complete process of the project; digging and analyzing massive user behavior data, customizing multi-dimensional data sets, and forming a data mart for use in various scene themes.

Big Data Project Practical Tutorial_Big Data Enterprise Offline Data Warehouse, Online Education Project Practical (Complete Process of Hive Data Warehouse Project)

The fourth stage PB memory computing

Pre-study guide: Spark has officially adopted Python as the first language on its homepage. In the update of version 3.2, it highlights the built-in bundled Pandas; Spark content.

1. From entry to mastery of python (19 days)

Python basic learning courses, from building the environment. Judgment statements, and then to the basic data types, and then learn and master the functions, familiarize yourself with file operations, initially build an object-oriented programming idea, and finally lead students into the palace of python programming with a case.

A full set of Python tutorials_Python basics video tutorials, essential tutorials for self-study Python for zero-basic beginners

2. Python programming advanced from zero to website building

After completing this course, you will master advanced Python syntax, multi-tasking programming, and network programming.

Python Advanced Grammar Advanced Tutorial_Python multitasking and network programming, a complete set of tutorials for building a website from scratch

3.spark3.2 from basic to proficient

Spark is the star product of the big data system. It is a high-performance distributed memory iterative computing framework that can handle massive amounts of data. This course is developed based on Python language learning Spark3.2. The explanation of the course focuses on integrating theory with practice, which is efficient, fast, and easy to understand, so that beginners can quickly master it. Let experienced engineers also gain something.

Spark full set of video tutorials, big data spark3.2 from basic to proficient, the first set of spark tutorials based on Python language in the whole network

4. Big data Hive+Spark offline data warehouse industrial project actual combat

Through the big data technology architecture, it solves the data storage and analysis, visualization, and personalized recommendation problems in the industrial Internet of Things manufacturing industry. The one-stop manufacturing project is mainly based on the Hive data warehouse layer to store the data of various business indicators, and based on sparkSQL for data analysis. The core business involves operators, call centers, work orders, gas stations, and warehousing materials.

For the first time, the entire network disclosed the actual combat of big data Spark offline data warehouse industrial projects, and Hive+Spark built an enterprise-level big data platform

Guess you like

Origin blog.csdn.net/weixin_51689029/article/details/129594918