Explain what Python, data analysis, and programming are in the most understandable language?

In this article, I try to use expressions that people with zero background can understand. For example, if you have a piece of land, there are many kinds of agricultural work on it, such as loosening the soil, fertilizing or sowing seeds. For example, if we need to sow seeds now, we need to choose whether the sowing tool is large machinery, small machinery or manual labor. Secondly, we must formulate a strategy for sowing, whether to start with a long snake formation or an eight-sided exquisite formation.
Then this corresponds to data analysis (one of the agricultural tasks), Python (a tool for sowing), and programming (a strategy for sowing). If we are more detailed, then these three parts can actually be covered by the same content. Replace it.

Data analysis (farm work)

There are many types of farm work, and data analysis is just one direction. For example, artificial intelligence, robotics, software development, etc. It’s just that this time we chose data analysis, so in order to do data analysis, we must prepare the necessary tools and strategies. For Python, it is a very suitable tool for data analysis. .

Python (Tools)

Here, Python is like a "tool" you use to sow seeds. You can choose to use large machinery, small machinery, or manual seeding. Python is a programming language here, and there are many languages ​​​​that can do the same thing, such as C++, Java, R, etc. Python is chosen here because it is very convenient for data analysis (seeding). It does not mean that only Python can do this. R language is also an option for data analysis. It and Python actually have their own advantages in data analysis, but for most people, Python is easier to get started.

Programming (Strategy)

In the end, programming is like a "seeding strategy" that you develop. Based on the results of data analysis, you may decide to use the one-word long snake formation or the eight-sided exquisite formation sowing mode. Programming is the specific steps to implement this decision. You'll write a piece of code that specifies how the machinery moves, when to plant seeds, and how to optimize the process for best results.
In fact, this section is the thinking of data analysis. For example, if you are given a piece of land, you have to analyze its soil quality and what kind of crops it is suitable for. It’s the same as data processing. For example, when you get data, you have to first check the quality of the data, whether there are null values, what kind of visualization method is suitable, etc.
This is actually very particular. Some data are collected in time series, so its time attribute is very important. So when visualizing, curve charts are very suitable, so that the time scale can be displayed very intuitively. The change. In fact, for data analysis, python is an alternative programming language, and other languages ​​​​can also be used.
The most important thing is actually "data analysis thinking". In fact, this can also be explained by "farming", because for a piece of land, experienced people can tell you step by step what is suitable for planting in this piece of land and how to plant it. You can grow well on this land and have a good harvest every year. But if you are given another piece of land, or even asked to plant completely different crops, your thinking ability will be tested.
The same goes for data analysis. You will encounter all kinds of data and the associated needs. It requires you to flexibly change your thinking, because data analysis must produce a result in the end. And the result should serve your purpose.
If you are still very vague about the above concepts, I strongly recommend that you follow the systematic courses to understand it. After all, we are in the era of big data, and data is generated all the time. Mastering this skill is undoubtedly very helpful for future development. helpful. Then we continue to say that in general, data analysis can be divided into two parts: data and analysis. For data, the data part includes three key steps: data collection, data cleaning & organization, and data storage; analysis also has three key steps: exploratory data analysis (also called statistical data analysis), in-depth analysis and results Explain & Present.The data part is also divided into several sub-parts. Let’s analyze each part specifically:

data collection

source

It's a matter of where to find the data. It might be collected from a library book, or from the Internet, or even by doing a questionnaire. For example, you can use a crawler written in python to automatically crawl data from the Internet.

Tools & Techniques

Today’s technology makes it easier for us to collect data. For example, there are some automated tools that can quickly grab information from the Internet.

Ethics and Compliance

When collecting data, you must ensure that you do not infringe on other people's privacy, comply with legal regulations, and do not randomly capture and use other people's personal information.

Data cleaning and organization

Data quality check

Imagine if the data you get contains a lot of repeated information, or some information is wrong, this will affect your analysis results. So check and clean this data first.

data conversion

Sometimes, the data collected is not immediately usable. It may be necessary to change its form or units to make it suitable for analysis.

feature engineering

This step is to identify the parts of the collected data that are most useful for your analysis. It's like picking out the most important bits from a bunch of messy information.

data storage

Database management

The data collected should be stored in a safe place for future use. Common storage methods include computer database software, such as SQL, NoSQL, etc.

Data Security

Keeping your data safe is just as important as protecting your wallet. Make sure others cannot access or tamper with your data.
For the data part, the largest part is data cleaning and sorting, because for the collection and storage of data, it is basically automatic crawlers or obtaining it from the database or directly calling the interface for operation. The former is very important for data quality. The requirements are not that high. The data processed by the latter is already very standard data, so it is very simple.
The cleaning and organizing step is the most troublesome, because this step includes many tedious but necessary steps, such as removing duplicate data, erroneous data and missing data.
Discuss the different situations of missing data. For example, some missing data can be deleted directly, but some data cannot be deleted. You can only make up for it through statistical methods or other methods.
There are some analyzes on the Internet about the proportion of time spent on data analysis. It can be seen that cleaning and organizing data account for the largest part, up to 60%. Generally, most of the time is spent on cleaning data. Although data cleaning and organization are tedious, they play a key role in the entire data analysis process.
As the largest link, it has a decisive impact on the quality and efficiency of the entire analysis process.
With properly cleansed and organized data, analysts can accurately uncover the insights behind the data to power decision-making. So even though this stage takes a long time, the time and effort invested is well worth it.
After mastering the basic theory, the most important thing to do is to gain an overall grasp through systematic courses with practice. It is also the data analysis course offered by Zhihu Zhixuetang just mentioned. It is highly recommended to learn about it.

analysis part

The main raw material of the analysis step comes from the data part. Only when the data part is processed well can the speed, accuracy and value of analysis be high. In this part, it is basically from the shallower to the deeper. The first is simple exploratory data analysis, which can also be called descriptive data analysis. It mainly calculates the average, median, maximum and minimum values, which are familiar to everyone. Statistical values, as well as some simple visualizations. If it is a time series, some trend analysis will also be done, looking at moving averages and the like; further analysis will use knowledge of machine learning and statistics, such as machine learning modeling. Perform analysis and prediction, or hypothesis testing of statistical learning, etc. The last step is to output the results. In business analysis, it is generally called decision-making, which is responsible for outputting business decisions or providing data analysis results support for business decisions. If we analyze each step in detail,

Exploratory Data Analysis (EDA)

In a business environment, we often have to study and understand data. Exploratory data analysis is a method that allows us to better understand the characteristics of data.

Statistical description

This part is like doing a physical exam on the data. Calculations like mean, median (the number in the middle of all the data), standard deviation (the degree of fluctuation between data), etc. can give us a quick understanding of the "health" of the data.

data visualization

Sometimes, a picture is worth a thousand words. Through charts, we can intuitively see the distribution and changing trends of data or the relationship between different data.

preliminary observations

Here we have to be a detective, looking for possible trends and outliers in the data to see if there is anything unusual or particularly interesting.

In-depth analysis

Analysis is like digging for treasure; sometimes you may need to dig deeper to find truly valuable information.

Modeling and Forecasting

By building some models, we can predict possible future trends. Just like weather forecasts, although they may not be completely accurate, the general direction is usually right.

hypothetical test

This part can be seen as our verification of a certain point of view. For example, we might want to verify whether a new sales strategy actually increases sales.

pattern recognition

This is about finding hidden patterns and associations in data. Like finding purchase associations between different items for better product recommendations. Results Interpretation and Presentation This part is about converting our findings and analysis into actually usable information.

data story

People love stories, and framing the results of your analysis into a compelling story can make complex data easier to understand.

Report writing

Reporting is a common communication method in business. Through professional reports, we can share our analysis results with colleagues, superiors and even customers.

business strategy

The ultimate goal is to use these analytical results for business decisions and help companies develop better strategies and plans. So in general, the work content of data analysis is data and analysis. The two parts complement each other. For a data analysis project, these are basically necessary items.

Guess you like

Origin blog.csdn.net/Everly_/article/details/133267034