Some common tools and techniques for data analysis:
- Excel :
Excel
is one of the most commonly used data analysis tools. It enables basic statistical analysis, charting, and data visualization. - Python :
Python
is a popular programming language that is widely used in data science. It has powerful data analysis libraries such as Pandas, NumPy and SciPy. - R :
R
is a programming language designed for data science with powerful data analysis and visualization capabilities. - Tableau :
Tableau
is a data visualization and business intelligence tool that presents data through intuitive charts and interactive dashboards. - SQL :
SQL
is a language used to manage and process relational databases. It enables data query, analysis and aggregation.
Data analysis learning platform & link:
- Kaggle :
Kaggle
is an online data science community that provides a wide range of data sets, competitions, and tutorial resources.
https://www.kaggle.com/
- Coursera :
Coursera
is an online learning platform that offers many data analysis and data science courses.
https://www.coursera.org/learn/data-analysis
- DataCamp :
DataCamp
is an online data science learning platform that provides tutorials on data analysis tools such as Python, R, and SQL.
https://www.datacamp.com/
- Udemy :
Udemy
is an online education platform that offers a large number of data analysis and data science courses.
https://www.udemy.com/topic/data-analysis/
- Data.gov :
Data.gov
is a public data repository provided by the U.S. government that contains various types of data sets that can be used for analysis and research.
https://www.data.gov/
Data Analysis code example:
- Python (Pandas lib)
import pandas as pd
#读取CSV文件为DataFrame
df = pd.read_csv("data.csv")
#查看前几行数据
df.head()
#查看数据信息
df.info()
#计算各列统计信息
df.describe()
- R:
#读取CSV文件为DataFrame
df <- read.csv("data.csv")
#查看前几行数据
head(df)
#查看数据信息
str(df)
#计算各列统计信息
summary(df)
- SQL:
--连接数据库
USE dbname;
--查询数据
SELECT column1, column2, column3
FROM tablename
WHERE condition;
--计算各列统计信息
SELECT COUNT(column), AVG(column), MAX(column), MIN(column)
FROM tablename;