征服统计学01|什么是统计分布?

之前读书期间学的概率统计什么的都忘得七七八八了,工作中也常在用,一直想系统再学习下,苦于无好的教材,最近发现了一个有趣的统计学课程( StatQuest! )现在决定站在巨人的肩膀上系统梳理一遍统计学基础知识,希望能学到最后~~~。

 StatQuest!系列课程是由北卡罗来纳大学教堂山分校(University of North Carolina at Chapel Hill)的遗传系前辈Josh Starmer所创,发在Youtube上(国内上不了油管,but好心的小伙伴已经搬到B站了),课程涵盖以下几个部分

  • 统计学基础Statistics Fundamentals – These videos give you a general overview of statistics as well as a be a reference for statistical concepts. Topics include:
    • Histograms
    • What is a statistical distribution?
    • And many more!!!
  • 线型回归和线型模型Linear Regression and Linear Models – These videos teach the basics relating to one of statistics most powerful tools.  Linear Regression and Linear Models allow us to use continuous values, like weight or height, and categorical values, like favorite color or favorite movie, to predict a continuous value, like age.
  • 逻辑回归Logistic Regression – These videos pick up where Linear Regression and Linear Models leave off. Now, instead of predicting something continuous, like age, we can predict something discrete, like whether or not someone will enjoy the 1990 theatrical bust Troll 2.
  • 机器学习Machine Learning – Linear Models and Logistic Regression are just the tips of the machine learning iceberg. There’s tons more to learn, and this playlist will help you trough it all, one step at a time.
  • NGSHigh Throughput Sequence Analysis – If you do high-throughput sequence analysis, this playlist is for you!
  • R中的统计学Statistics in R – If you want to do any of this stuff in R, this playlist is for you, and you only. No one else is allowed to watch it.

下图为Josh Starmer真尊~~~,看到后面那个家伙没?那是是他授课道具额,课程中经常自创BGM,灰常有趣~~~


本文介绍什么是统计分布?

测量一群人的身高,然后将身高数据按下图堆放在不同区间(bins)里,得到一个直方图。 

我们会发现,大多数身高落在了5到6英尺之间, 矮于5和高于6英尺的身高都相对较少;

可以理解为当随机测一个人的身高,它身高很有可能落在5到6英尺之间。

当将bins宽缩小一半时,可以更精确发现大多数身高落在了5.25到5.75英尺之间, 矮于4.5和高于6.5英尺的身高都相对较少;

通过更进一步缩小bins大小,测更多的人,可以得到人群身高更精确的分布。

可以用一条平滑的曲线近似描述直方图的分布,身高主要分布在5到6英尺之间,矮于5和高于6英尺相对概率较少。

曲线优于直方图,不受bins宽度限制,可利用微分直接计算身高落在任意区间的概率,而不必用最近的区间来估算。

此外,当我们时间和经济有限时,可以利用有限数据的均值和标准差来估计人群身高的近似曲线。

曲线和直方图都属于统计分布,展示测量的概率分布,高出表示数据落在此区间的概率更大,相对的低处表示数据落在此区间的概率更小。


Reference

猜你喜欢

转载自blog.csdn.net/qq_21478261/article/details/111410048