Solving the median
We have the following set of age data:
2 employees' age is 17
4 employees' age is 28
1 employees' age is 18
2 employees' age is 31
3 employees' age is 21
1 employees' age is 33
1 employees' age is 26
2 The age of employees is 34 The age of
1 employees is 26 The age of
3 employees is 37 The age of
1 employees is 39 The age of
3 employees is 43 The age of
2 employees is 40
- Create an age array, set the number of people without age to 0
>> f_abs = [2 1 0 0 3 0 0 1 0 1 0 4 0 2 0 1 2 0 0 3 0 1 2 0 0 3];
- Create an array to store the age
>> bins = [17:43];
- Create an array, repeating each element by frequency
raw = [];
Complete procedure:
f_abs = [2 1 0 0 3 0 0 1 0 1 0 4 0 2 0 1 2 0 0 3 0 1 2 0 0 3];
bins = [17:43];
raw = [];
for i = 1:length(f_abs)
if f_abs(i) > 0
new = bins(i) * ones(1,f_abs(i));
else
new = [];
end
raw = [raw,new];
end
ave = mean(raw)%计算平均值
md = median(raw)%计算中位数
sigma = std(raw)%计算标准偏差
Plotting frequency histogram
- Step 1: Calculate the area
area = sum(f_abs);%计算面积
- Calculate frequency ratio
scaled = f_abs/area;%计算比例
complete:
f_abs = [2 1 0 0 3 0 0 1 0 1 0 4 0 0 2 0 1 2 0 0 3 0 1 2 0 0 3];
bins = [17:43];
raw = [];
for i = 1:length(f_abs)
if f_abs(i) > 0
new = bins(i) * ones(1,f_abs(i));
else
new = [];
end
raw = [raw,new];
end
area = sum(f_abs);%计算面积
scaled = f_abs/area;%计算比例
bar(bins,scaled),xlabel('年龄'),ylabel('频数比例');
Image:
The distribution in this example is more scattered, let's look at another example:
f_abs = [2 1 0 0 5 4 6 7 8 6 4 3 2 2 1 0 0 1];
bins = [17:34];
raw = [];
for i = 1:length(bins)
if f_abs(i) > 0
new = ones(1,f_abs(i))*bins(i);
else
new = [];
end
raw = [raw,new];
end
avr = mean(raw)
media = median(raw)
sigma1 = std(raw)
This time the age distribution is 17~34
between, and the standard deviation is much smaller, we plot the frequency distribution histogram:
f_abs = [2 1 0 0 5 4 6 7 8 6 4 3 2 2 1 0 0 1];
bins = [17:34];
raw = [];
for i = 1:length(bins)
if f_abs(i) > 0
new = ones(1,f_abs(i))*bins(i);
else
new = [];
end
raw = [raw,new];
end
avr = mean(raw);
media = median(raw);
sigma1 = std(raw);
area = f_abs/sum(f_abs);
bar(bins,area),xlabel('年龄'),ylabel('频数比例')
This is somewhat similar to the Gaussian distribution curve, so when the data conforms to the Gaussian distribution, the standard deviation can be used to describe the data and determine the probability of falling on a certain data.
The standard deviation is represented by this symbol: the
mean is represented by this symbol:
then the probability that the curve falls in the following range are:
the standard deviation and the mean in the previous example are:
avr =
24.6538
sigma1 =
3.3307
The approximate 68%
age falls within one standard deviation of the mean, that is, aver-sigma~aver+sigma
between and wider, 96%
the age will fall within two standard deviations of the mean ...