Descriptive statistics

First, numerical data analysis

Four numerical data

Numerical data analysis has four main aspects.

  1. Center Measure of central tendency
  2. Spread Measuring the degree of dispersion
  3. Shape Shape data
  4. Outliers Outliers

Analysis of categorical data

Less categorical data analysis part to consider. Categorical data analysis method is generally independent entity falling within view of each group number or proportion. For example, if we look at the dog's breed, we will be concerned about the number of dogs in each breed, or how the proportion of each breed of dog.

1.1 measure of central tendency

Way measure of central tendency of three ways:

  1. Mean Means
  2. Median Median
  3. Mode The mode

1.1.1 Mean

Often referred to as the mean or average or expected value in mathematics. We all values ​​by adding then dividing by the number of all the measured values ​​of the data set to calculate the mean.

1.1.2 median

The median our data is divided into two parts, less than half of it, half above it. How to calculate the median up to us with an even number or odd observations.

The median values ​​of the odd number

If we have an odd number of observations, the median is the middle of that number directly. For example, if we have seven observations and press ascending order, the median is the fourth value. If we have nine observations, the median is the fifth value.

The median value of an even number

If we have an even number of observations, the median is the average of the two middle values. For example, if we have eight observations and in ascending order, fourth and fifth of the average value is calculated.

To calculate the median, we must first sort value.

1.1.3 The mode

The mode refers to the largest number of data values ​​in a set of data appears.

A data set may have multiple modes, it may not be the mode.

Countless public

If all the values ​​of the data set the same as the frequency of occurrence, the mode does not exist. If we have a group of data sets:

1, 1, 2, 2, 3, 3, 4, 4

It is not the mode because the same number of all observations occurred.

Multiple modes

If the number of two (or more) number that appears is the largest, there are multiple modes. If we have a group of data sets:

1, 2, 3, 3, 3, 4, 5, 6, 6, 6, 7, 8, 9

Which has two modes 3 and 6, because both values ​​appear three times, the highest frequency, while others value only appears once.

 

Symbolic expression

polymerization

 

 

Random variables and expressions

 

 

 1.2 degree of dispersion measurement

 

 

 

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/jason-zhou/p/12469654.html