Summary of cda level 1 simulation questions wrong questions knowledge points

Sql

truncate function

Format: TRUNCATE(number, decimals)
number: the number to be truncated
decimals: the number of decimal places to truncate to the number of decimal places to be truncated to, if it is 0, it means no decimals are kept
For example:
select truncate(2.83,0)
result For 2
select truncate(2.83,1)
the result is 2.8
select truncate(2.83,2)
the result is 2.83

SKUs and SPUs

concept

SPU = standard product unit (SPU) is the smallest unit of commodity information aggregation. Standardized product unit (spu)
SKU = stock keeping unit stock keeping unit

Example: If Lenovo PC is the SPU, then Lenovo G50-16G is the SKU
insert image description here
insert image description here

How to make a droplet chart using a clustered column chart

[What the hell is a water drop chart, can it be made with a clustered column chart? ? ] https://www.bilibili.com/video/BV1na4y1g7Hg/?share_source=copy_web&vd_source=7bb833164ffff331416eb9ad96d824bd
The content is mainly from the big brother of youtube:
https://www.youtube.com/watch?v=fhMLFQIl8Eg

Sankey diagram

insert image description here

Contingency correlation coefficient [unfinished]

Before performing the t test, you need to perform the F test to determine whether there is a significant difference in the variance of the two populations [Why? 】【undone】

insert image description here

insert image description here

Boxplot/Box and Whisker Plot

Case introduction:
The known set of data is 4, 4, 6, 7, 10, 11, 12, 14, 15.
Then the median is 10.
The upper quartile is (12+14)/2=13
The lower quartile is (4+6)/2=5.
The corresponding boxplot looks like this:
insert image description here
Now introduce another group Data: 5, 5, 7, 8, 10, 11, 12, 14, 15
insert image description here

It can be seen from this example:
1. The median in the case is closer to the upper quartile Q3. Combined with the data, it is found that the distribution of the last 50% of the data is more concentrated.
2. The second group of data is more concentrated (starting from 5) , at this time, it is found that the box of the boxplot is shorter. This so-called short can be judged by the quartile difference IQR, so the smaller the quartile difference, the more concentrated the data. The conclusion in some books is that the interquartile range describes the concentration of the middle 50% of the data.

Why can ordinal data be measured by interquartile range?

Chapter 3 Database Application

subquery

insert image description here

#Why is 1.5IQR to identify outliers?
Find a good article: For details, please click the article
Why “1.5” in IQR Method of Outlier Detection?
https://towardsdatascience.com/why-1-5-in-iqr-method-of-outlier-detection-5d07fdc82097
insert image description here

Guess you like

Origin blog.csdn.net/u012076669/article/details/130778046