R ggplot2- language Visualization Toolkit (Display Data Distribution)


There are some geometric object can be used to show the distribution of data, the dimensions of which depends on the distribution of specific use, distribution is continuous or discrete, and we are interested in the conditional distribution or joint distribution. For one-dimensional continuous distribution, the most important geometric objects is a histogram. The following figure shows a histogram diamond data depth variables. In order to find a strong performance of view, many times from the test group of layout detail is essential. For example, we can change the set space width ( binwidth ) or explicitly specified accurately slicing position ( breaks ).

The following code demonstrates these ways:

Part 1

Part 1.1

Adjust class interval

Never expect to rely on the default parameters will be able to get a strong performance of the graphic (above left) for a specific distribution.
(Below) the x-axis enlarged, xlim = c(55,70)and selected from the group a smaller width binwidth =0.1than the left to reveal more detail. We can see that the distribution is slightly skewed to the right.

Do not forget to write information on important parameters in the header (such as from the group width).

qplot(depth, data = diamonds, geom = "histogram")
qplot(depth, data = diamonds, geom = "histogram", xlim = c(55, 70), binwidth = 0.1)

Here Insert Picture Description
Here Insert Picture Description

Part 1.2

There are several ways can be used for inter-group distribution: drawing simultaneously a plurality of small histogram facets=.-var; frequency polygons (Polygon Frequency), ; geom="freqpoly"or density conditions of FIG position="fill".

As the histogram and frequency polygon geometry objects are used stat_binstatistical transformation. This statistical transformation generates two output variables countand density. Variable countto the default value, because it's better interpretability. The variable densityis essentially equivalent to the total count divided by the count , this variable in shape when we want to compare different distributions rather than the absolute size of the data more useful. In particular, we often use this variable to compare data distributed in different size of the subset.

Pairs of the form and distribution of many geometric objects are associated with geometric objects (geom) / statistical transformation (stat) of. Most of these geometric objects are essentially an alias (alias): a basic geometric objects in conjunction with a statistical transformation can draw graphics you want. On the surface, boxplot (Boxplot)

It seems to be an exception, but in the background to achieve, geom_boxplotas is the use of the basic strip, the combination of lines and points.

Data diamond cutting and depth distribution of the three views. Top to bottom are faceted histogram , conditions density maps and frequency polygons FIG .

+ Facet adjustment group from (density)

depth_dist <- ggplot(diamonds, aes(depth)) + xlim(58, 68)

They have shown an interesting pattern: With the improvement of the quality of the diamond, and the distribution is gradually shifted toward the left become more symmetrical.

depth_dist + geom_histogram(aes(y = ..density..), binwidth = 0.1) + facet_grid(cut~.)

Here Insert Picture Description

Cumulative histogram

depth_dist + geom_histogram(aes(fill = cut), binwidth = 0.1, position = "fill")

Here Insert Picture Description

The frequency polygon

depth_dist + geom_freqpoly(aes(y = ..density.., colour = cut), binwidth = 0.1)

Here Insert Picture Description

Part 2

Boxplot may be used (below), the distribution of continuous variables were observed when the conditions for taking a categorical variables (e.g., cut) when taking condition (upper panel), or continuous variables (e.g., carat).

library(plyr)

Boxplot

qplot(cut, depth, data = diamonds, geom = "boxplot")

Here Insert Picture Description

Binning

For continuous variables, you must set grouppattern to obtain a plurality of attributes boxplot.

It is used here group=round_any(carat,0.1, floor)to obtain variables for carat to 0.1 units as box plots the size of the bin.

qplot(carat, depth, data = diamonds, geom = "boxplot", 
   group = round_any(carat,0.1, floor), xlim = c(0, 3))

Here Insert Picture Description

Scatter jitter jitter type

Geometric objects jitterdrawn a coarser pattern may have a discrete variable in the two-dimensional distribution of the time.

Overall, the data will be treated more effectively break up the small data set.

The figure shows a mpg dataset discrete variable class and continuous variables city , then the continuous variables FIG city replaced by discrete variables DRV .

geom_jitter=position_jitter+geom_point: By adding random noise on the cover in order to avoid discrete distribution of painting problems, which is a relatively crude approach. Plotted using the following code shows an example of this.

qplot(class, cty, data = mpg, geom = "jitter")
qplot(class, drv, data = mpg, geom = "jitter")

Here Insert Picture Description
Here Insert Picture Description

FIG density (after smoothing kernel smoothing method based on a frequency obtained polygons)

geom_density=stat_density+geom_area: After smoothing kernel smoothing method based on a frequency obtained polygon. Please only known potential density distribution is smooth, continuous and unbounded when using such density map . Parameter may be used adjustto adjust the degree of smoothing the density curve obtained. Plotted using the following code shows an example of this.

Density map is actually a smoothed version of the histogram. Its theoretical properties of ideal, but it is difficult to FIG back data itself.

Density of the diagram is a variable depth. The figure below shows the different values ​​of the variables in accordance with the cut of color version.

qplot(depth, data = diamonds, geom = "density", xlim = c(54, 70))
qplot(depth, data = diamonds, geom = "density", xlim = c(54, 70), fill = cut,alpha = I(0.2))

Here Insert Picture Description
Here Insert Picture Description

Published 19 original articles · won praise 85 · views 8114

Guess you like

Origin blog.csdn.net/qq_44658157/article/details/105174893