Comprehensive Guide (a) data visualization with R language

This article reprinted from: https://mp.weixin.qq.com/s?__biz=MjM5MTQzNzU2NA==&mid=208506453&idx=2&sn=1f0d2cde4f07877af863aeb50c7d66ce&scene=21#wechat_redirect

 

Let's take a quick look at this chart:

 

This visualization data chart (originally created by Tableau Software) is how to use data visualization to help decision-makers a good example. Imagine if that information to tell investors through a table, you think you'll spend more time to explain it to him?

 

Today's world, with the growing amount of data, it is difficult without visual form to present all your data in the information. Although there are specialized tools such as Tableau, QlikView and d3.js, but nothing can replace the ability to have a good visual modeling / statistical tools. In particular, it helps to do some exploratory data analysis and characterization engineering. This is the R language that provides incredible help.

 

R languages ​​provide a satisfactory built-in functions and libraries (e.g. ggplot2, leaflet, lattice) in order to establish visualization presentation data. In this article, I've been involved with the R programming language to create both common and advanced visualization steps. However, before the introduction of those, let's take a quick look at a brief history of data visualization. If you are not interested in history, no problem, you can skip to the next section.

 A Brief History of Data Visualization

Historically, the evolution of data visualization has been well-known practitioners to complete the work. William Pulai Fei (William Playfair) is the founder of a graphical statistical methods. William Pu Laifei invention, four types of graphs: graph, economics data histograms, pie chart, and FIGS. Joseph Priestley (Joseph Priestly) created the first epoch-making time line chart, each of which column is used to display a person's life (1765). Yes, the timeline 250 in FIG invention are historical rather than Facebook invention!

 

The most famous of the early visual data by Charles Minard depicted Napoleon's March (French invasion of Russia). Visualization of data contains information over time, the temperature of the widespread impact of the Napoleonic invasion of Russia. In the drawing, it is worth noting that six types of data in two dimensions, respectively: the number of Napoleon's army, distance, temperature, latitude and longitude, direction and location of the march with a specific date.

 

Florence Nightingale (Florence Nightangle) is a pioneer in data visualization. Her way data chart describes the impact of the disease on mortality army (1858). Jon Snow (John Snow) (instead of "Game of Thrones" in the figure) is used to map the spatial analysis of charts and pioneer. In 1854 in London, with this map we discovered the source of the cholera epidemic and public water pump related information maps to help pinpoint the location of a source of the outbreak to a pump.

 

Data Language visualizing R

In this article, we will create the following visual effect:

 

Basic visualization

1. Histogram

2. bar / line in FIG.

Box 3. FIG.

4. Scatter

 

Senior visual effects

1. FIG hotspot

2. The image mosaic

3. Map visualization

4.3-dimensional

5. FIG Related

 

R Language tips:

HistData package provides a small set of data, it is very interesting and very important in statistics and data visualization history.

 

 

Basic visualization

 

Notes:

  1. Basic graphics can be easily created using the R language. Plot (plot) command is a command to pay attention.

     

2. Its parameters are x-axis data, y-axis data, the x-axis labels, y-axis labels, titles, and color. To create the graph, simply use parameter type is selected to l.

 

3. If you want to box diagram, you can use the box diagram (boxplot), to bar on a bar graph functions.

 

1. Histogram

 

Basically, the histogram data is broken down into a small lattices (or spacer), and displays their frequency distribution. You can change the interval to see if doing so affect the intelligibility of data visualization.

 

To give you an example.

 

Note: We use par (mfrow = c (2,5)) command, in order to clear the plurality of FIG on the same page (with reference to the code below).

library(RColorBrewer)

data(VADeaths)

by (mfrow = c (2,3))

hist(VADeaths,breaks=10, col=brewer.pal(3,"Set3"),main="Set3 3 colors")

hist(VADeaths,breaks=3 ,col=brewer.pal(3,"Set2"),main="Set2 3 colors")

hist(VADeaths,breaks=7, col=brewer.pal(3,"Set1"),main="Set1 3 colors")

hist(VADeaths,,breaks= 2, col=brewer.pal(8,"Set3"),main="Set3 8 colors")

hist(VADeaths,col=brewer.pal(8,"Greys"),main="Greys 8 colors")

hist(VADeaths,col=brewer.pal(8,"Greens"),main="Greens 8 colors")

 

请注意,如果间隔数少于被指定的颜色数,颜色会变成极值,如上图中的“Set3 8 colors”图。如果间隔数目超过了颜色的数目,则颜色会开始像在第一行中一样地重复出现。

 

2.条形图/线型图

 

线型图

 

下面的折线图显示了在给定时间内飞机乘客数的增长情况。折线图通常是分析一段时间内延伸趋势的首选。此外,当我们需要比较数量随着某种变量(例如时间)的相对变化时,线型图也是适用的。下面是代码:

plot(AirPassengers,type="l") #Simple Line Plot

条形图

 

条形图适用于显示跨几个组别的累计汇总之间的比较。层叠图用于跨类别的条形图。下面是代码:

barplot(iris$Petal.Length) #Creating simple Bar Graph

barplot(iris$Sepal.Length,col = brewer.pal(3,"Set1"))

barplot(table(iris$Species,iris$Sepal.Length),col = brewer.pal(3,"Set1")) #Stacked Plot

 

3. 箱式图

 

箱式图显示5个有统计学意义的数字,分别是最小数、第一四分数位、中位数、第三四分位数和最大数。因此,它在数据延伸的可视化上非常有用,还能得出相应的推论。下面是简单的代码:

boxplot(iris$Petal.Length~iris$Species) #Creating Box Plot between two variable

让我们来理解下面的代码:

 

在下面的例子中,我在屏幕上显示了4个图。通过使用~符号,我可以将(萼片的长度)的伸展是如何跨各种类别(的物种)进行可视化。我在最后的两个图中演示了调色板。调色板是一组颜色,用来使图标更有吸引力,而且能帮助在数据中创建醒目的区别。

data(iris)

par(mfrow=c(2,2))

boxplot(iris$Sepal.Length,col="red")

boxplot(iris$Sepal.Length~iris$Species,col="red")

oxplot(iris$Sepal.Length~iris$Species,col=heat.colors(3))

boxplot(iris$Sepal.Length~iris$Species,col=topo.colors(3))

 

要了解更多关于R语言中调色板的使用,请参看http://decisionstats.com/2011/04/21/using-color-palettes-in-r/

 

4. 散点图(包括3D等功能

 

散点图有助于轻松地把数据可视化和进行简单的数据检查。这里有简单散点图和多元散点图的代码:

plot(x=iris$Petal.Length) #Simple Scatter Plot

plot(x=iris$Petal.Length,y=iris$Species) #Multivariate Scatter Plot

散点图矩阵可以帮助将彼此交叉的多个变量可视化。

plot(iris,col=brewer.pal(3,"Set1"))

 

您可能会想,我还没有把饼图列表成基本图形。这不是失误,而是我故意这么做的。这是因为,数据可视化专业人员不赞成使用饼图来表示数据。因为人的眼睛不能像目测线性距离那样精确地目测出圆的距离。只需要简单地把任何可用饼图表示的东西都用线图表示。但是,如果你喜欢饼图,可使用:

pie(table(iris$Species))

到这里为止,我们已经学过的所有图表列表如下:

 

您可能已经注意到,在一些图表中,他们的标题已被截断,因为我把太多图表放在同一个屏幕上。要改变这一点,你只需要改变par函数的‘mfrow’参数。

 

原文链接:http://www.analyticsvidhya.com/blog/2015/07/guide-data-visualization-r/?utm_source=feedburner&utm_medium=email&utm_campaign=Feed%3A+AnalyticsVidhya+%28Analytics+Vidhya%29

Guess you like

Origin www.cnblogs.com/shujuxiong/p/11183213.html