How to use R's basic drawing system to draw?

The basic drawing system of R is written by Ross Ihaka, which is very powerful, mainly composed of graphics package and grDevices package, which will be automatically loaded when R is started. There are two types of functions in the basic drawing system, one is high-level drawing functions, and the other is low-level drawing functions. The so-called high-level drawing function is a function used to directly generate graphics, including plot( ), hist( ), boxplot( ), pairs( ), etc. The low-level drawing function is a function used to add new graphics or elements on the basis of the drawing shape of the high-level drawing function, including points( ), lines( ), text( ), title( ), legend() and axis() etc.

4.1.1 Function plot()

The function plot() is a generic function, for different types of data, it can draw different graphs. For example, for numerical data, it can draw a scatter plot; for categorical data, it can draw a box plot; for some statistical models, it can draw corresponding graphics, such as survival analysis, it can draw survival curve. Therefore, the function plot() is used very frequently. It is recommended that readers open its help file to view the usage of its various common parameters.

Let's create an example data to show the response of a patient with 2 kinds of drugs (drugA and drugB) and 5 doses (dose).

> dose <- c(20, 30, 40, 45, 60)> drugA <- c(16, 20, 27, 40, 60)> drugB <- c(15, 18, 25, 31, 40)

Use the above data to plot the relationship between the dose and response of drug A:

> plot(dose, drugA)> plot(dose, drugA, type = "b")

The above command creates two graphs. The parameter type in the function plot() defaults to "p" (representing points), so the resulting figure 4-1(a) is a scatter plot. In the second command line, the parameter type is changed to "b" (representing points and lines), so the resulting Figure 4-1(b) is a dotted line graph.

  (A) (b)


Figure 4-1 Scatter diagram (a) and dot-line diagram (b) of the relationship between drug A dose and response

The function plot() is used to create a new graph. We can also use low-level graphing functions, such as lines( ), legend( ), etc., to add new graphic elements to an existing graph. E.g:

> plot(dose, drugA, type = "b", +     xlab = "Dosage",ylab = "Response",+     lty = 1,pch = 15)> lines(dose, drugB, type = "b", lty = 2, pch = 17)> legend("topleft", title = "Drug Type",+        legend = c("A", "B"), +        lty = c(1, 2), +        pch = c(15, 17))

As shown in Figure 4-2, in order to compare the response of the two drugs at different doses, we show two dotted line graphs on one graph, and use different types of lines (lty) and points with different characteristics (pch) to add distinguish. To enhance readability, a legend has also been added. It should be noted that the attributes of the points and lines in the function legend() must be consistent with the attributes set in the previous functions plot() and lines( ).

Figure 4-2 Comparison of dose and response relationship between drug A and drug B

4.1.2 Histogram and density curve

The histogram is the most commonly used tool for displaying the distribution of continuous variables. It is essentially an estimate of the density function. Histograms and density graphs are generally used to explore distributions and are rarely used to report results. The function hist() can be used to draw a histogram.

The data set anorexia is located in the MASS package and comes from a study on the weight changes of young women with anorexia. The data set contains 72 observation objects and 3 variables. The variable Treat (treatment mode) is a factor with 3 levels. The variables Prewt and Postwt are both numerical and represent the weight before and after treatment (unit: lb). Draw the histogram of the variable Prewt below, the code is as follows:

> data(anorexia, package = "MASS")> str(anorexia)'data.frame':  72 obs. of  3 variables: $ Treat : Factor w/ 3 levels "CBT","Cont","FT": 2 2 2 2 2 2 2 2 2 2 ... $ Prewt : num  80.7 89.4 91.8 74 78.1 88.3 87.3 75.1 80.6 78.4 ... $ Postwt: num  80.2 80.1 86.4 86.3 76.1 78.1 75.1 86.7 73.5 84.6 ...> attach(anorexia)> hist(Prewt)

Figure 4-3(a) shows the frequency distribution of the variable Prewt. Since no parameters are set in the function hist( ), the default group distance, axis label and title are used in the figure. It should be noted that the shape of the histogram is affected by the group distance. Sometimes we need to try to set different values ​​of the parameter breaks to get a suitable graph. The output result of the function hist() contains some calculated return values, which can be used for further mapping or analysis, such as dividing the end points, frequency (or density), and midpoints of the interval.

The density curve provides a smoother description of the data distribution. The method of drawing the density curve is:

> plot(density(Prewt))

It can be seen from Figure 4-3(b) that the distribution of the variable Prewt is unimodal and basically symmetrical. We can also add a density curve and axial whisker plot to a histogram. At this time, you need to set the parameter freq to FALSE in the function hist( ), that is, change the ordinate to frequency, otherwise the density curve will be almost invisible. The parameter las is set to 1 to display the scale label of the vertical axis horizontally.

> hist(Prewt, freq = FALSE, col = "red",+ xlab = "Weight (lbs)", + main = "Histogram of weight distribution before treatment", + las = 1)> lines(density(Prewt), col = "blue", lwd = 2)> rug(Prewt)> detach(anorexia)

Figure 4-3(c) fills the bars with red, adds more informative axis labels and titles, and changes the vertical axis scale label to horizontal display by setting the parameter las to 1. Then use the function lines() to superimpose a blue density curve twice the default line width on the histogram. Finally, use the function rug() to add an axis-whisker plot on the horizontal axis to show the dense trend of data distribution.

  (A) (b) (c)


Figure 4-3 Example of histogram

4.1.3 Bar graph

The bar chart is often used in medical scientific papers. It displays the frequency distribution of categorical variables through vertical or horizontal rectangles. The function barplot() can be used to draw bar graphs. The following uses the Arthritis data set in the vcd package as an example to introduce the usage of the function barplot( ). This data set comes from a group controlled double-blind clinical trial study on a new method of treating rheumatoid arthritis. The response variable Improved recorded the treatment effect of each patient who received medication (Treated, 41 cases) or placebo (Placebo, 43 cases), divided into 3 levels (None, Some, Marked).

> library(vcd)> data(Arthritis)> attach(Arthritis)> counts <- table(Improved)> countsImproved  None   Some  Marked     42     14    28

The function table() is used to generate the frequency statistics table of categorical variables. From the output above, we can see that 28 patients have improved significantly, 14 patients have improved partially, and 42 patients have not improved. The bar graph can be used to display this frequency distribution, as shown in Figure 4-4(a).

> barplot(counts, xlab = "Improvement", ylab = "Freqency", las = 1)

The function barplot() can also be used to display data in a two-dimensional contingency table. Figure 4-4(b) draws a grouped bar graph, and adds color and legend, the code is as follows:

> counts <- table(Improved, Treatment)> barplot(counts, +         col = c("red", "yellow", "green"),+         xlab = "Improvement", ylab = "Freqency",+         beside = TRUE, las = 1)> legend("top", legend = rownames(counts), +        fill = c("red", "yellow", "green"))

  (A) (b)


Figure 4-4 Example of bar graph

Bar graphs can sometimes be used to display the mean, median, standard deviation, confidence interval, etc. under different categories. This function can be achieved with the functions in the basic package, but it requires many steps. The function aggregate.plot() in the epiDisplay package can simplify this process. The following code uses the data set anorexia as an example to draw a bar graph of the mean weight after treatment under different treatment methods, and the results are shown in Figure 4-5.

> library(epiDisplay)> aggregate.plot(anorexia$Postwt, by = list(anorexia$Treat), +                error = "sd", legend = FALSE, +                bar.col = c("red", "yellow", "green"),+                ylim = c(0,100), las = 1,+                main = "")

Figure 4-5 Example of a bar graph of mean and standard deviation

The error bar above represents the standard deviation. We can display the standard error or confidence interval by changing the parameter error setting in the function aggregate.plot( ).

4.1.4 Pie Chart

The pie chart can be used to show the proportion of classified data. For example, the pie chart drawn by the code below (Figure 4-6) shows the distribution of the types of diseases admitted to the emergency department of a hospital in a week.

> percent <- c(5.8, 27.0, 0.5, 20.8, 12.8, 33.1)> disease <- c("shanggan", "stroke", "traumatic injury", "fainting", "food poisoning", "other") > lbs <- paste0(disease, percent, "%")> pie(percent, labels = lbs, col = rainbow(6))

Figure 4-6 Diagnosis distribution of emergency patients admitted to a hospital in a week

Most statisticians do not recommend the use of pie charts. They recommend using bar charts or dot charts instead of pie charts, because people's judgment of length is more accurate than that of area. Therefore, the basic package function pie() has limited options for drawing pie charts. However, some donation packages extend R's ability to draw pie charts, such as the plotrix package. The function pie3D() provided by this package can draw a three-dimensional pie chart, and the other function fan.plot() can draw a pie chart similar in function to a pie chart. Interested readers can install this package and view its help document.

4.1.5 Box plot and violin plot

Box plots, also known as box-whisker plots, are often used to show the approximate distribution characteristics of data, as well as to explore outliers and outliers. The function boxplot() can be used to draw boxplots. The following box plot shows the distribution of changes before and after the body weight in the data set anorexia.

> anorexia$wt.change <- anorexia$Postwt - anorexia$Prewt> boxplot(anorexia$wt.change, ylab = "Weight change (lbs)", las = 1)

In order to allow readers to better understand the meaning of each part of the box plot, the author added manual annotations in Figure 4-7. If the data is symmetrically distributed, the Median should be in the middle of the upper quartile and the lower quartile, that is, the box of the box plot is symmetric about the median line. Values ​​outside the upper hinge and lower hinge are usually considered outliers.

Figure 4-7 Example of box plot with added label

Parallel box plots can be used to compare the distribution of an indicator in each category of a categorical variable. For example, to compare weight changes under different treatments, you can use the following command:

> boxplot(wt.change ~ Treat, data = anorexia,+         ylab = "Weight change (lbs)", las = 1)

The first parameter of the function boxplot() is a formula. Formulas in R generally use the symbol "~" to connect variables. The left side of "~" can be regarded as the dependent variable, and the right side of "~" can be regarded as the independent variable. It can be seen from Figure 4-8(a) that the amount of weight change in the "FT" (family treatment) group was higher than that of the other two groups. However, the significance of the difference requires further significance testing to determine, which we will discuss in detail in Chapter 5.

Violin plot can be seen as a combination of box plot and density plot. The function vioplot() in the vioplot package can be used to draw violin diagrams. Please install and load this package before use. For example, Figure 4-8(a) can be replaced with a violin diagram, as shown in Figure 4-8(b).

> library(vioplot)> vioplot(wt.change ~ Treat, data = anorexia, +         ylab = "Weight change (lbs)",+         col = "gold", las = 1)

   (A) (b)


Figure 4-8 Box plot (a) and violin plot (b) of weight change under different treatments

4.1.6 Cleveland dot chart

The Cleveland dot plot is essentially a scatter plot. It displays the size of the data by the position of the point. It is a method of plotting a large number of labeled values ​​on a simple horizontal scale. Its function is similar to that of a bar chart. But emphasize the sorting of data and the gap between them. The function dotchart() can be used to draw a Cleveland dot chart. The data set VADeaths in the datasets package is the death rate (in ‰) of people of different age groups in the urban and rural areas of Virginia, USA in 1940. The Cleveland dot chart (Figure 4-9) can better display the data, the code is as follows:

> dotchart(VADeaths)> dotchart(t(VADeaths), pch = 19)

As can be seen from Figure 4-9, the mortality rate increases with age; in the same age group, the mortality rate in rural areas is higher than that in urban areas; in the same age group and the same area, the mortality rate of men is higher For women.

Figure 4-9 Example of Cleveland dot plot

4.1.7 Export graphics

If you want to save the graphics, you can use the graphical user interface and the code in two ways. Under "Plots" at the bottom right of RStudio, click "Export" and select "Save as Image" or "Save as PDF" to save the graphics in the specified folder. We can also choose "Copy to Clipboard" to copy graphics directly to Word or PowerPoint documents. It should be noted that the graphics saved in this way are related to the size of the RStudio graphics window, that is, the graphics obtained by windows of different sizes will be different. If you want to save the graphics for use in reports or papers, the author recommends using the code method, placing the drawing statement between the statement to turn on the target graphics device and the statement to turn off the target graphics device. For example, the following code will save the graph in the current working directory and name it "mygraph.pdf":

> pdf("mygraph.pdf")> boxplot(wt.change ~ Treat, +        data = anorexia, +        ylab = "Weight change (lbs)",+        las = 1)> boxplot(wt.change ~ Treat, data = anorexia, ylab = "Weight change (lbs)")> dev.off()

In addition to the function pdf( ), we can also use the functions png( ), jpeg( ), tiff() and postscript() to save graphics in other formats. Graphics files in bmp, png, and jpeg formats are all non-vector formats and are easily affected by resolution, but they take up a small space and are suitable for use in Word and PowerPoint documents; graphics files in ps format are vector format files, which are not Resolution has nothing to do, suitable for typesetting and printing; while graphic files in tiff (or tif) format can support many color systems, and are independent of operating systems, they are the most widely used in various publications. E.g:

> tiff(filename = "mygraph.tiff",+      width = 15, height = 12, units = "cm", res = 300)> boxplot(wt.change ~ Treat, data = anorexia, ylab = "Weight change (lbs)")> dev.off()

The above command generates a graphic file named "mygraph.tiff", the parameters width and height are used to set the width and height of the graph respectively, the parameter units are used to set the unit of width and height, and the parameter res is used to set the resolution. Here Set to the minimum value of 300 required by most publications.

Paper cut option "R language analysis of actual medical data" from

  • Introduction to Medical Statistics, recommended by Professor Yu Songlin, Tongji Medical College, Huazhong University of Science and Technology
  • Emphasize actual combat and application, highlight the nature of the problem and the overall structure
  • Contains a large number of R program examples and graphics, to take you to a deeper understanding of data analysis

This book uses medical data as an example to explain how to use R for data analysis, combined with a large number of selected examples to introduce common analysis methods in a simple way, to help readers solve practical problems in medical data analysis.

The book is divided into 14 chapters. Chapters 1 to 3 introduce the basic usage of R language; Chapter 4 introduces data visualization; Chapter 5 introduces basic statistical analysis methods; Chapter 6 to Chapter 8 Introduction The three most commonly used regression models in medical research; Chapter 9 introduces the basic methods of survival analysis; Chapters 10 to 12 introduces several commonly used multivariate statistical analysis methods; Chapter 13 introduces the clinical diagnostic test Statistical evaluation indicators and calculation methods; Chapter 14 introduces the Meta analysis methods commonly used in medical scientific research practice.

This book is suitable for undergraduates and postgraduates in clinical medicine, public health and other medical-related majors. It can also be used as a reference book for students and researchers in other majors to study data analysis. Reading this book, readers can not only master the method of using R and related packages to quickly solve practical problems, but also have a deeper understanding of data analysis.

 

Guess you like

Origin blog.csdn.net/epubit17/article/details/108403226