R data analysis: combined with the APA format drawing method to talk about ggplot2 and ggsci, please bookmark

I wrote a basic operation of plot for you before, I believe that students should not be addicted to it. However, ggplot2 is mostly used in the mainstream, so today I plan to write about the operation of ggplot2 and the color matching of the graph with an example of forming an APA template format image.

About APA format

You can go to the official website of the American Psychological Association, and you can see the detailed introduction of the APA format:

Including paper templates, citation rules, etc., the content can be said to be very rich. For social science students who do not know how to write papers, this website is the gold standard. It is strongly recommended.

But what we focus on today is the standard of the figure inside. Click on the Tables and Figures on the homepage to enter the next interface:

There are form templates and figure templates in it. For example, in the form template, how to make a table for regression analysis, how to make a table for factor analysis, etc., people have given standard templates. Aren’t these what many students want? Many students come to consult? The American Psychological Association has sorted it all out for everyone, and I recommend it . Specific to the figure template, there are so many in it:

How to do the flow chart of sample inclusion and exclusion, how to do the diagram of path analysis, qualitative research, diagram of mixed design research and so on.

Then this article hopes to make the following example histogram on the APA official website with ggplot2, and in the process of doing it, combine the common operations of writing ggplot2 for everyone (I see that there are many drawing principles of ggplot2 on the Internet, you can search and learn by yourself, this article is skipped)

Practice

My drawing sample data is as follows:

You can see that the data contains the score we need to draw the histogram and the upper and lower values ​​​​needed by the error bars, and the grouping variable reward.

In the first step, we handle the mapping relationship:

viz_data_one %>% 
  ggplot(aes(x = age_group,
             y = framing_score,
             fill = reward,
             ymin = lower,
             ymax = upper))

In the above code, I just told ggplot that my xy axis is the two variables, what variable to use to map to the fill color, and the upper and lower limits. That’s all. I didn’t tell it what geom should be used to represent my data, so ggplot doesn’t know, and it won’t show you any geometry. So after the code runs, the output is gone except for the mapped xy axis, as follows:

Continue to write in the above code, I know that I need a histogram, so the geometry should be geom_bar, and the variable I want to draw the column - score represents the height of the column, no statistical conversion is required, so write the parameter stat = 'identity'; I need 3 sets of columns to be arranged side by side, and set the width between the columns (position_dodge(.6)) and the width of the columns (width). At this point, write the code as follows:

geom_bar(stat = 'identity',color='black',position = position_dodge(.6),width = .5)

The position_dodge in the above code requires the columns to be arranged side by side. The first parameter inside is the width of the side by side. Width = .5 sets the width of the columns. After running the code, the effect is as follows:

It feels pretty good.

In the next step, we need to add error bars, which is another geom. At this time, we continue to add a geometry called geom_errorbar. Similarly, our error bars need to be arranged side by side like columns, and the width of our error bars must also be reduced. So we set width = .1, and position = position_dodge(.6), write the code as follows :

geom_errorbar(width = .1, position = position_dodge(.6))

The output after running is as follows:

Up to now, there are two geoms in the picture, one is bar and the other is errorbar. Our data information has been displayed through geometry.

Next, modify the details. First, change the color. We need to specify the mapped color. Specifically, we change the fill color of the fill, so we need to use the scale_fill_manual function.

So how to choose the color? Or how to quickly find your favorite color?

First install the colourpicker package, and then your Rstudio will have the color selection plug-in Plot Color Helper, with which you can easily select the color you want and get the color representation method:

Click Plot Color Helper to get the following picture (I only took a screenshot, you can choose various colors):

Through this, I selected the following 3 colors and wrote the following code:

scale_fill_manual(values = c("#FAFAFA", "#FFA500", "#6CA6CD"))

At this time, the output of re-running is as follows:

Next, we need to deal with some non-data elements in the figure, such as the background, such as various labels. First, we change the label of the xy axis and the title of the legend. We need to use the labs function and write the code as follows:

  labs( x = "Age Group",y = "Framing Score",fill = NULL,title = "Low Risk") 

The above code, x = "Age Group", y = "Framing Score" is very simple, because our legend is for fill, then fill is the title of the legend in labs at this time, the above code removes the title of the legend, and adds the title of the entire graph at the same time.

Also, we need to set the binding site of the xy axis to 0. This operation is for the y axis, and scale_y_continuous is needed. The code is as follows:

  scale_y_continuous(expand = expansion(0), limits = c(0, 0.4), breaks = seq(0, .4, .1))

In the above code, expand controls the extension of the upper and lower limits of the y-axis, expansion (0) means that the y-axis does not expand at all, limits set the limit of the y-axis (actually the limit of the amount of data), and breaks set the ticks on the y-axis.

After running, the following figure is obtained:

There are still a lot of non-data elements that need to be modified here, and the theme function needs to be used. The background version of the original image is white. We need the panel.background parameter to set the background version. At the same time, I want the title of the entire image to be centered and bold.

These mentioned above are called non-data drawing elements, and there are corresponding element functions for each element, such as element_line (change the line type of the corresponding element), element_text (change the text of the corresponding element, including size, thickness...) and so on:

Each element is associated with an element function, which describes the visual properties of the element. For example, element_text() sets the font size, colour and face of text elements like plot.title.

Due to the limited space of this article, I can't give you a detailed introduction. You can search for this book "ggplot2: Elegant Graphics for Data Analysis", which is free on the Internet. Interested students can read and learn. Here, our code is written as follows:

  theme(
    plot.margin = unit(c(1, 1, 1, 1), "cm"),
    panel.background = element_blank(),
    plot.title = element_text(size = 22, face = "bold",
                              hjust = 0.5,
                              margin = margin(b = 15)),
    axis.line = element_line(color = "black"),
    axis.title = element_text(size = 22, color = "black",
                              face = "bold"),
    axis.text = element_text(size = 22, color = "black"),
    axis.text.x = element_text(margin = margin(t = 10)),
    axis.text.y = element_text(size = 17),
    axis.title.y = element_text(margin = margin(r = 10)),
    axis.ticks.x = element_blank(),
    legend.position = c(0.20, 0.8),
    legend.background = element_rect(color = "black"),
    legend.text = element_text(size = 15),
    legend.margin = margin(t = 5, l = 5, r = 5, b = 5),
    legend.key = element_rect(color = NA, fill = NA)
  )

At this point, after running the above code, the following figure is obtained:

Basically the same format as the example on the APA official website, the example is complete!

Use of ggsci

Next, introduce ggsci, which is a relatively advanced magazine color palette:

ggsci offers a collection of high-quality color palettes inspired by colors used in scientific journals, data visualization libraries, science fiction movies, and TV shows.

Let's put it this way, you have finished drawing, like color matching according to magazines, just use this package. It is very simple and easy to use. This package includes Nature Publishing Group, American Association for the Advancement of Science, The New England Journal of Medicine, and top journals such as jama and Lancet.

It is also very simple to use, just change the corresponding scale directly after making the picture. For example, the picture I just made uses scale_fill_manual to define the color. I directly change this line of code to scale_fill_aaas, and I can get the color matching of Science magazine as follows:

Science color matching

Changed to scale_fill_npg, I can get the color matching of nature magazine:

Nature magazine color matching

Changing to scale_fill_nejm, I can get the New England Journal of Medicine color scheme:

NEJM color matching

I have to say that the color palette of high-end magazines is still pretty good .

Do you have a feeling that posting Science is hopeful after reading the above content? If so, please bookmark this article and forward it to spread. Thank you everyone. If you haven’t, please bookmark this article. Maybe you will read it next time, hahahaha.

Wish you all the best soon.

summary

Today, combined with a drawing example, I have sorted out some ggplot2 drawing ideas and color matching operations for you. If you have a good grasp of ggplot principles, it may seem difficult. The basic principle is that you can search for resources at will.

Thank you for reading it patiently. My articles are written in detail, and the important codes are in the original text. Please forward this article to Moments and reply to the "data link" by private message to get all the data and learning materials collected by me. If it is useful to you, please remember to collect it first, and then like and share.

Everyone’s opinions and suggestions are also welcome. If you want to know any statistical methods, you can leave a message under the article. Maybe I will write a tutorial for you after seeing it. If you have any questions, please feel free to private message. If you have any cooperation intentions, please drop me directly.

Guess you like

Origin blog.csdn.net/tm_ggplot2/article/details/125711056