R language learning 3.2

DuangDuangDuangDuang

I met Mr. Han this morning and I was in a very handsome mood ٩(๑>◡<๑)۶

The most commonly used language packs of R language:
I now have two download language packs. I prefer to enter install.packages() in the edit bar. You need to call library() before using the package and then run it and you can use the function liao inside. . !!!R language is case sensitive.

data visualization

The most commonly used drawing tool of ggplot2, the drawing is beautiful and beautiful, and it can cover a lot of complex information.
Lattice and ggplot2 can be compared. Although ggplot2 is beautiful in drawing, it is difficult to get started. Lattice is more suitable for novices. The drawing speed is relatively fast and can produce 3D graphs.
gridExtra can combine several pictures into one picture, but ggplot2 does not have this function.

Statistical Analysis

forecast is suitable for analyzing time series data.
Spatstat is suitable for spatial analysis, such as the distribution characteristics of a certain virus incidence in geographic space. Zoo
has many useful built-in functions in forecast, but sometimes simpler and easier calculations of moving averages and moving standards are needed. Poor function, now zoo is a very good supplement.
lme4 or nlme If you need to do some complex data, the experiment is divided into several layers, there are many subgroups, and there are sub-subgroups under the subgroups. For example, several kinds of cells should be prepared, and several indicators should be tested. Each indicator has different Observe the time point... You can use one of these two packages to organize complex data easily. (In drug clinical trials, generally a clinical trial will have the most important research goal. A lot of data will be collected around this central issue, and some meaningful conclusions can be drawn using the data, but it is not the main research goal. Secondary studies are called subgroup studies, and the data involved are called subgroups.)

Data collation

dplyr can decompose and integrate data at will, but this package will be more difficult to get started. But it will cost you when you are familiar with it.

Bioinformatics

Bioconductor is a good choice for genome or chip analysis, display, gene flow, etc. It also has a very active user community with timely feedback, updated twice a year, and a wealth of learning resources, eg http://www.bioconductor.org/help/course-materials/ Here is a series of tutorials

Show off

Although data visualization is also a kind of display, the next one is to display other content in a richer way.
In addition to the analysis results of knitr, the code, running process, text description, etc. of the analysis process can be displayed in the form of web pages, slides, pdfs, etc. generated by knitr. Suitable for teaching, reporting and other occasions.
Shiny can make beautiful web pages just like knitr, and it can also quickly build dynamic interactive web apps. Its biggest advantage is that it does not need to learn other programming languages ​​(css js, etc.) purely on R.
stringr string processing artifact. It can play a very good auxiliary role in operations such as drawing, web page editing, and data cleaning, and splicing, matching, and exchanging strings. (Data cleaning refers to the last procedure to find and correct identifiable errors in data files, including checking data consistency, handling invalid and missing values, etc. Unlike questionnaire review, data cleaning after entry is generally done by a computer instead of Manually.)

the above

The above are some of the more commonly used packages I have found, especially ggplot2, lattice, and dplyr. When writing code, there are some functions that are not in the package you installed, and you don’t know which package they are in. At this time, you need to search in help. You can also search for functions of unknown packages here, but English requires a solid foundation. . In the returned result, you see that the string of letters in front of "a string of letters:: the function you searched for" is the name of the package. Download and install it to use.

R language representation of data-data frame

The data frame is a matrix of data, but each column in the data frame can be different types of data. Restrictions on data frame entry:

  • The components must be vectors (numbers, characters, logic), factors, numeric matrices, lists, or other data frames.
  • Matrices, lists, and data provide as many variables as possible for the new data frame because they each have columns, elements, or variables.
  • Numerical vectors, logical values, and factors remain in their original format, while character vectors are forced to be converted into factors and their levels are the independent values ​​appearing in the vector.
  • The length of the vector in the form of variables in the data frame must be the same, and the matrix structure must have the same number of rows.
eg:

Construct a data frame from x1 and x2
X=data.frame(x1,x2) You can also assign a new column label to the data frame
X=data.frame('height'=x1,'weight'=x2)Insert picture description here

Multivariate data R language call

  • Select the data block that needs to be calculated, copy it, and use dat<-read.table(``clipboard'',header=T) in R
  • Read the txt format document named data X=read.table('data.txt',header=T) when the first line is used as the title.
  • Read the document in CSV format X=read.csv('data.csv')
  • To read the Excel format, first download the package "readxl" that reads the Excel file, call the package: library(readxl), and finally read the file: X=read_excel("data.xls")

Simple R language analysis of multivariate data

  • Hist()
  • Scatter plot()
  • Display the first few lines of data head(X)
  • Binding data attach(X)
  • One-dimensional contingency table table (the data name of the data column you want to know)
  • Barplot barplot(table(), col=1:7) col=1:7 7 colors, set according to the number of data
  • Pie chart (table())
  • A barplot of an indicator grouped by an indicator (table (indicator 1, indicator 2), beside = T, col = indicator 1 1: number of indicator 1)
  • Index 3 frequency three-dimensional contingency table arranged in index 1, 2 ftable (indicator 1, index 2, index 3) Note: order in parentheses
  • Unbind detach (X) When the data frame is not in use, be sure to unbind the data frame.

Guess you like

Origin blog.csdn.net/m0_46445293/article/details/104610185