R Data Analysis: Practices, Principles and Reproducibility Methods of Network Analysis

For the study of complex problems and phenomenal behaviors, especially when there are complex interplays between them, network analysis (alternate name: psychological networks, network analysis or network psychometrics) is a good alternative method. This method has not been popular for a few years. Interested students can learn it. Moreover, if you can make a beautiful network diagram, all reviewers should be willing to read it. Today I try to give you a brief introduction to network analysis.

For example, how do physiological, psychological, social and environmental factors affect obesity? It's very complicated. Do a multiple regression? Do a structural equation model? All of these lack a systematic perspective, and it is recommended to use network analysis to grasp the relationship between the influencing factors of obesity as a whole, and to find the most important intervention targets.

Such as health, behavior, psychology, cognitive function, etc., anyway, it is the analysis of complex systems. If you can’t think of a topic, you can consider network analysis in terms of methods

From a network perspective, health behaviours and outcomes can be conceptualised as emergent phenomena from a system of reciprocal interactions: network analysis offers a powerful methodological approach to investigate the complex patterns of such relationships.

For example, an article on suicide research, the original text is posted below:

Bloch-Elkouby, S., Gorman, B., Schuck, A., Barzilay, S., Calati, R., Cohen, L. J., Begum, F., & Galynker, I. (2020). The suicide crisis syndrome: A network analysis. Journal of Counseling Psychology, 67(5), 595–607.

Through network analysis, the author answers three questions: First, what is the relationship between different symptoms in suicide breakdown syndrome? Second, are any of these symptoms particularly important? The third is whether these symptoms have some aggregation? Each of the three questions is very valuable, and a network analysis is all solved for you. Interested students can download it and read it.

Another important point is that network analysis can help you identify intervention points in complex systems, which is actually of great clinical value.

network analyses allow for the computation of centrality indices that provide information about the symptoms that are the most connected to the other symptoms included in the network and whose potential causal contribution to the other symptoms may thus deserve further investigation

If you are engaged in humanities and social sciences, network analysis is also an important technical support that is most likely to help you put forward original theoretical hypotheses because it is a function relationship for complex human systems. The overall meaning is to promote scientific research from 0 to 1, and this method must be learned.

Basics of Network Analysis

A simplest network like this

There are several points called nodes, and these points are connected by lines, and the lines are called edges.

In the network diagram, points represent variables, and lines represent variable relationships. Note that this line has no arrows (usually partial correlation coefficients), positive coefficients are green lines, and negative coefficients are red lines. The thickness of the line reflects the strength of the relationship. The thicker the stronger, the stronger. Through such a network diagram, the complex relationship between multiple variables can be presented at a glance.

Psychological networks consist of nodes representing observed variables, connected by edges representing statistical relationships. This methodology has gained substantial footing and has been used in various different fields of psychology, such as clinical psychology, psychiatry, personality research, social psychology , and quality of life research

The basic steps of drawing are as follows

First, there must be a statistical model between the data, and the model coefficients are used as the weight of the edge, and then the graph is formed, and then the model is evaluated.

Specifically, there are many statistical models that can be selected: correlations, covariances, partial correlations, regression coefficients, odds ratios, factor loadings. Generally, we use the partial correlation coefficient as the weight of the edge. When the nodes are fixed, the network can be drawn very densely. In order to increase the interpretability and generalizability and the stability of the network, it is necessary to use some regularization, usually LASSO, to simplify the network, that is, to get rid of the edges that have little meaning, so that the network diagram is more concise and easy to explain. After the graph is formed, it is time to evaluate the model. The main analysis includes two edge stability analysis and centrality indices. The following is a brief introduction:

edge stability analysis

Network analysis itself is relatively complicated, and the randomness of the network is greater than other analysis, and our scientific research logic itself is to use samples to reflect the overall. If the network you make is unstable, can you say that the network you found, such as obesity or suicide, is credible? Therefore, after the network analysis, we must report the robustness of the network. The logic is to perform repeated bootstrap sampling, repeatedly re-estimate the model, and repeatedly recalculate the confidence intervals (eg 95% CI) for their edge estimates to see the difference between these models to evaluate the robustness of the model. Through edge stability analysis, we can get the confidence interval of the weight of each edge of the network. The narrower the interval, the more stable the network. The sample code for edge stability analysis is as follows:

resboot1 <- bootnet(Data, default = c("EBICglasso"), tuning=.5,corMethod="cor_auto",
 nBoots = 1000, nCores = 8, type = c("nonparametric")) 

Usually this part of the paper will also be reported in the form of graphs.

centrality indices

In network analysis, the importance of each node is different. Are there some nodes that are more important than others? The index for evaluating the importance of nodes is centrality indices. This index includes three indices: strength, closeness, and betweenness. The meanings of the three indices are as follows:

strength, which shows how well a node is directly connected to other nodes, closeness, which shows how well a node is indirectly connected to other nodes, and betweenness, which quantifies the number of times a node acts as a bridge along the shortest path between two other nodes

The simple logic is that if there are more and stronger connections between a point and other points in the graph, then this point is more important, and this point should be placed in the center of the graph; if the indirect distance between a point and other points is closer, this point is more likely to be affected by network changes; if a point always acts as a bridge in a pairwise relationship on average, then this point has great significance for the composition of the entire network.

It is usually sufficient to report only strength, since the other two indices are less stable.

Practice

Today we reproduce the practice based on a study published in the American Journal of Public Health Research in 2022. The name of the article is Partial Relationships between Health and Fitness Measures in Adults: A Network Analysis

In the results, the author reports the correlation coefficient matrix of the variables related to each other and the network edge weights of the model, as shown in the following table:

The author reports the network graph and the centrality indices of the model, which are presented in the form of graphs, as shown in the figure below:

There is also Bootstrapped edge weight estimates, which is also a graph:

So let's take a look today, how to reproduce the graph of this paper with our own data.

For example, I now have data as follows, there are 2800 observations, 26 variables, the last variable is gender, and the remaining 25 variables are 5 scales with 5 items :

The first thing I want to do is to fit the network model. The core function used to fit the network model is estimateNetwork. Usually, we only need to set the data and default parameters to run it. If we want to simplify the network through lasso, we can set default = "EBICglasso" and feed the fitted network model object to the plot to produce a network diagram.

For example, if I want to fit a network model to male observations, I can write the code as follows:

network_male <- estimateNetwork(df %>% 
                                  filter(gender == "Male") %>% 
                                  select(-gender),
                                default = "EBICglasso",
                                corMethod = "spearman")

After running, directly feed the model object to the plot to output the graph:

It is basically completed here, but it is still a bit rough to publish. In fact, different letters (variable names) in our data represent different scales. In fact, a better way is to put the items of each scale together and give a legend to make it clear at a glance, so we have to make some adjustments to the graphics next.

For example, if I want to make an adjustment to the overall layout of the nodes and add a legend for each scale, I can add a group parameter to the code to indicate which scale each node comes from:

This actually looks much better. Of course, if you need a more detailed legend, for example, if I want to know what each node means, I can also use the nodeNames parameter to add the legend of the node:

That's even better, thanks.

But we note that this study on the American Journal of Public Health Research adds labels to the edges. If we want to achieve this effect, we only need to set edge.labels to true, because there are many nodes, and the effect of adding labels is not very good:

Let's look at the author's other two graphs. One is the graph of centrality indices. We only need to feed the model object to centralityPlot to generate the graph:

The other is a graph of edge weight estimation. You only need to feed the model object to bootnet, and the plot can come out:

It can be seen that the effect of our picture is actually better than the original text.

So far, all the results in the original text have been reproduced for everyone.

Covariate Control for Network Analysis

The input data of network analysis itself is actually a correlation matrix. At this time, we want to control covariates such as age, gender, ethnicity, etc. The feasible method is to do regression to obtain residuals, and use the correlation matrix of residuals as the input of the model. The same idea can be used in structural equations. For example, if you want to control covariates with cross-lag, you can use this method .

summary

Today I wrote about the practice of network analysis for you. In fact, there is another piece that is the comparison of network analysis. The problem to be solved is whether several networks are different, or whether certain two sides of the same network are different. I will write about this later.

Guess you like

Origin blog.csdn.net/tm_ggplot2/article/details/127759414