Nature Communications: Isn't there universality in scale-free networks? This article uses advanced statistical methods to tell you

Scale-free networks are rare

The rarity of strong scale-free networks in real networks

Authors:

Anna D. Broido, Aaron Clauset (University of Colorado Boulder)

Click here to get the "paper link"

010201.png

Preface

Networks in the real world usually claim to be scale-free, which means that the node degree k follows a power-law distribution k^-α, which has a very broad impact on the structure and dynamics of complex systems. However, the universality of scale-free networks is controversial.

In this paper, different scales of scale-free networks are defined and applied to nearly 1,000 social, biological, technological, transportation, and information networks, and their inexperience is strictly tested. Among these networks, this article finds that empirically speaking, network structures with a high degree of scale-free nature are rare, and for most networks, the ability of lognormal distribution to fit data is as good as power law distribution. Sometimes even better. Moreover, experiments have shown that social networks are weakly scale-free at best, while only a few technical and biological networks are scale-free.

Background introduction

The network is a powerful method to express and study the structure of complex systems. The social interaction between individuals, the interaction of proteins or genes in biological organisms, the communication between digital computers, and various transportation systems are all examples of the network as a research tool.

In the entire field of science and the network category, it is common to encounter this statement: most or all real-world networks are scale-free. Generally, if the node degree k follows a power-law distribution k^-α, where α>1, such a network is considered to be a scale-free network. Of course, there are some versions that require more requirements, such as 2<α<3 or the evolution of the node degree conforms to the priority connection mechanism. The research and application of scale-free networks are very common in network science. Many studies have investigated how the existence of scale-free structures affects dynamic operations on the network. Scale-free networks are also widely used as network-based values. The basis of simulation and experimentation.

However, the universality of scale-free networks is still controversial. Many studies have shown support for its universality, but there are still many statistical or theoretical challenges. The reasons leading to the existence of these conflicting views include: previous work usually relied on smaller, domain-specific data sets; less strict statistical methods; different definitions of "scale-free" structures and unclear which ones can be measured Standards for the existence of scale-free networks and so on. In addition, there are few studies that rigorously compare the fitted power-law distribution with alternative scale-free distributions (such as the normal distribution or the stretched exponential distribution).

In order to resolve these conflicts, this paper conducts rigorous testing and resolves the universality of scale-free networks by applying the most advanced statistical methods to a large number of different real networks. In order to clearly cover the various changes in the definition of scale-free networks in previous studies, this article formalizes a set of quantitative standards that represent the different strengths of the scale-free structure in a particular network. For each network data set in the corpus, we estimate the most suitable power-law distribution model, test its statistical reasonableness, and compare it with other non-scale-free distributions. Finally, we analyze these results, consider how the evidence of the scale-free structure varies between domains, and quantitatively evaluate its robustness under several alternative standards. Finally, we give some suggestions for the discussion of subsequent research and the development of the future network structure.

experiment

• Prepare

The corpus used in this article consists of 928 network data sets. These data sets come from ICON, covering networks in the fields of biology, information, society, technology, and transportation, ranging from hundreds of nodes to millions of nodes.

The figure below shows the function between the average degree of the data set and the number of nodes n. In order to determine which degree distributions are available, this article first applies a series of graph transformations to convert a given network data set into a set of simple graphs, each of which can clearly test the scale-free structure. In this process, simple graphs that are too dense or sparse under the pre-specified threshold will be discarded. Then, standard statistical methods are applied to each simple graph to define the best-fit power-law distribution of the right-tailed distribution, and the goodness-of-fit test is used to evaluate its statistical reasonableness. At the same time, the likelihood ratio test is used to determine the four alternatives The distribution is fitted to the same part of the right tail for comparison.

010202.png

• Definition of scale-free network

A distribution is defined as a scale-free structure usually in the following two situations:

(I) The power-law distribution is not necessarily the best model of the degree distribution, but it is relatively better than the alternative distribution;

(Ii) The power-law distribution itself is the best model of the best degree distribution.

For the first case, it can be classified as:

1. Super weak: For at least 50% of the graphs, no other distribution is better than the power law distribution.

For the second case, it can be classified into the following three types:

1. The weakest: for at least 50% of the graphs, the power law distribution cannot be rejected ( ≧ 0.1);

2. Weak: On the weakest requirement, the power-law distribution area contains at least 50 nodes;

3. Strong: On the weak and weakest requirements, satisfy 2<αˆ<3 for at least 50% of the graphs;

4. The strongest: at least 90% of the graphics meet the requirement of "strong", and at least 95% of the graphics meet the requirement of "super weak".

There are also networks that do not fall into any of the above categories:

1. Non-scale-free: neither the super weak nor the weakest network.

The following figure shows the scale-free division described above:

010203.png

• Scaling parameters

In the entire corpus, the distribution parameter αˆ of the median estimation scaling parameter is concentrated on αˆ=2, but has a long right tail, so there are 32% of the data sets αˆ≧3.

As can be seen from the figure below, the range of α ∈ (2, 3) is the most representative range that includes the features of the scale-free network. Among the five types of scale-free structures, the distribution of median αˆ is very different. For networks belonging to the "ultra-weak" category, the breadth of the distribution is similar to the overall distribution. The right tail is long and many networks have αˆ≧3, indicating that they It is not a particularly reasonable scale-free network. In the “weakest” and “weak” scale-free scales, the median αˆ is still widely distributed, but in the “strong” and “strongest” categories, it is basically concentrated in αˆ∈(2, 3).

010204.jpg

• Alternative distribution

This paper analyzes the power-law distribution and four alternative distributions through the likelihood ratio test. The analysis results are shown in the following table. For the exponential distribution, it has a weaker tail and relatively low variance. Compared with the power-law distribution, which has a support rate of 33%, it has a support rate of 41%. This result is in line with the wide distribution of scaling parameters because When α≧3, the degree distribution must have a relatively thin tail. The normal logarithmic distribution is a wide and heavy-tailed distribution, but it is still not scale-free. The table shows that its own support rate (48%) is more than three times that of the power law distribution (12%), and it also has a very large uncertainty (40%). In other words, in fact, the normal logarithmic distribution fits at least the power law of most degree distributions (88%) very well, which indicates that many previously determined scale-free networks are actually likely to be normal logarithmic distributions. . Weibull or stretched exponential distribution can produce heavy or fine tailed distribution, similar to exponential distribution. The results of the last exponential cut-off power law distribution show that most networks (56%) tend to use power laws with cut-off models, which indicates that finite-scale effects are very common.

010205.png

• Evaluate the scale-free hypothesis

Given the results of fitting, testing, and comparing power-law distributions on the network, we now classify each category according to the above six categories. It can be seen from the figure below that 49% of the networks are not scale-free networks, 46% belong to the ultra-weak category, and only 10% and 4% of the network data sets can be classified into the strong and super-strong categories. This experimental result indicates that the true scale-free structure may not be as common as suggested in previous work, and the scale-free structure may not be a general model in the experiment.

010206.png

Of course, these data will also change with the data in different fields. The specific results are shown in the figure below. The main focus of the analysis in this article is the network in the three specific fields of biology, society, and technology. Among biological networks, 63% do not belong to the scale-free category. In this category, fungal networks account for a very large part, including some protein interaction networks and some food networks. Among the remaining networks, 6% showed the strongest, and this part was mainly the metabolic network. In contrast, social networks show different phenomena. It does not have strong and strongest categories. Therefore, social networks can only be weak and scale-free at best. Ninety percent of technical networks exhibit ultra-weak properties, and 28% belong to the strong category.

010207.png

• Robustness analysis

To evaluate the dependence of these results on the evaluation program itself, we conducted a series of robustness tests. These tests mainly include:

(I) Consider only simple data sets (no weight, no direction, no heavy edges, single chain);

(Ii) We delete the cut-off power-law distribution from the alternative distribution;

(Iii) We lower the percentage thresholds of all categories so that any simple graph can be included if it meets the requirements;

(Iv) Analyze the scaling behavior of the first and second distance ratios of the degree distribution.

The third test result is shown in the figure below:

010208.png

It can be seen from the figure that after considering the "most loose" parameterization, the threshold of each category is lowered. Under this modification, the strongest and strongest are both 18%. These test results show that in the category definition of the main evaluation plan The percentage requirement used is not too strict, and our conclusions are reliable for the change in the threshold used in the evaluation. The fourth test provides an assessment of model independence for the key predictions of the scale-free hypothesis. Since the moment <k^m> is finite on m<α-1, and all higher-order moments diverge gradually, so In the range of α ∈ (2, 3), the torque ratio <k^2>/<k>^2 diverges as the network size n increases.

The fourth test result is shown in the figure below":

010209.jpg

As can be seen from the figure, there is a huge difference between the ratios across networks, domains, and scales. For example, the ratio of 10^2≦n≦10^3 is often several orders of magnitude larger than the network.

discuss

This article uses statistics and classification ideas in the evaluation of the scale-free hypothesis to provide a quantitative and rigorous division method to evaluate the degree to which a scale-free structure is displayed in some networks.

By evaluating the degree distribution of nearly 1,000 real-world networks from various fields, we found that scale-free networks are not ubiquitous. Only less than 36 networks (approximately 4%) show the super-strong nature of scale-free structures. In 88% of the networks, the log-normal fit distribution is even better than the power-law distribution. In different fields, the proportion of scale-free structure is usually different. These differences provide hints for where the scale-free structure may actually appear.

In contrast, we found that social networks are weakly scale-free at best. Although power-law distributions are statistically reasonable models of these networks, they are not the best models. At the same time, the statistical evaluation in this paper only considers the degree distribution of the network, and there is relatively little description of other structural models or basic processes that control any specific network form.

The structural diversity of the real network revealed in this paper is both a problem and an opportunity. The extensive attention to the interpretation and use of scale-free models in previous work shows that relatively little understanding of the mechanisms that produce non-scale-free structural patterns is needed. The novel mechanism for generating a more realistic network structure in the network has become the main work direction in the future.

(The pictures in this article are all from the screenshots of the paper)


Author | Wang Jianjia (Shanghai University)

Typography | Academic Spinach

校 审 | Felan

Responsible Editor | Academic slag and excellent academic


Past review:

NeurIPS 2019 | BERT New Transformation: Pre-training for Visual Basics

No way out of PhD? A picture tells you 15 ways out after graduating from PhD

Which computer major is better than the four world university rankings such as THE and QS?

 

Guess you like

Origin blog.csdn.net/AMiner2006/article/details/103805207