Classic Paper | Lotka's Law: Frequency Distribution of Scientific Productivity

Frequency Distribution of Scientific Productivity

The frequency distribution of scientific productivity (frequency distribution of scientific productivity) isa classic paper published by Alfred J. Lotka in 1926. This paper first revealed the uneven phenomenon of scientists' productivity distribution and proposed the famous "Lotka's law" ” and gave a quantitative description.

In the paper, Lotka found that in a given time period, only a few scientists (with a high frequency distribution) produced a large number of papers or results, while most scientists (lower frequency distribution) will only produce a small number of papers or results. And this distribution pattern takes the form of a power law: the productivity of scientists is negatively correlated with the power law index of their rankings, that is, the higher the ranking of scientists, the higher their productivity.

Specifically, Lotka's law states that the second scientist is approximately 1 4 as productive as the first scientist \dfrac{1}{4}41, the productivity of the third scientist is approximately 1 9 of the first scientist \dfrac{1}{9}91, and so on.

The main contribution of this paper is to reveal the imbalance in the distribution of scientists' productivity and propose a quantitative mathematical model to describe this distribution law, which is of great significance for understanding scientists' scientific research activities, the composition of scientific teams and the distribution of scientific output. , becoming one of the foundations of scientometrics and scientific research evaluation.

  • Source:Journal of the Washington Academy of Sciences, June 19, 1926, Vol. 16, No. 12 (June 19, 1926), p. 317-3
  • Original link: https://www.jstor.org/stable/24529203

text

It would be of interest to determine, if possible, the part which men of different calibre contribute to the progress of science.

It would be interesting, if possible, to ascertain the portion of the contribution which men of different talents have made to the progress of science.

Considering first simple volume of production, a count was made of the number of names, in the decennial index of Chemical Abstracts 1907-1916, against which appeared 1, 2, 3 . . . . entries. Names of firms (e.g. Aktiengesellschaft, etc.) were omitted from reckoning, since they represent the output, not of a single individual, but of an unknown number of persons. The letters A and B of the alphabet only were covered. These were treated both separately and in the aggregate, with the results shown in the table and in figures 1 and 2 below.

First, simply consider the circulation number of journals, and count the number of author names with 1, 2, 3... documents in the index of "Chemical Abstracts" from 1907 to 1916. Since company names (e.g. Aktiengesellschaft, etc.) represent the output of multiple people with an indeterminate number of people, they are excluded from the calculation. Consider only A and B of the alphabet. These data were processed individually and aggregated, and the results are shown in the table and Figures 1 and 2 below.

A similar process was also applied to the name index of Auerbach’s Geschichtstafeln der Physik (J. A. Barth, Leipzig, 1910) which cover the entire range of history up to and including the year 1900. In this case we obtain a measure not merely of volume of productivity, but account is taken, in some degree, also of quality, since only the out standing contributions find a place in this little volume, with its 110 pages of tabular text. The figures and relations thus obtained are shown in the table and in figures 1 and 2.

The same process is applied to the name index in Albach's "Historical List of Physics" (JA Barth, Leipzig, 1910), which covers the entire development of physics from its beginnings to 1900. In this case, we obtain not only a measure of output, but also quality to some extent, since only figures who have made outstanding contributions to physics appear in this book of only 110 pages. in the booklet. Therefore, from these data and relationships, we obtain the results shown in Table 1, Figure 1 and Figure 2.

Table 1. Frequency distribution of scientific productivity
Number of authors n
Number of people who published the specified number of papers Number of papers
percentage
chemical abstracts albach alphabet chemical abstracts albach alphabet
Letter A Letter B A+B Observations Calculated value[1] Observations Calculated value[2]
A B A+B A+B all letters
total 1543 5348 6891 1325
1 890 3,101 3,991 784 57.68 57.98 57.92 56.69 59.17 60.79
2 230 829 1,059 204 14.91 15.5 15.37 15.32 15.4 15.2
3 111 382 493 127 7.19 7.14 7.15 7.12 9.58 6.75
4 58 229 287 50 3.76 4.28 4.16 4.14 3.77 3.8
5 41 143 184 33 2.66 2.67 2.67 2.72 2.49 2.43
6 42 89 131 28 2.72 1.66 1.9 1.92 2.11 1.69
7 20 93 113 19 1.3 1.74 1.64 1.44 1.43 1.24
8 24 61 85 19 1.56 1.14 1.23 1.12 1.43 0.95
9 21 43 64 6 1.36 0.8 0.93 0.9 0.45 0.75
10 15 50 65 7 0.97 0.93 0.94 0.73 0.53 0.61
11 9 32 41 6 0.58 0.6 0.59 0.61 0.45 0.5
12 11 36 47 7 0.71 0.67 0.68 0.52 0.53 0.42
13 6 26 32 4 0.39 0.49 0.46 0.45 0.3 0.36
14 7 21 28 4 0.45 0.39 0.41 0.39 0.3 0.31
15 3 18 21 5 0.19 0.34 0.3 0.34 0.38 0.27
16 4 20 24 3 0.26 0.37 0.35 0.3 0.23 0.24
17 4 14 18 3 0.26 0.26 0.26 0.27 0.23 0.21
18 5 14 19 1 0.32 0.26 0.28 0.24
19 3 14 17 0 0.19 0.26 0.25 0.22
20 6 8 14 0 0.39 0.15 0.2 0.2
21 0 9 9 1 0.17 0.13 0.18
22 2 9 11 3 0.13 0.17 0.16 0.17
23 4 4 8 0 0.26 0.07 0.12 0.15
24 4 4 8 3 0.26 0.07 0.12 0.14
25 0 9 9 2 0.17 0.13 0.13
26 3 6 9 0 0.19 0.11 0.13 0.12
27 1 7 8 1 0.06 0.13 0.12 0.11
28 2 8 10 0 0.13 0.15 0.15 0.11
29 2 6 8 0 0.13 0.11 0.12 0.1
30 2 5 7 1 0.13 0.09 0.1 0.09
31 0 3 3 0 0.06 0.04
32 0 3 3 0 0.06 0.04
33 3 3 6 0 0.19 0.06 0.09
34 1 3 4 1 0.06 0.06 0.06
35 0 0 0 0
36 0 1 1 0 0.02 0.01
37 0 1 1 1 0.02 0.01
38 1 3 4 0 0.06 0.06 0.06
39 0 3 3 0 0.06 0.04
40 1 1 2 0 0 0.02 0.03
41 0 1 1 0 0.02 0.01
42 0 2 2 0 0.04 0.03
43 0 0 0 0
44 0 3 3 0 0.06 0.04
45 0 4 4 0 0.07 0.06
46 1 1 2 0 0.06 0.02 0.03
47 0 3 3 0 0.06 0.04
48 0 0 0 2
49 0 1 1 0.02 0.01
50 1 1 2 0.06 0.02 0.03
51 0 1 1 0.02 0.01
52 0 2 2 0.04 0.03
53 0 2 2 0.04 0.03
54 0 2 2 0.04 0.03
55 2 1 3 0.13 0.02 0.04
56 0 0 0
57 0 1 1 0.02 0.01
58 0 1 1 0.02 0.01
59-60 0 0 0
61 0 2 2 0.04 0.03
62-65 0 0 0
66 0 1 1 0.02 0.01
67 0 0 0
68 0 2 2 0.04 0.03
69-72 0 0 0
73 0 1 1 0.02 0.01
74-77 0 0 0
78 0 1 1 0.02 0.01
79 0 0 0
80 1 0 1 0.06 0.01
81-83 0 0 0
84 0 1 1 0.02 0.01
85-94 0 0 0
95 0 1 1 0.02 0.01
96-106 0 0 0
107 1 0 1 0.06 0.01
108 0 0 0
109 0 1 1 0.02 0.01
110-113 0 0 0
114 0 1 1 0.02 0.01
115-345 0 0 0
346 1 0 1 0.06 0.01

[1]: 根据 f = 56.9 n 1 ∗ 888 得到 f= \dfrac{56.9}{n^{1*888}}得到 f=n188856.9得到

[2]: 根据 f = 600 π 2 n 2 得到 f = \dfrac{600}{\pi^2 n^2}得到 f=π2n2600得到

On plotting the frequencies of persons having made 1, 2, 3 . . . . contributions, against these numbers 1, 2, 3 . . . . of contributions, both variables on a logarithmic scale, it is found that in each case the points are rather closely scattered about an essentially straight line having a slope of approximately two to one. The approach to this ratio is particularly close in the case of the data taken from Auerbach’s tables. Determined by least squares, the slope of the curve to Auerbach’s data, as determined from the first 17 17 17 points1, was found to be 2.021 ± 0.017 2.021 ± 0.017 2.021±0.017. Similarly, the slope for the data in the Chemical Abstracts, letters A and B jointly, as determined from the first thirty points, came out as 1.888 ± 0.007 1.888 ± 0.007 1.888±0.007. The general formula for the relation thus found to exist between the frequency y of persons making x contributions is

将发表1,2,3…篇论文的作者频数y,对这些论文数量1,2,3…x进行绘图,两变量均采用对数坐标,发现在每种情况下,数据点相对密集地散布在近似直线上,该直线的斜率大约为2比1。在从阿尔巴赫的表格中提取的数据中,特别接近这一比率。通过最小二乘法确定,根据阿尔巴赫的数据,从前17个数据点1得到的曲线斜率为 2.021 ± 0.017 2.021 ± 0.017 2.021±0.017。类似地,从前30个数据点确定的《化学文摘》中的数据,包括字母A和B,得到的斜率为 1.888 ± 0.007 1.888 ± 0.007 1.888±0.007。这样找到作者发表 x x x 篇论文的频数 y y y x x x 之间关系的一般公式为:
x n y = c o n s t x^n y= const xny=const

For the special case that n = 2 n = 2 n=2 (inverse square law of scientific productivity) the value of the constant in ( 1 ) (1) (1) is found as follows:

对于特殊情况,当 n = 2 n=2 n=2时 (科学产出的反平方定律),公式 ( 1 ) (1) (1) 中的常数值可通过如下过程确定:
y 1 = c 1 2 y_1 = \frac{c}{1^2} y1=12c

y 2 = c 2 2 y_2 = \frac{c}{2^2} y2=22c
y n = c n 2 y_n = \frac{c}{n^2} yn=n2c

∑ 1 ∞ y = c ( 1 1 2 + 1 2 2 + 1 3 2 + … … ) = c ∑ 1 ∞ 1 x 2 = c π 2 6 c = 6 π 2 ∑ 1 ∞ y \begin{align*} \sum^{∞}_{1}y &= c (\frac{1}{1^2} + \frac{1}{2^2} + \frac{1}{3^2}+……)\tag{5} \\&= c\sum^{∞}_{1}\frac{1}{x^2}\tag{6} \\&= c\frac{\pi^2}{6}\tag{7} \\ c &= \frac{6}{\pi^2}\sum^{∞}_{1}y\tag{8} \end{align*} 1yc=c(121+221+321+……)=c1x21=c6π2=π261y(5)(6)(7)(8)

But, since y y y is a frequency, the summation ∑ 1 ∞ y \sum^{∞}_{1}y 1y gives unity.

Then finally

但是,由于 y y y 是一个频率,所以 ∑ 1 ∞ y = 1 \sum^{∞}_{1}y = 1 1y=1

因此,最后
c = 6 π 2 = 6 9.87 = 0.6079    o r    60.79    p e r   c e n t \begin{align*} c &= \frac{6}{\pi^2}\tag{9} \\&= \frac{6}{9.87}\tag{10} \\&= 0.6079~~or ~~60.79 ~~per ~cent\tag{11} \end{align*} c=π26=9.876=0.6079  or  60.79  per cent(9)(10)(11)

Thus, according to the inverse square law, the proportion of all contributors who contribute a single item should be just over 60 60 60 per cent. In the cases here examined the actual proportion of this class to the whole was 59.2 59.2 59.2 per cent in Auerbach’s data ( 1325 1325 1325 contributors), 57.7 57.7 57.7 per cent in the Chemical Abstracts under initial A A A ( 1543 1543 1543 contributors) 57.98 57.98 57.98 under letter B B B ( 5348 5348 5348 contributors) and 57.9 57.9 57.9 under letters A A A and B B B jointly ( 6891 6891 6891 contributors).

因此,根据反平方定律,贡献一篇文献的作者所占全部作者的比例应略高于 60 60% 60。在这里所研究的案例中,这个类别相对于整体的实际比例是:阿尔巴赫的数据中是 59.2 % 59.2\% 59.2%(1325名作者),《化学文摘》中以 A A A字母开头的是 57.7 % 57.7\% 57.7%(1543名作者),以 B B B字母开头的是 57.98 % 57.98\% 57.98%(5348名作者),首字母为 A A A B B B开头的合计比例是 57.9 % 57.9\% 57.9% (6891名作者)。

在这里插入图片描述

Fig. 1.—Frequency diagram showing per cent of authors mentioned once, twice, etc., in Auerbach’s Geschichtstafeln der Physik, entire alphabet, and in the decennial index of Chemical Abstracts 1907-1916, letters A and B. The dotted line indicates frequencies computed according to the inverse square law

图1. 频数分布图显示在阿尔巴赫编写的《物理学历史一览表》全本字母表中以及1907-1916年十年期间《化学文摘》字母A和B部分中,被提及一次、两次等的作者所占百分比。点线表示根据反平方定律计算出的频数

在这里插入图片描述

Fig. 2.—Logarithmic frequency diagram showing number of authors mentioned once, twice, etc., in Auerbach’s tables (points indicated by crosses), and in Chemical Abstracts, letters A and B (points indicated by circles). The fully drawn line indicates points given by inverse square law, exponent = 2; the line of dashes corresponds to exponent 1.89.

图 2. 对数坐标频数分布图,显示在Auerbach表中被提及一次、两次等的作者数量(叉号表示)以及在《化学文摘》字母A和B部分中被提及的作者数量(圆圈表示)。实线表示依据幂指数为2的反平方定律计算的点,虚线对应幂指数1.89。

Frequency distributions of the general type$ (1)$ have a wide range of applicability to a variety of phenomena,2 and the mere form of such a distribution throws little or no light on the underlying physical relations.3 The fact that the exponent has, in the examples shown, approximately the value 2 enables us to state the result in the following simple form:

形如公式 ( 1 ) (1) (1)的一般形式频数分布对各种现象都有广泛的适用性,仅从这样分布的形式很难看出其潜在的物理关系。在所示的例子中,指数的大约值为2,使我们能够用以下简单形式陈述结果:

In the cases examined it is found that the number of persons making 2 contributions is about one-fourth of those making one; the number making 3 contributions is about one-ninth, etc. ; the number making n n n contributions is about 1 n 2 \dfrac{1}{n^2} n21 of those making one;4 and the proportion, of all contributors, that make a single contribution, is about 60 60 60 per cent.

在所研究的案例中发现,做出两项贡献的人数约为做出一项贡献人数的四分之一;做出三项贡献的人数约为做出一项贡献人数的九分之一,依此类推;做出 n n n 项贡献的人数约为做出一项贡献人数的 1 n 2 \dfrac{1}{n^2} n214而所有贡献者中只做出单次贡献的比例约为60%。

The fact that two such widely different sources as Chemical Abstracts (listing practically all current work in chemistry over a ten year period) and Auerbach’s tables (listing selected important contributions only, in physics, for all historical time) give very similar results, seems somewhat remarkable. It would be interesting to extend this study to such a work as Darmstaedter’s Handbuch der Geschichte der Naturwissenschaften und der Technik. Unfortunately the index of this work does not indicate multiple entries of the same year under one author’s name, but distinguishes only separately dated entries. It would therefore be necessary in each case to refer to the text. On the other hand the work could be abridged by restricting the inquiry to one or two letters of the alphabet, as was here done in the case of the Chemical Abstracts.

值得注意的是,《化学文摘》(涵盖了近十年的化学研究成果)和阿尔巴赫编写的《物理学历史一览表》(仅列出了物理学整个历史时期的重要贡献),引人注目的是,这两个截然不同的数据源给出了非常相似的结果。将这项研究扩展到《Darmstaedter’s Handbuch der Geschichte der Naturwissenschaften und der Technik》等作品将是很有意思的。不幸的是,该作品的索引并未显示同一作者名下同一年的多个条目,而仅区分了不同日期的条目。因此,每种情况下都需要参考正文内容。另一方面,可以通过将研究范围限制在字母表中的一个或两个字母上来简化工作,就像在《化学文摘》的案例中所做的那样。


  1. Beyond this point fluctuations become excessive owing to the limited number of persons in the sample. ↩︎ ↩︎

  2. Compare especially Cohrado Gini, Biblioteca dell’ Economista, ser. 5a, 20: Indici di concentrazione e di dipendenza. See also the Report of Commission of Housing and Regional Planning, State of New York, Jan. 11, 1926 : 59-73; and Income in the United States, by W. I. King and others; 2: 344 et seq. 1922. ↩︎

  3. C. J. Willis’ conclusions regarding the mechanism of evolution, inferred as they are from the occurrence of curves of this type in the relation between numbers of species and genera, seem for this reason to carry little conviction. See A. J. Lotka, Physical Biology: 311. 1925. ↩︎

  4. Fortunately, however, there are somewhat more persons of very great productivity than would be expected under this simple law. The very high figures (e.g., Abderhalden, 346 contributions in ten years) should perhaps be considered separately, since they are not the product of one person unassisted. Joint contributions have in all cases been credited to the senior author on ↩︎ ↩︎

Guess you like

Origin blog.csdn.net/YuvalNoah/article/details/131797006