Computational Social Science-"Science" article translation

By David Lazer, Alex Pentland, Lada Adamic, Sinan Aral, Albert-László Barabási, Devon Brewer, Nicholas Christakis, Noshir Contractor, James Fowler, Myron Gutmann, Tony Jebara, Gary King, Michael Macy, Deb Roy, Marshall Van Alstyne

Translation: Xu Xiaoke ( [email protected] )

We live in various networks. We regularly check emails, call mobile phones everywhere, swipe cards to take transportation, and use credit cards to purchase goods. In public places, there may be monitors to monitor our behavior, and in hospitals, our medical records are kept in digital form. We are also likely to write a blog for everyone to maintain friendship through online social networks. All of the above things have left us with digital footprints. These traces converge to become a complex picture of individual and collective behavior. At the same time, these traces may also change our understanding of life, organization and society.

Although the ability to collect and analyze massive amounts of data has changed some fields such as biology and physics, the research on data-driven "computational sociology" has been slow. Although important journals in economics, sociology, and political science rarely pay attention to this field, computational sociology has begun to be studied in international companies such as Google, Yahoo, and the US Department of Security. Computational sociology is either a proprietary research field of private companies and government departments; or although some privileged researchers use private data to publish papers, these data cannot be evaluated and copied by others. The above scenarios are undoubtedly not helpful to the public's long-term interests in knowledge accumulation, verification and distribution.

Based on an open academic environment, what is the value of computational sociology? Can it enhance society's understanding of individual and collective behavior? What are the obstacles to the development of computational sociology?

So far, research on human relations has mainly relied on one-off, self-reported data. New technologies, such as video surveillance (1) , e-mail, and "smart" name tags, not only provide interactive relationships at different times over time, but also provide structure and content relationship information. For example, the interaction between groups can be studied using e-mail data. Questions about the dynamic characteristics of people ’s communication over time can also be investigated: whether the working group has stabilized and rarely changed, or their relationship has changed dramatically over time Change (2) ? What kind of interaction mode corresponds to prolific groups and individuals (3) ? Face-to-face group communication can be assessed by "social measurement", and electronic devices can be worn by people to capture people's physical intimacy, location, movement, and various other individual behaviors and collective interactions. These data help to solve many interesting problems, such as the close relationship and communication mode within an organization, and the information flow mode of individuals or groups with outstanding performance (4) .

We can also understand the social "macro" social network information (5) and how it evolves over time. Telephone companies have a record of the call patterns between their customers for several years, and e-commerce portals like Google and Yahoo have instant messaging data that customers exchange with each other. Can these data depict a complex picture of social communication patterns? Which of these interactive activities will affect economic productivity or public health? In any case, it is now very simple to track human activities (6) . Mobile phones provide a large-scale method for tracking whether people are physically and physically intimate for a long time (7) . These data may provide useful epidemiological insights: such as how a pathogen, like a cold virus, spreads among people through physical contact.

The Internet provides a completely different way to understand what people are saying and how people are connected together (8) . For example, in this political season just past, as long as you track the spread of arguments, rumors, political opinions, and other clues in the blog space (9) , as well as individual “surfing” behavior on the Internet (10) , every voter It's clear what exactly cares . The virtual world can naturally record everyone's behavior completely, which also provides more possibilities for research-many experiments are impossible and unacceptable in reality (11) . Similarly, social network online sites provide a unique way to understand the impact of a person's position in the network on the entire organization, from their feelings to their emotions and health (12) . Natural language processing has begun to continuously enhance the ability to organize and analyze large amounts of textual materials on the Internet and other sources (13) .

In short, computational sociology is continuously leveraging the breadth, depth, and breadth of data we collect and analyze in an unprecedented way. However, obstacles that are not easily overcome have affected this process. Currently existing methods cannot deal with the interrelationships and positions of the entire human individual that are changing over trillions of moments. For example, the current social network theory is often established through data obtained from a "snapshot" of dozens of people. How can it tell us the interrelationship between various information about millions of people, which includes Data on the location, business transactions and daily communication of these people. These large numbers of people-to-people interaction data can quantitatively provide new perspectives on human collective behavior, but our current research framework cannot process these data.

 

 

 

uploading.4e448015.gifUnsuccessful transfer and re-upload cancel

Data obtained from the blog space. The picture above shows the link structure between political blog communities (since 2004). The red line represents conservative blogs, the blue line represents liberal blogs; the orange line represents liberals connected to conservatives, and the purple line represents conservatives connected to liberals. The size of each blog reflects the number of other blogs connected to it. [Reprinted from Document 8 with permission of the Computing Machinery Association]

There are also institutional barriers to prevent the advancement of computational sociology. From the perspective of the path, the problems explored in physics and biology are more suitable for observation and interference. In the process of discovery, neither quarks nor cells mind us revealing their secrets, nor resist us from changing their environment. For the basic structure, the gap between sociology and computational sociology is much larger than that between biology and computational biology. The main reason is that computational sociology requires distributed monitoring, tracking permission, and coding. There are almost no resources available in sociology. Even from the perspective of physical distance and management form, the difference between the sociology department and the engineering or computing department is much larger than other sciences.

Perhaps the most painful challenge is how to ensure that data can be obtained while protecting personal privacy. Many data are owned (such as mobile phone data and business transaction information, etc.). The confusion caused by AOL's disclosure of many of its customers' "anonymized" search records highlights the potential risk of individuals or companies sharing private data through private companies (14) . A robust model of cooperation and data sharing between industry and academia is necessary to promote research, protect personal privacy, and provide protection for companies. More generally, handling privacy issues just right is the most basic. A recent report by the National Research Council of the United States on geographic information systems specifically pointed out that they may frequently remove personal features and carefully anonymize data (15) . Last year, the US National Health Service and The Wellcome Trust suddenly removed some of the online access functions of gene databases (16) . These data appear to have been anonymized, and only report the overall frequency of certain genetic markers. However, research shows that, statistically, if all the data of all individuals in the database is used, it is still possible to reconfirm the identity of individuals (17) .

Because a small incident that violates privacy protection will result in a system and legal provisions that stifle the new computing sociology, self-adjusting systems related to procedures, technology and rules must be established to reduce risk and protect Potential research. As the cornerstone of the self-adjustment system, the American Agency Review Boards (IRBs) must enhance their scientific and technological knowledge to understand the potential factors of intrusion and harm to individuals, because the new possibilities can no longer be judged by their current examples of harm. Many IRBs have difficulty assessing the possibility of de-anonymization of complex data. Moreover, IRBs may need to check whether it is necessary to establish an organization focused on protecting data security. At present, the existing data is spread in many organizations, and these organizations' understanding and processing methods for data security are uneven. Researchers must develop data to protect personal privacy while retaining data for research. At the same time, these systems in turn may also help protect customer privacy and data security for industry (18) .

Finally, the development of computational sociology is also closely related to other emerging interdisciplinary subjects (like the science of sustainable development), which requires the development of a way to train new scholars. The committees and editorial departments that determine the authority of professors need to understand and reward interdisciplinary efforts. Initially, computational sociology required the work of sociologists and computational scientists. In the long run, this issue will depend on the academic community's decision whether to train computational sociologists, or a team of econometric sociologists and social literature metrologists. The emergence of cognitive science provides a good example for the development of computational sociology. The areas covered by cognitive science include biology, philosophy, and computational science. It has attracted a lot of resources to create a common area, and has made a great contribution to the public goods of the past generation. We believe that computational sociology has similar potential and deserves similar investment.

 

References and Notes

1. D. Roy et al., “The Human Speech Project,” Proceedings of the 28th Annual Conference of Cognitive Science Society, Vancouver, BC, Canada, 26 to 29 July 2009.

2. J. P. Eckmann et al. Proc. Natl. Acad. Sci. U.S.A. 101, 14333 (2004).

3. S. Aral, M. Van Alstyne, “Network Structure & Information Advantage,” Proceedings of the Academy of Management Conference, Philadelphia, PA, 3 to 8 August 2007.

4. A. Pentland, Honest Signals: How They Shape Our World(MIT Press, Cambridge, MA, 2008).

5. J.-P. Onnela et al., Proc. Natl. Acad. Sci. U.S.A. 104,7332 (2007).

6. T. Jebara, Y. Song, K. Thadani, “Spectral Clustering and Embedding with Hidden Markov Models,” Proceedings of the European Conference on Machine Learning, Philadelphia, PA, 3 to 6 December 2007.

7. MC González et al., Nature 453, 779 (2008).

8. D. Watts, Nature 445, 489 (2007).

9. L. Adamic, N. Glance, in Proceedings of the 3rd International Workshop on Link Discovery (LINKDD 2005), pp. 36–43; http://doi.acm.org/10.1145/1134271.1134277.

10. J. Teevan, ACM Trans. Inform. Syst. 26, 1 (2008).

11. W. S. Bainbridge, Science 317, 472 (2007).

12. K. Lewis et al., Social Networks 30, 330 (2008).

13. C. Cardie, J. Wilkerson, J. Inf. Technol. Polit. 5, 1 (2008).

14. M. Barbarao, T. Zeller Jr., “A face is exposed for AOL searcher No. 4417749,” New York Times, 9 August 2006, p. A1.

15. National Research Council, Putting People on the Map: Protecting Confidentiality with Linked Social-Spatial Data, M. P. Gutmann, P. Stern, Eds. (National Academy Press, Washington, DC, 2007).

16. J. Felch. “DNA databases blocked from the public,” Los Angeles Times, 29 August 2008, p. A31.

17. N. Homer, S. Szelinger, M. Redman, D. Duggan, W. Tembe, PLoS Genet. 4, e1000167 (2008).

18. M.V.A. has applied for a patent on an algorithm for protecting privacy of communication content.

19. Additional resources in computational social science can be found in the supporting online material.

Published 150 original articles · praised 149 · 810,000 views

Guess you like

Origin blog.csdn.net/chaishen10000/article/details/105285430