【持续更新】multi-layer graph多层关系网络数据集

表格

名字 点数1 边数 层数2 权重3
ratings_data 49290 + 139738 664823 5 Unweighted
jester_dataset_1_1 24983 + 100 2498300 -10.00 to +10.00 Unweighted
jester_dataset_1_2 23500 + 100 2350000 -10.00 to +10.00 Unweighted
jester_dataset_1_3 24930 + 100 2493000 -10.00 to +10.00 Unweighted
jester_dataset_2 63978 + ~150 1761439 -10.00 to +10.00 Unweighted
jester_dataset_3 66336 + 151 10016736 -10.00 to +10.00 Unweighted
BX-CSV-Dump4 278858(276271) + 271380 >1048576(Excel error) 10
201305.relationship5 ~200000 310150 4 Unweighted
201306.relationship ~200000 310150 4 Unweighted
201307.relationship ~200000 368048 4 Unweighted
201308.relationship ~200000 368048 4 Unweighted
201309.relationship ~200000 368048 4 Unweighted
201310.relationship ~200000 366566 4 Unweighted
201311.relationship ~200000 348878 4 Unweighted
201312.relationship ~200000 396030 4 Unweighted
Vickers-Chan-7thGraders_Multiplex_Social 29 29*3 3
Padgett-Florence-Families_Multiplex_Social 16 35 2
Lazega-Law-Firm_Multiplex_Social 71 2571 3 Unweighted
Krackhardt-High-Tech_Multiplex_Social 21 312 3
Kapferer-Tailor-Shop_Multiplex_Social 39 1018 4 Unweighted
CKM-Physicians-Innovation_Multiplex_Social 246 1551 3 Unweighted
CS-Aarhus_Multiplex_Social 61 620 5
EUAir_Multiplex_Transport 249 3588 37
London_Multiplex_Transport 323 441 3
NYClimateMarch2014_Multiplex_Social 102439 353495 3 Weighted
Cannes2013_Multiplex_Social 438537 991854 3 Weighted
MoscowAthletics2013_Multiplex_Social 88804 210250 3 Weighted
MLKing2013_Multiplex_Social 327707 396671 3 Weighted
ObamaInIsrael2013_Multiplex_Social 2281259 4061960 3 Weighted
Arabidopsis_Multiplex_Genetic 6980 18654 7 Unweighted
Homo_Multiplex_Genetic 18222 170899 7 Unweighted
AMiner-Coauthor ~1712433 4258615 1 Weighted
wikitree 1382751 9192212 Unweighted
coauthor 1629217

下面所有的说明摘自原网页说明,每个数据说明最下面为该数据集的说明网页和下载网页(如果两者不是同一个网页则会分开写)

ratings_data

The dataset was collected by Paolo Massa in a 5-week crawl (November/December 2003) from the Epinions.com Web site.

The dataset contains

  • 49,290 users who rated a total of
  • 139,738 different items at least once, writing
  • 664,824 reviews and
  • 487,181 issued trust statements.
    Users and Items are represented by anonimized numeric identifiers.

The dataset consists of 2 files.

it contains the ratings given by users to items.

Every line has the following format:

user_id item_id rating_value

For example,

23 387 5

represents the fact “user 23 has rated item 387 as 5”

Ranges:

user_id is in [1,49290]

item_id is in [1,139738]

rating_value is in [1,5]

http://www.trustlet.org/downloaded_epinions.html

http://www.trustlet.org/datasets/downloaded_epinions/

jester_dataset

Anonymous Ratings from the Jester Online Joke Recommender System

Dataset 1: Over 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003.

Dataset 2: Over 1.7 million continuous ratings (-10.00 to +10.00) of 150 jokes from 59,132 users: collected between November 2006 - May 2009.

Dataset 2+: An updated version of Dataset 2 with over 500,000 new ratings from 79,681 total users: data collected from November 2006 - Nov 2012

Freely available for research use when acknowledged with the following reference:

Eigentaste: A Constant Time Collaborative Filtering Algorithm. Ken Goldberg, Theresa Roeder, Dhruv Gupta, and Chris Perkins. Information Retrieval, 4(2), 133-151. July 2001.

As a courtesy, if you use the data, I would appreciate knowing your name, what research group you are in, and the publications that may result.

Dataset 1

Over 4.1 million continuous ratings (-10.00 to +10.00) of 100 jokes from 73,421 users: collected between April 1999 - May 2003

Save to disk, then unzip to obtain Excel files:

  • jester_dataset_1_1.zip: (3.9MB) Data from 24,983 users who have rated 36 or more jokes, a matrix with dimensions 24983 X 101.
  • jester_dataset_1_2.zip: (3.6MB) Data from 23,500 users who have rated 36 or more jokes, a matrix with dimensions 23500 X 101.
  • jester_dataset_1_3.zip: (2.1MB) Data from 24,938 users who have rated between 15 and 35 jokes, a matrix with dimensions 24,938 X 101.

Format:

  1. 3 Data files contain anonymous ratings data from 73,421 users.
  2. Data files are in .zip format, when unzipped, they are in Excel (.xls) format
  3. Ratings are real values ranging from -10.00 to +10.00 (the value “99” corresponds to “null” = “not rated”).
  4. One row per user
  5. The first column gives the number of jokes rated by that user. The next 100 columns give the ratings for jokes 01 - 100.
  6. The sub-matrix including only columns {5, 7, 8, 13, 15, 16, 17, 18, 19, 20} is dense. Almost all users have rated those jokes (see discussion of “universal queries” in the above paper).

The text of the jokes can be downloaded here: jester_dataset_1_joke_texts.zip (92KB)

Format:

  1. 100 files
  2. Each file has title init_.html, where _ is 1 to 100
  3. The titles correspond to the ID’s of the jokes in the Excel files above

Dataset 2

Over 1.7 million continuous ratings (-10.00 to +10.00) of 150 jokes from 59,132 users: collected between November 2006 - May 2009
Save to disk, then unzip: jester_dataset_2.zip (7.7MB)

Format:

  • jester_ratings.dat: Each row is formatted as [User ID] [Item ID] [Rating]
  • jester_items.dat: Maps item ID’s to jokes

Note that the ratings are real values ranging from -10.00 to +10.00. As of May 2009, the jokes {7, 8, 13, 15, 16, 17, 18, 19} are the “gauge set” (as discussed in the Eigentaste paper) and the jokes {1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 14, 20, 27, 31, 43, 51, 52, 61, 73, 80, 100, 116} have been removed (i.e. they are never displayed or rated).

Dataset 2+

An updated version of Dataset 2 with over 500,000 new ratings from 79,681 total users: data collected from November 2006 - Nov 2012
Save to disk, then unzip: jester_dataset_2+.zip (5.1MB)

Format:

  • In this dataset we stripped out users that did not respond to the gauge set of question. The data is formated as an excel file representing a 66336 x 151 matrix with rows as users and columns as jokes.
  • 10 of the jokes don’t have ratings, their ids are: { 1, 2, 3, 4, 6, 9, 10, 11, 12, 14 }.
  • Each rating is from (-10.00 to +10.00) and 99 corresponds to a null rating (user did not rate that joke).

Note that the ratings are real values ranging from -10.00 to +10.00. As of May 2009, the jokes {7, 8, 13, 15, 16, 17, 18, 19} are the “gauge set” (as discussed in the Eigentaste paper) and the jokes {1, 2, 3, 4, 5, 6, 9, 10, 11, 12, 14, 20, 27, 31, 43, 51, 52, 61, 73, 80, 100, 116} have been removed (i.e. they are never displayed or rated).

http://eigentaste.berkeley.edu/dataset/

BX-CSV-Dump

Book-Crossing Dataset … mined by Cai-Nicolas Ziegler, DBIS Freiburg

Collected by Cai-Nicolas Ziegler in a 4-week crawl (August / September 2004) from the Book-Crossing community with kind permission from Ron Hornbaker, CTO of Humankind Systems. Contains 278,858 users (anonymized but with demographic information) providing 1,149,780 ratings (explicit / implicit) about 271,379 books.

[ ! ] Freely available for research use when acknowledged with the following reference (further details on the dataset are given in this publication):
Improving Recommendation Lists Through Topic Diversification,
Cai-Nicolas Ziegler, Sean M. McNee, Joseph A. Konstan, Georg Lausen; Proceedings of the 14th International World Wide Web Conference (WWW '05), May 10-14, 2005, Chiba, Japan. To appear.

Download: [ PDF Pre-Print ]

As a courtesy, if you use the data, I would appreciate knowing your name, what research group you are in, and the publications that may result.

Format
The Book-Crossing dataset comprises 3 tables.

  • BX-Users
    Contains the users. Note that user IDs (User-ID) have been anonymized and map to integers. Demographic data is provided (Location, Age) if available. Otherwise, these fields contain NULL-values.

  • BX-Books
    Books are identified by their respective ISBN. Invalid ISBNs have already been removed from the dataset. Moreover, some content-based information is given (Book-Title, Book-Author, Year-Of-Publication, Publisher), obtained from Amazon Web Services. Note that in case of several authors, only the first is provided. URLs linking to cover images are also given, appearing in three different flavours (Image-URL-S, Image-URL-M, Image-URL-L), i.e., small, medium, large. These URLs point to the Amazon web site.

  • BX-Book-Ratings
    Contains the book rating information. Ratings (Book-Rating) are either explicit, expressed on a scale from 1-10 (higher values denoting higher appreciation), or implicit, expressed by 0.

http://www2.informatik.uni-freiburg.de/~cziegler/BX/

relationship(链接已失效)

Internet AS-level Topology Archive

Introduction

This site serves as an archive of the historical Internet AS-level topology data for academic research, providing the following features:

  • comprehensively cover publicly available long-term data sources
  • directly and plainly extract AS-to-AS links from BGP raw data (source code available)
  • continually take snapshots in two time scales: daily and monthly
  • clearly separate the topology of IPv6 networks from that of IPv4 networks

In addition to the pure topological data, this site also provides the essential semantic information of topology: AS relationship and prefix origin, powered by Cyclops.

Methodology

The historical AS-level topology data is derived from BGP data collected by Route Views, RIPE RIS, PCH, and Internet2. Here is the list of BGP data collectors. The BGP dataset comprises one RIB file per collector per day and all available updates. A daily (monthly) snapshot consists of AS-to-AS links appearing within that day (month), which is determined by the timestamp on the file name of raw BGP data. The topologies in IPv4 network and IPv6 network are extracted separately. A link is contained in IPv4 (IPv6) topology if the corresponding AS path is originated from a prefix in an IPv4 (IPv6) address format. In extracting links from raw data, we discard AS-SETs, private ASNs and loop paths. For details, see the comments of our Perl script, which reads “show ip bgp” output directly, reads MRT format data with a modified version of bgpdump, and outputs AS links.

AS relationship data and IPv4 prefix origin data are directly dumped from the database of Cyclops monthly. The method of AS relationship inference is described in our paper. Note that the date on the file name of those two types of data ONLY indicates the date when the dump operation is executed, and is NOT associated with the time when links or prefixes are observed. The links in AS relationship data should be ONLY used as the index of links in topology data.

Data Format

Topology data is represented by undirected graph consisting of AS-to-AS links in a plain text format, where each line is a link ASN1 ASN2 with a convention that ASN1 < ASN2 numerically, \t as field separator, and \n as line separator. ASN is in the asplain format. In a monthly snapshot, the third field FREQ in each line (for each link) is the frequency (number of days) of that link observed within that month.

  • IPv4/v6 daily : {ASN1}\t{ASN2}\n
  • IPv4/v6 monthly: {ASN1}\t{ASN2}\t{FREQ}\n

AS relationship data is represented by bidirected graph, where each AS pair appear twice with their bilateral relationship. In IPv4 prefix origin data, each line denotes a prefix, in CIDR notation, with its original AS.

  • AS relationship: {ASN1}\t{ASN2}\t{p2p|p2c|c2p|unknown}\n

  • IPv4 prefix origin: {PREFIX}\t{ASN}\n
    The data files are compressed by gzip, and the format of path to them is as follows:

  • IPv4 daily : ipv4/daily/YYYY.MM/YYYYMMDD.link.v4.gz

  • IPv4 monthly: ipv4/monthly/YYYYMM.link.v4.gz

  • IPv6 daily : ipv6/daily/YYYY.MM/YYYYMMDD.link.v6.gz

  • IPv6 monthly : ipv6/monthly/YYYYMM.link.v6.gz

  • AS relationship: YYYYMM.relationship.gz

  • IPv4 prefix origin: YYYYMM.origin.gz

Caveats

Before drawing any conclusion from this dataset, please be aware of that this dataset is definitely suffering from the following issues:

  • Incompleteness: Quite a number of links are missing, mainly due to the limited scope of current BGP data collection.
  • Inaccuracy: Some short-life links may not exist actually, as they probably originate from unintentional misconfigurations or intentional trials.
  • Ambiguity: The differences between snapshots may not indicate the actual change of Internet topology, as BGP data collectors may be up and down, and routers peering with those collectors may come and go.

https://irl.cs.ucla.edu/topology/

https://irl.cs.ucla.edu/topology/ipv4/relationship/

Vickers-Chan-7thGraders_Multiplex_Social

The data were collected by Vickers from 29 seventh grade students in a school in Victoria, Australia. Students were asked to nominate their classmates on a number of relations including the following three (layers):

Who do you get on with in the class?
Who are your best friends in the class?
Who would you prefer to work with?

Students 1 through 12 are boys and 13 through 29 are girls.

Ref: M. Vickers and S. Chan - Representing Classroom Social Structure. Melbourne: Victoria Institute of Secondary Education. (1981)

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

Padgett-Florence-Families_Multiplex_Social

PEDGETT FLORENTINE FAMILIES
The multiplex social network consists of 2 layers (marriage alliances and business relationships) describing florentine families in the Renaissance.

Ref: JF Padgett, CK Ansell - “Robust Action and the Rise of the Medici, 1400-1434”. American journal of sociology, 1259-1319 (1993)

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

Lazega-Law-Firm_Multiplex_Social

The multiplex social network consists of 3 kinds of (Co-work, Friendship and Advice) between partners and associates of a corporate law partnership.

Ref: Emmanuel Lazega - “The Collegial Phenomenon: The Social Mechanisms of Cooperation Among Peers in a Corporate Law Partnership”. Oxford University Press (2001)
Ref: Tom A.B. Snijders, Philippa E. Pattison, Garry L. Robins, and Mark S. Handcock - “New specifications for exponential random graph models”. Sociological Methodology (2006), 99-153.

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

Krackhardt-High-Tech_Multiplex_Social

The multiplex social network consists of 3 kinds of relationships (Advice, Friendship and “Reports to”) between managers of a high-tech company.

Ref: D. Krackhardt - “Cognitive social structures”. Social Networks (1987), 9, 104-134

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

Kapferer-Tailor-Shop_Multiplex_Social

Interactions in a tailor shop in Zambia (then Northern Rhodesia) over a period of ten months.
Layers represent two different types of interaction, recorded at two different times (seven months apart) over a period of one month. TI1 and TI2 record the “instrumental” (work- and assistance-related) interactions at the two times; TS1 and TS2 the “sociational” (friendship, socioemotional) interactions.
The data are particularly interesting since an abortive strike occurred after the first set of observations, and a successful strike took place after the second.

Ref: Kapferer B. (1972) - “Strategy and transaction in an African factory”.

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

CKM-Physicians-Innovation_Multiplex_Social

Data collected by Coleman, Katz and Menzel on medical innovation, considering physicians in four towns in Illinois, Peoria, Bloomington, Quincy and Galesburg.
They were concerned with the impact of network ties on the physicians adoption of a new drug, tetracycline. Three sociometric matrices (layers) were generated, based on the following questions:

When you need information or advice about questions of therapy where do you usually turn?
And who are the three or four physicians with whom you most often find yourself discussing cases or therapy in the course of an ordinary week – last week for instance?
Would you tell me the first names of your three friends whom you see most often socially?

Ref: J. Coleman, E. Katz, and H. Menzel.- “The Diffusion of an Innovation Among Physicians”. Sociometry (1957) 20:253-270.

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

CS-Aarhus_Multiplex_Social

The multiplex social network consists of five kinds of online and offline relationships (Facebook, Leisure, Work, Co-authorship, Lunch) between the employees of Computer Science department at Aarhus.

Ref: Matteo Magnani, Barbora Micenkova, Luca Rossi - Combinatorial Analysis of Multiple Networks. arXiv:1303.4986 (2013)

See the official web page.

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

EUAir_Multiplex_Transport

The multilayer network is composed by thirty-seven different layers each one corresponding to a different airline operating in Europe.

Ref: Alessio Cardillo, Jesús Gómez-Gardenes, Massimiliano Zanin, Miguel Romance, David Papo, Francisco del Pozo and Stefano Boccaletti - Emergence of network features from multiplexity. Scientific Reports 3, Article number: 1344 doi:10.1038/srep01344

See the official web page.

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

London_Multiplex_Transport

Data was collected in 2013 from the official website of Transport for London ( https://www.tfl.gov.uk/) and manually cross-checked.
Nodes are train stations in London and edges encode existing routes between stations. Underground, Overground and DLR stations are considered (see https://www.tfl.gov.uk/ for further details). The multiplex network used in the paper makes use of three layers corresponding to:

The aggregation to a single weighted graph of the networks of stations corresponding to each underground line (e.g., District, Circle, etc)
The network of stations connected by Overground
The network of stations connected by DLR

Raw data and geographical coordinates of stations are provided. We also provide the multiplex networks after considering real disruptions occurring in London.

If you use this data you should cite the following paper:

Manlio De Domenico, Albert Solé-Ribalta, Sergio Gómez, and Alex Arenas, “Navigability of interconnected networks under random failures”. PNAS 111, 8351-8356 (2014)

For further details visit https://comunelab.fbk.eu/data.php

http://deim.urv.cat/~alephsys/data.php?tdsourcetag=s_pctim_aiomsg

NYClimateMarch2014_Multiplex_Social

We consider different types of social relationships amoung users, obtained from Twitter during an exceptional event. In this specific dataset we focused on People’s Climate March in 2014.
The multiplex network used in the paper makes use of 3 layers, corresponding to retweet, mentions and replies observed between:
Start: 2014-09-19 00:46:19
End: 2014-09-22 06:56:25

Ref: E. Omodei, M. De Domenico, A. Arenas. - Characterizing interactions in online social networks during exceptional events… Front. Phys. 3, 59 (2015)

Files format: layerID nodeID nodeID weight

3 layers Multiplex

Nodes: 102439

Edges: 353495

https://comunelab.fbk.eu/data.php

Cannes2013_Multiplex_Social

We consider different types of social relationships amoung users, obtained from Twitter during an exceptional event. In this specific dataset we focused on Cannes Film Festival in 2013.
The multiplex network used in the paper makes use of 3 layers, corresponding to retweet, mentions and replies observed between:
Start: 2013-05-06 07:23:49
End: 2013-06-03 05:48:26

Ref: E. Omodei, M. De Domenico, A. Arenas. - Characterizing interactions in online social networks during exceptional events… Front. Phys. 3, 59 (2015)

Files format: layerID nodeID nodeID weight

3 layers Multiplex

Nodes: 438537

Edges: 991854

https://comunelab.fbk.eu/data.php

MoscowAthletics2013_Multiplex_Social

We consider different types of social relationships amoung users, obtained from Twitter during an exceptional event. In this specific dataset we focused on 2013 World Championships in Athletics.
The multiplex network used in the paper makes use of 3 layers, corresponding to retweet, mentions and replies observed between:
Start: 2013-08-05 11:25:46
End: 2013-08-19 14:35:21

Ref: E. Omodei, M. De Domenico, A. Arenas. - Characterizing interactions in online social networks during exceptional events… Front. Phys. 3, 59 (2015)

Files format: layerID nodeID nodeID weight

3 layers Multiplex

Nodes: 88804

Edges: 210250

https://comunelab.fbk.eu/data.php

MLKing2013_Multiplex_Social

We consider different types of social relationships amoung users, obtained from Twitter during an exceptional event. In this specific dataset we focused on 50th aniversary of Marthin Luther King’s speech “I have a dream…” in 2013.
The multiplex network used in the paper makes use of 3 layers, corresponding to retweet, mentions and replies observed between:
Start: 2013-08-25 15:41:36
End: 2013-09-02 10:16:21

Ref: E. Omodei, M. De Domenico, A. Arenas. - Characterizing interactions in online social networks during exceptional events… Front. Phys. 3, 59 (2015)

Files format: layerID nodeID nodeID weight

3 layers Multiplex

Nodes: 327707

Edges: 396671

https://comunelab.fbk.eu/data.php

ObamaInIsrael2013_Multiplex_Social

We consider different types of social relationships amoung users, obtained from Twitter during an exceptional event. In this specific dataset we focused on a visit to Israel by US President Barack Obama in 2013
The multiplex network used in the paper makes use of 3 layers, corresponding to retweet, mentions and replies observed between:
Start: 2013-03-19 16:56:29
End: 2013-04-03 23:24:34

Ref: E. Omodei, M. De Domenico, A. Arenas. - Characterizing interactions in online social networks during exceptional events… Front. Phys. 3, 59 (2015)

Files format: layerID nodeID nodeID weight

3 layers Multiplex

Nodes: 2281259

Edges: 4061960

https://comunelab.fbk.eu/data.php

Arabidopsis_Multiplex_Genetic

We consider different types of genetic interactions for organisms in the Biological General Repository for Interaction Datasets (BioGRID, thebiogrid.org), a public database that archives and disseminates genetic and protein interaction data from humans and model organisms. BioGRID currently includes more than 720,000 interactions that have been curated from both high-throughput data sets and individual focused studies using over 41,000 publications in the primary literature. We use BioGRID 3.2.108 (updated 1 Jan 2014). The present folder concerns arabidopsis thaliana.
The multiplex network used in the paper makes use of the following layers:

Direct interaction
Physical association
Additive genetic interaction defined by inequality
Suppressive genetic interaction defined by inequality
Synthetic genetic interaction defined by inequality
Association
Colocalization

Ref: C. Stark, B. -J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers. - “Biogrid: a general repository for interaction datasets” - Nucleic Acids Research 2006 34 (1) D535–D539

M. De Domenico, V. Nicosia, A. Arenas, and V. Latora - “Structural reducibility of multilayer networks” - Nature Communications 2015 6, 6864

Files format: layerID nodeID nodeID weight
7 layers Multiplex

Nodes: 6980

Edges: 18654

https://comunelab.fbk.eu/data.php

Homo_Multiplex_Genetic

We consider different types of genetic interactions for organisms in the Biological General Repository for Interaction Datasets (BioGRID, thebiogrid.org), a public database that archives and disseminates genetic and protein interaction data from humans and model organisms. BioGRID currently includes more than 720,000 interactions that have been curated from both high-throughput data sets and individual focused studies using over 41,000 publications in the primary literature. We use BioGRID 3.2.108 (updated 1 Jan 2014). The present folder concerns homo sapiens.
The multiplex network used in the paper makes use of the following layers:

Direct interaction
Physical association
Suppressive genetic interaction defined by inequality
Association
Colocalization
Additive genetic interaction defined by inequality
Synthetic genetic interaction defined by inequality

Ref: C. Stark, B. -J. Breitkreutz, T. Reguly, L. Boucher, A. Breitkreutz, and M. Tyers. - “Biogrid: a general repository for interaction datasets” - Nucleic Acids Research 2006 34 (1) D535–D539

Manlio De Domenico, Mason A. Porter, and Alex Arenas - “MuxViz: A Tool for Multilayer Analysis and Visualization of Networks” - Journal of Complex Networks 2015 3 (2) 159-176

Files format: layerID nodeID nodeID weight
7 layers Multiplex

Nodes: 18222

Edges: 170899

https://comunelab.fbk.eu/data.php

AMiner-Coauthor

This dataset is designed for research purpose only.

The content of this data includes paper information, paper citation, author information and author collaboration. 2,092,356 papers and 8,024,869 citations between them are saved in the file AMiner-Paper.rar; 1,712,433 authors are saved in the file AMiner-Author.zip and 4,258,615 collaboration relationships are saved in the file AMiner-Coauthor.zip.

FileName
Node
Number
Size
AMiner-Paper.rar

[download from mirror site]

Paper

Citation

2,092,356

8,024,869

509 MB

AMiner-Author.zip

[download from mirror site]

Author 1,712,433 167 MB
AMiner-Coauthor.zip

[download from mirror site]

Collaboration 4,258,615
31.5 MB

Supplement: The relaionship between author id and paper id AMiner-Author2Paper.zip. The 1st column is index, the 2nd colum is auhor id, the 3rd column is paper id, the 4th column is author’s position.

This file saves the collaboration network among the authors in the second file. The format is as follows:

#00 11 22 ---- 00 means the index id of one author, 11 means the index id of another author, 22 means the number of collaborations btween them

The following is an example:

#693708 1658058 2
https://www.aminer.cn/billboard/aminernetwork

wikitree

WikiTree. WikiTree is a free, collaborative family-history website, which contains more than 5 million user-contributed profilesof individuals who have lived in the past centuries.

Please Cite:
Fire, M. and Elovici, Y. “Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population, 2013.”

Bibtex:
@article{
title={Data Mining of Online Genealogy Datasets for Revealing Lifespan Patterns in Human Population},
author={Fire, Michael and Elovici, Yuval },
journal={arXiv preprint arXiv:1311.427},
year={2013},
}

Nodes Number: Edge Number:
1,382,751 9,192,212
http://proj.ise.bgu.ac.il/sns/wikitree.html

coauthor

Overview
We construct the evolving coauthor network from ArnetMiner5. We collected 1,768,776 publications published during

1986 to 2012 with 1,629,217 authors involved. We regard each year as a time stamp and there are 27 time stamps in total. At each time stamp, we create an edge between two authors if they have coauthored at least one paper in the most recent 3 years (including the current year). We convert the undirected coauthor network into directed network by regarding each undirected edge as two symmetric directed edges.

Description
Each file is corresponding to a one timestamp. Each line contains an edge denoting that person 1 coauthor person 2 at current timestamp.

Download
http://arnetminer.org/lab-datasets/dynamicinf/

Reference
Honglei Zhuang, Yihan Sun, Jie Tang, Jialin Zhang, and Xiaoming Sun. Influence Maximization in Dynamic Social Networks. In Proceedings of 2013 IEEE International Conference on Data Mining (ICDM’13). pp. 1313-1318. [PDF] [Slides_PPT] [Slides_PDF]

https://www.aminer.cn/dynamic_coauthor


  1. 点数为大致点数,可能与实际有一定差别,如果出现类似:49290 + 139738的结构说明这是一个二分图,加号左右分别是两侧的点数,如果出现“~”符号表示此点数不精确。 ↩︎

  2. 层数为大致层数,可能与实际有一定差别,如果出现类似:-10.00 to +10.00 的结构说明这个数据集的层数是一个区间值 ↩︎

  3. 下属大部分数据集都是直接包含权重的,Unweighted指的是数据集中的权重都相同,未作说明则表示没有实际查看过。 ↩︎

  4. 此数据集使用 Excel 保存,数据集过大导致其无法全部显示 ↩︎

  5. relationship 的链接已失效,暂时无法下载使用 ↩︎

猜你喜欢

转载自blog.csdn.net/m0_43448982/article/details/103076309