Datasets

Datasets

For the convenience of tnet users, I have collected a number of network datasets that were available on the Internet, and made them conform to the required standard. If you have a network that you would like to add to this page or if there are any mistakes or conflicts of interest, please contact me.

Note: Please do cite the mentioned reference if you use a dataset.

To make it easier for other researchers, it is possible to downloaded the networks in their native form and transformed versions. For example, the Facebook-like Social Network is available as a longitudinal one-mode network (native form) and as a static one-mode network. Two-mode networks are transformed to weighted one-mode networks as described on the projecting two-mode networks onto weighted one-mode networks-page.

For instructions on how to load the datasets in tnet and UCINET, see the end of this page

Network 1: Facebook-like Social Network

The Facebook-like Social Network originate from an online community for students at University of California, Irvine. The dataset includes the users that sent or received at least one message (1,899). A total number of 59,835 online messages were set over 20,296 directed ties among these users. This dataset was the main dataset used in my Ph.D. thesis. This network has also been described in Patterns and Dynamics of Users’ Behaviour and Interaction: Network Analysis of an Online Community and used in a number of articles including Prominence and control: The weighted rich-club effect and Clustering in weighted networks. Although this dataset contains many nodal attributes (e.g., gender, age, and course attended), these are not made available as it would be possible to reverse engineer the anonymisation procedure of users. Self-loops in the longitudinal edgelist signal the time that users registered on the site.

  • Weighted longitudinal one-mode network (weighted by number of characters): tnet-format
  • Binary longitudinal one-mode network: tnet-format
  • Weighted static one-mode network (weighted by number of characters): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of messages): tnet-format; UCINET-format

Opsahl, T., Panzarasa, P., 2009. Clustering in weighted networks. Social Networks 31 (2), 155-163, doi: 10.1016/j.socnet.2009.02.002

Network 2: Facebook-like Forum Network

The Facebook-like Forum Network was attained from the same online community as the online social network; however, the focus in this network is not on the private messages exchanged among users, but on users’ activity in the forum. The forum represents an interesting two-mode network among 899 users and 522 topics in that a weight can be assigned to the ties based on the number of messages or characters that a user posted to a topic. When transforming this weighted two-mode network into a one-mode network, I have maintained the users as I believe these are directly responsible for the tie generation. The number of users in this network is smaller than in the online social network as all users that sent or received private messages did not participate in the forum. Note that the identification numbers do not match with the online social network. The two-mode networks are projected onto one-mode networks using the procedure outlined on the projecting two-mode networks onto weighted one-mode networks-page.

  • Weighted longitudinal two-mode network (weighted by number of characters): tnet-format
  • Binary longitudinal two-mode network: tnet-format
  • Weighted static two-mode network (weighted by number of messages): tnet-format
  • Weighted static two-mode network (weighted by number of characters): tnet-format
  • Weighted static one-mode network (weighted by number of messages; sum): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of characters; sum): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of messages; Newman’s method): tnet-format; UCINET-format
  • Weighted static one-mode network (weighted by number of characters; Newman’s method): tnet-format; UCINET-format

Opsahl, T. 2013. Triadic closure in two-mode networks: Redefining the global and local clustering coefficients. Social Networks 35 (2), 159-167, doi: 10.1016/j.socnet.2011.07.001.

Network 3-5: Freeman’s EIES dataset

The second dataset is Freeman’s EIES networks (Freeman, 1979), also used in Wasserman and Faust (1994). This dataset was collected in 1978 and contains three networks of researchers working on social network analysis. The first network contains the personal relationships among 48 of the researchers at the beginning of the study (time 1). The second network is the personal relationship at the end of the study (time 2). In these two networks, all ties have a weight between 0 and 4. 4 represents a close personal friend of the researcher’s; 3 represents a friend; 2 represents a person the researcher has met; 1 represents a person the researcher has heard of, but not met; and 0 represents a person unknown to the researcher. The third network is different. It is a matrix with the number of messages sent among 32 of the researchers that used an electronic communication tool (frequency matrix).

There are three pieces of information about each of the 32 researchers that were part of the third network (nodal attributes): their name, the main disciplinary affiliation (1: sociology; 2: anthropology; 3: mathematics or statistics; and 4: others), and the number of citations each researcher had in the Social Science Citation Index in 1978.

  • Weighted static one-mode network (personal relationships; time 1): tnet-format; UCINET-format
  • Weighted static one-mode network (personal relationships; time 2): tnet-format; UCINET-format
  • Weighted static one-mode network (messages): tnet-format; UCINET-format

Freeman, S.C., Freeman, L.C., 1979. The networkers network: A study of the impact of a new communications medium on sociometric structure. Social Science Research Reports 46. University of California, Irvine, CA.

Network 6: The Caenorhabditis elegans worm’s neural network

This dataset contains the neural network of the Caenorhabditis elegans worm (C.elegans). It was studied by Watts and Strogatz (1998). The network contains 306 nodes that represent neurons. Two neurons are connected if at least one synapse or gap junction exist between them. The weight is the number of synapses and gap junctions. This network was obtained from the Collective Dynamics Group’s website. Note: This network contained 14 duplicated ties (i.e., a tie was mentioned twice in the edgelist). In the files available here, the duplicated tie pairs are merged, and the weight is the sum of the two identical ties.

Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.

Network 7: Norwegian Interlocking Directorate (August 2009)

This is the interlocking directorate among 384 public limited companies in Norway (Allmennaksjeselskap or ASA). The list of companies is created by selecting all companies listed as public limited companies on the website of the Norwegian Business Register on August 5, 2009. For each company, we downloaded public announcements containing changes to the boards’ composition since November 1999. From these announcements, we extracted monthly affiliation (or two-mode) networks since May 2002 (see website for choice of cut-off). Corresponding one-mode projections are also available. We strive to keep the data updated by downloading new announcements around the middle of each month.

As we are not including new companies in the list, but remove companies if they file a bankruptcy notice, the dataset is shrinking. This was also the case with the data used in the original paper (Seierstad and Opsahl, 2011). Although the paper is based on August 1, 2009, data, 17 companies had given a bankruptcy notice by this time. Thus, there were only 367 companies with 1,495 directors.

This dataset contains some nodal attributes. The directors’ and companies’ names are known. In addition, for the companies, the city and post code of their registered office are known, while for the directors, the gender is known.

The data files are available through www.boardsandgender.com along with a description of how the data is collected and directors’ gender determined.

Seierstad, C., Opsahl, T., 2011. For the few not the many? The effects of affirmative action on presence, prominence, and social capital of women directors in Norway. Scandinavian Journal of Management 27 (1), 44-54, doi: 10.1016/j.scaman.2010.10.002

Network 8-11: Intra-organisational networks

This dataset contains four networks are intra-organizational networks. Two are from a consulting company (46 employees) and two are from a research team in a manufacturing company (77 employees). These networks was used by Cross and Parker (2004).

In the first network, the ties are differentiated on a scale from 0 to 5 in terms of frequency of information or advice requests (“Please indicate how often you have turned to this person for information or advice on work-related topics in the past three months”). 0: I Do Not Know This Person; 1: Never; 2: Seldom; 3: Sometimes; 4: Often; and 5:Very Often.

In the second network, ties are differentiated in terms of the value placed on the information or advice received (“For each person in the list below, please show how strongly you agree or disagree with the following statement: In general, this person has expertise in areas that are important in the kind of work I do.”). The weights in this network is also based on a scale from 0 to 5. 0: I Do Not Know This Person; 1: Strongly Disagree; 2: Disagree; 3: Neutral; 4: Agree; and 5: Strongly Agree.

In the third network, the ties among the researchers are differentiated in terms of advice (“Please indicate the extent to which the people listed below provide you with information you use to accomplish your work”). The weights are based on the following scale: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Very Infrequently; 2: Infrequently; 3: Somewhat Infrequently; 4: Somewhat Frequently; 5: Frequently; and 6: Very Frequently.

The fourth network is based on the employees’ awareness of each others’ knowledge and skills (“I understand this person’s knowledge and skills. This does not necessarily mean that I have these skills or am knowledgeable in these domains but that I understand what skills this person has and domains they are knowledgeable in”). The weight scale in this network is: 0: I Do Not Know This Person/I Have Never Met this Person; 1: Strongly Disagree; 2: Disagree; 3: Somewhat Disagree; 4: Somewhat Agree; 5: Agree; and 6: Strongly Agree.

In addition to the relational data, the dataset also contains information about the people (nodal attributes). The following attributes are known for the consultancy firm: the organisational level (1 Research Assistant; 2: Junior Consultant; 3: Senior Consultant; 4: Managing Consultant; 5: Partner), gender (1: male; 2: female), region (1: Europe; 2: USA), and location (1: Boston; 2: London; 3: Paris; 4: Rome; 5: Madrid; 6: Oslo; 7: Copenhagen).

For the researchers in the manufacturing company, the following attributes are known: location (1: Paris; 2: Frankfurt; 3: Warsaw; 4: Geneva), tenure (1: 1-12 months; 2: 13-36 months; 3: 37-60 months; 4: 61+ months) and the organisational level (1: Global Dept Manager; 2: Local Dept Manager; 3: Project Leader; 4: Researcher).

Cross, R., Parker, A., 2004. The Hidden Power of Social Networks. Harvard Business School Press, Boston, MA.

Network 12: Newman’s scientific collaboration network

This is the co-authorship network of based on preprints posted to Condensed Matter section of arXiv E-Print Archive between 1995 and 1999. This dataset can be classified as a two-mode or affiliation network since there are two types of “nodes” (authors and papers) and connections exist only between different types of nodes. The two-mode network is projected onto one-mode networks using the procedure outlined on the projecting two-mode networks onto weighted one-mode networks-page. In addition to the network data, the names of the authors (369kb) are available.

  • Binary static two-mode network: tnet-format (659kb)
  • Binary static one-mode network: tnet-format (1.21mb)
  • Weighted static one-mode network (sum of joint papers): tnet-format (1.21mb)
  • Weighted static one-mode network (Newman’s projection method): tnet-format (1.98mb); UCINET-format (1.98mb)

This network was given by Mark Newman.

Newman, M. E. J., 2001. The structure of scientific collaboration networks. PNAS 98, 404-409.

Network 13: Davis’ Southern Women Club

This dataset was collected by Davis and colleague in the 1930s. It contains the observed attendance at 14 social events by 18 Southern women. For a more detailed description, see Davis et al. (1941) or Wasserman and Faust (1994). The first name of the women is also available (1kb).

  • Binary static two-mode network: tnet-format (1kb)
  • Binary static one-mode network: tnet-format (3kb)
  • Weighted static one-mode network (co-attended events): tnet-format (3kb)
  • Weighted static one-mode network (Newman’s projection method): tnet-format (7kb); UCINET-format (7kb)

Davis, A., Gardner, B. B., Gardner, M. R., 1941. Deep South. University of Chicago Press, Chicago, IL.

Network 14: The network of airports in the United States

There are three US airport networks. The first is the network of the 500 busiest commercial airports in the United States. This dataset was used in Colizza et al. (2007). A tie exists between two airports if a flight was scheduled between them in 2002. The weights corresponds to the number of seats available on the scheduled flights. Even thought this type of networks is directed by nature as a flight is scheduled from one airport and to another, the networks are highly symmetric (Barrat et al., 2004). Therefore, the version of this network is undirected (i.e., the weight of the tie from one airport towards another is equal to the weight of the reciprocal tie). This network was obtained from the Complex Networks Collaboratory’s website

Colizza, V., Pastor-Satorras, R., Vespignani, A., 2007. Reaction-diffusion processes and metapopulation models in heterogeneous networks. Nature Physics 3, 276-282.

The second dataset is the complete US airport network in 2010. This is the network used in the first part of the Why Anchorage is not (that) important: Binary ties and Sample selection-blog post. The data is downloaded from the Bureau of Transportation Statistics (BTS) Transtats site (Table T-100; id 292) with the following filters: Geography=all; Year=2010; Months=all; and columns: Passengers, Origin, Dest. Based on this table, the airport codes are converted into id numbers, and the weights of duplicated ties are summed up. Also ties with a weight of 0 are removed (only cargo), and self-loops removed.

The third dataset was also used in the Why Anchorage is not (that) important: Binary ties and Sample selection-blog post. The data is downloaded from Openflights.org. Unlike the BTS data, this dataset contains ties between two non-US-based airports. As such, it gives much more of a complete picture and avoids the sample selection. The weights in this network refer to the number of routes between two airports. Airport attributes are available.

Network 15: The US power grid

This is the network is the high-voltage power grid in the Western States of the United States of America. The nodes are transformers, substations, and generators, and the ties are high-voltage transmission lines. This network was originally used in Watts and Strogatz (1998). Although the transmission lines can be directed and differentiated based on their capacity, this information is not available.

Watts, D. J., Strogatz, S. H., 1998. Collective dynamics of “small-world” networks. Nature 393, 440-442.

How to load datasets

tnet

To use tnet, you first need to download and install R and then download and install tnet within R (information from tnet’s website). You only need to do these steps once. Every time that you start R, you need to load tnet. This you can do by writing the following command

library(tnet)

A dataset can be loaded by writing a command similar to:

net <- read.table("<link to dataset>")

where is the link to the dataset in the above table, e.g. Freeman’s third EIES network can be loaded by the following command:

net <- read.table("http://opsahl.co.uk/tnet/datasets/Freemans_EIES-3_n32.txt")

UCINET

To use UCINET, you need to download and install UCINET (information on UCINET’s website). This programme is not free, but there is a 30-day trial period.

To load a dataset, you must download and save the dl-file of the dataset you wish to study from the above table to your computer. The network can be imported into UCINET by using the DL import function. You can find this function through the menu: Data > Import > DL. When the function’s dialog box opens, you must select the downloaded file containing the dataset by clicking on “…” after “Input text file in DL format”. The second box can be set to default, but do remember, and change if you wish, the name that appears in the third box as this will be the name of the internal UCINET file.

Like this:

Like

Loading…