Comparative Networks Data

The data comprise 304 social and biological networks. We began by coding these networks into one of (currently) 9 broad categories: Association, Biological, Ecological, Exchange, Friendship, Kinship, Perception, Support, and Transportation. These categories are described in greater detail below and they allow us to study variation in the properties of these networks across different types of nodes and ties. While we make no claims that these categories are definitive, they serve as a basis for making comparisons between networks, or for looking at particular types of networks.

  1. Association: This category primarily captures relationships of group co-membership including the number of movies actors have co-starred in, whether two students went to the same school, or the number of scenes two characters in a book shared.
  2. Biological: This category includes metabolic, protein, and gene interaction networks. This category of networks is distinguished from ecological networks by the nodes, which are not autonomous in this classification.
  3. Ecological: This category includes interactions, flows, and relationships among animals and ecosystems. Some examples include dominance relationships among cattle, hens, and female sheep, the count of interactions between kangaroos, a monkey-grooming network, and the carbon flow network.
  4. Exchange: This category includes trade relationships at the national and local levels. Examples include the volumes of raw materials exchanged between countries and the amount of Taro exchanged among 22 households in a Papuan village, as well as a number of communication networks.
  5. Friendship: This category records friendship relations between people in a number of different contexts (both in person and online). Some examples include the self assessed friendship networks of highschool and college students, prison inmates, bank employees, and monks.
  6. Kinship: This category includes networks of familial relationships, often recorded over a long time period.
  7. Perception: This category includes networks that were collected by asking respondents to give their perception of romantic, social, friendship, etc. relationships between a group of their peers or subordinates.
  8. Support: This category primarily includes networks of social support and advice giving. Some examples include advice giving networks in several firms, a law office, and the Harry Potter books, as well as legislative co-sponsorship networks.
  9. Transportation: This category includes transportation links between cities and countries. For example, one of the networks in this category records whether there is a direct flight between two U.S. cities.

The table below provides descriptive statistics for networks in each of these categories (as well as the entire dataset). These include the minimum, median, and maximum number of nodes in networks assigned to that category, the average proportion of non-zero edges in networks assigned to that category, and the count of networks assigned to that category. Perception and Friendship networks are the two largest categories, and currently make up over half of the dataset. One important aspect of the dataset is that it does not include any particularly large networks. This is a conscious choice designed to ensure that most forms of statistical analysis can be applied to these networks, and to ensure that the resulting aggregate file sizes would not be prohibitively large. If the reader is interested in perform comparative studies using very large networks, we suggest they look at the data available through Stanford’s SNAP Lab. In addition to practical concerns associated with storing and analyzing very large networks, we also believe that there are likely to be substantial differences in the way that network processes operate at the scale of millions of nodes as opposed to the scale of tens of nodes. This focus on smaller networks is reflected in a median network size of just 34 nodes in the dataset, with a maximum network size of under 2,000 nodes.

Category
Min. # Nodes
Median # Nodes
Max. # Nodes
Prop. Non-Zero Edges
# Networks

Association
14
34
410
0.41
29

Biological
212
453
1706
0.01
3

Ecological
16
28
62
0.28
11

Exchange
10
24
293
0.3
17

Friendship
14
31
336
0.25
123

Kinship
20
25
25
0.14
5

Perception
44
44
44
0.05
39

Support
12
22
1899
0.31
75

Transportation
1174
1374
1574
0.01
2

All Networks
10
34
1899
0.25
304

The figure below plots the size of the network (on a log scale) against the proportion of non-zero edges for each of the 304 networks in our dataset, with nodes colored by category. As we can see, there is a great deal of heterogeneity in the proportion of non-zero edges across different categories and network sizes, with communication networks showing some of the highest variability across these dimensions.

oops!