A multi-species repository of social networks | Scientific Data
Our validation process consisted of data-type and constraint validation, structural validation, and cross-reference and ecological validation. All data collection and validation steps were carried out by two co-authors (PS and JM).
Mục Lục
Data-type and constraint validation
The first step involved quality checks to ensure that the original data contained enough information to enable reconstruction of social network(s). All datasets were acquired in electronic format in one of the following four network data structures: edgelists, adjacency matrices, adjacency lists or group membership dataframes. All data was classified into nodes, edges or attribute data. All node ids were verified to be of the same type (e.g. integer or string). All edges were verified to be between nodes in the node list, or were added as nodes to the node list. All attribute data was verified to correspond to an existing node or edge.
Structural validation
We next validated the structural integrity of the network described in the original data-source by removing all edges that connected any node to itself (i.e. self loops). Any duplicate edges were also removed. Individuals with no edges (i.e. isolated nodes) were not removed from the network.
The ASNR currently only contains static networks. Thus, multiple associations or interactions reported between the same node pair at different time-points were replaced with weighted edges, with weights representing the association/interaction frequency.
Cross-reference and ecological validation
For detecting errors in the data mining and GraphML conversion process, we calculated network summary statistics (e.g. number of nodes, number of edges, clustering coefficient) for each network and cross-checked them against the network description in the original publication. The structures of each converted network file were also cross-checked to ensure consistency within the ecological context of data collection. For example, networks of the same group of individuals of a species that were collected over mating vs. non-mating season are expected to differ in terms of their network densities.
Data characterization
In the sections below, we characterize the phylogenetic and geographical distribution, data collection methodology, and structural similarity of the networks included in the repository.
Phylogenetic and geographic distribution
The phylogenetic distribution of the taxonomic groups currently included in the repository is shown in Fig. 1. While mammals are the most studied taxa, social networks from other taxa including reptiles, birds, insects, and fish also exist.
Fig. 1
Phylogenetic distribution of non-human species included in the Animal Social Network Repository (ASNR). The first color strip includes the species’ scientific name, and is color coded according to the taxonomic class. The second color strip is coded according to the social interaction quantified in the network, and the third color strip is coded according to the weighting criteria of the network edges. Datasets that had multiple species or with unspecified species name were not included in the figure.
Full size image
The geographical locations where data for each social network were collected is shown in Fig. 2. The United States contributes the largest number of studies and the repository contains data from Central and South America, Europe, Africa, Asia and Australia. Additionally, most studies are in free-ranging populations.
Fig. 2
Geographical distribution of the social networks included in ASNR. The points indicate the geographical location where data for each social network was collected. The point size is proportional to the number of social networks collected at each location. Point color denotes whether the monitored animal populations were captive, semi-ranging or free-ranging.
Full size image
Behavioral types
The behavioral data span a range of social associations from direct physical contacts such as grooming and trophallaxis to indirect interactions such as spatial proximity and association (Fig. 1).
Additionally, contact intensity were distributed across six categories–unweighted (i.e., all edges have weight equal to one), contact frequency, contact duration, simple ratio index18, twice weight index19, and half weight index18 (Fig. 1).
Data collection methodology
Figure 3 summarizes the methodology and data collection techniques described in original data sources that were used to collect the networks. We highlight that studies rely on a variety of data collection methodologies and timescales, reflecting empirical constraints and the disparate scientific purposes of each study. It is important that future comparative studies take these differences into account11.
Fig. 3
Duration, time resolution and technique of data collection of social networks included in the repository. mn = manual, RFID = radio-frequency identification.
Full size image
Assessing network structure
We used the Python NetworkX package20 to examine the structural properties of the social networks associated with each species. We calculated the following structural properties for each social network in the repository: total nodes, total edges, network density, network average degree, degree heterogeneity, degree assortativity, average clustering coefficient (unweighted and weighted), transitivity, average betweenness centrality (unweighted and weighted), average clustering coefficient (weighted and unweighted), Newman modularity, maximum modularity, relative modularity, group cohesion, and network diameter. These network metrics are defined in Table 1.
Table 1 Structural properties of the networks described in ASNR.
Full size table
In Fig. 4 we capture the structural similarity between the social networks included in the repository. Social networks of mammals tend to cluster together, although some structural overlap also exists with the social networks of insects and fish. Social networks that describe spatial proximity, physical contact or grooming interactions between individuals tend to be structurally similar.
Fig. 4
Graphical representation of similarity of networks based on six network metrics–degree heterogeneity, network density, average clustering coefficient, degree assortativity, betweenness centrality and relative modularity. Each node in the network represents a unique social group of an animal species, and an edge between two nodes demonstrates the similarity of their network structure. If a social group contained more than one network (for example, snapshots of a temporal network), an average value was calculated for each network metric. A z-score of each network metric was calculated. Two social groups were considered to be structurally similar (and connected by edges) if they were within one standard deviation of each other in the z-score distribution of all six network metrics. The figures on the left and right are identical except for node colors: (left) node colors indicate taxonomic classes. Green – Mammalia, orange – Aves, pink – Actinopterygii, yellow – Insecta and blue – Reptilia. (right) Node colors indicate type of interaction represented as edges. Pink – spatial proximity, green – grooming, light blue – social projection bipartite, orange – group membership, dark blue – physical contact, red – dominance interaction, dark green – trophallaxis, brown – foraging, purple – non physical social interaction, teal – overall mix.
Full size image