Network data

Network data

This page contains links to some network data sets I’ve compiled over the
years. All of these are free for scientific use to the best of my
knowledge, meaning that the original authors have already made the data
freely available, or that I have consulted the authors and received
permission to the post the data here, or that the data are mine. If you
make use of any of these data, please cite the original sources.

The data sets are in GML format. For a description of GML see here.
GML can be read by many network analysis packages, including Gephi and Cytoscape. I’ve written a simple
parser in C that will read the files into a data structure. It’s available
here. There are many features of GML not
supported by this parser, but it will read the files in this repository
just fine. There is a Python parser for GML available as part of the
NetworkX package here and
another in the igraph package,
which can be used from C, Python, or R. If you know of or develop other
software (Java, C++, Perl, R, Matlab, etc.) that reads GML, let me know.

Data sets

Other sources of network data

There are a number of other pages on the web from which you can download
network data. Here are a few that I am aware of:

  • UCINet
    data sets: Social network data sets released with the UCINet software
    by Steve Borgatti et al.
  • Pajek
    data sets: Example data sets released with the Pajek software by
    Vladimir Batagelj and Andrej Mrvar.

  • Indiana University
    data sets: A set of very large data sets, including some non-network
    data sets, compiled by the School of Library and Information Science at
    Indiana University. Network data sets include the NBER data set of US
    patent citations and a data set of links between articles in the on-line
    encyclopedia Wikipedia.

  • Duncan Watts’ data
    sets: Data compiled by Prof. Duncan Watts and collaborators at Columbia
    University, including data on the structure of the Western States Power
    Grid and the neural network of the worm C. Elegans.

  • Laszlo Barabasi’s
    data sets: Data compiled by Prof. Albert-Laszlo Barabasi and
    collaborators at the University of Notre Dame, including web data and
    biochemical networks.

  • Alex
    Arenas’s data sets: Data compiled by Prof. Alexandre Arenas and
    collaborators at Universidad Rovira i Virgili, including metabolic network
    data and the network from their study of the collaboration patterns of jazz
    musicians.

  • Stanford Large
    Network Dataset Collection: A substantial collection of data sets
    describing very large networks, including social networks, communications
    networks, and transportation networks.

Last modified: April 19, 2013

counter