Network datasets – Katya Ognyanova
Since I started posting network tutorials on this site, people will occasionally write to ask me about the included example datasets. I also get e-mails from people asking where they might find network data to use for a project or in teaching. Seems like a good idea to post a quick reply here.
The datasets included in my tutorials are mostly synthetic (or trimmed and heavily manipulated) in order to illustrate various visualization aspects in a manageable way. Feel free to use those datasets (citing or linking to the source is appreciated), but keep in mind that they are artificially generated and not the result of actual data collection. When I do use empirical data, the download files include documentation (if the data is collected by me) or clearly point to the source (if the data was collected by someone else).
If you are looking for network data, large or small, there are a number of excellent open online repositories that you can take a look at. Below is a short list (feel free to e-mail me if you have other good links, and I will add them here).
Another good place where you can find a collection of links to network resources (including data repositories) is the Awesome Network Analysis list curated by François Briatte.
If you are looking for network data to use in teaching, I would also recommend having students collect social media data. For graduate students, R packages like twitteR and Rfacebook may be a good way to do this. For undergraduate students, I recommend NodeXL, an intuitive and easy to use Excel addon that can grab data from Facebook, Twitter, YouTube, and other sources.