Different dataset forms in Social Networks – GeeksforGeeks

Prerequisite: Python Basics

For constructing any network we need a good dataset and every network has a different format for data of datasets. Basically a dataset is nothing but a huge collection of data that can be used further for any analysis. Data in dataset can be in many formats.

Example of Network datasets:

  • Ingredients network.
  • Synonymy network.
  • Web graph
  • Zachary Karate Club network.

Types of formats of datasets:

  • CSV(Comma Separated Value): It has extension either .txt or .csv . CSV format file can have 2 more types it can be either edge list or adjacency list format .
    Example:
    • EdgeList format: Basically it can edges and weights if required. Every row contains 2 nodes, first node will be the source node and the second node will be the target node.
          0 5
          0 11
          0 34
          0 45
          1 56
          1 67
          1 76
          1 89
      
    • Adjacency list format: Basically every contains 2 or more nodes. The first node is the source node and subsequent nodes in the same row are the nodes connected directly to source node like in first row 1 is directly connected to 2, 5 and 7 nodes.
      1 2 4 6
      2 3 4
      3 2 4 6
      4 6 2 3
      6 1 3
  • GML(Graph Modeling Language): It is the most commonly used format for network datasets because it provides flexibility for assigning attributes to the nodes and edges and it is very simple.
    graph
    [
      node
      [
       id 1
       label "Node 1"
      ]
      node
      [
       id 2
       label "Node 2"
      ]
      node
      [
       id 3
       label "Node 3"
      ]
       edge
      [
       source 2
       target 1
       label "Edge 2 to 1"
      ]
      edge
      [
       source 3
       target 1
       label "Edge 3 to 1"
      ]
    ]
    
  • Pajek Net: It has extension .NET or .Paj .It is widely used for network datasets. For every row, you have every node return and all nodes are done you start with information about edges that contain source node and the target node.
    *Vertices 6
    *Edges
    1 2
    1 6
    2 3
    2 5
    3 1 
    3 5
    3 6
    4 5 
    5 6
    6 2
    
  • GraphML: Here ML stands XML as it is very much similar to XML. As in XML, there are hierarchical structures and their tags. Similarly in graphml also there are tags like XML tag, graphml tag, graph tag, node tag, and edge tag.
    <?xml version="1.0" encoding="UTF-8"?>
    <graphml xmlns="http://graphml.graphdrawing.org/xmlns"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://graphml.graphdrawing.org/xmlns
    http://graphml.graphdrawing.org/xmlns/1.0/graphml.xsd">
    <graph id="A" edgedefault="undirected">
    <node id="a"/>
    <node id="b"/>
    <edge id="c" source="a" target="b"/>
    </graph>
    </graphml>
    

    In the above graphml example first, there is an XML tag, after that their graphml tag, inside graphml tag there is graph tag and inside that, there are several nodes and edge tags.

  • GEXF(Graph Exchange XML Format): It was created by Gephi people. Gephi is an opensource software that is used for visualizing and analyzing social networks. This format is also inspired by XML as it has similar tags. Tags are XML tag, GEXF tag, Meta tag, Graph tag, node tag, edge tag.
    <?xml version="1.0" encoding="UTF-8"?>
    <gexf xmlns="http://www.gexf.net/1.2draft" version="1.2">
        <meta lastmodifieddate="2009-06-01">
            <creator>Gexf.net</creator>
            <description>Geeks for geeks</description>
        </meta>
        <graph mode="static" defaultedgetype="directed">
            <nodes>
                <node id="a" label="Hello" />
                <node id="b" label="GeeksforGeeks" />
            </nodes>
            <edges>
                <edge id="c" source="a" target="b" />
            </edges>
        </graph>
    </gexf>
    

My Personal Notes

arrow_drop_up