SNAP Tutorial
Best way to get familiar to with SNAP is to first read this document. Then examine the example applications implemented in SnapSamples directory of the SNAP distribution. Refer to description of files to see what functionality is implemented in which source files of the SNAP library and directly examine the source files.
SNAP directory structure
The SNAP library contains the following directory structure:
- snap: implementation of SNAP network analysis library.
- examples: small sample applications that demonstrate functionality of the library
- glib: STL-like library that implements basic data structures, like vectors (TVec), hash-tables (THash) and strings (TStr), provides serialization and so on.
Compiling SNAP
SNAP compiles under Windows (Visual Studio, CygWin with GCC), Mac, Linux and other Unix clones (GCC).
For plotting structural properties of networks (e.g., degree distribution) the SNAP expects to find GnuPlot in the system path. Similarly for drawing and visualizing small graphs SNAP utilizes GraphViz which has to be installed on the system. Set system PATH variable appropriately or put the executables in the working directory.
Note: If compiling SNAP with GCC then depending on your platform (Windows or Linux) you may have to directly edit the Makefile (comment out two lines). See the Makefiles in examples directory for instructions on what to do.
After you edited the
Makefile, type:
cd examples/graphgen
make opt
Note make opt compiles the optimized version of the code, while make will compile an unoptimized version of the code.
And executing
./graphgen -g:w -n:1000 -k:4 -p:0.1 -o:smallworld.txt
will generate a Watts-Strogatz small-world graph where each node is connected to k=4 closest nodes and with the rewiring probability of p=0.1.
Graph and network types
SNAP supports the following graph types:
TUNGraph: undirected graph (single edge between an unordered pair of nodes)
TNGraph: directed graph (single directed edge between an ordered pair of nodes)
TNEGraph: directed multi-graph (any number of directed edges between a pair of nodes)
Graphs store only information about the network connectivity. To add data on nodes and edges of the network use network types where node/edge data is passed as template parameter:
TNodeNet<TNodeData>: like TNGraph but with TNodeData object for each node
TNodeEDatNet<TNodeData, TEdgeData>: like TNGraph but with TNodeData on each node
and TEdgeData on each edge
TNodeEdgeNet<TNodeData, TEdgeData>: like TNEGraph but with TNodeData on each node
and TEdgeData on each edge
TBigNet<TNodeData>: memory efficient implementation of TNodeNet that avoids memory
fragmentation and handles billions of edges (depending on how much RAM you have)
Refer to imdbnet.h for an example of how to create a network with rich data on its nodes. imdbnet.h provides an example of a IMDB actors-to-movies bipartite network with move and actor data on the nodes of the network.
Example usage
Example of how to create and use a directed graph:
// create a graph
PNGraph Graph = TNGraph::New();
Graph->AddNode(1);
Graph->AddNode(5);
Graph->AddNode(32);
Graph->AddEdge(1,5);
Graph->AddEdge(5,1);
Graph->AddEdge(5, 32);
Nodes have explicit (and arbitrary) node ids. There is no restriction for node ids to be contiguous integers starting at 0. In a multi-graph TNEGraph edges have explicit integer ids. In TUNGraph and TNGraph edges have no explicit ids -- edges are identified by a pair node ids.
SNAP uses smart-pointers (TPt) so there is not need to explicitly free (delete) graph objects. They self-destruct when they are not needed anymore. Prefix P in the class name stands for a pointer, while T means a type.
Iterators
SNAP heavily relies on the user of iterators that allow for fast traversals over the nodes or edges. For example:
// create a directed random graph on 100 nodes and 1k edges
PNGraph Graph = TSnap::GenRndGnm<PNGraph>(100, 1000);
// traverse the nodes
for (TNGraph::TNodeI NI = Graph->BegNI(); NI < Graph->EndNI(); NI++) {
printf("node id %d with out-degree %d and in-degree %d\n",
NI.GetId(), NI.GetOutDeg(), NI.GetInDeg());
}
// traverse the edges
for (TNGraph::TEdgeI EI = Graph->BegEI(); EI < Graph->EndEI(); EI++) {
printf("edge (%d, %d)\n", EI.GetSrcNId(); EI.GetDstNId());
}
// we can traverse the edges also like this
for (TNGraph::TNodeI NI = Graph->BegNI(); NI < Graph->EndNI(); NI++) {
for (int e = 0; e < NI.GetOutDeg(); e++) {
printf("edge (%d %d)\n", NI.GetId(), NI.GetOutNId(e));
}
}
All graph and network datatypes define node and edge iterators. In general graph/network data types use the following functions to return various iterators:
BegNI(): iterator to first node
EndNI(): iterator to one past last node
GetNI(u): iterator to node with id u
BegEI(): iterator to first edge
EndEI(): iterator to one past last edge
GetEI(u, v): iterator to edge (u, v)
GetEI(e): iterator to edge with id e (only for multigraphs)
In general node iterators provide the following functionality:
GetId(): return node id
GetOutDeg(): return out-degree of a node
GetInDeg(): return in-degree of a node
GetOutNId(e): return node id of the endpoint of e-th out-edge
GetInNId(e): return node id of the endpoint of e-th in-edge
IsOutNId(int NId): do we point to node id n
IsInNId(n): does node id n point to us
IsNbhNId(n): is node n our neighbor
In addition, iterators over networks also allow for easy access to the data:
GetDat(): return data type TNodeData associated with the node
GetOutNDat(e): return data associated with node at endpoint of e-th out-edge
GetInNDat(e): return data associated with node at endpoint of e-th in-edge
GetOutEDat(e): return data associated with e-th out-edge
GetInEDat(e): return data associated with e-th in-edge
For additional information on node (TNodeI) and edge (TEdgeI) iterators see the implementation of various graph and network data types in graph.h and network.h.
Input output
With SNAP it is easy to save and load networks in various formats. Internally SNAP saves networks in compact binary format but functions for loading and saving networks in various other text and XML formats are also available (see gio.h).
For example:
// generate a network using Forest Fire model
PNGraph G = TSnap::GenForestFire(1000, 0.35, 0.35);
// save and load binary
{ TFOut FOut("test.graph"); G->Save(FOut); }
{ TFIn FIn("test.graph"); PNGraph G2 = TNGraph::Load(FIn); }
// save and load from a text file
TSnap::SaveEdgeList(G2, "test.txt", "Save as tab-separated list of edges");
PNGraph G3 = TSnap::LoadEdgeList
("test.txt", 0, 1);
Manipulating graphs and networks
SNAP provides rich functionality to efficiently manipulate graphs and networks. Functions are implemented as a part of namespace TSnap. All functions support any graph/network type.
For example:
// generate a network using Forest Fire model
PNGraph G = TSnap::GenForestFire(1000, 0.35, 0.35);
// convert to undirected graph TUNGraph
PUNGraph UG = TSnap::ConvertGraph<PUNGraph>(G);
// get largest weakly connected component of G
PNGraph WccG = TSnap::GetMxWcc(G);
// get a subgraph induced on nodes {0,1,2,3,4,5}
PNGraph SubG = TSnap::GetSubGraph(G, TIntV::GetV(0,1,2,3,4));
// get 3-core of G
PNGraph Core3 = TSnap::GetKCore(G, 3);
// delete nodes of degree 10
TSnap::DelDegKNodes(G, 10);
For more examples of how to work with networks see example applications in SnapSamples directory of the SNAP distribution. Also refer to description of files to see what functionality is implemented in which source files of the SNAP library.
Computing structural properties
SNAP provides rich functionality to efficiently compute structural properties of networks. Functions are implemented as a part of namespace TSnap. All functions support any graph/network type.
For example
// generate a Preferential Attachment graph on 1000 nodes and node out degree of 3
PUNGraph G = TSnap::GenPrefAttach(1000, 3);
// get distribution of connected components (component size, count)
TVec<TPair<TInt, TInt> > CntV; // vector of pairs of integers (size, count)
TSnap::GetWccSzCnt(G, CntV);
// get degree distribution pairs (degree, count)
TSnap::GetOutDegCnt(G, CntV);
// get first eigenvector of graph adjacency matrix
TFltV EigV; // vector of floats
TSnap::GetEigVec(G, EigV);
// get diameter of G
TSnap::GetBfsFullDiam(G);
// count the number of triads in G, get the clustering coefficient of G
TSnap::GetTriads(G);
TSnap::GetClustCf(G);
For more examples of how to work with networks see example applications in SnapSamples directory of the SNAP distribution. Also refer to description of files to see what functionality is implemented in which source files of the SNAP library.