Compressing Graphs (Kolmogorov complexity of graphs)

Li and Vitányi in their standard textbook on Kolmogorov complexity (3rd edition, p.456) observe

Almost all strings have high complexity. Therefore, almost all tournaments and almost all undirected graphs have high complexity.

This is made more precise in Section 6.4. In particular, they show that the number of distinct isomorphism classes of undirected graphs with $n$ vertices asymptotically approaches $2^\binom{n}{2}/n!$, with only a small error.

By Stirling's approximation, the Kolmogorov complexity of undirected graphs with $n$ vertices can then be seen to be close to $n(n-1)/2 + c$ bits. For undirected graphs the adjacency matrix is symmetric and has 0's on the diagonal, so one only needs to store the $n(n-1)/2$ bits above the diagonal.

This makes more precise the claim made in another answer, that the adjacency matrix representation is optimal. (Note that close in this argument may still be an unbounded function of $n$; see also Lemma 6.4.6 and the comment after it.)

  • Ming Li and Paul Vitányi, An Introduction to Kolmogorov Complexity and Its Applications, 3rd edition, Springer-Verlag, 2008. doi: 10.1007/978-0-387-49820-1

Each graph can be uniquelly represented as an adjacency matrix. This is simply an nxn matrix of values 0,1. Each graph can therefore be expressed uniquely as a string of length n^2 by sequentially concatenating each line. Graphs with cycles and directed graphs are also uniquely represented.

Examples:
The complete graph of degree 3 can be expressed as 011-101-110 -> 011101110
The graph with Nodes {1,2,3} and edges {{1, 2},{1,3}} can be expressed as 011-100-100 -> 011100100

From this representation and from kolmogorov complexity certain statements become evident.

(1) For any other representation of graphs, there exists a constant c such that the conditional kolgomorov complexity on this representation saves at most c bits from the kolmogorov complexity from adjacency matrix.

(2) Sparse graphs are highly compressible (adjacency matrix of sparse graph contain a lot of zeroes versus very few ones)

(3) if all you care about is about equivalency classes between graphs.
Program p1(n) := find the nth graph g such that no graph preceding (in my representation) is isomorphic,.
Program p2(n) := find the smallest graph that is isomorphic to the graph n.
program p3(n) := find the n' such that p1(n') = p2(n)
p3 is the function that compresses a graph
p1 is the decompression function


Graph compression algorithms are starting to be used in biological applications (e.g. Itzkovitz, et al. Coarse-graining and self-dissimilarity of complex networks; L. Peshkin, Structure induction by lossless graph compression; M. Hayashida and T. Akutsu Comparing biological networks via graph compression). Unlike the pure mathematical side, the networks involved are not the whole class of (non-isomorphic) (un)labelled graphs.

Motivated by string compression, these algorithms can detect frequently occurring induced subgraphs. Recently, the related topic of network motifs (small connected subgraphs that occur as an induced subgraph significantly more frequently than in randomized networks) has received an explosion of interest. It is hoped that graph compression algorithms can provide insight into network motifs (and other small induced sugraphs, or "graphlets"), and thus provide insight into the (biological) network as a whole.