Proportional venn diagram for more than 3 sets

The Problem

We need to represent counts of multiple interconnected categories of object, and a Venn diagram would be unable to represent more than a trivial amount of categories and their overlap.

A Solution

Consider each of the categories and their combinations as a node in a graph. Draw the graph such that the size of the node represents the count in each category, and the edges connect the related categories. The advantage of this approach is: multiple categories can be accommodated with ease, and this becomes a type of connected bubble chart.

The Result

network layout

The Code

The proposed solution uses NetworkX to create the data structure and matplotlib to draw it. If data is presented in the right format, this will scale to a large number of categories with multiple connections.

import networkx as nx
import matplotlib.pyplot as plt

def load_nodes():
    text = '''  Node    Size
                1        523
                2        231
                3        102
                4         72
                1+2      710
                1+3      891
                1+3+4    621'''
    # load nodes into list, discard header
    # this may be replaced by some appropriate output 
    # from your program
    data = text.split('\n')[1:]
    data = [ d.split() for d in data ]
    data = [ tuple([ d[0], 
                    dict( size=int(d[1]) ) 
                    ]) for d in data]
    return data

def load_edges():
    text = '''  From   To
                1+2    1
                1+2    2
                1+3    1
                1+3    3
                1+3+4    1
                1+3+4    3
                1+3+4    4'''
    # load edges into list, discard header
    # this may be replaced by some appropriate output 
    # from your program
    data = text.split('\n')[1:]
    data = [ tuple( d.split() ) for d in data ]
    return data

if __name__ == '__main__':
    scale_factor = 5
    G = nx.Graph()
    nodes = load_nodes()
    node_sizes = [ n[1]['size']*scale_factor
                  for n in nodes ]

    edges = load_edges()
    G.add_edges_from( edges )

    nx.draw_networkx(G, 
                     pos=nx.spring_layout(G),
                     node_size = node_sizes)
    plt.axis('off')
    plt.show()

Other Solutions

Other solutions might include: bubble charts, Voronoi diagrams, chord diagrams, and hive plots among others. None of the linked examples use Python; they are just given for illustrative purposes.


I believe ninjagecko is correct and this cannot generally be represented as a diagram of intersections, unless you don't mind the diagram being in n dimensions. However, it can be represented in 2D if you have a diagram for each category showing all its intersections - and this itself can be a single diagram. So this may be a more appropriate way to represent your data. I've produced a stacked barchart to illustrate:

enter image description here

The code:

cats = ['cat1','cat2','cat3','cat4']
data = {('cat1',): 523, ('cat2',): 231, ('cat3',): 102, ('cat4',): 72, ('cat1','cat2'): 710,('cat1','cat3'): 891,('cat1','cat3','cat4') : 621}

import matplotlib.pyplot as plt
import numpy as np
from random import random

colors = dict([(k,(random(),random(),random())) for k in data.keys()])
print colors
for i, cat in enumerate(sorted(cats)):
    y = 0
    for key, val in data.items():
        if cat in key:
            plt.bar(i, val, bottom=y, color=colors[key])
            plt.text(i,y,' '.join(key))
            y += val
plt.xticks(np.arange(len(cats))+0.4, cats )
plt.show()