Find the area between two curves plotted in matplotlib (fill_between area)

The area_between_two_curves function in pypi library similaritymeasures (released in 2018) might give you what you need. I tried a trivial example on my side, comparing the area between a function and a constant value and got pretty close tie-back to Excel (within 2%). Not sure why it doesn't give me 100% tie-back, maybe I am doing something wrong. Worth considering though.


Your set of data is quite "nice" in the sense that the two sets of data share the same set of x-coordinates. You can therefore calculate the area using a series of trapezoids.

e.g. define the two functions as f(x) and g(x), then, between any two consecutive points in x, you have four points of data:

(x1, f(x1))-->(x2, f(x2))
(x1, g(x1))-->(x2, g(x2))

Then, the area of the trapezoid is

A(x1-->x2) = ( f(x1)-g(x1) + f(x2)-g(x2) ) * (x2-x1)/2                         (1)

A complication arises that equation (1) only works for simply-connected regions, i.e. there must not be a cross-over within this region:

|\             |\/|
|_|     vs     |/\|

The area of the two sides of the intersection must be evaluated separately. You will need to go through your data to find all points of intersections, then insert their coordinates into your list of coordinates. The correct order of x must be maintained. Then, you can loop through your list of simply connected regions and obtain a sum of the area of trapezoids.

EDIT:

For curiosity's sake, if the x-coordinates for the two lists are different, you can instead construct triangles. e.g.

.____.
|   / \
|  /   \
| /     \
|/       \
._________.

Overlap between triangles must be avoided, so you will again need to find points of intersections and insert them into your ordered list. The lengths of each side of the triangle can be calculated using Pythagoras' formula, and the area of the triangles can be calculated using Heron's formula.


The area calculation is straightforward in blocks where the two curves don't intersect: thats the trapezium as has been pointed out above. If they intersect, then you create two triangles between x[i] and x[i+1], and you should add the area of the two. If you want to do it directly, you should handle the two cases separately. Here's a basic working example to solve your problem. First, I will start with some fake data:

#!/usr/bin/python
import numpy as np

# let us generate fake test data
x = np.arange(10)
y1 = np.random.rand(10) * 20
y2 = np.random.rand(10) * 20

Now, the main code. Based on your plot, looks like you have y1 and y2 defined at the same X points. Then we define,

z = y1-y2
dx = x[1:] - x[:-1]
cross_test = np.sign(z[:-1] * z[1:])

cross_test will be negative whenever the two graphs cross. At these points, we want to calculate the x coordinate of the crossover. For simplicity, I will calculate x coordinates of the intersection of all segments of y. For places where the two curves don't intersect, they will be useless values, and we won't use them anywhere. This just keeps the code easier to understand.

Suppose you have z1 and z2 at x1 and x2, then we are solving for x0 such that z = 0:

# (z2 - z1)/(x2 - x1) = (z0 - z1) / (x0 - x1) = -z1/(x0 - x1)
# x0 = x1 - (x2 - x1) / (z2 - z1) * z1
x_intersect = x[:-1] - dx / (z[1:] - z[:-1]) * z[:-1]
dx_intersect = - dx / (z[1:] - z[:-1]) * z[:-1]

Where the curves don't intersect, area is simply given by:

areas_pos = abs(z[:-1] + z[1:]) * 0.5 * dx # signs of both z are same

Where they intersect, we add areas of both triangles:

areas_neg = 0.5 * dx_intersect * abs(z[:-1]) + 0.5 * (dx - dx_intersect) * abs(z[1:])

Now, the area in each block x[i] to x[i+1] is to be selected, for which I use np.where:

areas = np.where(cross_test < 0, areas_neg, areas_pos)
total_area = np.sum(areas)

That is your desired answer. As has been pointed out above, this will get more complicated if the both the y graphs were defined at different x points. If you want to test this, you can simply plot it (in my test case, y range will be -20 to 20)

negatives = np.where(cross_test < 0)
positives = np.where(cross_test >= 0)
plot(x, y1)
plot(x, y2)
plot(x, z)
plt.vlines(x_intersect[negatives], -20, 20)

Define your two curves as functions f and g that are linear by segment, e.g. between x1 and x2, f(x) = f(x1) + ((x-x1)/(x2-x1))*(f(x2)-f(x1)). Define h(x)=abs(g(x)-f(x)). Then use scipy.integrate.quad to integrate h.

That way you don't need to bother about the intersections. It will do the "trapeze summing" suggested by ch41rmn automatically.