Cache Invalidation — Is there a General Solution?
What you are talking about is lifetime dependency chaining, that one thing is dependent on another which can be modified outside of it's control.
If you have an idempotent function from a
, b
to c
where, if a
and b
are the same then c
is the same but the cost of checking b
is high then you either:
- accept that you sometime operate with out of date information and do not always check
b
- do your level best to make checking
b
as fast as possible
You cannot have your cake and eat it...
If you can layer an additional cache based on a
over the top then this affects the initial problem not one bit. If you chose 1 then you have whatever freedom you gave yourself and can thus cache more but must remember to consider the validity of the cached value of b
. If you chose 2 you must still check b
every time but can fall back on the cache for a
if b
checks out.
If you layer caches you must consider whether you have violated the 'rules' of the system as a result of the combined behaviour.
If you know that a
always has validity if b
does then you can arrange your cache like so (pseudocode):
private map<b,map<a,c>> cache //
private func realFunction // (a,b) -> c
get(a, b)
{
c result;
map<a,c> endCache;
if (cache[b] expired or not present)
{
remove all b -> * entries in cache;
endCache = new map<a,c>();
add to cache b -> endCache;
}
else
{
endCache = cache[b];
}
if (endCache[a] not present) // important line
{
result = realFunction(a,b);
endCache[a] = result;
}
else
{
result = endCache[a];
}
return result;
}
Obviously successive layering (say x
) is trivial so long as, at each stage the validity of the newly added input matches the a
:b
relationship for x
:b
and x
:a
.
However it is quite possible that you could get three inputs whose validity was entirely independent (or was cyclic), so no layering would be possible. This would mean the line marked // important would have to change to
if (endCache[a] expired or not present)
The problem in cache invalidation is that stuff changes without us knowing about it. So, in some cases, a solution is possible if there is some other thing that does know about it and can notify us. In the given example, the getData function could hook into the file system, which does know about all changes to files, regardless of what process changes the file, and this component in turn could notify the component that transforms the data.
I don't think there is any general magic fix to make the problem go away. But in many practical cases there may very well be opportunities to transform a "polling"-based approach into an "interrupt"-based one, which can make the problem simply go away.