Optimize memory usage of a collection of Strings in Java
String.intern() will help you here (most likely). It will resolve multiple instances of the same string down to one copy.
EDIT: I suggested this would 'most likely' help. In what scenarios will it not ? Interning strings will have the effect of storing those interned string representations permanently. If the problem domain is a one-shot process, this may not be an issue. If it's a long running process (such as a web app) then you may well have a problem.
I would hesitate to say never use interning (I would hesistate to say never do anything). However there are scenarios where it's not ideal.
Do not use String.intern (there have been various memory issues related to this through the years). instead, create your own cache, similar to String.intern. basically, you want a Map, where each key maps to itself. then, before caching any string, you "intern" it:
private Map<String,WeakReference<String>> myInternMap = new WeakHashMap<String,,WeakReference<String>>();
public String intern(String value) {
synchronized(myInternMap) {
WeakReference<String> curRef = myInternMap.get(value);
String curValue = ((curRef != null) ? curRef.get() : null);
if(curValue != null) {
return curValue;
}
myInternMap.put(value, new WeakReference<String>(value));
return value;
}
}
note, you use weakreferences for the keys and values so that you don't keep references for strings which you are no longer using.
String.intern
is the obvious choice as Brian says. But if you don't want to intern across all the String in memory, you can use a Set to first see if the value is present. Here's untested code. You will have to work out removing from reverse map when removing from main
class Map2<K, V> implements Map<K, V>
{
Map<K, V> _map = Maps.newHashMap();
Set<V, V> _rev = Maps.newHashMap();
V put(K k, V v) {
if (_rev.containsKey(v)) {
V prev = _rev.get(v);
return _map.put(k, prev);
} else {
_rev.put(v, v);
return _map.put(k,v);
}
}