Mongo Collection `Size` is *larger* than `storageSize`?
storageSize
is the sum of all extents for that data, excluding indexes.
So that collection takes up 2 extents, they are ~2GB each, hence ~4GB. size
includes indexes and I believe a couple of other things which inflate the number. Neither really represents the proper on-disk size. For disk size, db.stats()
has a filesize field which is closer to what you want I think you're looking for.
The manual is somewhat better at outlining what the various fields mean, see here for collections:
http://docs.mongodb.org/manual/reference/collection-statistics/
And here for database stats:
http://docs.mongodb.org/manual/reference/database-statistics/
Some other potentially relevant information:
The compact command does not shrink any datafiles; it only defragments deleted space so that larger objects might reuse it. The compact command will never delete or shrink database files, and in general requires extra space to do its work, usually a minimum of one extra extent.
If you repair the database it will essentially rewrite the data files from scratch, which will remove padding and store them on disk as efficiently as you are going to get. However you will need to have ~2x the size on disk to do so (actually less, but it's a decent guide).
One other thing to bear in mind here - repair and compact remove padding. The padding factor varies between 1 (no moves of documents caused by documents growing), to 2 (lots of moves caused by documents growing). Your padding factor of ~1.67 would indicate you are growing (and hence causing moves) quite a bit.
When you compact or repair a database you remove that padding - subsequent document growth is therefore going to trigger even more moves than before. Because moves are relatiely expensive operations, this can have a serious impact on your performance. More info here:
http://www.mongodb.org/display/DOCS/Padding+Factor
For mongodb > 3.x
For MMAPv1:
datasize < storageSize
but For wiredTiger
datasize > storageSize (most cases due to compression but may be
storageSize greater, it varies on condition like
compression technique, whether compact/repair
command run or not)
For db.getCollection('name').stats()
size = total size in memory of all records in a collection + padding (excluded index size + record header which is 16 byte per header, header means = field name)
avgObjSize = avg size of obj + padding
storageSize = total amount of storage allocated to this collection for document storage. (totalIndex size excluded)
totalIndexSize : totalIndexSize (compressed in case of wiredTiger)
For db.stats()
dataSize = document + padding
storageSize = document + padding + deleted space
fileSize = document + padding extents + index extents + yet-unused space
We can delete unused space or hole by this
db.getCollection('name').runCommand( "compact" )
After running compact or repair command we can get exact storage size and data size difference.
Compression Technique in mongodb wiredTiger:
- snappy : good compression, low overhead
- zlib: better compression, more CPU
- none (we can disable compression, by default its enable in WT)