NTFS performance and large volumes of files and directories
There are also performance problems with short file name creation slowing things down. Microsoft recommends turning off short filename creation if you have more than 300k files in a folder [1]. The less unique the first 6 characters are, the more of a problem this is.
[1] How NTFS Works from http://technet.microsoft.com, search for "300,000"
Here's some advice from someone with an environment where we have folders containing tens of millions of files.
- A folder stores the index information (links to child files & child folder) in an index file. This file will get very large when you have a lot of children. Note that it doesn't distinguish between a child that's a folder and a child that's a file. The only difference really is the content of that child is either the child's folder index or the child's file data. Note: I am simplifying this somewhat but this gets the point across.
- The index file will get fragmented. When it gets too fragmented, you will be unable to add files to that folder. This is because there is a limit on the # of fragments that's allowed. It's by design. I've confirmed it with Microsoft in a support incident call. So although the theoretical limit to the number of files that you can have in a folder is several billions, good luck when you start hitting tens of million of files as you will hit the fragmentation limitation first.
- It's not all bad however. You can use the tool: contig.exe to defragment this index. It will not reduce the size of the index (which can reach up to several Gigs for tens of million of files) but you can reduce the # of fragments. Note: The Disk Defragment tool will NOT defrag the folder's index. It will defrag file data. Only the contig.exe tool will defrag the index. FYI: You can also use that to defrag an individual file's data.
- If you DO defrag, don't wait until you hit the max # of fragment limit. I have a folder where I cannot defrag because I've waited until it's too late. My next test is to try to move some files out of that folder into another folder to see if I could defrag it then. If this fails, then what I would have to do is 1) create a new folder. 2) move a batch of files to the new folder. 3) defrag the new folder. repeat #2 & #3 until this is done and then 4) remove the old folder and rename the new folder to match the old.
To answer your question more directly: If you're looking at 100K entries, no worries. Go knock yourself out. If you're looking at tens of millions of entries, then either:
a) Make plans to sub-divide them into sub-folders (e.g., lets say you have 100M files. It's better to store them in 1000 folders so that you only have 100,000 files per folder than to store them into 1 big folder. This will create 1000 folder indices instead of a single big one that's more likely to hit the max # of fragments limit or
b) Make plans to run contig.exe on a regular basis to keep your big folder's index defragmented.
Read below only if you're bored.
The actual limit isn't on the # of fragment, but on the number of records of the data segment that stores the pointers to the fragment.
So what you have is a data segment that stores pointers to the fragments of the directory data. The directory data stores information about the sub-directories & sub-files that the directory supposedly stored. Actually, a directory doesn't "store" anything. It's just a tracking and presentation feature that presents the illusion of hierarchy to the user since the storage medium itself is linear.
I am building a File-Structure to host up to 2 billion (2^32) files and performed the following tests that show a sharp drop in Navigate + Read Performance at about 250 Files or 120 Directories per NTFS Directory on a Solid State Drive (SSD):
- The File Performance drops by 50% between 250 and 1000 Files.
- The Directory Performance drops by 60% between 120 and 1000 Directories.
- Values for Numbers > 1000 remain relatively stable
Interestingly the Number of Directories and Files do NOT significantly interfere.
So the Lessons are:
- File Numbers above 250 cost a Factor of 2
- Directories above 120 cost a Factor of 2.5
- The File-Explorer in Windows 7 can handle large #Files or #Dirs, but Usability is still bad.
- Introducing Sub-Directories is not expensive
This is the Data (2 Measurements for each File and Directory):
(FOPS = File Operations per Second)
(DOPS = Directory Operations per Second)
#Files lg(#) FOPS FOPS2 DOPS DOPS2
10 1.00 16692 16692 16421 16312
100 2.00 16425 15943 15738 16031
120 2.08 15716 16024 15878 16122
130 2.11 15883 16124 14328 14347
160 2.20 15978 16184 11325 11128
200 2.30 16364 16052 9866 9678
210 2.32 16143 15977 9348 9547
220 2.34 16290 15909 9094 9038
230 2.36 16048 15930 9010 9094
240 2.38 15096 15725 8654 9143
250 2.40 15453 15548 8872 8472
260 2.41 14454 15053 8577 8720
300 2.48 12565 13245 8368 8361
400 2.60 11159 11462 7671 7574
500 2.70 10536 10560 7149 7331
1000 3.00 9092 9509 6569 6693
2000 3.30 8797 8810 6375 6292
10000 4.00 8084 8228 6210 6194
20000 4.30 8049 8343 5536 6100
50000 4.70 7468 7607 5364 5365
And this is the Test Code:
[TestCase(50000, false, Result = 50000)]
[TestCase(50000, true, Result = 50000)]
public static int TestDirPerformance(int numFilesInDir, bool testDirs) {
var files = new List<string>();
var dir = Path.GetTempPath() + "\\Sub\\" + Guid.NewGuid() + "\\";
Directory.CreateDirectory(dir);
Console.WriteLine("prepare...");
const string FILE_NAME = "\\file.txt";
for (int i = 0; i < numFilesInDir; i++) {
string filename = dir + Guid.NewGuid();
if (testDirs) {
var dirName = filename + "D";
Directory.CreateDirectory(dirName);
using (File.Create(dirName + FILE_NAME)) { }
} else {
using (File.Create(filename)) { }
}
files.Add(filename);
}
//Adding 1000 Directories didn't change File Performance
/*for (int i = 0; i < 1000; i++) {
string filename = dir + Guid.NewGuid();
Directory.CreateDirectory(filename + "D");
}*/
Console.WriteLine("measure...");
var r = new Random();
var sw = new Stopwatch();
sw.Start();
int len = 0;
int count = 0;
while (sw.ElapsedMilliseconds < 5000) {
string filename = files[r.Next(files.Count)];
string text = File.ReadAllText(testDirs ? filename + "D" + FILE_NAME : filename);
len += text.Length;
count++;
}
Console.WriteLine("{0} File Ops/sec ", count / 5);
return numFilesInDir;
}