Reasonable way to predict query performance over time
With a good index present, the time taken to locate a matching row should scale roughly logarithmically, as long as you have room for the index in memory.
I'd make the index UNIQUE
since the basename must be unique otherwise your workflow is invalid, and it makes the index more efficient.
CREATE UNIQUE INDEX IX_raw_records_basename
ON dbo.raw_records (basename);
Check the execution plan for the query to ensure the index is being used.
Make sure you have enough room in memory for the index, and assuming concurrency won't be a massive problem, you should be good for a very large number of rows.
I'd reconsider the length of the basename
and filename
columns since the query optimizer will use the length when calculating how much memory it will need to allocate to run the query. If, for instance, the basename
column will never hold more than 20 characters, but you have it defined as 512 characters, the memory grant for SELECT basename FROM dbo.raw_records;
would be 25.6 times larger than actually required. Column lengths are actually much more important than most people realize.
You could also change the query to be SELECT 1 FROM table1 WHERE basename = <basename>
that way you wouldn't even need the id
since all you're trying to do is verify its existence. Only do what you really need. It looks like the index you show in your question would work fine for that.
In addition, if space is an issue you may want to consider data compression of the indexes and table. This should allow the index to fit in a smaller memory footprint. Evaluate DATA_COMPRESSION = ROW
versus DATA_COMPRESSION = PAGE
to see which is the best compression method for your requirements.