SQL Server: Using Dual Databases for Performance?
Two things
You don't actually say you have an index on the table -- I expect this would just solve your problem. An index on f_application, f_computername, f_endtime, f_starttime should make your update time tiny with only 6mill records.
If you want to split it up, don't do it the way you describe, create a table for open but not closed records that you use before your current table. Then when something "updates" delete it from the that staging table and insert it in your big table. Using a staging table in this way is considered a leading/best practice - having a table arbitrarily split in two is always a nightmare to work with
You can do better than two databases. There are two things you should look at in your existing database before sharding off part of the old data:
Choose a good clustered index. There are three rules you should follow for the clustered index to work well with this data:
- It should use an increasing value, so that new records will always belong at the end of the table in cluster order, or at least in the last page. This is especially important when you have lots of inserts, as in this case. Something like an identity/autoincrement field, but you'll see in a moment why we can do better.
- It should uniquely or nearly-uniquely identify the record, so the updates for the application close records will be fast.
- You should be able to know the clustered index based on the application close records coming into the DB (this rules out the identity column from earlier).
- You don't want anything changed by the application close records to be part of the index, as that could force the database to need to move the record to a new location on the disk when you have updates.
If there is an increasing timestamp (ie:
f_starttime
), that may good for the first field in the index, as long as it's also part of the close record as indicated in requirement #3. Add any other fields that you'll need to uniquely or nearly-uniquely identify a record. Note that you can still use an identity column for the table. Just don't use it as the first column in the clustered index. Based the sql code in the question, I might go withf_starttime, f_computername, f_application, f_ID
.Even if you go with the staging table suggested in the other answer, these index changes may still be a good idea.
- Table partitioning. Table partitioning helps the db server keep only the recent records in memory, so that older data from the same table can remain on disk. Sql Server 2016 will even let you push the historical data to cloud storage on Azure via Stretch Database.
The other suggestion to keep completed records separate from open records is good, too. Even with that suggest, though, indexing and table partitioning can help as the size of the table for completed records becomes large. You can start looking at sharding old data to a separate (linked) db only after all of these options fail.
Really, though, Sql Server is easily able to handle six million-ish records without resorting to these kinds of tricks (changing the index may still be worth doing, though). Are you sure the server is correctly provisioned for this? You might do just as well simply adding RAM to the server.
Finally, separating a reporting database from the live processing database is also common, and not at all a bad thing to do. We sometimes call this a "Data Warehouse", though that also often involves schema changes and an SSIS process to move the data. This is a good feature to have, because it prevents an accidental mistake in a data analysis query from causing performance problems in production. You can best accomplish this via database mirroring/log shipping to a read-only slave, or more recently via an AlwaysOn Availability Group.
I think a filtered index would fit the bill quite nicely. Based on your comments to the other answers it looks like a sample create index statement would be useful to you. The create index statement would look something like:
CREATE NONCLUSTERED INDEX <index name> ON
<schema>.tb_applicationusage(f_application, f_computername, f_starttime) INCLUDE (f_ID)
WHERE f_endtime IS NULL;
It would be nice if you could put a UNIQUE
constraint on that index but I doubt you'll be able to get away with that because invariably something will happen that will prevent a normal log-out/shutdown of the application. Of course you should replace <index name>
with an appropriate name and <schema>
with the table's schema (probably dbo
). I only added the INCLUDE
statement because it was in your sample query. If you find that you don't need that then feel free to drop it.
Your update statement as it is written isn't an update but a select. Here is a sample of the full query written as an update:
WITH LastLogin_CTE AS
(
SELECT TOP 1
* -- Generally avoid * notation but I used it here because we're just driving an update
FROM tb_applicationusage
WHERE
f_application = @ApplicationName -- Parameterize the name
AND
f_computername = @ComputerName -- Parameterize the name
AND
f_endtime IS NULL
ORDER BY
f_starttime DESC
)
UPDATE LastLogin_CTE
SET
f_endtime = SYSUTCDATETIME();
This update will of course leave orphaned rows that were never logged out. My guess is that is what the monthly report is designed to detect.
Parting thoughts:
- You should probably avoid the
tb_
,f_
, and other prefixes if possible. They just add noise to your object names that make it harder to read and doesn't seem to be industry best practice. - On performance issues you should really include a query plan in your question. That will help us spot missing indexes and provide more useful information.