Slow JOIN on tables with millions of rows

I think you need this index (as Krismorte suggested):

CREATE NONCLUSTERED INDEX [IX dbo.value_text id_file, id_field, value]
ON dbo.value_text (id_file, id_field, [value]);

The following index is probably not required as you appear to have a suitable existing index (not mentioned in the question) but I include it for completeness:

CREATE NONCLUSTERED INDEX [IX dbo.files cid (id, year, name)]
ON dbo.files (cid)
INCLUDE
(
    id,
    [year],
    [name]
);

Express the query as:

SELECT
    FileId = F.id,
    [FileName] = F.[name],
    FileYear = F.[year],
    V.[value]
FROM dbo.files AS F
JOIN dbo.clients AS C
    ON C.id = F.cid
OUTER APPLY
(
        SELECT DISTINCT
            VT.[value]
        FROM dbo.value_text AS VT
        WHERE
            VT.id_file = F.id
            AND VT.id_field = 65739
) AS V
WHERE 
    C.id = 10
OPTION (RECOMPILE);

This should give an execution plan like:

expected plan

The OPTION (RECOMPILE) is optional. Only add if you find the ideal plan shape is different for different parameter values. There are other possible solutions to such "parameter-sniffing" issues.

With the new index, you may also find the original query text produces a very similar plan, also with good performance.

You may also need to update the statistics on the files table, since the estimate in the supplied plan for cid = 19 is not accurate:

Inaccurate estimate

After updating the statistics on the files table the query is working very fast in all the cases. If in the future I add more fields in "files" table, should I update the index or something?

If you add more columns to the file table (and use/return them in your query) you will need to add them to the index (minimally as included columns) to keep the index "covering". Otherwise, the optimizer may choose to scan the files table rather than looking up the columns not present in the index. You might also choose to make cid part of a clustering index on that table instead. It depends. Ask a new question if you want clarification on these points.

Slow JOIN on tables with millions of rows

Tags:

Performance

Query Performance

Azure Sql Database

Related

Recent Posts