A better way to write this query?
While your query is valid, there is a lot I would differently.
Don't use CaMeL-case names in Postgres if it can be avoided. Your unnamed entity framework may force this nonsense upon you, but I don't want to deal with the double-quote mess, so I tested with your schema after removing all double-quotes - effectively making all identifiers lower-cased.
- Are PostgreSQL column names case-sensitive?
Don't use illegible or illegal column and table aliases (like
"wordTexts.WordId"
). That's a matter of taste and style (and sanity), but you also omitted the keywordAS
where you should not and kept it where you could omit it.- Date column arithmetic in PostgreSQL query
I also formatted some more to make it easier for me to grok the query. That last part is totally optional. But use some consistent formatting style.
Arriving at this:
SELECT *
FROM PulledTexts
WHERE Id IN (
SELECT w.TextIdId
FROM WordTexts w -- AS can be omitted for table alias
LEFT JOIN UniqueWords u ON w.WordIdId = u.Id -- LEFT JOIN might be necessary here
WHERE u.WordText = 'automate'
OR w.TextIdId IN (
SELECT w.TextIdId -- AS and1 -- column alias only documentation here, not visible
FROM WordTexts w
JOIN UniqueWords u ON w.WordIdId = u.Id -- LEFT JOIN misleading here
WHERE u.WordText = 'audit'
INTERSECT
SELECT w.TextIdId -- AS and2 -- but don't omit AS for column alias
FROM WordTexts w
JOIN UniqueWords u ON w.WordIdId = u.Id
WHERE u.WordText = 'trial'
)
);
Which can be simplified to:
SELECT *
FROM (
SELECT w.TextIdId AS Id
FROM WordTexts w
JOIN UniqueWords u ON w.WordIdId = u.Id -- now we don't need LEFT any more
WHERE u.WordText = 'automate'
UNION
SELECT w.TextIdId
FROM WordTexts w
JOIN UniqueWords u ON w.WordIdId = u.Id
WHERE u.WordText = 'audit'
INTERSECT
SELECT w.TextIdId
FROM WordTexts w
JOIN UniqueWords u ON w.WordIdId = u.Id
WHERE u.WordText = 'trial'
) w
JOIN PulledTexts p USING (Id)
We do not need additional parentheses because, according to the manual:
INTERSECT
binds more tightly thanUNION
. That is,A UNION B INTERSECT C
will be read asA UNION (B INTERSECT C)
.
But this will be faster when replacing multiple intersected subqueries:
SELECT *
FROM (
SELECT w.TextIdId AS Id
FROM WordTexts w
JOIN UniqueWords u ON w.WordIdId = u.Id
WHERE u.WordText = 'automate'
UNION
SELECT TextIdId
FROM WordTexts w1
JOIN WordTexts w2 USING (TextIdId)
WHERE w1.WordIdId = (SELECT Id FROM UniqueWords WHERE WordText = 'audit')
AND w2.WordIdId = (SELECT Id FROM UniqueWords WHERE WordText = 'trial')
) w
JOIN PulledTexts p USING (Id)
The INTERSECT
part can be cast as relational division problem. Explanation in this related answer from just yesterday:
- Counting with multiple joins
db<>fiddle here
Most important for performance is to have the right indexes. You probably should have a UNIQUE
constraint on (WordIdId, TextIdId)
in table WordTexts
, which implements the currently missing index on those two columns in this order.
- Does creating a unique constraint on a Postgres column remove the need to index it?