SQL LIKE, how to sort results by weighted occurrence count?
Okay, let's have your search terms in a temp table:
CREATE TABLE #SearchTerms (Term varchar(50) not null)
insert into #SearchTerms (Term)
select 'yellow' union all
select 'large' union all
select 'widgets'
And let's do something silly:
select
widgets.ID,
(LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) as DescScore
(LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) as TitleScore
from
widgets,#SearchTerms
We've now counted each occurrence of each term, in both the description and the title.
So now we can sum and weight those occurrences:
select
widgets.ID,
SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
from
widgets,#SearchTerms
group by
Widgets.ID
And if we need to do more with this, I'd recommend putting the above in a subselect
select
w.*,CombinedScore
from
widgets.w
inner join
(select
widgets.ID,
SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
from
widgets,#SearchTerms
group by
Widgets.ID
) t
on
w.ID = t.ID
where
CombinedScore > 0
order by
CombinedScore desc
(Note that I've assumed there's an ID column in all these examples, but that can be expanded into as many columns as are necessary to define the PK in the widgets table)
The real trick here is counting occurrences of a word within a larger body of text, which is done by:
(LEN(text) - LEN(text with each occurrence of term removed)) / LEN(term)
Choose one of the following options:
- use a fulltext search engine (Lucene or similiar) to do weighted full text search;
- see this other StackOverflow question
- do multiple
SELECT
s, each one with a relevance field, merge them withUNION
and sort the result; - sort your results in your application, after having retrieved the results.