SQL LIKE, how to sort results by weighted occurrence count?

Okay, let's have your search terms in a temp table:

CREATE TABLE #SearchTerms (Term varchar(50) not null)
insert into #SearchTerms (Term)
select 'yellow' union all
select 'large' union all
select 'widgets'

And let's do something silly:

select
    widgets.ID,
    (LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) as DescScore
    (LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) as TitleScore
from
    widgets,#SearchTerms

We've now counted each occurrence of each term, in both the description and the title.

So now we can sum and weight those occurrences:

select
    widgets.ID,
    SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
    ((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
from
    widgets,#SearchTerms
group by
    Widgets.ID

And if we need to do more with this, I'd recommend putting the above in a subselect

select
    w.*,CombinedScore
from
    widgets.w
       inner join
    (select
        widgets.ID,
        SUM((LEN(description) - LEN(REPLACE(description,Term,''))) / LEN(Term) +
        ((LEN(title) - LEN(REPLACE(title,Term,''))) / LEN(Term) *5)) as CombinedScore
    from
        widgets,#SearchTerms
    group by
        Widgets.ID
    ) t
        on
            w.ID = t.ID
where
    CombinedScore > 0
order by
    CombinedScore desc

(Note that I've assumed there's an ID column in all these examples, but that can be expanded into as many columns as are necessary to define the PK in the widgets table)


The real trick here is counting occurrences of a word within a larger body of text, which is done by:

(LEN(text) - LEN(text with each occurrence of term removed)) / LEN(term)

Choose one of the following options:

  • use a fulltext search engine (Lucene or similiar) to do weighted full text search;
  • see this other StackOverflow question
  • do multiple SELECTs, each one with a relevance field, merge them with UNION and sort the result;
  • sort your results in your application, after having retrieved the results.