Basic Search Functionality

    strs = {"first block of text with random content", 
            "different block of text", "1 2 3 4"};

    Nearest[(StringPadRight[#, 50] & /@ strs), "content random with"]

This is a deep and complex question apparently:



Mathematica has a menagerie of built in goodies to assemble your own variant.

  • EditDistance
  • DamerauLevenshteinDistance
  • NeedlemanWunschSimilarity
  • SmithWatermanSimilarity

using Padding to make the strings the same length and then using Levenshtein distance (EditDistance):

Module[{},

    strs = {"first block of text with random content",
            "different block of text", 
            "1 2 3 4"};

    pattern = "content random with";


    lengthOfLongestString = 
      StringLength@First@MaximalBy[strs, StringLength];



    vectors = (PadRight[#, lengthOfLongestString] & /@ToCharacterCode /@ strs);

    EditDistance[#, ToCharacterCode@"content random with"] & /@ vectors

 ]

    (*{25, 32, 39}*)

Using this test the first string is now closest to the pattern


SmithWatermanSimilarity does it :)

strs = {"first block of text with random content", "different block of text", "1 2 3 4"};
metric3 = SmithWatermanSimilarity["content random with", #] &

MaximalBy[strs, metric3]

(* {"first block of text with random content"} *)

How about checking if the string under question matches each of the three words individually. If it matches all 3 words it passes.

In this first version of my code I have done the parsing manually so you can see what it does:

  Cases[{"first block of text with random content", 
         "different block of text", 
         "1 2 3 4"}, 
        a_ /;
    StringMatchQ[a, "*with*"] && 
    StringMatchQ[a, "*content*"] && 
    StringMatchQ[a, "*random*"]]   

>

Automated version

 Cases[
    {"first block of text with random content", 
     "different block of text", 
     "1 2 3 4"}, 

  a_ /; Fold[And, 
           Quiet[StringMatchQ[a, #] & /@ (("*" ~~ # ~~ "*") & /@StringSplit["content random with"])]]]

Another test of greater complexity would be to test if the three words occur adjacently in a string...



As of Mathematica 11:

filenames = Table[CreateFile[], 3];
content = {"first block of text with random content", "different block of text", "1 2 3 4"};

MapThread[Put, {content, filenames}];
index = CreateSearchIndex[filenames];

Perform searches using TextSearch:

Snippet /@ Normal@TextSearch[index, "block"]

In order to rank search results, score them using SearchAdjustment. You can also experiment with SearchQueryString to get a series of search results which have different priorities. Finally, you can post process and rank according to measurements you were not able to capture by the following methods. Perhaps counting the number of times certain words appear as you indicate that you would like to do.


If we define

strs = {"first block of text with random content", 
   "different block of text", "1 2 3 4"};
metric = EditDistance["content random with", #] &
(* EditDistance["content random with", #1] & *)

We can do

MinimalBy[strs, metric]
(* {"different block of text", "1 2 3 4"} *)

This tells us that the 2nd and 3rd string are equal closest by edit distance. A different metric could give a different answer.

EDIT

For matching by words rather than characters, you could use (suggested by @Conor Cosnett)

metric2 = -StringCount[#, StringSplit["content random with"]] &;
MinimalBy[strs, metric2]
(* {"first block of text with random content"} *)