How do i implement tag searching? with lucene?
Edit: You can use Lucene. Here's an explanation how to do this in Lucene.net. Some Lucene basics are:
- Document - is the storage unit in Lucene. It is somewhat analogous to a database record.
- Field - the search unit in Lucene. Analogous to a database column. Lucene searches for text by taking a query and matching it against fields. A field should be indexed in order to enable search.
- Token - the search atom in Lucene. Usually a word, sometimes a phrase, letter or digit.
- Analyzer - the part of Lucene that transforms a field into tokens.
Please read this blog post about creating and using a Lucene.net index.
I assume you are tagging blog posts. If I am totally wrong, please say so. In order to search for tags, you need to represent them as Lucene entities, namely as tokens inside a "tags" field.
One way of doing so, is assigning a Lucene document per blog post. The document will have at least the following fields:
- id: unique id of the blog post.
- content: the text of the blog post.
- tags: list of tags.
Indexing: Whenever you add a tag to a post, remove a tag or edit it, you will need to index the post. The Analyzer will transform the fields into their token representation.
Document doc = new Document();
doc.Add(new Field("id", i.ToString(), Field.Store.YES, Field.Index.NO));
doc.Add(new Field("content", text, Field.Store.YES, Field.Index.TOKENIZED));
doc.Add(new Field("tags", tags, Field.Store.YES, Field.Index.TOKENIZED));
writer.AddDocument(doc);
The remaining part is retrieval. For this, you need to create a QueryParser and pass it a query string, like this:
QueryParser qp = new QueryParser();
Query q = qp.Parse(s);
Hits = Searcher.Search(q);
The syntax you need for s will be:
tags: apples tags: carrots
To search for apples or carrots
tags: carrots NOT tags: apples
See the Lucene Query Parser Syntax for details on constructing s.
Lucene for .net seems to be mature. No need to use Java or SOLR
The Standard query language for Lucene allows equally ranked search terms and negation
So if your Lucene index had a field "tag" your query would be
tag:apple* OR tag: carrot*
Which would give equal ranking to each word, and more rank weighting to document with both tags
To negate a tag use this
tag:carrot* NOT tag:apple*
Simple example to show indexing and querying with Lucene here