Meta Search Engine Architecture

Have a look at Lucene.

Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It is a technology suitable for nearly any application that requires full-text search, especially cross-platform.


If you look at Garlic (pdf), you'll notice that its architecture is generic enough and can be adapted to a meta-search engine.

UPDATE:

The rough architectural sketch is something like this:

   +---------------------------+
   |                           |
   |    Meta-Search Engine     |         +---------------+
   |                           |         |               |
   |   +-------------------+   |---------| Configuration |
   |   | Query Processor   |   |         |               |
   |   |                   |   |         +---------------+
   |   +-------------------+   |
   +-------------+-------------+
                 |
      +----------+---------------+
   +--+----------+-------------+ |
   |             |             | |
   |     +-------+-------+     | |
   |     |    Wrapper    |     | |
   |     |               |     | |
   |     +-------+-------+     | |
   |             |             | |
   |             |             | |
   |     +-------+--------+    | |
   |     |                |    | |
   |     | Search Engine  |    | |
   |     |                |    +-+
   |     +----------------+    |
   +---------------------------+

The parts depicted are:

  • Meta-Search Engine - the engine, orchestrates the whole thing.
  • Query Processor - part of the engine, resolves capabilities, sends requests and aggregates results of specific search engines (through the wrappers).
  • Wrapper - bridges the meta-search engine API to specific search engines. Each wrapper works with a specific search engine. Exposes the external search engine capabilities to the meta-search engine, accepts and responds to search requests.
  • Search engine - external search engines to query, they're exposed to the meta-search engine through the wrappers.
  • Configuration - data that configures the meta-search engine, e.g., which wrappers to use, where to find more wrappers, etc. Can also configure the wrappers.