what does a B-tree index on more than 1 column look like?
In Oracle a composite key index can be used even though the leading columns are not filtered. This is done through three mechanisms:
- A fast full index scan, in which multiblock reads are used to traverse the entire index segment.
- An index full scan, in which the index is read in the logical order of the blocks (I believe I read that in recent versions Oracle can use multiblock reads for this, but really you should count on single block reads)
- An inddex skip scan, where a very low cardinality for the non-predicated leading columns allows Oracle to perform multiple index range scans, one for each unique value of the leading column(s). These are pretty rare in my experience.
Look for articles by Richard Foote or Jonathan Lewis for more information on Oracle index internals.
Some implementations simply concatenate the values in the order of the columns, with delimiters.
Another solution is to simply have a b-tree within a b-tree. When you hit a leaf on the first column, you get both a list of matching records and a mini b-tree of the next column, and so on. Thus, the order of the columns specified in the index makes a huge difference on whether that index will be useful for particular queries.
Here's a related question I wrote last week:
Does SQL Server jump leaves when using a composite clustered index?
With most implementations, the key is simply a longer key that includes all of the key values, with a separator. No magic there ;-)
In your example the key values could look something like
"123499|John Doe|Conway, NH" "32144|Bill Gates| Seattle, WA"
One of the characteristics of these indexes with composite keys is that the intermediate tree nodes can be used in some cases to "cover" the query.
For example, if the query is to find the Name and City given the ID, since the ID is first in the index, the index can search by this efficiently. Once in the intermediate node, it can "parse" the Name and City, from the key, and doesn't need to go to the leaf node to read the same.
If however the query wanted also to display the phone number, then the logic would follow down the leaf when the full record is found.
Imagine that the key is represented by a Python tuple (col1, col2, col3) ... the indexing operation involves comparing tuple_a
with tuple_b
... if you have don't know which value of col1 and col2 that you are interested in, but only col3, then it would have to read the whole index ("full index scan"), which is not as efficient.
If you have an index on (col1, col2, col3), then you can expect that any RDBMS will use the index (in a direct manner) when the WHERE clause contains reference to (1) all 3 columns (2) both col1 and col2 (3) only col1.
Otherwise (e.g. only col3 in the WHERE clause), either the RDBMS will not use that index at all (e.g. SQLite), or will do a full index scan (e.g. Oracle) [if no other index is better].
In your specific example, presuming that id is a unique identifier of a customer, it is pointless to have it appear in an index (other than the index that your DBMS should set up for a primary key or column noted as UNIQUE).