Pages

Database Indexes in Oracle

I have often been sucked into intense debates of database indexes and how they work.

In order to explain indexes in a simple way, think of a telephone diary that most of us use (or at-least used to some years back).  A typical telephone diary has tabs from A-Z and we save names along with other information based on the person's last name in the appropriate alphabet page.
For example William Gates would be listed under the G page or Steve Jobs would be under the J page.
The tabs therefore are indexes which enable you to quickly go to the relevant page and look up a name. If there were no tabs you would be forced to flip page by page and search for a name( which equates to a table scan).

Another example (though not directly appropriate) is the Facebook list that you maintain where in the regular suspects are Family, Friends, Acquaintances. However as you keep adding more friends there comes a need to further breakdown the Friends list to Close Friends, School Friends and University friends. This enables you to search for a friend in the appropriate list and locate a name faster than as opposed to looking up a single list called Friends. Of course the assumption here is that you don't use the search field to find a Name.

Database indexes work in a similar way. The more unique values the better the index performance. However in the database world retrieving data is not simple and there are different searches that one needs to perform on a table. Tradeoffs need to be made between performance while retrieving records vs saving/updating records and retrieval speed vs storage space.

Creating an index on a single column, a composite index on a multiple columns or multiple indexes on multiple columns often come up and become hot topics of debate.The debate further becomes skewed if the persons debating do not have proper understanding as to how indexes work.

The reason for choosing the Oracle database is simply because I use it. And while the basic principles of indexing remains the same across different database vendors, each vendor incorporates some quirks into their search logic. A prime example of this is the cost based optimizer introduced by Oracle (which some people also refer to as Voodoo magic).

So here's my understanding how database indexes work in Oracle.

Consider a table X which has columns a, b, c, d, e and f.
Say we index c,d,e,f. (Note: In a composite index the order of the composite columns also play an important role.)
The possible efficient queries are

1. select * from X where c = :c and d = :d and e= :e and f = :f
2. select a,b from X where c = :c and d = :d and e= :e and f = :f
3. select a,b from X where c = :c
4. select c,d,e from X where f =:f
(I have not listed all the possible queries but I hope common sense prevails)

In the above examples Oracle will use the composite index. Even in the 3 query oracle will use the index since c is the first column in the composite index

5. select a,d,e from X where e=:e
In such a case the index will not be used and even with a hint it will cause an inefficient retrieval of data.

6. select c,d,e,f from X where c=:c
In this case oracle will use the composite index as a skinny table and directly search and retrieve data purely using the index.

Lets take another set of indexes where say, there is one index on (e,f).
And then single indexes on a, b, c, d

The possible efficient queries are
1. select *  from X where e=:e and f =: f
2. select * from X where a=:a (and similarly for all index columns)

However consider the following:
3. select * from X where e=:e
Oracle will still used the composite index (e,f). Thus making the query efficient.

4. select * from X where a:=a and b:=b and c=:c
Oracle will either say let me do a full table scan or use the first index and then do a table scan to fulfill the other search. You could also use hints to tell oracle to use the indexes on a, b and c. This will still be in-efficient as opposed to having a composite index on all 3 columns (a,b,c).


So before you decide on your indexing strategy, you need to first define your queries for retrieving data and then make a decision.