What indexes should I create on a table?

While designing a software system that involves a database, this question inevitably pops up in the minds of software developers. Indexes improve the speed of data retrieval operations, and who wouldn’t like their queries to be faster? Aside, database design is such an important decision in the lifetime of the software. Your application code is most likely to change over a few years, unlike your database schema which is most likely to stay the same. “Data matures like wine, applications like fish.”

A fitting example pointed here is about Last.fm, who were not able to allow change of usernames despite ten years of feature requests.
This article helps you understand the idea of database indexes from the fundamentals, so you can decide when to use one.

What’s an index?

Let’s assume you have a table in your database similar to this:

  CREATE TABLE blogpost (
    id integer,
    slug varchar,
    content varchar

And your application fetches individual blog posts frequently, using queries like this:

  SELECT content from blogpost WHERE slug = "greatest-post-ever";

To find all matching entries, the system would have to scan the entire blogpost table, row by row. Assuming there are many rows in the table, and only a few rows (in this case, zero or one) that would match the query, this is clearly an inefficient method. Instead, if the system maintains an index A great analogy is the alphabetical index of frequently looked up terms that you see at the back of most non-fiction books. You can scan the index and quickly flip to the appropriate pages, rather than read the entire book. on the column slug, it can use a more efficient method for locating matching rows. Databases use data structures like balanced trees, B+ trees and hashes to implement indexes; and allow you to create different types of indexes to suit your needs.

Which fields should be indexed?

Here are four simple rules thumb to keep in mind:

  1. Index every primary key.
  2. Index every foreign key.
  3. Index every column used in a JOIN clause.
  4. Index every column used in a WHERE clause.

If you are using a modern database, chances are the system will automatically create some essential indexes for you – like those on primary keys and foreign keys. It’s very helpful to understand what kind of queries are going to be made by the application, like in our example earlier, to figure out which additional columns should be indexed. Patterns like multicolumn indexes are a helpful pattern when you frequently make queries involving lookups on more than one column together.

Here be dragons.

Database indexes are indeed a great idea, and you might be tempted to index all the things Some one actually did this. For science.. Not so fast! Every index you add adds some overhead to the database. Generally, write operations (DELETE, UPDATE) become more expensive, while read operations (SELECT) generally just benefit. Too many indexes can exhaust the cache memory so that even read operations can suffer.

Finally, as with all great things, database indexes are a double-edged sword that should be wielded very carefully. There is no way around deeply understanding your own database schema and how it’s going to be used by the application to do this well.

Further reading

  1. Indexes
  2. Efficient Use of PostgreSQL Indexes
  3. Use the index, Luke