Databases / Chapter 2: SQL Commands / SQL Commands

Database Indexes

Consider the following table called students:

id given_name family_name

451834 Mary Smith

134922 Emma Lin

652331 Lucas Smalls

820980 Billy Homes

id	given_name	family_name
451834	Mary	Smith
134922	Emma	Lin
652331	Lucas	Smalls
820980	Billy	Homes

Suppose that we want the know the name of the student with ID 134922. We can issue the following query:

SELECT given_name,family_name FROM students WHERE id=134922

serve_db iterates over the students table to find the student with id 134922. This process is very fast since the server only needs to process 4 rows. But what if the table contained 1 billion rows? If the server can check 1 million rows per second, the query would take 1000 seconds to execute. But we don't want users to wait 16 minutes for a webpage to load. Can we speed up the query execution?

In the data structures course, we used binary search trees to improve search performance from O(n) to O(log(n)). In our example, n = 1 billion, which means the search performance would improve from O(1 billion) to O(30) (since 1 billion is around 2^30). Instead of taking 16 minutes to query the table, the server would take 30 microseconds!

In a database, the search tree is called an index. A database index is an additional data structure that helps the database server find records, just as an index of a book helps the reader quickly find topics. The index does not replace the table.

For the table above, the database server can construct the following index:

Each node contains the row ID and the index (location) of the row. When the query is issued, serve_db searches for 134922 in the tree, starting at the root. The root node value is 451834, so the server moves to the left child. The left child value is 134922, which is the ID the server was looking for. The left child also contains 1, which indicates that the row with ID 134922 is at index 1 of the students table. Thus, the database server retrieves the row at index 1 and adds the given_name and family_name to the output.

To save space, an index node typically do not contain an entire row. Instead, the node contains the column that we want to search and the pointer to the row that contains the column. For example, if we want to search the family name instead of the student ID, we can create another index based on the family name. This means that a database table can have many indexes that speed up different types of queries.

The database server can also use the index for other types of operations. For instance, the index can help the server quickly find rows to update or delete. When the table is updated, the server also update the index (if necessary). For example, when serve_db is inserting a new row, serve_db also needs to add a new node that points to the new row to the database indexes. The server may also have to balance the tree. The index update operations add O(log(n)) time to query execution, which is usually an acceptable amount of overhead.

Databases indexes are typically B-trees, which is a search tree that can store multiple values in a single node. B-trees share the same Big O performance as BSTs but are better suited to database servers where disk operations take much longer than CPU processing times.

To create a database index, we issue the following command:

CREATE INDEX index_name ON table_name (column_name)

For example, to create an index on the "id" column, we issue the following command:

CREATE INDEX id_index ON students (id)

Quiz: Check All That Apply (1 point)

Please choose all true statements

Quiz (1 point)

Please write the SQL statement that creates an index called name_index for the students table. The column to index is family_name.

Previous Lesson Next Lesson

Comments

Please log in to add comments