This assignment helps you understand what table columns deserve an index through a series of exercises on a database schema. You do not type SQL in this assignment, but rather identify which columns deserve indexes, the type of index it deserves, and why. Correct placement of indexes is absolutely critical to database performance. All but the smallest of databases would have terrible performance if no indexes were assigned to any of its table columns.
An index is a physical construct that serves two purposes speeding up data retrieval and enforcing uniqueness. Indexes are not modeled in logical entity-relationship diagrams, because indexes do not operate at the logical level of abstraction, as do tables and table columns. While the primary purpose of creating an index is to speed up data access, modern relational DBMS also support a type of index that additionally enforces a uniqueness constraint. A uniqueness constraint can be assigned to a table column directly, or is indirectly assigned to a table column when it is assigned a primary key constraint. While a uniqueness constraint is a logical construct, many modern relational DBMS physically implement this constraint through an index.
You might reasonably ask the question, Why not simply add indexes to every column in the schema? After all, then we would not need to concern ourselves with index placement. The primary reason is that while indexes speed up reading from the database, indexes slow down writing to the database. Indexes associated with a table slow down writes to that table, because every time data is added to, modified, or deleted from the table, the indexes referencing the data must be modified. A secondary reason is that indexes increase the size of our database, and that not only affects storage requirements, but also affects database performance since the buffer cache will need to handle the increased size.
Let us begin exploring index placement by first examining the album schema below. see image.
1. The first item deserving our attention is that many modern relational DBMS, including Oracle and SQL Server, automatically add unique indexes to table columns covered by a primary key constraint.
Identify the primary key columns in the album schema, using the standardized dot notation, that is, TableName.ColumnName. We do not need to add indexes to these since the DBMS will create them automatically for us.
2. The next thing we want to do is to add indexes to all foreign key columns. We do this without concerning ourselves with any SQL queries (as we do in #3 below). Some DBMS, including Oracle, will sometimes escalate a row-level lock to a page-level lock when a SQL join is performed using a foreign key that has no index. The focus of this assignment is not locking, so I will not get into fine details, but suffice it to say that page-level locks are always bad for transactions because they result in deadlocks over which the database developer has no control. Another reason we index all foreign key columns is because the vast majority of the time, the foreign key will be used in the WHERE clause of SQL queries that perform a join on that table.
Identify the foreign key columns in the album schema, using the standardized TableName.ColumnName notation. For each foreign key, indicate whether you would create a non-unique index, or a unique index. A non-unique index speeds up data retrieval, but does not enforce a uniqueness on the covered column, so that values in the column can repeat. A unique index speeds up data retrieval and also enforces uniqueness on the covered column. Unique indexes obtain better performance than non-unique indexes for some queries, because the DBMS query optimizer knows that each key requested from a unique index will at most have one value, while each key requested from a non-unique index may have many values. Therefore if it is guaranteed that values will not repeat, it is better to use unique indexes. However, adding a unique index on a column that has values that can repeat will cause erroneous transaction abortions every time a repeated value is added to the column, so it is important to correctly discern which type of index is needed.
3. This step concerns itself with adding indexes needed by specific queries. We have already identified the primary key columns for which the DBMS creates indexes automatically, and have identified the foreign key indexes. The final step in the process of adding indexes to a schema is to index all columns referenced in the WHERE clause and join conditions of all pertinent SQL queries. The WHERE clause and join conditions in a SQL query contains conditions that specify what rows from the tables will be present in the result set. The query optimizer makes heavy use of these conditions to ensure that the results are retrieved in a timely fashion. For example, if the underlying table has a billion rows, but a condition in the WHERE clause restricts the result set to five rows, a good plan from the query optimizer will only access a small number of rows in the underlying table as opposed to the full billion rows.
There are many factors to consider as to what makes a SQL query a candidate to influence index creation, and in truth this decision is moderately subjective. Therefore I will explain some common reasons as to why a SQL query would be chosen to influence index creation, but will not ask you to do so for the queries I give you below.
One significant factor is frequency of use. If a query is used often, as is usually the case when it is embedded in an application, then it is important that we create indexes for that query. If we create a one-time-use ad-hoc query, we would usually not concern ourselves with creating indexes for that query, unless another factor influences our decision.
This brings me to another significant factor, which is performance of the SQL query without indexes. If a query would take hours or days to run without proper indexes for it, generally we would add the indexes for it regardless of how often the query executes. Even if we could wait a long time for the results, if a query is executing for hours or days, it is using resources in the database to do its work, and we usually need these resources to be available to other queries. On the flipside, if the query will not perform much better with the indexes present, usually because it is only querying from very small tables, then adding the indexes provides marginal benefit and we might want to avoid creating indexes for the query.
a. Lets start with a single table query that retrieves the minutes and seconds for the Moods for Moderns song.
SELECT s.song_length_minutes, s.song_length_seconds
FROM Song s
WHERE s.song_name = 'Moods For Moderns'
Identify what column you would index to speed up performance for this query. Also indicate whether you would use a unique or non-unique index. Explain your reasoning behind both choices.
b. Now for a two table query that retrieves the artist name for the Power Play album.
SELECT artist.artist_first_name, artist.artist_last_name
FROM Album a
JOIN Recording_artist artist ON a.recording_artist_id = artist.recording_artist_id
WHERE a.album_name = 'Power Play'
Assuming that the foreign key indexes have already been created, identify the additional column needing an index for this query, and specify whether the index would be unique or non-unique. Explain the reasoning behind both choices.
c. Now for a more complex example for a query that contains a subquery. Before I ask you to identify the columns needed for an index, let us review how subqueries affect indexing. Because the WHERE clause and join conditions for the outer query, and for all subqueries, collectively determine what rows from the underlying tables in the schema will be retrieved, all columns in the WHERE clause and join conditions for the subqueries need an index as well. So we are extending the concept we are already familiar with to the subqueries as well.
This query lists the names of all albums that have songs that are less than four minutes in length.
SELECT a.album_name
FROM Album a
WHERE a.album_id IN (SELECT s.album_id
FROM Song s
WHERE s.song_length_minutes < 4)
Assuming that the primary and foreign key indexes have already been created, identify the additional column needing an index for this query, and specify whether the index would be unique or non-unique. Explain the reasoning behind both choices.