SQL Server Indexes: Clustered and Non-Clustered

Clustered Indexes

A clustered index, as the name suggests, is an index that defines the physical order of data within the table. It sorts rows in the data pages based on the key values. Clustered indexes are most useful for columns that are queried frequently, and arrange the data rows into a B-tree structure based on the indexed column values.

In SQL server, each table can have only one clustered index, also referred to as the Clustered Index Key (CIK). The CIK can be defined in either ascending or descending order.

When creating a clustered index, it is important to ensure that the indexed column represents a unique, sequential value to avoid data fragmentation, which can impact query performance. For instance, indexing a column with non-unique values will lead to multiple data entries for each indexed value, which can result in page splits and high disk activity.

Clustered indexes can be created using a T-SQL statement, such as:

CREATE CLUSTERED INDEX [CI_EmployeeID] ON [dbo].[Employees]
(
	[EmployeeID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,
IGNORE_DUP_KEY = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

In this example, a clustered index named [CI_EmployeeID] is created on the [EmployeeID] column in the [Employees] table.

It is worth noting that clustered indexes take up physical storage space on disk proportional to the size of the indexed table, so creating too many clustered indexes can result in increased disk activity and potentially slower queries.

Non-Clustered Indexes

Unlike a clustered index, a non-clustered index does not determine the physical order of data in a table. Instead, it creates a separate structure for the indexed columns that contains pointers to the actual data rows. This means that non-clustered indexes can often provide faster performance for queries that filter on non-indexed columns because it doesn’t need to scan the entire table.

Non-clustered indexes can be created on any column in a table, and a table can have multiple non-clustered indexes. When creating a non-clustered index, it is important to consider the columns being included and excluded as they can impact both the index table size and query performance. Index key columns must adhere to the same column size restrictions as a clustered index.

To create a non-clustered index in SQL Server, use the following T-SQL statement:

CREATE NONCLUSTERED INDEX [IX_Employee_LastName] ON [dbo].[Employees]
(
	[LastName] ASC,
	[FirstName] ASC
)
INCLUDE ( [HireDate], [PhoneNumber])

In this example, a non-clustered index named [IX_Employee_LastName] is created on the [LastName] and [FirstName] columns in the [Employees] table, with [HireDate] and [PhoneNumber] included columns.

When using non-clustered indexes, it is important to monitor index fragmentation to maintain optimal performance. Index fragmentation is caused when index data pages are out of order and can be resolved using T-SQL maintenance commands such as ALTER INDEX REORGANIZE and ALTER INDEX REBUILD.

Key Differences

While both clustered and non-clustered indexes store data in a B-tree structure, the main difference is that with a clustered index, the physical order of data in a table is determined by the indexed column. With non-clustered indexes, only the column data is stored in the index, with pointers to the physical data rows.

Another key difference is that a table can only have one clustered index, while it can have multiple non-clustered indexes. Clustered indexes are typically used on columns with unique data, such as primary keys, while non-clustered indexes can be created on any column.

When querying data, a non-clustered index can often provide faster performance if the filtered column is part of the index. In contrast, a clustered index can provide faster performance when returning all data from the table in clustered index order, as the server doesn’t need to perform a separate sort operation.

It is also worth noting that clustered indexes take up more disk space than non-clustered indexes, as the clustered index contains both the index data and the actual data. Non-clustered indexes only contain index data.

When deciding between using a clustered or non-clustered index, it is important to consider the specific needs of the query and the data being returned. In general, clustered indexes should be used for large datasets where data is being queried in order, while non-clustered indexes should be used for filtered queries.

To illustrate, let’s say we have a large employees table with many columns, and we want to query data on the EmployeeID column. If EmployeeID is a unique column, we may want to use a clustered index for optimal performance. However, if we want to query data based on department or hire date, we may want to use a non-clustered index instead.

Appropriate Usage

Understanding when to use a clustered index versus a non-clustered index is key in optimizing query performance. The appropriate usage can often depend on the type of data being queried and the specific needs of the query.

Clustered indexes are most useful for frequently queried columns where data is returned in order. For instance, a clustered index on a unique EmployeeID column can be ideal for returning a list of employees in ascending or descending order based on their ID. Another example would be a bank transaction table where transactions occur in chronological order.

Non-clustered indexes, on the other hand, are most useful for filtering queries that involve a smaller subset of data, such as querying for all employees hired in a specific year. Non-clustered indexes should be used for queries that involve a wide range of values or when the indexed columns have low cardinality (few distinct values).

It’s also worth noting that non-clustered indexes can often improve the performance of join operations. For example, if we have a join operation involving a large Transaction table and a smaller Customers table, creating a non-clustered index on the CustomerID column of the Transaction table can significantly improve performance by reducing the number of rows scanned during the join operation.

When deciding between a clustered or non-clustered index, it’s important to carefully consider the size of the data, frequency of queries, and specific data being queried. Using both in conjunction can be a powerful way to optimize query performance.

In summary, clustered indexes are best suited for frequently queried large datasets where data is being retrieved in order, while non-clustered indexes are ideal for filtering queries or when the indexed column has low cardinality. Ultimately, the appropriate usage of clustered and non-clustered indexes will depend on the specific needs of the query and the underlying data.

Summary

SQL Server indexes can significantly improve the performance of database queries, but it’s important to understand the differences between clustered and non-clustered indexes and when to use each one. Clustered indexes physically order data in a table based on the indexed column, while non-clustered indexes only store index data. A table can have only one clustered index, but multiple non-clustered ones. Clustered indexes are useful for frequently queried columns where data is returned in order, while non-clustered indexes are more suitable for filtering queries. It’s important to carefully consider the size of the data, frequency of queries, and specific data being queried when deciding between a clustered or non-clustered index.