Query optimization is one of the most important skills in your system performance and scalability toolkit, and correct index usage is one of the most effective ways to get good query execution plans. Queries need efficient on-disk access paths to minimize I/O, which then will also minimize memory and CPU usage. However, it is not always clear what kind of index to choose and what the system impact will be if you add or change an index.
Before you can add or change indexes on a system, you need to understand the main types of indexes that SQL Server currently supports. SQL Server 2008 R2 supports eight types of indexes:
• Non Clustered
• Non Clustered with Included Columns
• Full Text
Spatial and XML indexes are designed to organize spatial and XML data. Full text is a special index type that supports searches with character string data types. I am not going to cover these types because they are “special occasion” indexes. In the future, I’ll post a more detailed article about best practices for filtered indexes. See the Types of Indexes topic in SQL Server Books Online on MSDN for more information about each index type.
A quick aside: the term clustered index means something different in Oracle than in SQL Server. In SQL Server, a clustered index is a B-tree (a storage algorithm for optimal retrieval) of the keys that are used in the clustered index definition, with the data stored at the bottom or leaf level of the B-tree. If you use an analogy to a library card catalog, the clustered index can be thought of as the bookshelf where you can physically find the book, and the Dewey Decimal number is the clustering key.
The following diagram (from the topic in SQL Server Books Online on MSDN) illustrates a clustered index organization.
Siemens PLM Software has chosen (and tested) the clustering key for most of the Teamcenter tables, and you should not change them; changes will have a significant impact on your system. A clustered index change will cause all nonclustered indexes to be rebuilt as well, and you do not want to make that change without substantial testing. However, almost all implementations of Teamcenter have been customized, and there may be tables that have been added for your particular installation, so I will include some clustered index best practices in this blog.
Clustered Index Best Practices
I recommend that you always have a clustered index on every table, unless there is a demonstrable reason not to. Except for a query that simply requests all data from a single table, in no specific order, and that query will never include a join, every query needs an underlying index to return results efficiently. I also recommend that the clustering key be the primary key of the table, unless there is a demonstrable reason it not to be. Tables are meant to be joined by a foreign key and a primary key, and having the clustered index key be the primary key facilitates those joins.
There may be scenarios in which the primary key would not seem to be the best choice for the clustering key. But, there are also a lot of recommendations for clustered index design that are targeted at optimization of a single query. You have to test your entire system to find the optimal key for the system as a whole. See theClustered Index Structures topic in SQL Server Books Online on MSDN for more information about clustered indexes in SQL Server.
A nonclustered index is a smaller ancillary index (also a B-tree) that is used to store the clustered index key and any other columns that are needed to provide quick retrieval. For example, if you want to retrieve data that is sorted by date from an Order table (with a primary key of OrderID, which is also the clustering key), you might create a nonclustered index on the OrderDate column. When you create a nonclustered index on a table with a clustered index, you get the clustering key column as well, because it is brought over as a “bookmark” to the clustered index. A rebuild of a nonclustered index will not cause a rebuild of any other index.
In the library analogy, the nonclustered index is the card catalog box (well, those probably don’t exist anymore), which has cards that are sorted by OrderDate, and the Dewey Decimal number is the “bookmark” back to the physical book stack.
Nonclustered Index Best Practices
I recommend creating single-column nonclustered indexes on all foreign keys and columns that are commonly used in criteria or query predicates (columns that are used in WHERE clauses). This has the smallest system impact and gives the optimizer the most choices to put together indexes (including using index intersection/union, which I discuss later in this post) to produce good plans for queries in general, so that you are not individually optimizing queries. You want to give the optimizer the tools to do its job for the majority of the workload and then focus on the remainder of the problematic queries. See the Nonclustered Index Structures topic in SQL Server Books Online on MSDN for more information about nonclustered indexes.
Heap Table or Structure
A “heap” refers to data that is stored in a table without a clustered index, resulting in data that is stored in no particular logical order. This means that there is no storage pattern that the optimizer can use to return result sets. There is also no unique identifier in the row, so the engine has to supply an artificial ID called a RID (row identifier) to identify that row. Again, this RID offers nothing to the optimizer for query resolution, and it takes space on the disk. There is no guarantee that SQL Server will even continue to support heap structures.
In the library analogy, the heap table equivalent is when you just throw the books all over the floor.
Heap Table Best Practices
For these reasons, and because relational tables are required by relational theory (and therefore all optimizer algorithms derived from relational theory) to have a unique identifier, I recommend that you avoid using heap tables if at all possible, even in temporary tables and variables. See the Heap Structures topic in SQL Server Books Online on MSDN for more information.
Covering indexes are often created to “cover” a specific query, with the goal of an optimal execution plan. Typically, all of the columns that are returned by the query, as well as the columns that are used in the WHERE clause, are included in the index. If you use a covering index, use the INCLUDE clause for all nonkey columns to save index space. See the Index with Included Columns topic in SQL Server Books Online on MSDN for more information about the INCLUDE clause.
Although adding a covering index can provide optimal performance for one query, it solves that one query performance problem and only that one problem. In addition, because a covering index is actually a small datamart, you will suffer the performance hit of any insert, update, or delete operation that affects that table. And, you will still need to continue to create covering indexes for any other query and any other permutation of this query. This is a stopgap approach (which may be the first necessary step to resolve a performance problem), but it is not an enterprise approach; it will lead to significantly lower system performance and degraded scalability.
Covering Index Best Practices
Use a covering index to resolve an urgent performance problem, but do not use this as a general approach, and keep track of any that you do create. Covering indexes can become not valid when the original query changes, and then they bring no benefit but still affect index maintenance.
In summary, use these best practices:
- Create clustered indexes on all tables; by default, use the primary key as the clustering key. This supports permutations of join operations for the optimizer.
- Create a nonclustered index on each foreign key. This, again, provides the framework for the optimizer to choose its own joins.
- Avoid covering indexes except as an outlier solution until you can find a more enterprise solution.
Source : Microsoft MSDN