SQL Server Index Automation

Index DMVs

SQL Server includes several very useful DMVs for finding missing indexes and showing usage statistics on existing indexes. This is in addition to related functions that show physical index statistics, such as fragmentation, and related statistics management functions. The script included in this article improves on the default DMVs by consolidating indexes, removing redundant indexes based on include columns, and linking new indexes to existing indexes.

Missing Indexes

The DMVs for finding missing indexes and missing index groups take much of the work out of finding useful indexes. The SQL engine is constantly monitoring and maintaining statistics and comparing query plans to existing indexes. Query plans that would benefit from a new or modified index are returned by the DMVs.

Index Usage

The query engine also captures actual usage of current indexes. For larger, closely monitored systems this can be very useful when determining if an index is worth the maintenance cost associated with it. There is always a cost associated with adding an index. The initial creation time, increased time for INSERT, UPDATE, and DELETE statements, regular maintenance of the index to reduce fragmentation, increased backup size and time, index size on disk, and administrative overhead for admins. Indexes are absolutely necessary but each index added should be carefully considered.

DMV Purpose

The DMVs for missing indexes serve two primary purposes. First, they point out indexes that are completely absent, the columns needed by the indexes, and the estimated cost reduction for each index. The second purpose is to show which include columns should be added to these new indexes or to existing indexes. Include columns in the recommended indexes are where complications and redundancies can be generated. The recommended indexes don’t take into account existing indexes that need extra include columns and add those columns, it simply recommends a new index. It does the same thing for new indexes that are recommended. If a query plan would benefit from a new index with two columns and no includes, and another query plan would benefit from an index with the identical columns and one or more include columns, two indexes are recommended. From an indexing and administrative perspective, the single index with the include columns is all that is needed and is actually better from a maintenance and performance perspective.

Issues with DMVs
Manually combining the redundant indexes and updating existing indexes rather than creating new indexes can be time consuming, especially with large transactional systems. It’s not unusual for hundreds of indexes to be recommended, which is why most generation scripts utilizing these DMVs use a cost-saving metric to determine which indexes are most useful. Blindly creating hundreds of indexes would likely be detrimental to performance and the same is true for creating redundant indexes.

System views are available that provide detailed information about existing indexes, their columns, and include columns. Missing index DMVs provide the same information but the focus is on indexes that could and potentially should be created. Analyzing information from the DMVs alone provides a way to remove the redundancy from new indexes. Combining the information from these two sources provides a way to show which existing indexes should be modified rather than creating a new index. Doing the analysis using the DMVs and a custom script removes most of the manual work for this process.

Solving the DMV Redundancy Issue

There are a few complications when trying to match indexes to each other, both existing indexes to recommended and multiple recommended indexes. The first complication is that indexes can have one or more columns. So a static query won’t work. A series of static queries could work, but would be more lines of code than needed and would have an upper limit on the number of columns you would want to compare. Indexes can have up to 32 columns. That’s a lot of code.

The next choice is creating the SQL on the fly, a dynamic query. is a possibility, but it would require validations to ensure multi-column indexes match on both sides, including the number of index columns in addition to the name matching. Without checking for the number of columns, indexes with two matching columns could match to queries with 3 columns if only looking at one side of the join. The dynamic query would work but it can also be harder to read and maintain, especially for less experienced coders.

Another possible solution is to concatenate the index columns to create a single calculated column and do the comparison on that column. After some experience writing SQL, concatenating columns for the comparison feels a little kludgy, but the alternatives are less appealing and harder to maintain. The single column solution allows for easy, fast comparisons, is easy to maintain, supports an unlimited number of columns (even if the engine allows more than 32 columns in the future), and requires no dynamic SQL.

Any of the solutions listed above, and I’m sure additional solutions I didn’t consider, would work equally well. The script below concatenates columns for the comparisons. The index DMVs return a comma delimited list of index columns and include columns. The concatenation solution requires that all columns are compared in the same order, so the columns are normalized, ordered, and then put back into a single column using the STUFF function. The STUFF function is also used with the system views to put columns into a consistent order. Once the columns are in a consistent order, the comparison is trivial. A similar method is used for the INCLUDE columns with the addition of adding all columns with the same index columns, and getting a distinct list of columns.

Running the Script

The current version of the script was analyzed on Azure using a single database. Set the variables at the top of the script, set the output in SSMS to return in text mode using the button or <CTRL>+T, and run the script in your target database.

	--Include drop statements with checks for the index existence first.
	@IncludeDrops			bit		= 0		

	--Include the query cost estimate
	,@IncludeCost			bit		= 1		

Indexes are returned using my default naming pattern, IX_<schema><table>_<Columns> or using the existing index name. Existing indexes are assumed to be in a SQL project and will need to be updated in the project for them to deploy correctly. The variable, @IncludeDrops, can be used to run the script in interactive mode and delete existing indexes before the new create executes.

DMVs Used

  • sys.dm_db_missing_index_groups
  • sys.dm_db_missing_index_group_stats
  • sys.dm_db_missing_index_details