SQL Indexes - Indexing Theory Simply, Balanced Tree, Heaps

11. 2. 2020
Ing. Jan Zedníček - Data Engineer & Controlling
SQL Administration
0

Table of contents

1. How Does SQL Server Organize Data Physically?

2. Key to Understanding SQL Indexes? Logical Data Organization in SQL Server.

Correct table indexing in SQL Server is a base for good database performance during querying. You need to understand how SQL Server stores data into tables/indexes if you want to create appropriate sql indexes. It is also important to know how to approach these data correctly during querying.

How Does SQL Server Organize Data Physically?

File page is the smallest unit for reading and writing into db objects in SQL Server. Every single sql file page haves 8 KB and is related to one object (for example table or index). Individual file pages are organized into Extents. Each extent consists of 8 file pages.

On the highest level, data are organized into 2 file types. We will focus on the first one (MDF) in this article

MDF files where the data is
LDF files of transactional log

Key to Understanding SQL Indexes? Logical Data Organization in SQL Server.

The way of physical data storing on disk was mentioned in the introduction. Now we finally get to the logical organization. Data are stored in file pages as was already mentioned. There is plenty of them and so there must be a system by which SQL server orientates in it.

Performance of SQL queries depends directly on ability of SQL engine to give out individual file pages to some table which we are trying to query. This performance depends on how are the tables logically organized.

We use special system objects called Index allocation map (IAM) for these purposes. Each table haves assigned at least one such object. These objects work on linking principle – it links individual file pages with tables. According to number of IAM in a table, we distinguish 2 organization methods – Heap and balanced tree

Heap – Not Organized Data

Heap means tables which are not organized in any way :) and have only 1 IAM (so called first IAM). It is like going for a book to the library which is completely unorganized. You would have to go through all the books to find the one you are looking for.

You will get a table of this type whenever you create it without the primary key or without indexes. It is simply a heap of unorganized file pages. SQL server must scan whole heap if we query such table with a condition or try JOIN with another table. This means every page file is scanned separately – and this takes a long while.

SQL heap — Source: Itzik Ben-Gan, Dejan Sarka, Ron Talmage. Querying Microsoft SQL Server 2012. Microsoft press, Vydání 2012. 752 stran. ISBN 0735666059

Balanced Tree – Organized Data

On the other hand, data organization such as balanced tree is something completely different. Table gets organized as a balanced tree anytime you create clustered index (i.e. primary key) above the table.

This architecture creates clusters and therefore is the scan for the records much faster. SQL Server does not have to scan whole table as in heap case. It searches individual clusters. Sql indexes therefore work like if you search library for books by genre and author.

Source: Itzik Ben-Gan, Dejan Sarka, Ron Talmage. Querying Microsoft SQL Server 2012. Microsoft press, Vydání 2012. 752 stran. ISBN 0735666059

Next articles will have me look at index fragmentation and I will also demonstrate, how to repair indexes automatically using SQL script

Rate this post

Ing. Jan Zedníček - Data Engineer & Controlling

My name is Jan Zedníček and I have been working as a freelancer for many companies for more than 10 years. I used to work as a financial controller, analyst and manager at many different companies in field of banking and manufacturing. When I am not at work, I like playing volleyball, chess, doing a workout in the gym.

🔥 If you found this article helpful, please share it or mention me on your website

Leave a Reply Cancel reply

ETL | Mage.ai – Charts, Analysis, Testing, Overview, Cleansing
In this guide, we will take a look at the features that Mage.ai offers for data analysis. While this tool is primarily used for ETL pipelines, it […]
ETL | Mage.ai – Database configuration in io_config.yaml and secrets (passwords)
In this guide, we will take a look at how to configure the io_config.yaml file in Mage.ai. We will also explore how to hide and encrypt access […]
Mage.ai | Error UnicodeDecodeError: ‘charmap’ codec – Windows
This article will be related to troubleshooting. Today, I managed somehow to write a comment that caused the entire Mage.ai instance to crash due to […]
ETL | Mage.ai – Dbt Installation (pip/conda) and project initialization
In the previous article – ETL | Mage.ai – Solid Alternative to Airflow – Intro and Installation we introduced the ETL tool Mage.ai […]
ETL | Mage.ai Pipeline – data flow – Python, SQL Server
In a recent article dedicated to introducing Mage.ai – a tool for creating and managing ETL processes, I promised at the end that we would try […]
Bulk Copy Program (BCP) Utility – Fast Bulk Import and Export in SQL Server
BCP is a utility that is installed by default with SQL Server editions and is used for bulk import or export of a large volume of data in […]
SQL Server Table and Index Compression (Data Compression), Pros/Cons
Table and index compression is a functionality that has been available in various SQL Server editions for a while. It has been available in all […]
SSRS – Handling multiple value parameter/filters in reporting services
In the past, I have written several tutorials on reporting services (you can find them in the reporting services – SSRS category). I have gone […]
Data Masking in SQL Server – How to Hide Data in a Specific Column
Data masking is a feature that allows you to completely or partially mask selected data in a database. Access to unmask the data can also be granted […]
SSRS | How to Create an Amortization Calculator in SQL Server – Including a Report with Parameters
Lately, I’ve been dedicating a lot of time to financial mathematics in Excel. I’ll try to leverage that and shift the focus from Excel to […]

Resources: Power BI News and Blogs a BI Blogs and Magazines – SQL Server, Excel, Reporting

Full vs. Incremental Loads – Data Engineering with Fabric
by John Miner on 17. 4. 2024 at 0:00
Learn how to perform full and incremental loads in Fabric with a little SparkSQL. The post Full vs. Incremental Loads – Data Engineering with […]
Get the most out of SQL Server Agent logs
by Additional Articles on 17. 4. 2024 at 0:00
If you haven’t migrated your workloads to a managed database platform yet, you’re probably still relying on SQL Server Agent for various […]
On-premises data gateway April 2024 release
on 16. 4. 2024 at 16:00
We are excited to announce the April 2024 release of the on-premises data gateway!
Copilot in Power BI: Soon available to more users in your organization
on 16. 4. 2024 at 8:00
We have some exciting announcements to share regarding Copilot in Microsoft Fabric. The information in this blog post has also been shared with […]
SQL Performance Tuning tips for newbies
by Esat Erkec on 15. 4. 2024 at 12:12
The purpose of this article is to give newbies some basic advice about SQL performance tuning that helps to improve their query tuning skills in SQL […]
Finding Sister Locations to Help Each Other: Answers & Discussion
by Additional Articles on 15. 4. 2024 at 0:00
This week’s query exercise asked you to find two kinds of locations in the Stack Overflow database. The post Finding Sister Locations to Help Each […]
Disaster Recovery and High Availability Solutions in SQL Server
by Smit Dagli on 15. 4. 2024 at 0:00
Learn about disaster recovery and high availability options in SQL Server with details on the tradeoffs you make when choosing from Availability […]