Database Indexes Explained: Why Your Queries Got Slow and How to Fix Them
I once spent three days debugging a production slowdown, only to discover it was a single, missing index on a table that had grown from 10,000 rows to 10 million. The customer complaints were piling up, and my pager was screaming.
Why this matters in 2026
Databases are the bedrock of almost every application. As your data volume explodes, so does the complexity of retrieving it. Without proper indexing, simple SELECT statements can devolve into full table scans, grinding your application to a halt. This isn't just an academic problem; it's a direct hit to your bottom line through poor user experience and increased infrastructure costs. Understanding indexes isn't optional; it's a fundamental skill for anyone building or running software that matters.
Three things I learned shipping this in production
1. The "Obvious" Index Isn't Always Obvious
We had a reporting tool that was crawling. Users were complaining about reports taking minutes to generate. My first thought was, "It's gotta be the WHERE clause on orders.created_at." So, I added an index on that column. It helped, but not enough. The real culprit was a join condition that was being overlooked.
Our query looked something like this:
SELECT
c.customer_name,
SUM(o.order_total) AS total_spent
FROM
customers c
JOIN
orders o ON c.customer_id = o.customer_id
WHERE
o.created_at BETWEEN '2023-01-01' AND '2023-12-31'
GROUP BY
c.customer_name
ORDER BY
total_spent DESC;
I had optimized o.created_at, but the database was still struggling to efficiently link customers to orders for the entire year's worth of data before filtering by date. The actual bottleneck was the join condition.
Here's the EXPLAIN output before the correct index:
QUERY PLAN
-------------------------------------------------------------------------------------
Sort (cost=100.00..110.00 rows=1000 width=32)
Sort Key: (sum(o.order_total)) DESC
-> HashAggregate (cost=80.00..90.00 rows=1000 width=32)
Group Key: c.customer_name
-> Hash Join (cost=50.00..70.00 rows=1000 width=32)
Hash Cond: (o.customer_id = c.customer_id)
-> Seq Scan on orders o (cost=0.00..20.00 rows=10000 width=20)
Filter: ((created_at >= '2023-01-01'::date) AND (created_at <= '2023-12-31'::date))
-> Hash on customers c (cost=10.00..20.00 rows=100 width=12)
Notice the Seq Scan on orders o. That's the killer. It means the database had to read every single row in the orders table to find the ones matching the date range.
After adding a composite index on orders(customer_id, created_at):
CREATE INDEX idx_orders_customer_created ON orders (customer_id, created_at);
The EXPLAIN output changed dramatically:
QUERY PLAN
-------------------------------------------------------------------------------------
Sort (cost=100.00..110.00 rows=1000 width=32)
Sort Key: (sum(o.order_total)) DESC
-> HashAggregate (cost=80.00..90.00 rows=1000 width=32)
Group Key: c.customer_name
-> Hash Join (cost=50.00..70.00 rows=1000 width=32)
Hash Cond: (o.customer_id = c.customer_id)
-> Index Scan using idx_orders_customer_created on orders o (cost=0.00..20.00 rows=1000 width=20)
Index Cond: ((created_at >= '2023-01-01'::date) AND (created_at <= '2023-12-31'::date))
-> Hash on customers c (cost=10.00..20.00 rows=100 width=12)
Now, instead of a sequential scan, we see an Index Scan. The composite index allowed the database to efficiently filter by created_at and provide the customer_id for the join in one go. The query time dropped from minutes to seconds. The lesson: always look at your JOIN conditions and consider composite indexes. The order of columns in a composite index matters – put the most selective columns first, or the ones most frequently used for filtering and joining.
2. Unused Indexes Are Dead Weight
This is the flip side of the first point. We had a legacy system that was migrated, and some indexes were left behind. No one had touched them in years, and they weren't used by any active queries. Every time data was inserted or updated in those tables, the database had to maintain those unused indexes. This added overhead to every write operation.
We were using PostgreSQL 13. To find unused indexes, I wrote a query that looked at pg_stat_user_indexes and joined it with pg_index and pg_class.
SELECT
schemaname,
relname AS table_name,
indexrelname AS index_name,
pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size
FROM
pg_stat_user_indexes ui
JOIN
pg_index i ON ui.indexrelid = i.indexrelid
WHERE
ui.schemaname = 'public' -- Or your specific schema
AND NOT i.indisprimary -- Exclude primary keys, usually essential
AND NOT i.indisunique -- Exclude unique constraints, often essential
AND ui.idx_scan = 0 -- This is the key: index has never been scanned
ORDER BY
pg_relation_size(i.indexrelid) DESC;
This query, when run on our production PostgreSQL 13 instance, highlighted several indexes that hadn't been touched since the server was provisioned. We were diligently updating tables, and for every write, we were also updating these dormant indexes. This might seem like a small thing, but across thousands of writes per second, it adds up. We identified indexes that were taking up gigabytes of disk space and adding microseconds of latency to every INSERT, UPDATE, and DELETE. Removing them was a no-brainer. The performance improvement on writes was noticeable, and we reclaimed disk space.
3. Full-Text Search Isn't an Index, But It Can Be Your Best Friend
We had a product search feature where users could type in keywords. Initially, we implemented this using LIKE '%keyword%' queries with a B-tree index on the relevant text columns. This was a disaster. LIKE with a leading wildcard (%) cannot use a standard B-tree index effectively because it has to scan the entire index (or worse, the table). Performance was abysmal, and the development team was frustrated.
We were on MySQL 8.0. The obvious solution was full-text search. We converted our product_name and description columns to use MySQL's built-in full-text indexing.
ALTER TABLE products
ADD FULLTEXT INDEX ft_product_search (product_name, description);
Then, our search queries transformed from this painful version:
SELECT product_id, product_name
FROM products
WHERE product_name LIKE '%awesome%' OR description LIKE '%awesome%';
To this much more efficient one:
SELECT product_id, product_name
FROM products
WHERE MATCH(product_name, description) AGAINST('awesome' IN NATURAL LANGUAGE MODE);
The MATCH...AGAINST clause uses the specialized full-text index, which is designed for this kind of fuzzy, keyword-based searching. It's not a B-tree index; it's a different data structure optimized for tokenizing text and finding relevant documents. The performance difference was night and day. Searches that took tens of seconds now took milliseconds. This wasn't strictly a "database index" in the traditional B-tree sense, but it solved a performance problem that traditional indexing couldn't touch. It also taught me that sometimes the "database index" you need isn't the one you think you need.
What I would do differently if I started today
If I were building a new system from scratch today, I'd be much more aggressive about using EXPLAIN early and often. I wouldn't wait for performance to degrade. I'd bake query analysis into my development workflow, using tools like pg_stat_statements for PostgreSQL or the slow query log for MySQL from day one. I'd also lean more heavily on database performance monitoring (APM) tools that can automatically identify slow queries and suggest index improvements, rather than waiting for customer complaints or pager alerts. The cost of prevention is infinitely lower than the cost of cure.
What this looks like for your team
1. Audit your slow queries: Regularly run EXPLAIN on your most frequent or slowest queries. Don't guess; measure. Use database-specific tools to identify queries that perform full table scans or inefficient joins.
2. Implement composite indexes strategically: Don't just index individual columns. Think about common query patterns, especially JOIN conditions and WHERE clauses that are used together. Test the order of columns in composite indexes.
3. Clean up unused indexes: Periodically run queries to find and remove indexes that are not being used. They are a silent drain on write performance and disk space.
I write about engineering decisions and production systems at devwithzach.com — drop me a line if any of this rings true.
John from California
just requested a quote
2 minutes ago