PostgreSQL Query Optimization: From EXPLAIN to Indexes

August 5, 2025

BackendDatabase

Slow queries are the silent killer of application performance. A query that takes 2 seconds instead of 20ms can bring your entire system to its knees under load. In this comprehensive guide, we’ll master PostgreSQL query optimization using EXPLAIN, indexes, and query rewriting techniques.

The Problem: A Slow Query

Let’s start with a real-world scenario. You have a users table with 1 million rows:

CREATE TABLE users (
    id SERIAL PRIMARY KEY,
    email VARCHAR(255) NOT NULL,
    created_at TIMESTAMP NOT NULL,
    last_login TIMESTAMP,
    subscription_tier VARCHAR(50)
);

This query is killing your API response times:

SELECT * FROM users 
WHERE email = 'john@example.com' 
  AND subscription_tier = 'premium';

Execution time: 1,850ms. Unacceptable.

Step 1: Understanding EXPLAIN

EXPLAIN shows PostgreSQL’s query execution plan:

EXPLAIN SELECT * FROM users 
WHERE email = 'john@example.com' 
  AND subscription_tier = 'premium';

Output:

Seq Scan on users  (cost=0.00..25000.00 rows=1 width=120)
  Filter: ((email = 'john@example.com') AND (subscription_tier = 'premium'))

Key Insights:

Seq Scan: Sequential scan (table scan) - reading every single row
cost: Estimated cost units (not milliseconds)
rows: Estimated rows returned
width: Average row size in bytes

Sequential scans are fine for small tables (<10K rows) but catastrophic for large ones.

Step 2: EXPLAIN ANALYZE - Real Execution

EXPLAIN ANALYZE actually runs the query and shows real metrics:

EXPLAIN ANALYZE SELECT * FROM users 
WHERE email = 'john@example.com' 
  AND subscription_tier = 'premium';

Output:

Seq Scan on users  (cost=0.00..25000.00 rows=1 width=120) 
                   (actual time=1823.45..1823.46 rows=1 loops=1)
  Filter: ((email = 'john@example.com') AND (subscription_tier = 'premium'))
  Rows Removed by Filter: 999999
Planning Time: 0.123 ms
Execution Time: 1823.52 ms

Critical Metrics:

actual time: Real execution time in milliseconds
Rows Removed by Filter: 999,999 rows scanned but discarded
Execution Time: Total query time

We’re scanning 1 million rows to find 1 result. This is the problem.

Step 3: Creating an Index

Indexes are data structures that allow fast lookups. Think of them as a book’s index - instead of reading every page, you jump directly to the relevant section.

CREATE INDEX idx_users_email ON users(email);

Now let’s check the query plan:

EXPLAIN ANALYZE SELECT * FROM users 
WHERE email = 'john@example.com' 
  AND subscription_tier = 'premium';

Output:

Index Scan using idx_users_email on users  
  (cost=0.42..8.44 rows=1 width=120) 
  (actual time=0.034..0.035 rows=1 loops=1)
  Index Cond: (email = 'john@example.com')
  Filter: (subscription_tier = 'premium')
Planning Time: 0.156 ms
Execution Time: 0.052 ms

Improvement: 1,823ms → 0.052ms (35,000x faster!)

Understanding Index Types

B-Tree Index (Default)

Best for equality and range queries:

CREATE INDEX idx_created_at ON users(created_at);

-- Efficient queries:
WHERE created_at > '2024-01-01'
WHERE created_at BETWEEN '2024-01-01' AND '2024-12-31'
WHERE email = 'john@example.com'

Hash Index

Only for equality:

CREATE INDEX idx_email_hash ON users USING HASH(email);

-- Efficient:
WHERE email = 'john@example.com'

-- NOT efficient:
WHERE email LIKE 'john%'

GIN Index (Generalized Inverted Index)

For full-text search and JSONB:

CREATE INDEX idx_users_search ON users USING GIN(to_tsvector('english', email));

SELECT * FROM users 
WHERE to_tsvector('english', email) @@ to_tsquery('john');

Partial Index

Index only a subset of rows:

CREATE INDEX idx_premium_users ON users(email) 
WHERE subscription_tier = 'premium';

-- This query uses the partial index:
SELECT * FROM users 
WHERE email = 'john@example.com' 
  AND subscription_tier = 'premium';

Benefits:

Smaller index size
Faster index scans
Reduced write overhead

Composite Indexes

For queries filtering on multiple columns:

CREATE INDEX idx_users_email_tier ON users(email, subscription_tier);

Column Order Matters:

-- Uses index efficiently:
WHERE email = 'john@example.com' AND subscription_tier = 'premium'
WHERE email = 'john@example.com'

-- Does NOT use index:
WHERE subscription_tier = 'premium'

Rule: Put the most selective column first (the one that filters out the most rows).

Covering Indexes (Index-Only Scans)

Include all queried columns in the index:

CREATE INDEX idx_users_covering ON users(email, subscription_tier, last_login);

SELECT email, subscription_tier, last_login 
FROM users 
WHERE email = 'john@example.com';

Output:

Index Only Scan using idx_users_covering on users
  (cost=0.42..4.44 rows=1 width=50)
  Index Cond: (email = 'john@example.com')
  Heap Fetches: 0

Heap Fetches: 0 means PostgreSQL didn’t need to access the table at all - everything came from the index.

Query Rewriting Techniques

Avoid SELECT *

-- Bad: Fetches all columns
SELECT * FROM users WHERE email = 'john@example.com';

-- Good: Fetch only needed columns
SELECT id, email, subscription_tier FROM users WHERE email = 'john@example.com';

Use EXISTS Instead of COUNT

-- Bad: Counts all rows
SELECT COUNT(*) FROM users WHERE subscription_tier = 'premium';
IF count > 0 THEN ...

-- Good: Stops at first match
SELECT EXISTS(SELECT 1 FROM users WHERE subscription_tier = 'premium');

Avoid Functions on Indexed Columns

-- Bad: Index not used
WHERE LOWER(email) = 'john@example.com'

-- Good: Index used
WHERE email = 'john@example.com'

-- If you must use LOWER, create a functional index:
CREATE INDEX idx_email_lower ON users(LOWER(email));

Use LIMIT for Pagination

-- Bad: Fetches all rows, discards most
SELECT * FROM users ORDER BY created_at DESC;

-- Good: Fetches only needed rows
SELECT * FROM users ORDER BY created_at DESC LIMIT 20 OFFSET 0;

Analyzing Join Performance

EXPLAIN ANALYZE
SELECT u.email, o.total 
FROM users u
JOIN orders o ON u.id = o.user_id
WHERE u.subscription_tier = 'premium';

Output:

Hash Join  (cost=1000.00..5000.00 rows=100 width=50)
  Hash Cond: (o.user_id = u.id)
  ->  Seq Scan on orders o  (cost=0.00..3000.00 rows=10000 width=20)
  ->  Hash  (cost=500.00..500.00 rows=100 width=30)
        ->  Seq Scan on users u  (cost=0.00..500.00 rows=100 width=30)
              Filter: (subscription_tier = 'premium')

Join Types:

Nested Loop: Good for small tables
Hash Join: Good for medium tables
Merge Join: Good for large pre-sorted tables

Optimization:

CREATE INDEX idx_orders_user_id ON orders(user_id);
CREATE INDEX idx_users_tier ON users(subscription_tier);

Monitoring Index Usage

Check which indexes are actually being used:

SELECT 
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE schemaname = 'public'
ORDER BY idx_scan ASC;

If idx_scan is 0, the index is never used - consider dropping it.

Index Maintenance

Bloat

Indexes can become bloated over time:

-- Rebuild index
REINDEX INDEX idx_users_email;

-- Rebuild all indexes on a table
REINDEX TABLE users;

Vacuuming

-- Analyze table statistics
ANALYZE users;

-- Vacuum dead tuples
VACUUM users;

-- Vacuum and analyze
VACUUM ANALYZE users;

Common Anti-Patterns

1. Over-Indexing

-- Don't create indexes on every column
CREATE INDEX idx_id ON users(id);  -- PRIMARY KEY already indexed!
CREATE INDEX idx_email1 ON users(email);
CREATE INDEX idx_email2 ON users(LOWER(email));
CREATE INDEX idx_email3 ON users(email, id);

Cost: Each index slows down INSERT/UPDATE/DELETE operations.

2. Wrong Data Types

-- Bad: String comparison
CREATE TABLE sessions (
    user_id VARCHAR(50)  -- Should be INTEGER
);

-- Good:
CREATE TABLE sessions (
    user_id INTEGER REFERENCES users(id)
);

3. N+1 Queries

-- Bad: 1 + N queries
users = SELECT * FROM users LIMIT 10;
for user in users:
    orders = SELECT * FROM orders WHERE user_id = user.id;

-- Good: 1 query with JOIN
SELECT u.*, o.* 
FROM users u
LEFT JOIN orders o ON u.id = o.user_id
LIMIT 10;

Real-World Example: E-Commerce Query

Before Optimization:

SELECT p.name, p.price, c.name as category
FROM products p
JOIN categories c ON p.category_id = c.id
WHERE p.price BETWEEN 10 AND 100
  AND p.stock > 0
  AND c.name = 'Electronics'
ORDER BY p.created_at DESC
LIMIT 20;

-- Execution Time: 3,200ms

After Optimization:

-- Indexes:
CREATE INDEX idx_products_price_stock ON products(price, stock) 
WHERE stock > 0;

CREATE INDEX idx_products_category_created ON products(category_id, created_at DESC);

CREATE INDEX idx_categories_name ON categories(name);

-- Execution Time: 12ms (266x faster)

Conclusion

Query optimization is a systematic process:

Measure: Use EXPLAIN ANALYZE
Identify: Find sequential scans on large tables
Index: Create appropriate indexes
Verify: Re-run EXPLAIN ANALYZE
Monitor: Track index usage over time

Key Takeaways:

Indexes are not free - they slow down writes
Column order in composite indexes matters
Partial indexes can be more efficient than full indexes
Always test with production-like data volumes

Next Steps:

Learn about query planner statistics with pg_stats
Explore connection pooling for better resource utilization
Study PostgreSQL’s autovacuum settings

What’s the slowest query you’ve optimized? Share your war stories in the comments!