Database16 September 2025·11 min read

Designing Scalable Database Schemas

Normalization, denormalization tradeoffs, indexing strategies, partitioning, multi-tenant architectures, soft deletes, and audit trails for production databases.

DatabaseSchema DesignIndexingPostgreSQLMulti-TenantPerformance

Schema Design Is System Design

Your database schema is the foundation of your application. A well-designed schema makes queries simple, performance predictable, and application code clean. A poorly designed schema creates cascading problems — slow queries, data inconsistencies, and application bugs that are symptoms of the real disease: bad data modeling.

At The Beyond Horizon, we invest significant time in schema design before writing application code. Here are the principles we follow.

Normalization: Start Here

Normalization eliminates data redundancy by organizing data into separate tables connected by relationships. The first three normal forms handle most cases:

1NF: Each column holds atomic values. No arrays or nested objects in a single column (with exceptions for JSONB when appropriate).

2NF: Every non-key column depends on the entire primary key, not just part of it.

3NF: No non-key column depends on another non-key column. If a user's city determines their state, state should not be stored alongside city in the users table.

When to Normalize

Data integrity is critical (financial, healthcare, legal)

The same data is referenced from multiple places

Storage efficiency matters

You need to update a value in one place and have it reflected everywhere

Denormalization: When Performance Demands It

Denormalization intentionally duplicates data to avoid expensive JOINs at read time. This is a tradeoff: faster reads at the cost of more complex writes and potential data inconsistency.

When to Denormalize

Read-heavy workloads where JOIN performance is a bottleneck

Materialized views for dashboard and reporting queries

Caching frequently accessed computed values (e.g., storing total_orders on a customer record)

When the denormalized data changes infrequently

The Hybrid Approach

In practice, most production schemas use normalized tables as the source of truth with strategic denormalization for performance-critical read paths. Use database triggers or application-level hooks to keep denormalized data in sync.

Indexing Strategies

Indexes are the single most impactful performance optimization in any database. An unindexed query on a million-row table scans every row. A properly indexed query finds the result in milliseconds.

Index Types

B-tree indexes: (default): Best for equality and range queries. Use for primary keys, foreign keys, and columns in WHERE clauses.

GIN indexes: Best for full-text search, JSONB queries, and array containment.

GiST indexes: Best for geometric data, ranges, and nearest-neighbor queries.

Partial indexes: Index only rows matching a condition. CREATE INDEX ON orders (status) WHERE status = 'pending' indexes only pending orders — smaller and faster than indexing all rows.

Composite indexes: Multi-column indexes for queries that filter on multiple columns. Column order matters — put the most selective column first.

Index Anti-Patterns

Indexing every column: Each index slows down writes and consumes storage.

Not indexing foreign keys: JOINs on unindexed foreign keys cause full table scans.

Over-indexing boolean columns: A column with only true/false values provides minimal selectivity.

Partitioning

For tables exceeding tens of millions of rows, partitioning splits data across multiple physical tables while presenting a single logical table to queries.

Range partitioning: Split by date ranges. Orders from 2024 in one partition, 2025 in another. Queries filtering by date only scan the relevant partition.

List partitioning: Split by category. Orders by region, logs by severity level.

Hash partitioning: Distribute rows evenly across partitions for balanced load.

Multi-Tenant Architectures

SaaS applications serving multiple customers need tenant isolation:

Shared database, shared schema: All tenants in the same tables with a tenant_id column. Simplest to manage but requires discipline to include tenant_id in every query.

Shared database, separate schemas: Each tenant gets their own PostgreSQL schema. Better isolation with moderate management overhead.

Separate databases: Maximum isolation. Best for enterprise customers with compliance requirements. Highest operational complexity.

We default to shared schema with tenant_id and Row Level Security (RLS) policies in PostgreSQL. This provides strong isolation with minimal infrastructure complexity.

Soft Deletes

Never hard-delete business data. Add a deleted_at timestamp column that is NULL for active records. Your application queries include WHERE deleted_at IS NULL by default. Deleted records remain available for auditing, recovery, and compliance.

Use a partial index: CREATE INDEX ON users (email) WHERE deleted_at IS NULL to maintain query performance on active records only.

Audit Trails

For regulated industries or any application where tracking who changed what matters, implement an audit trail:

Add created_at, updated_at, created_by, and updated_by columns to every table

For detailed change tracking, use a separate audit_logs table that records the table name, record ID, action (INSERT/UPDATE/DELETE), old values, new values, and the user who made the change

PostgreSQL triggers can automate audit logging without application code changes

Schema design is a skill that improves with experience. Get it right early and your application will thank you for years. Need help designing your schema? Contact us.

The Beyond Horizon Team

Engineering-led digital studio based in India. We build production-grade web apps, mobile apps, AI systems, and SaaS platforms — and write about what we learn along the way.

About Us →Our Work →

Keep Reading

All Articles →

Database

MongoDB vs PostgreSQL: Choosing the Right Database for Your Project

A practical comparison of document and relational databases. Learn when MongoDB or PostgreSQL is the right choice based on data model, scaling needs, and use case.

10 min readRead →

Web Development

PostgreSQL for Web Developers: Schema Design, Performance, and Best Practices

Why PostgreSQL is the default database choice for modern web apps. Schema design principles, indexing strategies, query optimization, and our production stack.

11 min readRead →

Database

Database Migration Best Practices for Production Systems

Zero-downtime migrations, versioned migrations with Drizzle and Prisma, rollback strategies, data backfill patterns, and testing migrations safely.

10 min readRead →

Have a Project in Mind?

We build fast, SEO-ready web and mobile applications.