Database16 September 2025·11 min read

Designing Scalable Database Schemas

Normalization, denormalization tradeoffs, indexing strategies, partitioning, multi-tenant architectures, soft deletes, and audit trails for production databases.

DatabaseSchema DesignIndexingPostgreSQLMulti-TenantPerformance

Schema Design Is System Design

Your database schema is the foundation of your application. A well-designed schema makes queries simple, performance predictable, and application code clean. A poorly designed schema creates cascading problems — slow queries, data inconsistencies, and application bugs that are symptoms of the real disease: bad data modeling.

At The Beyond Horizon, we invest significant time in schema design before writing application code. Here are the principles we follow.

Normalization: Start Here

Normalization eliminates data redundancy by organizing data into separate tables connected by relationships. The first three normal forms handle most cases:

1NF: Each column holds atomic values. No arrays or nested objects in a single column (with exceptions for JSONB when appropriate).
2NF: Every non-key column depends on the entire primary key, not just part of it.
3NF: No non-key column depends on another non-key column. If a user's city determines their state, state should not be stored alongside city in the users table.

When to Normalize

Data integrity is critical (financial, healthcare, legal)
The same data is referenced from multiple places
Storage efficiency matters
You need to update a value in one place and have it reflected everywhere

Denormalization: When Performance Demands It

Denormalization intentionally duplicates data to avoid expensive JOINs at read time. This is a tradeoff: faster reads at the cost of more complex writes and potential data inconsistency.

When to Denormalize

Read-heavy workloads where JOIN performance is a bottleneck
Materialized views for dashboard and reporting queries
Caching frequently accessed computed values (e.g., storing total_orders on a customer record)
When the denormalized data changes infrequently

The Hybrid Approach

In practice, most production schemas use normalized tables as the source of truth with strategic denormalization for performance-critical read paths. Use database triggers or application-level hooks to keep denormalized data in sync.

Indexing Strategies

Indexes are the single most impactful performance optimization in any database. An unindexed query on a million-row table scans every row. A properly indexed query finds the result in milliseconds.

Index Types

B-tree indexes: (default): Best for equality and range queries. Use for primary keys, foreign keys, and columns in WHERE clauses.
GIN indexes: Best for full-text search, JSONB queries, and array containment.
GiST indexes: Best for geometric data, ranges, and nearest-neighbor queries.
Partial indexes: Index only rows matching a condition. CREATE INDEX ON orders (status) WHERE status = 'pending' indexes only pending orders — smaller and faster than indexing all rows.
Composite indexes: Multi-column indexes for queries that filter on multiple columns. Column order matters — put the most selective column first.

Index Anti-Patterns

Indexing every column: Each index slows down writes and consumes storage.
Not indexing foreign keys: JOINs on unindexed foreign keys cause full table scans.
Over-indexing boolean columns: A column with only true/false values provides minimal selectivity.

Partitioning

For tables exceeding tens of millions of rows, partitioning splits data across multiple physical tables while presenting a single logical table to queries.

Range partitioning: Split by date ranges. Orders from 2024 in one partition, 2025 in another. Queries filtering by date only scan the relevant partition.
List partitioning: Split by category. Orders by region, logs by severity level.
Hash partitioning: Distribute rows evenly across partitions for balanced load.

Multi-Tenant Architectures

SaaS applications serving multiple customers need tenant isolation:

Shared database, shared schema: All tenants in the same tables with a tenant_id column. Simplest to manage but requires discipline to include tenant_id in every query.
Shared database, separate schemas: Each tenant gets their own PostgreSQL schema. Better isolation with moderate management overhead.
Separate databases: Maximum isolation. Best for enterprise customers with compliance requirements. Highest operational complexity.

We default to shared schema with tenant_id and Row Level Security (RLS) policies in PostgreSQL. This provides strong isolation with minimal infrastructure complexity.

Soft Deletes

Never hard-delete business data. Add a deleted_at timestamp column that is NULL for active records. Your application queries include WHERE deleted_at IS NULL by default. Deleted records remain available for auditing, recovery, and compliance.

Use a partial index: CREATE INDEX ON users (email) WHERE deleted_at IS NULL to maintain query performance on active records only.

Audit Trails

For regulated industries or any application where tracking who changed what matters, implement an audit trail:

Add created_at, updated_at, created_by, and updated_by columns to every table
For detailed change tracking, use a separate audit_logs table that records the table name, record ID, action (INSERT/UPDATE/DELETE), old values, new values, and the user who made the change
PostgreSQL triggers can automate audit logging without application code changes

Schema design is a skill that improves with experience. Get it right early and your application will thank you for years. Need help designing your schema? Contact us.

BH

The Beyond Horizon Team

We are a digital agency based in Ajmer, India, specializing in Next.js web applications, React Native mobile apps, and UI/UX design. 150+ projects delivered.

About Us →

Have a project in mind?

We build fast, SEO-ready web and mobile applications.

Get a Free Consultation