Database Fundamentals
Understanding these core database concepts is essential for any developer working with data-driven applications. These fundamentals form the building blocks of modern database management systems.
-
Indexing โ Speeds up read queries by letting the DB skip scanning full tables.
-
Primary Key โ A unique identifier for each row; ensures entity integrity.
-
Foreign Key โ Maintains relational integrity by linking to a primary key in another table.
-
Normalization โ Process of structuring data to reduce redundancy and improve integrity.
-
Denormalization โ Combining tables to reduce joins and improve read performance.
-
ACID Properties โ Guarantees reliable transactions: Atomicity, Consistency, Isolation, Durability.
-
Isolation Levels โ Control how concurrent transactions interact (e.g., Read Committed, Serializable).
-
Joins โ Combine data from multiple tables; can be inner, left, right, or full outer.
-
Query Optimization โ The process of improving query performance using execution plans and indexes.
-
Query Plan โ A step-by-step roadmap the DB follows to execute your query.
-
EXPLAIN / ANALYZE โ Tools to inspect and debug how your SQL query is being executed.
-
Sharding โ Splits data across multiple machines to improve scalability.
-
Partitioning โ Splits a single large table into smaller, manageable chunks.
-
Replication โ Copies data from one DB node to another for fault tolerance and scaling reads.
-
CAP Theorem โ States a distributed system can only guarantee two of Consistency, Availability, and Partition Tolerance.
-
Eventual Consistency โ Guarantees data will sync across systems eventually, not instantly.
-
Deadlocks โ When two transactions wait forever for each other to release locks.
-
Write-Ahead Logging (WAL) โ Ensures data durability by logging changes before applying them.
-
MVCC (Multi-Version Concurrency Control) โ Enables concurrent reads/writes without locking rows.
-
Connection Pooling โ Reuses DB connections to reduce overhead and latency per request.
-
Materialized Views โ Precomputed query results stored as a table to speed up expensive queries.
-
NULL Handling โ Represents missing/unknown data, but behaves differently than empty or zero values.
-
Schema Migration โ The process of safely changing a database schema without downtime or data loss.
-
Data Integrity - The accuracy, completeness, and consistency of data in a database.
-
Constraint โ Rules like UNIQUE, CHECK, or NOT NULL that enforce data integrity at the schema level.
Database Types and Their Trade-offs
Different database types optimize for different aspects of the CAP theorem:
Relational Databases (RDBMS)
- Oracle, MySQL, PostgreSQL, SQL Server
- Prioritize ACID compliance and consistency
- Excel at complex transactions and relations between entities
- Use cases: Financial systems, ERP, CRM
NoSQL Document Stores
- MongoDB, Couchbase
- Prioritize flexibility and schema-less design
- Excel at horizontal scaling
- Use cases: Content management, catalogs, user profiles
Key-Value Stores
- Redis, DynamoDB
- Prioritize speed and simplicity
- Excel at caching and high-throughput operations
- Use cases: Session storage, caching, leaderboards
Wide-Column Stores
- Cassandra, HBase
- Prioritize high write throughput and horizontal scaling
- Excel at time-series and big data applications
- Use cases: IoT sensor data, weather data, analytics
Graph Databases
- Neo4j, ArangoDB
- Prioritize relationship modeling
- Excel at connected data and traversal queries
- Use cases: Social networks, recommendation engines, fraud detection
When to Use Which Database Strategy
- Indexing: When query performance on specific columns is critical, but weigh against write performance costs
- Normalization: When data consistency and integrity are paramount
- Denormalization: When read performance needs to be optimized, especially in reporting systems
- Sharding: When data volume exceeds what a single machine can handle
- Materialized Views: When expensive calculations or aggregations are frequently queried
Conclusion
Understanding these database fundamentals allows developers to make informed decisions about data storage, access patterns, and system design. Whether building a simple application or a complex distributed system, these concepts provide the foundation for efficient and reliable data management.
Index Selection Cheat Sheet
- Equality filters โ B-Tree on filtered columns in exact order; compound indexes when multiple equality filters.
- Range + equality โ Put equality columns first, range column last.
- ORDER BY + LIMIT โ Create index that matches filter and order to enable index-only scans.
- Covering Index โ Include selected columns to avoid table lookups.
- Avoid Over-Indexing โ Each index slows writes; drop unused with monitoring.
Reading Query Plans (Practical)
- Full Scan? Add/select better index; check selectivity.
- Row Estimates Off? Update statistics/analyze; consider histograms.
- Nested Loops vs Hash Join: Prefer hash join for large sets; nested loops for small/point lookups.
- Sort Nodes: Remove by aligning ORDER BY with index order.
Zero-Downtime Migration Playbook
- Backfill: Add new nullable columns/tables; backfill asynchronously.
- Dual-Write: Application writes to old and new schema behind a flag.
- Read-Compare: Shadow-read new path; compare results/metrics.
- Cutover: Flip reads to new; monitor; keep dual-write for rollback.
- Cleanup: Remove old paths after stability window.
Data Integrity in Practice
- Enforce constraints in DB (not only app):
CHECK,UNIQUE, FKs where possible. - Use UUID v7 for time-ordered IDs; avoid
serialfor sharded setups. - Logical Deletes: Prefer
deleted_atplus filtered unique indexes.
Checklist
- Critical queries have appropriate, measured indexes
- Migrations tested with prod-like data and timing
- Connection pool sizing matches DB and app limits
- Autovacuum/tuning (Postgres) monitored; slow query log enabled