Databases
An extensive overview of databases, their types, use cases, diagrams, and real-world examples.
In system design and high-level architecture (HLD), databases form the backbone of any system.
Choosing the right database depends on data structure, query patterns, scalability, and consistency requirements.
| Use Case | Best DB Type | Examples | Notes |
|---|
| Strict consistency, complex queries | Relational (SQL) | PostgreSQL, MySQL, Oracle | ACID, joins, transactions |
| High-speed caching | Key-Value | Redis, DynamoDB | Simple get/set, low latency |
| Flexible, evolving data | Document | MongoDB, Firestore | JSON, nested, schema-less |
| Analytics on huge data | Wide-Column | Cassandra, Bigtable | Partitioned, scalable |
| Social graphs, relationships | Graph | Neo4j, ArangoDB | Traversals, relationships |
| Time-stamped metrics | Time-Series | InfluxDB, TimescaleDB | Optimized for time, compression |
| Full-text search | Search | Elasticsearch, Solr | Text indexing, scoring |
| AI/ML similarity search | Vector | Pinecone, Milvus | Embeddings, ANN search |
| Multiple models in one | Multi-Model | ArangoDB, OrientDB, Cosmos DB | Flexible, complex apps |
| Immutable audit logs | Ledger/Blockchain | QLDB, Hyperledger | Tamper-proof, auditable |
| Mobile/embedded | Embedded | SQLite, RocksDB, Realm | Lightweight, local |
| Distributed SQL, global scale | NewSQL | CockroachDB, Spanner, YugabyteDB | ACID + horizontal scaling |
- Structure: Tables (rows & columns), fixed schema, ACID properties
- Strengths: Strong consistency, complex queries, joins, transactions
- Weaknesses: Harder to scale horizontally, rigid schema
- When NOT to use: If you need flexible schema, massive horizontal scaling, or handle unstructured data.
- Popular Managed Options: AWS RDS, Azure SQL, Google Cloud SQL
- Performance/Scalability: Vertical scaling, some support for read replicas and sharding.
- Cost: Can be expensive at scale, especially for licensing (Oracle, SQL Server).
- Banking, ERP systems, e-commerce orders, financial transactions
- PostgreSQL, MySQL, Oracle, SQL Server
- Structure: Simple
key β value mapping
- Strengths: Extremely fast, simple retrieval, easy to scale horizontally
- Weaknesses: Limited query functionality, no relationships
- When NOT to use: If you need complex queries or relationships.
- Popular Managed Options: AWS DynamoDB, Azure Table Storage, Google Cloud Datastore
- Performance/Scalability: Excellent horizontal scaling, in-memory options (Redis).
- Cost: Pay-per-request (DynamoDB), memory cost (Redis).
- Caching, session storage, shopping carts, leaderboards
- Redis, DynamoDB, Memcached
- Structure: JSON-like documents, flexible schema
- Strengths: Hierarchical data, flexible, easy to update, semi-structured
- Weaknesses: Less efficient for joins, eventual consistency in some cases
- When NOT to use: If you need complex joins or strict schema.
- Popular Managed Options: MongoDB Atlas, Firebase Firestore, AWS DocumentDB
- Performance/Scalability: Good horizontal scaling, sharding.
- Cost: Storage + throughput, can be expensive at scale.
- CMS, product catalogs, IoT apps, user profiles
- MongoDB, Firestore, CouchDB
- Structure: Columns grouped in families for scalable analytics
- Strengths: Scalable, optimized for analytics, high write throughput
- Weaknesses: Complex schema design, not ideal for transactional workloads
- When NOT to use: If you need ACID transactions or simple queries.
- Popular Managed Options: Google Bigtable, Azure Cosmos DB (Cassandra API), AWS Keyspaces
- Performance/Scalability: Excellent horizontal scaling, partitioning.
- Cost: Pay-per-use, can be high for large clusters.
- Big data analytics, recommendation engines, time-series-like workloads
- Cassandra, HBase, Bigtable
- Structure: Nodes and edges
- Strengths: Complex relationship queries, traversals, flexible schema
- Weaknesses: Slower for bulk inserts, not ideal for analytics
- When NOT to use: If you have simple, flat data or need high write throughput.
- Popular Managed Options: Neo4j Aura, AWS Neptune, Azure Cosmos DB (Gremlin API)
- Performance/Scalability: Good for traversals, can be hard to scale for massive graphs.
- Cost: Pay-per-node/edge, can be high for large graphs.
- Social networks, recommendations, fraud detection, knowledge graphs
- Neo4j, ArangoDB, JanusGraph
- Structure: Indexed by timestamp
- Strengths: Optimized for sequential data, compression, fast range queries
- Weaknesses: Narrow use case, not for general-purpose data
- When NOT to use: If you need complex relationships or non-time-based queries.
- Popular Managed Options: InfluxDB Cloud, AWS Timestream, Timescale Cloud
- Performance/Scalability: High write throughput, retention policies.
- Cost: Pay-per-ingest, storage.
- IoT, monitoring, financial tick data, metrics
- InfluxDB, TimescaleDB, Prometheus
| Timestamp | Temperature_C |
|---|
| 2025-09-11 10:00 | 23.5 |
| 2025-09-11 11:00 | 24.0 |
| 2025-09-11 12:00 | 23.8 |
- Structure: Indexed text for full-text search
- Strengths: Full-text search, scoring, faceting, analytics
- Weaknesses: Not transactional, eventual consistency
- When NOT to use: For transactional or relational data.
- Popular Managed Options: Elastic Cloud, AWS OpenSearch, Azure Cognitive Search
- Performance/Scalability: Scales horizontally, sharding.
- Cost: Storage + compute, can be high for large indexes.
- E-commerce search, logs, document search
- Elasticsearch, Solr, OpenSearch
- Structure: High-dimensional vectors
- Strengths: Semantic/AI queries, similarity search, ANN (approximate nearest neighbor)
- Weaknesses: Still evolving, not for transactional data
- When NOT to use: For traditional queries or transactional workloads.
- Popular Managed Options: Pinecone, Weaviate Cloud, Milvus Cloud
- Performance/Scalability: Optimized for vector search, horizontal scaling.
- Cost: Pay-per-vector, storage.
- AI embeddings, semantic search, recommendation, image/audio similarity
- Pinecone, Weaviate, Milvus
- Structure: Supports multiple paradigms (doc + graph + key-value)
- Strengths: Flexible, can handle complex apps with mixed requirements
- Weaknesses: Added complexity, may not be best-in-class for any one model
- When NOT to use: If you only need one data model.
- Popular Managed Options: ArangoDB Oasis, Azure Cosmos DB, OrientDB Cloud
- Performance/Scalability: Varies by model, generally good.
- Cost: Pay-per-model, storage.
- Complex apps, SaaS, hybrid data needs
- ArangoDB, OrientDB, Cosmos DB
- Structure: Immutable, append-only, cryptographically verifiable
- Strengths: Tamper-proof, auditable, traceable
- Weaknesses: Slower writes, complex to manage
- When NOT to use: For high-throughput or general-purpose data.
- Popular Managed Options: Amazon QLDB, Hyperledger Fabric, BigchainDB
- Performance/Scalability: Slower than traditional DBs, optimized for audit.
- Cost: Pay-per-write, storage.
- Audit logs, financial ledgers, supply chain, compliance
- Amazon QLDB, Hyperledger, BigchainDB
- Structure: Local, file-based, lightweight
- Strengths: No server needed, fast, portable
- Weaknesses: Not distributed, limited concurrency
- When NOT to use: For distributed or multi-user systems.
- Popular Managed Options: SQLite, RocksDB, Realm
- Performance/Scalability: Excellent for local, limited for distributed.
- Cost: Free/open source.
- Mobile apps, IoT devices, desktop software
- Structure: Relational, distributed, SQL interface
- Strengths: ACID + horizontal scaling, global distribution
- Weaknesses: Newer, less mature ecosystem
- When NOT to use: If you donβt need global scale or distributed SQL.
- Popular Managed Options: Google Spanner, CockroachDB Cloud, Yugabyte Cloud
- Performance/Scalability: Excellent horizontal scaling, global consistency.
- Cost: Pay-per-node, storage.
- Global apps, SaaS, fintech, distributed transactions
- CockroachDB, Google Spanner, YugabyteDB
| Category | Structure & Features | Strengths | Weaknesses | Best Use Cases | Examples | Managed/Cloud | Cost Notes |
|---|
| Relational (SQL) | Tables, fixed schema, ACID | Strong consistency, joins, transactions | Hard to scale horizontally | Banking, ERP, e-commerce | PostgreSQL, MySQL, Oracle | AWS RDS, Azure SQL | Can be expensive at scale |
| Key-Value (NoSQL) | Key β Value pairs | Super fast, simple, scalable | Limited queries | Caching, session storage | Redis, DynamoDB | AWS DynamoDB | Pay-per-request |
| Document (NoSQL) | JSON-like documents | Flexible schema, hierarchical | Hard for complex joins | CMS, product catalogs | MongoDB, Firestore | MongoDB Atlas | Storage + throughput |
| Wide-Column (NoSQL) | Column families | Scalable, analytical | Complex schema | Analytics, recommendations | Cassandra, HBase | Google Bigtable | Pay-per-use |
| Graph (NoSQL) | Nodes + Edges | Relationships | Slower bulk inserts | Social graphs, fraud detection | Neo4j, ArangoDB | Neo4j Aura | Pay-per-node/edge |
| Time-Series | Indexed by time | Optimized for sequential data | Narrow use case | IoT, metrics, finance | InfluxDB, TimescaleDB | InfluxDB Cloud | Pay-per-ingest |
| Search | Indexed text | Full-text search | Not transactional | Product search, logs | Elasticsearch, Solr | Elastic Cloud | Storage + compute |
| Vector | High-dimensional vectors | Semantic / AI queries | Still evolving | AI embeddings, similarity | Pinecone, Weaviate | Pinecone Cloud | Pay-per-vector |
| Multi-Model | Mix of paradigms | Flexible | Added complexity | Complex apps | ArangoDB, OrientDB | ArangoDB Oasis | Pay-per-model |
| Ledger/Blockchain | Immutable, auditable | Tamper-proof | Slower writes | Audit logs, compliance | QLDB, Hyperledger | Amazon QLDB | Pay-per-write |
| Embedded | Local, file-based | Lightweight, fast | Not distributed | Mobile, IoT, desktop | SQLite, RocksDB | - | Free |
| NewSQL | Distributed SQL | ACID + scale | Newer tech | Global SaaS, fintech | CockroachDB, Spanner | CockroachDB Cloud | Pay-per-node |
| Scenario | Recommended DB(s) | Reason |
|---|
| Twitter | PostgreSQL + Redis + Neo4j | SQL for accounts, Redis for caching, Graph DB for connections |
| Netflix | Cassandra + MongoDB | Wide-Column for analytics, Document DB for metadata |
| Uber | PostgreSQL + Redis + TimescaleDB | SQL for transactions, Key-Value cache, Time-Series for GPS |
| E-commerce site | MySQL + Elasticsearch | SQL for inventory/orders, Search DB for product search |
| IoT sensor data | InfluxDB, TimescaleDB | High write throughput, time-series optimized |
| Real-time chat | Redis, MongoDB | Fast pub/sub, flexible messages |
| Blockchain explorer | PostgreSQL, Elasticsearch | SQL for transactions, Search for logs |
| Mobile app local | SQLite, Realm | Embedded, lightweight |
Tip: For most modern systems, a polyglot persistence approach (using multiple DBs for different needs) is common!
Links: