Database Playground · six database families

Storage and mutations

Postgres stores data in tables with a fixed schema. Foreign keys connect one table to another, and the database enforces them. Writes that violate the FK get rejected.

users

0 rows

idPK	name	email

posts

0 rows

idPK	user_idFK	title

Access patterns · good vs bad

Index lookups over declared keys are fast. Queries that scan every row are slow.

✓Indexed lookup

SELECT * FROM posts WHERE id = 3

0 / 6 rows checked O(log N)

✗Full table scan

SELECT * FROM posts WHERE title LIKE '%kernel%'

0 / 6 rows checked O(N)

Under load

Connections are finite. Row locks serialize concurrent writes to the same row.

Postgres · pool + queue idle

connection pool (10 slots)

request queue 0

🔒 users.id=1 · row lock held

ops / sec

p99 wait + service

0ms

pool utilization

When to pick it

If your data has relationships and correctness matters more than throughput, start here.

✓Use when

Data has real relationships (orders, users, line items)
You need ACID transactions across rows
Queries are varied and exploratory. You don't know them all up front
Correctness matters more than last-mile throughput
The working set fits on one beefy node

✗Avoid when

You need sub-millisecond key lookups at huge QPS (use KV)
You need to scale writes past what one node can do (use wide-column)
Rows will be constantly UPDATEd by many writers (lock contention)
Data is sparse and each row has totally different columns
You want queue-like append-only semantics (use a queue)

★Real-world

Postgres: the safe default for most products
MySQL: same shape, different ecosystem
SQLite: embedded, single-file, very capable
CockroachDB or Spanner: distributed SQL that scales
Typical use: billing, user accounts, CMS, orders

Storage and mutations

Redis is one big hash map. Values are addressed by a key: hash(key) → bucket → value. SET, GET, DEL are all O(1). No schema. No cross-key transactions.

Redis · hash table (8 buckets) idle

Access patterns · good vs bad

Point lookup by key is O(1). Search by value means walking every key, O(N).

✓Point lookup by key

GET user:7

0 / 8 buckets visited O(1)

✗Scan all values

"Find values containing 'pro'"

0 / 8 buckets visited O(N)

Under load

Sub-millisecond latency at huge throughput. Hot keys and memory pressure can break that.

Redis · hash table under load idle

ops / sec

p99 latency

0ms

capacity

0 / 24

When to pick it

Best for caches, ephemeral state, and simple keyed lookups. Rarely your source of truth.

✓Use when

Data is naturally keyed (user_id, session_id, sku)
You need sub-millisecond latency at huge QPS
Data is ephemeral or has a TTL (sessions, caches, rate counters)
Hot data that doesn't change often (feature flags, config)
You need a cache or read accelerator in front of a real database

✗Avoid when

You need to query by anything other than the key
You need relationships between entries
You need ACID across multiple keys
Data is too big for memory (most KV stores are RAM-first)
It's your only copy and durability matters

★Real-world

Redis: sessions, caches, rate limits, leaderboards, pub/sub
Memcached: pure cache, simpler
DynamoDB: disk-backed, durable, scales
etcd, ZooKeeper: KV for configuration and coordination
Typical use: session store, API cache, feature flags, rate limiter

Storage and mutations

Cassandra groups data by a row key. Rows are schema-flexible and often sparse. Writes are upserts at the column level. Insert with an existing row key and the new columns merge in.

Access patterns · partition key is king

The row key hashes to a node. Query with the row key, one node answers. Query without it, every node has to scan.

✓Query by row key

SELECT * FROM user_profiles WHERE row_key = 'u#1001'

0 / 4 nodes contacted 1 hop

✗Scatter scan

SELECT * FROM user_profiles WHERE language = 'Go' ALLOW FILTERING

0 / 4 nodes contacted all nodes

Under load

Throughput scales linearly with nodes. Concentrate writes on one row key and you pin a single node. That's a hot partition.

Cassandra · cluster under load idle

cluster ops / sec

nodes

load balance

idle

all nodes idle

When to pick it

Wide-column is a specialist. Pick it when you have massive write volume and already know your access pattern.

✓Use when

Massive write volume that one node can't handle
You know your access patterns in advance (always read by row key)
Time-series, event logs, append-heavy data
You need horizontal scale. Adding nodes adds throughput linearly
Eventual consistency is acceptable (tunable per query)

✗Avoid when

You need ad-hoc queries across the data
You need JOINs between entities (don't exist here)
You need strong consistency across partitions
The data is small enough to fit on one machine
You don't know how you'll query yet

★Real-world

Cassandra: time-series, event history, feed data
Bigtable: Google's original, backs search and analytics
HBase: Hadoop ecosystem
DynamoDB: managed, wide-column and KV hybrid
Typical use: messaging history, telemetry, IoT, activity feeds

Storage and mutations

A bucket holds objects addressed by a key like photos/2024/sunset.jpg. Objects are immutable blobs with metadata. PUT, GET, DELETE. No partial updates.

bucket: media-prod 0 objects

Access patterns · key or scan

GET by exact key is one index lookup. Filter by content or size and you have to walk every object in the bucket.

✓GET by exact key

GET media-prod/photos/2024/sunset.jpg

0 / 8 objects touched O(1)

✗Filter by size or content

ListObjectsV2, then filter client-side for size > 5MB and key contains 'sunset'

0 / 8 objects touched O(N)

Under load

Parallel GETs and PUTs scale almost for free. The pain shows up on a LIST over a huge bucket, where you page through keys one window at a time.

Object store · request fan-out idle

throughput · ops / sec

p99 latency

0ms

LIST pagination

0 / 0

no LIST in flight

When to pick it

The default home for big files. Cheap, durable, and infinite. Everything else is a tradeoff.

✓Use when

Large blobs: images, video, audio, backups, ML datasets
Files that are written once and read many times
You want cheap durable storage at scale with no capacity planning
Static asset hosting behind a CDN
Data lakes, build artifacts, model weights, log archives

✗Avoid when

You need transactions or strong consistency across writes
You need to query by content, size, or any field
You need partial updates to a file
You need sub-100µs latency per request
The data is relational with joins and references

★Real-world

S3: AWS, the original. Backs most of the modern web
GCS, Azure Blob, R2: same shape, different vendors
MinIO: self-hosted, S3-compatible
Typical use: media, backups, ML datasets, build artifacts
Static sites: HTML, JS, CSS served straight from a bucket

Storage and mutations

Each item is a high-dimensional embedding vector (e.g., 1536 dims) plus a small payload. The DB builds an ANN index (HNSW, IVF) so similarity search skips most vectors instead of scanning them all.

2D projection · semantic space (1536 dims compressed to 2)

animals programming machine learning food

Access patterns · approximate beats exact

ANN search uses the index to narrow down to a handful of candidates near the query, then checks just those. Brute-force KNN compares the query to every vector.

✓ANN search (HNSW)

SELECT * FROM items ORDER BY embedding <-> $query LIMIT 3

index narrows the search

0 / 16 vectors compared ~log N

✗Exact KNN (brute force)

Compare query to every vector. O(N · D)

scan every vector

0 / 16 vectors compared O(N · D)

Under load

ANN trades a little recall for a lot of speed. Filtered queries can fall off a cliff if the filter is selective, since the index doesn't know about your filter.

Vector index · query latency vs recall idle

live queries hitting the index

queries / sec

recall

100%

brute force, all neighbors found

p99 latency

0ms

no traffic

When to pick it

Vector storage is for similarity, not lookup. Pick it when "close enough" is the question and an exact key won't do.

✓Use when

Semantic search over text, code, or docs
RAG for chatbots and AI assistants
Recommendations based on similarity, not rules
Image, audio, or video similarity matching
Anomaly detection and deduplication

✗Avoid when

You need exact key lookup (use KV)
Transactional data with strict consistency
Complex relational joins across many entities
"Similarity" is ill-defined for your data
Tiny datasets where brute force is fine

★Real-world

pgvector: Postgres extension, great default
Pinecone: managed, scales effortlessly
Weaviate, Qdrant, Milvus: open-source options
Chroma: embedded, popular for local RAG
Typical use: chatbot context, semantic search, recs

Storage and mutations

Data is nodes with labels (Person, Movie, City) and edges between them (FRIENDS_WITH, ACTED_IN, LIVES_IN). Both nodes and edges carry properties, and edges are first-class citizens you traverse in either direction.

property graph · nodes and edges Person Movie City

0 nodes, 0 edges CREATE (a:Person)-[:FRIENDS_WITH]->(b:Person)

Access patterns · traverse, don't scan

Start at a known node and hop along edges. The DB walks the local neighborhood instead of touching the whole dataset.

✓Traverse from a starting node

MATCH (a:Person {name:'Alice'})-[:FRIENDS_WITH*1..2]->(f) RETURN f

0 / 0 nodes visited local subgraph

✗Query without a starting node

MATCH (p:Person)-[:ACTED_IN]->(:Movie)<-[:ACTED_IN]-(:Person)-[:LIVES_IN]->(:City) ...

0 / 0 nodes scanned whole graph

Under load

Shallow traversals stay fast on huge graphs because each hop is local. Push depth too far and the visited set fans out fast.

Neo4j · traversal cost by depth idle

nodes visited

edges traversed

latency

0ms

no traversal yet

When to pick it

Pick a graph when relationships are the point. Multi-hop questions about who connects to whom should be cheap.

✓Use when

Highly-connected data where relationships are central to your queries
You ask multi-hop questions: friends-of-friends, paths between two entities, shared neighbors
Social graphs, fraud rings, knowledge graphs, recommendations, supply chains, identity, network topology
You want to ask "how is X connected to Y" without 6 expensive JOINs
The shape of relationships changes often, schemas need to flex

✗Avoid when

Simple tabular data with few relationships
High-volume simple key lookups (a KV store will do)
You don't actually traverse relationships often
Your team has no graph experience and the use case is borderline
You need full SQL analytics over flat tables

★Real-world

Neo4j: the reference graph DB, Cypher query language
Amazon Neptune: managed graph on AWS
ArangoDB, JanusGraph, Dgraph, TigerGraph, Memgraph
Typical use: social graph, recommendations, fraud detection
Knowledge graphs, identity resolution, dependency analysis