NoSQL Databases

Why NoSQL Databases Matter

NoSQL databases provide alternatives to relational databases optimized for different data models and scaling patterns. They trade traditional ACID guarantees for flexibility, horizontal scalability, and performance.

Core Benefits:

  • Flexible schema: Adapt data model without migrations
  • Horizontal scalability: Add nodes to increase capacity
  • High performance: Optimized for specific access patterns
  • Specialized data models: Documents, key-value, wide-column, graphs
  • High availability: Built-in replication and fault tolerance

Problem: Relational databases struggle with massive scale, flexible schemas, and certain access patterns (e.g., caching, time-series).

Solution: Choose appropriate NoSQL database type based on data model and access patterns.

NoSQL vs SQL Trade-offs

AspectSQLNoSQL
SchemaFixed (enforced)Flexible (schema-less or schema-optional)
ScalingVertical (bigger hardware)Horizontal (more nodes)
TransactionsACID (strong consistency)BASE (eventual consistency)
JoinsComplex joins supportedLimited or no joins (denormalize)
Query flexibilitySQL (very flexible)Varies by database type
Use casesFinancial, traditional appsBig data, real-time, flexible schema

Key insight: SQL and NoSQL are complementary - many applications use both (polyglot persistence).

CAP Theorem

CAP theorem states distributed systems can provide only 2 of 3 guarantees:

  • Consistency: All nodes see same data at same time
  • Availability: System responds to requests (no downtime)
  • Partition tolerance: System works despite network splits

NoSQL choices:

  • CP (Consistency + Partition tolerance): MongoDB, HBase, Redis Cluster
  • AP (Availability + Partition tolerance): Cassandra, DynamoDB, Riak

Trade-off: Choose consistency (CP) for financial data, availability (AP) for social media feeds.

NoSQL Database Types

Document Stores

Model: JSON-like documents with nested structure

Examples: MongoDB, CouchDB

Use cases: Content management, user profiles, catalogs

Strengths:

  • Flexible schema (add fields without migrations)
  • Rich queries (secondary indexes, aggregation)
  • Natural mapping to objects

Weaknesses:

  • No joins (denormalize data)
  • Document size limits (16MB in MongoDB)
  • Eventual consistency (configurable)

Key-Value Stores

Model: Simple key → value mapping

Examples: Redis, DynamoDB, Riak

Use cases: Caching, session storage, real-time analytics

Strengths:

  • Extremely fast (O(1) lookups)
  • Simple API (GET/SET)
  • Data structures (Redis: lists, sets, sorted sets)

Weaknesses:

  • No queries (only key-based access)
  • Limited transactions
  • Value is opaque (no partial updates)

Wide-Column Stores

Model: Rows with dynamic columns (column families)

Examples: Cassandra, HBase, ScyllaDB

Use cases: Time-series, IoT sensors, event logging

Strengths:

  • Write-optimized (append-only)
  • Linear scalability (petabyte scale)
  • Time-series efficient

Weaknesses:

  • Complex data modeling
  • Limited query flexibility
  • Eventual consistency

Graph Databases

Model: Nodes and edges (relationships)

Examples: Neo4j, Amazon Neptune

Use cases: Social networks, recommendation engines, fraud detection

Strengths:

  • Relationship queries (shortest path, pattern matching)
  • Cypher query language
  • ACID transactions

Weaknesses:

  • Scaling challenges
  • Limited to graph problems
  • Specialized expertise required

NoSQL Type Selection Matrix

Data ModelAccess PatternChoose
Flexible documentsRich queriesMongoDB
Simple key-valueFast cache/sessionsRedis
Time-series/eventsHigh write throughputCassandra
Complex relationshipsGraph traversalNeo4j

MongoDB (Document Store)

MongoDB stores JSON-like documents with flexible schema and rich query capabilities.

Why Use MongoDB

Strengths:

  • Flexible schema (add fields without migrations)
  • Rich queries (filters, projections, aggregation)
  • Horizontal scaling (sharding)
  • ACID transactions (replica sets)
  • Mature ecosystem (Atlas cloud, Compass GUI)

Weaknesses:

  • No joins (denormalize or use $lookup)
  • 16MB document size limit
  • Memory-intensive (indexes in RAM)

Use when: Schema evolves frequently, need rich queries, documents are self-contained units.

MongoDB Java Driver

Maven dependency:

<dependency>
    <groupId>org.mongodb</groupId>
    <artifactId>mongodb-driver-sync</artifactId>
    <version>5.2.1</version>
</dependency>

Connection pattern:

import com.mongodb.client.*;
import org.bson.Document;

String connectionString = "mongodb://localhost:27017";  // => MongoDB connection URI (type: String)
                                                        // => Format: mongodb://host:port
try (MongoClient mongoClient = MongoClients.create(connectionString)) {  // => Create client connection (type: MongoClient)
                                                                         // => try-with-resources ensures close()
    MongoDatabase database = mongoClient.getDatabase("myapp");  // => Get database "myapp" (type: MongoDatabase)
                                                                // => Database created lazily if doesn't exist
    MongoCollection<Document> collection = database.getCollection("users");  // => Get collection "users" (type: MongoCollection<Document>)
                                                                             // => Collection created on first write

    // Perform operations
}  // => mongoClient.close() called automatically

CRUD Operations

Insert document:

Document user = new Document("name", "Alice")  // => Create document with name field (type: Document)
                                               // => Documents are BSON (Binary JSON) structures
    .append("email", "alice@example.com")  // => Add email field (type: String)
    .append("age", 30)  // => Add age field (type: int)
    .append("tags", Arrays.asList("developer", "java"));  // => Add tags array (type: List<String>)
                                                          // => Flexible schema: can add any fields

collection.insertOne(user);  // => Insert document into collection
                            // => MongoDB auto-generates _id field (type: ObjectId)
                            // => Returns immediately after acknowledgment
System.out.println("Inserted with ID: " + user.getObjectId("_id"));  // => Output: Inserted with ID: 507f1f77bcf86cd799439011
                                                                      // => _id is unique identifier

Find documents:

// Find all
for (Document doc : collection.find()) {
    System.out.println(doc.toJson());
}

// Find with filter
Document filter = new Document("age", new Document("$gte", 25));
for (Document doc : collection.find(filter)) {
    System.out.println(doc.getString("name"));
}

// Find one
Document found = collection.find(Filters.eq("email", "alice@example.com"))
    .first();

if (found != null) {
    System.out.println("Found user: " + found.getString("name"));
}

Update document:

import com.mongodb.client.model.Updates;
import com.mongodb.client.result.UpdateResult;

// Update one
UpdateResult result = collection.updateOne(
    Filters.eq("email", "alice@example.com"),
    Updates.combine(
        Updates.set("age", 31),
        Updates.addToSet("tags", "mongodb")
    )
);

System.out.println("Modified " + result.getModifiedCount() + " document(s)");

// Update many
collection.updateMany(
    Filters.lt("age", 18),
    Updates.set("status", "minor")
);

Delete document:

import com.mongodb.client.result.DeleteResult;

// Delete one
DeleteResult result = collection.deleteOne(
    Filters.eq("email", "alice@example.com")
);

System.out.println("Deleted " + result.getDeletedCount() + " document(s)");

// Delete many
collection.deleteMany(Filters.eq("status", "inactive"));

Query Filters and Projections

Complex filters:

import com.mongodb.client.model.Filters;

// Comparison operators
collection.find(Filters.and(
    Filters.gte("age", 25),
    Filters.lt("age", 40)
));

// Logical operators
collection.find(Filters.or(
    Filters.eq("status", "active"),
    Filters.exists("premium", true)
));

// Array operators
collection.find(Filters.in("tags", "java", "python"));
collection.find(Filters.all("tags", Arrays.asList("java", "developer")));

// Text search
collection.createIndex(new Document("description", "text"));
collection.find(Filters.text("mongodb tutorial"));

Projections (select specific fields):

import com.mongodb.client.model.Projections;

// Include specific fields
collection.find()
    .projection(Projections.fields(
        Projections.include("name", "email"),
        Projections.excludeId()
    ))
    .forEach(doc -> System.out.println(doc.toJson()));

// Exclude specific fields
collection.find()
    .projection(Projections.exclude("password", "ssn"))
    .forEach(doc -> System.out.println(doc.toJson()));

Aggregation Pipeline

Aggregation pipeline processes documents through stages (filter → transform → group).

Pattern:

import com.mongodb.client.model.Aggregates;
import com.mongodb.client.model.Accumulators;
import com.mongodb.client.model.Sorts;

// Count users by country, sorted descending
List<Document> pipeline = Arrays.asList(
    Aggregates.match(Filters.eq("status", "active")),
    Aggregates.group("$country", Accumulators.sum("count", 1)),
    Aggregates.sort(Sorts.descending("count")),
    Aggregates.limit(10)
);

collection.aggregate(pipeline)
    .forEach(doc -> {
        System.out.println(doc.getString("_id") + ": " + doc.getInteger("count"));
    });

Average age by department:

List<Document> pipeline = Arrays.asList(
    Aggregates.group("$department",
        Accumulators.avg("avgAge", "$age"),
        Accumulators.sum("count", 1)
    )
);

collection.aggregate(pipeline)
    .forEach(doc -> {
        System.out.printf("%s: %.1f avg age (%d users)%n",
            doc.getString("_id"),
            doc.getDouble("avgAge"),
            doc.getInteger("count"));
    });

Indexing Strategies

Create indexes for frequently queried fields:

import com.mongodb.client.model.Indexes;
import com.mongodb.client.model.IndexOptions;

// Single field index
collection.createIndex(Indexes.ascending("email"));

// Compound index (multiple fields)
collection.createIndex(Indexes.compoundIndex(
    Indexes.ascending("country"),
    Indexes.descending("age")
));

// Unique index
collection.createIndex(
    Indexes.ascending("username"),
    new IndexOptions().unique(true)
);

// Text index for search
collection.createIndex(Indexes.text("description"));

// TTL index (auto-delete old documents)
collection.createIndex(
    Indexes.ascending("createdAt"),
    new IndexOptions().expireAfter(30L, TimeUnit.DAYS)
);

Performance:

  • Without index: O(n) collection scan
  • With index: O(log n) B-tree lookup

Trade-off: Indexes speed up reads but slow down writes (must update index).

Redis (Key-Value Store)

Redis is an in-memory data structure store supporting strings, hashes, lists, sets, and sorted sets.

Why Use Redis

Strengths:

  • Extremely fast (in-memory, ~100k ops/sec)
  • Rich data structures (not just key-value)
  • Pub/Sub messaging
  • Persistence options (RDB snapshots, AOF logs)
  • Atomic operations

Weaknesses:

  • Limited by RAM (must fit in memory)
  • Single-threaded (CPU-bound for complex operations)
  • No query language (key-based access only)

Use when: Need caching, session storage, real-time leaderboards, rate limiting, pub/sub messaging.

Redis Java Clients

Jedis (simple, synchronous):

<dependency>
    <groupId>redis.clients</groupId>
    <artifactId>jedis</artifactId>
    <version>5.2.0</version>
</dependency>

Lettuce (async, thread-safe):

<dependency>
    <groupId>io.lettuce</groupId>
    <artifactId>lettuce-core</artifactId>
    <version>6.4.0.RELEASE</version>
</dependency>

Recommendation: Use Lettuce for production (thread-safe, reactive support).

String Operations (Jedis)

import redis.clients.jedis.*;

try (Jedis jedis = new Jedis("localhost", 6379)) {  // => Create Redis connection (type: Jedis)
                                                    // => localhost:6379 is default Redis port
                                                    // => try-with-resources ensures close()
    // Set/Get
    jedis.set("user:1000:name", "Alice");  // => Set key-value pair (returns "OK")
                                          // => Key: "user:1000:name", Value: "Alice"
                                          // => Overwrites if key exists
    String name = jedis.get("user:1000:name");  // => Get value by key (type: String)
    System.out.println("Name: " + name);  // Output: Name: Alice

    // Set with expiration (TTL)
    jedis.setex("session:abc123", 3600, "user-data");  // => Set with TTL (Time To Live)
                                                       // => Expires in 3600 seconds (1 hour)
                                                       // => Automatically deleted after expiration

    // Increment counter
    jedis.incr("page:views");  // => Atomically increment by 1 (type: Long)
                              // => Creates key with value 1 if doesn't exist
    Long views = jedis.incrBy("page:views", 5);  // => Atomically increment by 5 (type: Long)
                                                 // => views is new value after increment

    // Check existence
    boolean exists = jedis.exists("user:1000:name");  // => Check if key exists (type: boolean)
                                                      // => Returns true if exists, false otherwise

    // Delete
    jedis.del("session:abc123");  // => Delete key (type: Long - number of keys deleted)
                                 // => Key no longer exists after deletion
}  // => jedis.close() called automatically

Hash Operations (Object Storage)

Hashes map field names to values (like Java Map).

// Store user as hash
Map<String, String> user = Map.of(
    "name", "Alice",
    "email", "alice@example.com",
    "age", "30"
);

jedis.hset("user:1000", user);

// Get all fields
Map<String, String> userData = jedis.hgetAll("user:1000");
System.out.println(userData);

// Get specific field
String email = jedis.hget("user:1000", "email");

// Increment numeric field
jedis.hincrBy("user:1000", "loginCount", 1);

// Get multiple fields
List<String> values = jedis.hmget("user:1000", "name", "email");

List Operations (Queues/Stacks)

Lists are ordered collections (doubly-linked lists).

// Push to list (queue)
jedis.rpush("tasks", "task1", "task2", "task3");

// Pop from list (FIFO queue)
String task = jedis.lpop("tasks");  // Returns "task1"

// Stack (LIFO)
jedis.rpush("stack", "item1", "item2");
String top = jedis.rpop("stack");  // Returns "item2"

// Get range
List<String> allTasks = jedis.lrange("tasks", 0, -1);

// List length
Long length = jedis.llen("tasks");

// Blocking pop (wait for items)
List<String> item = jedis.blpop(5, "tasks");  // Wait 5 seconds

Use cases: Task queues, activity feeds, recent items.

Set Operations (Unique Collections)

Sets store unique unordered values.

// Add members
jedis.sadd("user:1000:tags", "developer", "java", "mongodb");

// Check membership
boolean isMember = jedis.sismember("user:1000:tags", "java");

// Get all members
Set<String> tags = jedis.smembers("user:1000:tags");

// Remove member
jedis.srem("user:1000:tags", "mongodb");

// Set operations
jedis.sadd("user:1000:skills", "java", "python");
jedis.sadd("user:2000:skills", "python", "go");

// Intersection (common skills)
Set<String> common = jedis.sinter("user:1000:skills", "user:2000:skills");

// Union (all skills)
Set<String> all = jedis.sunion("user:1000:skills", "user:2000:skills");

// Difference
Set<String> unique = jedis.sdiff("user:1000:skills", "user:2000:skills");

Use cases: Tags, relationships, uniqueness constraints.

Sorted Set Operations (Leaderboards)

Sorted sets store members with scores for ranking.

// Add members with scores
jedis.zadd("leaderboard", 100, "Alice");
jedis.zadd("leaderboard", 95, "Bob");
jedis.zadd("leaderboard", 120, "Charlie");

// Get top 3 (descending)
List<String> top3 = jedis.zrevrange("leaderboard", 0, 2);
// Output: [Charlie, Alice, Bob]

// Get range with scores
List<Tuple> topWithScores = jedis.zrevrangeWithScores("leaderboard", 0, 2);
for (Tuple t : topWithScores) {
    System.out.println(t.getElement() + ": " + t.getScore());
}

// Get rank (0-based)
Long rank = jedis.zrevrank("leaderboard", "Alice");  // Returns 1 (second place)

// Increment score
jedis.zincrby("leaderboard", 10, "Bob");

// Get score
Double score = jedis.zscore("leaderboard", "Alice");

// Count in range
Long count = jedis.zcount("leaderboard", 90, 110);

Use cases: Leaderboards, priority queues, time-series data.

Pub/Sub Messaging

Redis supports publish/subscribe messaging pattern.

Publisher:

try (Jedis jedis = new Jedis("localhost", 6379)) {
    jedis.publish("notifications", "New order received");
    jedis.publish("notifications", "Payment processed");
}

Subscriber:

import redis.clients.jedis.JedisPubSub;

JedisPubSub subscriber = new JedisPubSub() {
    @Override
    public void onMessage(String channel, String message) {
        System.out.println("Received on " + channel + ": " + message);
    }

    @Override
    public void onSubscribe(String channel, int subscribedChannels) {
        System.out.println("Subscribed to " + channel);
    }
};

try (Jedis jedis = new Jedis("localhost", 6379)) {
    jedis.subscribe(subscriber, "notifications");  // Blocks
}

Pattern matching:

// Subscribe to pattern
jedis.psubscribe(subscriber, "user:*:notifications");

Redis Transactions

Redis transactions execute commands atomically with MULTI/EXEC.

Transaction tx = jedis.multi();

tx.set("key1", "value1");
tx.incr("counter");
tx.hset("user:1000", "status", "active");

List<Object> results = tx.exec();  // Execute atomically

Watch (optimistic locking):

String key = "balance";

jedis.watch(key);  // Watch for changes

Integer balance = Integer.parseInt(jedis.get(key));
if (balance < 100) {
    jedis.unwatch();
    System.out.println("Insufficient balance");
    return;
}

Transaction tx = jedis.multi();
tx.decrBy(key, 100);
List<Object> result = tx.exec();  // Fails if key changed

if (result == null) {
    System.out.println("Transaction failed (key modified)");
}

Lua Scripting

Execute complex operations atomically with Lua scripts.

String script =
    "local current = redis.call('get', KEYS[1]) " +
    "if current and tonumber(current) > tonumber(ARGV[1]) then " +
    "    redis.call('decrby', KEYS[1], ARGV[1]) " +
    "    return 1 " +
    "else " +
    "    return 0 " +
    "end";

Object result = jedis.eval(script, 1, "balance", "100");

if (result.equals(1L)) {
    System.out.println("Deducted 100 from balance");
} else {
    System.out.println("Insufficient balance");
}

Redis Persistence

RDB (Snapshots):

  • Periodic point-in-time snapshots
  • Fast recovery, compact files
  • Data loss possible between snapshots

AOF (Append-Only File):

  • Logs every write operation
  • Minimal data loss (configurable)
  • Larger files, slower recovery

Configuration (redis.conf):

# RDB
save 900 1        # Save after 900s if 1 key changed
save 300 10       # Save after 300s if 10 keys changed

# AOF
appendonly yes
appendfsync everysec   # Sync every second

Recommendation: Use both RDB + AOF for best durability.

Redis Cluster vs Sentinel

Redis Sentinel (high availability):

  • Automatic failover (promote replica on master failure)
  • Monitoring and notifications
  • Single master + multiple replicas

Redis Cluster (horizontal scaling):

  • Automatic sharding across nodes
  • Multi-master architecture
  • 16384 hash slots distributed across nodes

Choose Sentinel when: Need high availability, single master sufficient

Choose Cluster when: Dataset exceeds single server RAM, need write scalability

Cassandra (Wide-Column Store)

Cassandra is a distributed wide-column store optimized for write-heavy workloads and linear scalability.

Why Use Cassandra

Strengths:

  • Linear scalability (add nodes, increase throughput)
  • High write throughput (append-only log)
  • No single point of failure (masterless architecture)
  • Tunable consistency (ONE, QUORUM, ALL)
  • Time-series efficient

Weaknesses:

  • Complex data modeling (denormalize for queries)
  • Limited query flexibility (no joins, limited WHERE)
  • Eventual consistency (by default)
  • Learning curve (different from SQL)

Use when: High write volume, linear scalability needed, time-series data, IoT sensors.

DataStax Java Driver

Maven dependency:

<dependency>
    <groupId>com.datastax.oss</groupId>
    <artifactId>java-driver-core</artifactId>
    <version>4.18.1</version>
</dependency>

Connection pattern:

import com.datastax.oss.driver.api.core.CqlSession;
import com.datastax.oss.driver.api.core.cql.*;

try (CqlSession session = CqlSession.builder()
        .withKeyspace("myapp")
        .build()) {

    // Execute queries
}

CQL (Cassandra Query Language)

CQL looks like SQL but has different semantics.

Create keyspace (database):

session.execute(
    "CREATE KEYSPACE IF NOT EXISTS myapp " +
    "WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 3}"
);

Create table:

session.execute(
    "CREATE TABLE IF NOT EXISTS users (" +
    "    id uuid PRIMARY KEY," +
    "    name text," +
    "    email text," +
    "    created_at timestamp" +
    ")"
);

// Index for secondary queries
session.execute("CREATE INDEX ON users (email)");

Partition Keys and Clustering Columns

Partition key determines data distribution across nodes.

Clustering columns determine sort order within partition.

Pattern:

// Time-series table
session.execute(
    "CREATE TABLE sensor_data (" +
    "    sensor_id text," +          // Partition key
    "    timestamp timestamp," +     // Clustering column
    "    temperature double," +
    "    PRIMARY KEY (sensor_id, timestamp)" +
    ") WITH CLUSTERING ORDER BY (timestamp DESC)"
);

Key insight: All queries MUST include partition key for efficiency.

CRUD Operations

Insert data:

PreparedStatement prepared = session.prepare(
    "INSERT INTO users (id, name, email, created_at) VALUES (?, ?, ?, ?)"
);

BoundStatement bound = prepared.bind(
    UUID.randomUUID(),
    "Alice",
    "alice@example.com",
    Instant.now()
);

session.execute(bound);

Query data:

// Query by partition key
ResultSet rs = session.execute(
    "SELECT * FROM users WHERE id = ?",
    UUID.fromString("123e4567-e89b-12d3-a456-426614174000")
);

for (Row row : rs) {
    System.out.println(row.getString("name") + ": " + row.getString("email"));
}

// Query with clustering column
rs = session.execute(
    "SELECT * FROM sensor_data WHERE sensor_id = ? AND timestamp > ?",
    "sensor-001",
    Instant.now().minus(1, ChronoUnit.HOURS)
);

Update data:

session.execute(
    "UPDATE users SET email = ? WHERE id = ?",
    "newemail@example.com",
    userId
);

// Counter column (atomic increment)
session.execute(
    "UPDATE page_views SET count = count + 1 WHERE page_id = ?",
    "home"
);

Delete data:

// Delete row
session.execute("DELETE FROM users WHERE id = ?", userId);

// Delete column
session.execute("DELETE email FROM users WHERE id = ?", userId);

// TTL (time-to-live) delete
session.execute(
    "INSERT INTO sessions (id, data) VALUES (?, ?) USING TTL 3600",
    sessionId,
    sessionData
);

Tunable Consistency

Cassandra allows per-query consistency levels.

Consistency levels:

  • ONE: Single replica (fastest, least consistent)
  • QUORUM: Majority of replicas (balanced)
  • ALL: All replicas (slowest, most consistent)
  • LOCAL_QUORUM: Quorum in local datacenter

Pattern:

SimpleStatement statement = SimpleStatement.builder(
    "SELECT * FROM users WHERE id = ?"
)
.setConsistencyLevel(ConsistencyLevel.QUORUM)
.build();

ResultSet rs = session.execute(statement.bind(userId));

Trade-off: Higher consistency → slower queries, lower availability

Recommendation: Use QUORUM for balanced consistency/performance.

Write Path and Read Path

Write path (why writes are fast):

  1. Write to commit log (sequential disk write)
  2. Write to memtable (in-memory)
  3. Return success immediately
  4. Background flush to SSTable (sorted string table)

Read path:

  1. Check memtable
  2. Check row cache
  3. Check bloom filters (avoid disk reads)
  4. Read SSTables, merge results

Key insight: Writes never read disk (append-only), reads may hit multiple SSTables.

When to Use Cassandra

Use Cassandra for:

  • Time-series data (sensor readings, logs)
  • High write throughput (millions/sec per node)
  • Linear scalability (petabyte scale)
  • Multi-datacenter replication
  • Always-on availability (no single point of failure)

Avoid Cassandra for:

  • Complex JOINs (not supported)
  • Ad-hoc queries (limited WHERE clauses)
  • Strong consistency requirements (ACID transactions)
  • Small datasets (operational overhead)

Spring Data NoSQL

Spring Data provides consistent abstractions over NoSQL databases.

Spring Data MongoDB

Maven dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-mongodb</artifactId>
</dependency>

Entity mapping:

import org.springframework.data.mongodb.core.mapping.Document;
import org.springframework.data.annotation.Id;

@Document(collection = "users")
public class User {
    @Id
    private String id;
    private String name;
    private String email;
    private List<String> tags;

    // Getters/setters
}

Repository interface:

import org.springframework.data.mongodb.repository.MongoRepository;
import org.springframework.data.mongodb.repository.Query;

public interface UserRepository extends MongoRepository<User, String> {
    // Query derivation from method name
    List<User> findByName(String name);
    List<User> findByEmailContaining(String domain);
    List<User> findByTagsContaining(String tag);

    // Custom query
    @Query("{ 'age' : { $gte: ?0, $lte: ?1 } }")
    List<User> findByAgeBetween(int minAge, int maxAge);
}

MongoTemplate (lower-level access):

import org.springframework.data.mongodb.core.MongoTemplate;
import org.springframework.data.mongodb.core.query.Query;
import org.springframework.data.mongodb.core.query.Criteria;

@Service
public class UserService {
    @Autowired
    private MongoTemplate mongoTemplate;

    public List<User> findActiveUsers() {
        Query query = new Query();
        query.addCriteria(Criteria.where("status").is("active"));
        return mongoTemplate.find(query, User.class);
    }
}

Spring Data Redis

Maven dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-redis</artifactId>
</dependency>

Entity mapping:

import org.springframework.data.redis.core.RedisHash;
import org.springframework.data.annotation.Id;

@RedisHash("users")
public class User {
    @Id
    private String id;
    private String name;
    private String email;

    // Getters/setters
}

Repository interface:

import org.springframework.data.repository.CrudRepository;

public interface UserRepository extends CrudRepository<User, String> {
    // Inherits save(), findById(), findAll(), delete()
}

RedisTemplate (for data structures):

import org.springframework.data.redis.core.RedisTemplate;

@Service
public class CacheService {
    @Autowired
    private RedisTemplate<String, String> redisTemplate;

    public void cacheValue(String key, String value, Duration ttl) {
        redisTemplate.opsForValue().set(key, value, ttl);
    }

    public void addToSet(String key, String value) {
        redisTemplate.opsForSet().add(key, value);
    }

    public void incrementCounter(String key) {
        redisTemplate.opsForValue().increment(key);
    }
}

Spring Data Cassandra

Maven dependency:

<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-data-cassandra</artifactId>
</dependency>

Entity mapping:

import org.springframework.data.cassandra.core.mapping.Table;
import org.springframework.data.cassandra.core.mapping.PrimaryKey;

@Table("users")
public class User {
    @PrimaryKey
    private UUID id;
    private String name;
    private String email;

    // Getters/setters
}

Repository interface:

import org.springframework.data.cassandra.repository.CassandraRepository;

public interface UserRepository extends CassandraRepository<User, UUID> {
    List<User> findByEmail(String email);
}

Data Modeling Patterns

Document Modeling (Embedding vs Referencing)

Embedding (denormalize):

// User document with embedded addresses
{
    "_id": ObjectId("..."),
    "name": "Alice",
    "addresses": [
        { "type": "home", "street": "123 Main St", "city": "NYC" },
        { "type": "work", "street": "456 Office Blvd", "city": "SF" }
    ]
}

Referencing (normalize):

// User document
{ "_id": ObjectId("user1"), "name": "Alice" }

// Address documents
{ "_id": ObjectId("addr1"), "userId": ObjectId("user1"), "type": "home", ... }
{ "_id": ObjectId("addr2"), "userId": ObjectId("user1"), "type": "work", ... }

When to embed:

  • One-to-few relationships
  • Data accessed together
  • No independent access needed

When to reference:

  • One-to-many or many-to-many
  • Large subdocuments (approaching 16MB limit)
  • Independent access patterns

Denormalization Strategies

Problem: NoSQL databases have no joins - must denormalize data.

Pattern: Duplicate data to avoid lookups.

Example (e-commerce orders):

// Order document with denormalized user data
{
    "_id": ObjectId("order1"),
    "orderId": "ORD-123",
    "user": {                    // Denormalized user data
        "userId": ObjectId("user1"),
        "name": "Alice",
        "email": "alice@example.com"
    },
    "items": [
        {
            "productId": ObjectId("prod1"),
            "name": "Widget",    // Denormalized product name
            "price": 29.99,      // Denormalized price (at time of order)
            "quantity": 2
        }
    ],
    "total": 59.98,
    "status": "shipped"
}

Trade-off: Faster reads (no joins), but must handle stale data (user changes email).

Time-Series Data Patterns

Cassandra pattern (bucketing by time):

CREATE TABLE sensor_readings (
    sensor_id text,
    bucket text,              // "2025-02-04" (daily bucket)
    timestamp timestamp,
    temperature double,
    PRIMARY KEY ((sensor_id, bucket), timestamp)
) WITH CLUSTERING ORDER BY (timestamp DESC);

Query recent data:

String bucket = LocalDate.now().toString();
session.execute(
    "SELECT * FROM sensor_readings WHERE sensor_id = ? AND bucket = ? LIMIT 100",
    "sensor-001",
    bucket
);

MongoDB pattern (time-series collections)**:

db.createCollection("sensor_data", {
    timeseries: {
        timeField: "timestamp",
        metaField: "sensorId",
        granularity: "seconds"
    }
});

Aggregation Patterns

MongoDB aggregation (pre-aggregate for performance):

// Real-time aggregation (slower)
collection.aggregate(Arrays.asList(
    Aggregates.match(Filters.eq("status", "active")),
    Aggregates.group("$country", Accumulators.sum("count", 1))
));

// Pre-aggregated collection (faster)
// Background job: Aggregate hourly/daily into summary collection
{
    "_id": "2025-02-04:USA",
    "date": "2025-02-04",
    "country": "USA",
    "activeUsers": 12500,
    "newUsers": 150
}

Pattern: Pre-aggregate data in background jobs for fast dashboard queries.

Best Practices

1. Choose Appropriate NoSQL Type

Match database to data model and access pattern.

Before: Using MongoDB for simple caching After: Use Redis for caching (10x faster)

2. Understand Consistency Models

NoSQL databases often use eventual consistency.

Pattern: Design application to handle stale data.

// Redis cache with fallback to database
String cached = jedis.get("user:" + userId);
if (cached == null) {
    User user = database.findById(userId);
    jedis.setex("user:" + userId, 300, serialize(user));
    return user;
}
return deserialize(cached);

3. Index Frequently Queried Fields

Add indexes for performance.

MongoDB:

collection.createIndex(Indexes.ascending("email"));
collection.createIndex(Indexes.compound(
    Indexes.ascending("country"),
    Indexes.descending("createdAt")
));

Cassandra: Design schema so partition key matches query pattern.

4. Monitor Database Performance

Track slow queries and resource usage.

MongoDB profiler:

database.runCommand(new Document("profile", 1).append("slowms", 100));

Redis: Use MONITOR command (development only).

Cassandra: Use nodetool for cluster health.

5. Plan for Schema Evolution

NoSQL schemas evolve over time.

Pattern: Version documents.

{
    "_id": ObjectId("..."),
    "_version": 2,           // Schema version
    "name": "Alice",
    "email": "alice@example.com",
    "phoneNumbers": [...]    // New field in v2
}

Handle multiple versions in code:

if (doc.getInteger("_version") == 1) {
    // Migrate to v2 on read
    doc.put("phoneNumbers", new ArrayList<>());
    doc.put("_version", 2);
    collection.replaceOne(Filters.eq("_id", doc.getObjectId("_id")), doc);
}

6. Use Connection Pooling

Configure connection pools for production.

MongoDB:

MongoClientSettings settings = MongoClientSettings.builder()
    .applyConnectionString(new ConnectionString(connectionString))
    .applyToConnectionPoolSettings(builder ->
        builder.maxSize(20)
               .minSize(5)
               .maxWaitTime(30, TimeUnit.SECONDS))
    .build();

Redis (Jedis pool):

JedisPoolConfig poolConfig = new JedisPoolConfig();
poolConfig.setMaxTotal(20);
poolConfig.setMaxIdle(10);
poolConfig.setMinIdle(5);

JedisPool pool = new JedisPool(poolConfig, "localhost", 6379);

try (Jedis jedis = pool.getResource()) {
    // Use connection
}

7. Backup and Disaster Recovery

Plan for data loss scenarios.

MongoDB: Use mongodump/mongorestore or continuous backups (Atlas).

Redis: Configure RDB + AOF persistence, backup RDB files.

Cassandra: Use nodetool snapshot for backups.

Related Content

Core Java Topics

External Resources

MongoDB:

Redis:

Cassandra:

Spring Data:


Last Updated: 2026-02-04 Java Version: 17+ (baseline), 21+ (recommended) Library Versions: MongoDB Driver 5.2.1, Jedis 5.2.0, Lettuce 6.4.0, Cassandra Driver 4.18.1

Last updated