Optimize Performance
Problem
Java applications can suffer from performance issues due to inefficient code, suboptimal JVM configuration, or lack of proper profiling. Performance problems manifest as slow response times, high memory usage, or CPU bottlenecks.
// Inefficient code - creates many temporary objects
String result = "";
for (int i = 0; i < 10000; i++) {
result += i; // Creates a new String object each iteration
}This guide shows practical techniques for identifying and resolving performance bottlenecks in Java applications.
Solution
1. Profile Before Optimizing
Always measure performance before making optimizations. Use profiling tools to identify actual bottlenecks rather than guessing.
Java Flight Recorder (JFR) - Built-in profiling tool (Java 11+):
// Enable JFR at startup
// java -XX:StartFlightRecording=duration=60s,filename=recording.jfr MyApp
// Or programmatically
import jdk.jfr.Recording;
import jdk.jfr.consumer.RecordingFile;
public class PerformanceProfiler {
public static void profileOperation() throws Exception {
Recording recording = new Recording();
recording.setMaxSize(100_000_000); // 100MB max
recording.setDumpOnExit(true);
recording.setDestination(Path.of("app-recording.jfr"));
recording.start();
try {
// Your application code here
performBusinessLogic();
} finally {
recording.stop();
recording.close();
}
}
private static void performBusinessLogic() {
// Simulated business logic
List<String> data = new ArrayList<>();
for (int i = 0; i < 1_000_000; i++) {
data.add("Item " + i);
}
}
}2. Benchmark with JMH
Use Java Microbenchmark Harness (JMH) for accurate performance measurements.
Setup (Maven dependency):
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-core</artifactId>
<version>1.37</version>
</dependency>
<dependency>
<groupId>org.openjdk.jmh</groupId>
<artifactId>jmh-generator-annprocess</artifactId>
<version>1.37</version>
<scope>provided</scope>
</dependency>Example Benchmark:
import org.openjdk.jmh.annotations.*;
import java.util.concurrent.TimeUnit;
@BenchmarkMode(Mode.AverageTime)
@OutputTimeUnit(TimeUnit.NANOSECONDS)
@State(Scope.Thread)
@Fork(value = 1, warmups = 1)
@Warmup(iterations = 3, time = 1)
@Measurement(iterations = 5, time = 1)
public class StringConcatenationBenchmark {
private static final int ITERATIONS = 10000;
@Benchmark
public String stringConcatenation() {
String result = "";
for (int i = 0; i < ITERATIONS; i++) {
result += i;
}
return result;
}
@Benchmark
public String stringBuilder() {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < ITERATIONS; i++) {
sb.append(i);
}
return sb.toString();
}
@Benchmark
public String stringBuilderWithCapacity() {
StringBuilder sb = new StringBuilder(ITERATIONS * 5);
for (int i = 0; i < ITERATIONS; i++) {
sb.append(i);
}
return sb.toString();
}
}Running the benchmark:
mvn clean install
java -jar target/benchmarks.jarExpected output:
Benchmark Mode Cnt Score Error Units
StringConcatenationBenchmark.stringConcatenation avgt 5 45.234 ±1.234 ms/op
StringConcatenationBenchmark.stringBuilder avgt 5 0.123 ±0.001 ms/op
StringConcatenationBenchmark.stringBuilderWithCapacity avgt 5 0.089 ±0.001 ms/op3. Tune Garbage Collection
Configure GC based on application characteristics.
G1GC (Default in Java 9+) - Good balance for most applications:
java -XX:+UseG1GC \
-XX:MaxGCPauseMillis=200 \
-XX:InitiatingHeapOccupancyPercent=45 \
-XX:G1HeapRegionSize=16m \
-Xms2g -Xmx4g \
-XX:+PrintGCDetails \
-XX:+PrintGCDateStamps \
-Xloggc:gc.log \
MyAppZGC - Low-latency collector (Java 15+):
java -XX:+UseZGC \
-XX:ZCollectionInterval=5 \
-Xms8g -Xmx8g \
-XX:+PrintGCDetails \
MyAppShenandoah - Low-pause collector:
java -XX:+UseShenandoahGC \
-Xms4g -Xmx4g \
-XX:+PrintGCDetails \
MyApp4. Memory Optimization
Object pooling for frequently created objects:
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.BlockingQueue;
public class ObjectPool<T> {
private final BlockingQueue<T> pool;
private final ObjectFactory<T> factory;
public ObjectPool(ObjectFactory<T> factory, int size) {
this.factory = factory;
this.pool = new ArrayBlockingQueue<>(size);
// Pre-populate pool
for (int i = 0; i < size; i++) {
pool.offer(factory.create());
}
}
public T borrow() throws InterruptedException {
T object = pool.poll();
return object != null ? object : factory.create();
}
public void returnObject(T object) {
factory.reset(object);
pool.offer(object);
}
@FunctionalInterface
public interface ObjectFactory<T> {
T create();
default void reset(T object) {
// Override to reset object state
}
}
}
// Usage
ObjectPool<StringBuilder> pool = new ObjectPool<>(
new ObjectPool.ObjectFactory<>() {
public StringBuilder create() {
return new StringBuilder(1024);
}
public void reset(StringBuilder sb) {
sb.setLength(0);
}
},
10 // pool size
);
// Borrow, use, return
StringBuilder sb = pool.borrow();
try {
sb.append("data");
String result = sb.toString();
} finally {
pool.returnObject(sb);
}How It Works
Profiling Workflow
graph TD
A[Start Application] --> B[Enable Profiling]
B --> C[Run Workload]
C --> D[Collect Metrics]
D --> E[Analyze Results]
E --> F{Bottleneck Found?}
F -->|Yes| G[Optimize Code]
F -->|No| H[Performance Acceptable]
G --> I[Measure Again]
I --> E
style A fill:#0173B2,stroke:#000000,color:#FFFFFF
style D fill:#DE8F05,stroke:#000000,color:#FFFFFF
style F fill:#029E73,stroke:#000000,color:#FFFFFF
style G fill:#CC78BC,stroke:#000000,color:#FFFFFF
style H fill:#029E73,stroke:#000000,color:#FFFFFF
%% Color palette: Blue (#0173B2), Orange (#DE8F05), Teal (#029E73), Purple (#CC78BC)
%% Blue = Start, Orange = Data Collection, Teal = Decision/Success, Purple = Optimization
Key concepts:
- JFR Profiling: Captures detailed runtime information with minimal overhead (<1%)
- JMH Benchmarking: Eliminates JIT compilation effects and warmup variations
- GC Tuning: Balances throughput vs. pause times based on application needs
- Object Pooling: Reduces allocation pressure for frequently created objects
GC Selection Decision Tree
Choose GC based on application characteristics:
- G1GC: General-purpose applications, balanced throughput and latency
- ZGC: Low-latency requirements (<10ms pauses), large heaps (>100GB)
- Shenandoah: Pause-sensitive applications, medium to large heaps
- Parallel GC: Batch processing, throughput-critical (not latency-sensitive)
Variations
CPU Profiling with Async-Profiler
For deeper CPU profiling, use async-profiler (external tool):
wget https://github.com/jvm-profiling-tools/async-profiler/releases/download/v2.9/async-profiler-2.9-linux-x64.tar.gz
tar -xzf async-profiler-2.9-linux-x64.tar.gz
./profiler.sh -d 60 -f flamegraph.html 12345
./profiler.sh -e cpu -d 60 -f cpu-flamegraph.html 12345
./profiler.sh -e alloc -d 60 -f alloc-flamegraph.html 12345Heap Analysis
Analyze memory usage with jmap and jhat:
jmap -dump:format=b,file=heap.bin <pid>
jhat -port 7000 heap.binNative Memory Tracking
Track native memory usage (Java 8+):
java -XX:NativeMemoryTracking=summary MyApp
jcmd <pid> VM.native_memory summary
jcmd <pid> VM.native_memory baseline
jcmd <pid> VM.native_memory summary.diffCommon Pitfalls
Pitfall 1: Premature Optimization
Don’t optimize without profiling data.
// Bad: Optimizing without evidence
public String formatData(List<String> items) {
// Complex optimization that may not help
StringBuilder sb = new StringBuilder(items.size() * 100);
// ...
}
// Good: Optimize after profiling shows it's a bottleneck
public String formatData(List<String> items) {
// Simple, readable code first
return String.join(", ", items);
}Pitfall 2: Microbenchmark Pitfalls
JMH prevents common benchmarking mistakes, but you still need to avoid:
// Bad: Dead code elimination
@Benchmark
public void badBenchmark() {
String result = "test".toUpperCase(); // Result not used - may be optimized away
}
// Good: Return or consume result
@Benchmark
public String goodBenchmark() {
return "test".toUpperCase(); // Result returned
}
// Or use Blackhole
@Benchmark
public void goodBenchmarkBlackhole(Blackhole bh) {
bh.consume("test".toUpperCase()); // Result consumed
}Pitfall 3: Ignoring GC Logs
Always analyze GC logs to understand pause times and throughput:
java -Xlog:gc*:file=gc.log:time,uptime,level,tags \
-XX:+UseG1GC \
MyAppPitfall 4: Object Pooling Overhead
Object pooling can hurt performance if:
- Objects are cheap to create
- Pool management overhead exceeds object creation cost
- Pool causes contention in multithreaded scenarios
// Don't pool simple objects
// Bad: StringBuilder pooling for small operations
String result = pool.borrow().append("x").toString(); // Overhead not worth it
// Good: Direct creation for small operations
String result = new StringBuilder().append("x").toString();Related Patterns
Related Tutorial: See Intermediate Tutorial - Performance Optimization for performance fundamentals and Advanced Tutorial - JVM Internals for deep JVM tuning.
Related How-To: See Work with Streams Effectively for stream performance patterns and Use Collections Effectively for collection performance.
Related Cookbook: See Cookbook recipes “Efficient String Operations” and “Concurrent Collections Performance” for copy-paste ready performance patterns.
Related Explanation: See Best Practices - Performance for overarching performance principles.
Further Reading
- JMH Samples - Official JMH example benchmarks
- G1GC Tuning Guide - Oracle’s G1GC documentation
- Java Performance: The Definitive Guide - Comprehensive performance book
- Async-Profiler GitHub - Low-overhead profiling tool