Production‑Ready JVM Tuning for Containerized Spring Boot Microservices.

1. Intro – Why JVM tuning inside containers is different

Most Spring Boot + Docker guides stop at a simple recipe: build a fat JAR, put it in a container, set Xmx, and move on. That works for a local demo, but it hides an important detail: inside containers the JVM “sees” memory and CPU through Linux cgroups, not the full host machine. If we rely on the defaults, the JVM can make some pretty bad assumptions about how much memory it has available. In production this often shows up as random OOMKills, long GC pauses, spiky latency, and even higher cloud bills because we try to fix it by just throwing more memory at the problem.

In this article we’ll walk through a small Spring Boot service running in Docker, put it under a bit of load, and then tune the JVM settings specifically for a containerized environment. We’ll adjust heap sizing and garbage collection settings so the JVM behaves correctly inside a limited-memory container.

Finally, we’ll look at how those changes affect memory usage, CPU, and request latency using metrics, so you can see the impact clearly and apply the same ideas to your own services.

2. Example service – simple Spring Boot REST API

To keep the focus on JVM behavior and not on business logic, we’ll use a very simple Spring Boot REST API as our example service. The goal of this service is just to allocate some memory and do a bit of fake work on each request so that the JVM actually has something to manage. Every time a client calls our /work endpoint, the application will create a byte array of random size, touch that memory, and hold on to some of it for a short time so the heap usage goes up and down. This pattern is obviously not what you would write in a real production service, but it’s perfect for demonstrating how the JVM allocates heap, runs garbage collection, and behaves when the service runs inside a memory‑limited container.

Controller

// File: DemoController.java
package com.example.jvmTuningDemo;            // Package for our demo

import org.springframework.web.bind.annotation.GetMapping;  // Import GetMapping
import org.springframework.web.bind.annotation.RestController; // Import RestController

import java.util.ArrayList;                   // Import ArrayList
import java.util.List;                        // Import List
import java.util.Random;                      // Import Random

@RestController                               // Mark this class as a REST controller
public class DemoController {                 // Define controller class

    private final Random random = new Random();           // Random generator for work
    private final List<byte[]> memoryHolder = new ArrayList<byte[]>(); // List to hold allocated memory

    @GetMapping("/work")                      // Map HTTP GET /work to this method
    public String doWork() {                  // Method to simulate some work
        int size = 1_000_00 + random.nextInt(5_000_00);  // Calculate random size of allocation
        byte[] data = new byte[size];        // Allocate a byte array of that size
        for (int i = 0; i < data.length; i++) { // Loop to touch the memory
            data[i] = (byte) (i % 128);      // Write some values to the array
        }
        memoryHolder.add(data);              // Store reference so GC cannot free immediately

        if (memoryHolder.size() > 50) {      // If we have stored more than 50 arrays
            memoryHolder.remove(0);          // Remove the oldest one to bound memory growth
        }

        try {
            Thread.sleep(50 + random.nextInt(50));  // Sleep to simulate CPU + IO delay
        } catch (InterruptedException e) {  // Catch interrupted exception
            Thread.currentThread().interrupt();     // Restore interrupt flag
        }

        return "OK size=" + size + " holder=" + memoryHolder.size(); // Return response string
    }
}

Application Class.

// File: JvmTuningDemoApplication.java
package com.example.jvmTuningDemo;           // Package name

import org.springframework.boot.SpringApplication; // Import SpringApplication
import org.springframework.boot.autoconfigure.SpringBootApplication; // Import annotation

@SpringBootApplication                        // Enable Spring Boot auto configuration
public class JvmTuningDemoApplication {       // Main application class

    public static void main(String[] args) {  // Main method
        SpringApplication.run(JvmTuningDemoApplication.class, args); // Start Spring Boot app
    }
}

Dockerfile—naive vs tuned

Naive Dockerfile (what most people do)

Now that we have a small service to play with, the next step is to containerize it. The first Dockerfile most of us write is extremely simple: take a base JDK image, copy the Spring Boot JAR into it, and run java -jar. This is usually enough to get the application running on our laptop or in a test cluster, so it feels “good enough” and we rarely question what the JVM is actually doing with memory inside that container. The important detail, however, is that this naive Dockerfile does not set any JVM options that are aware of container limits, which means the JVM is free to size its heap based on assumptions that may not match the real cgroup constraints. Let’s look at that naive Dockerfile first, and then we’ll see why it can lead to problems under load.

# File: Dockerfile-naive
FROM eclipse-temurin:17-jdk-alpine            # Base image with JDK 17

WORKDIR /app                                  # Set working directory

COPY target/jvm-tuning-demo.jar app.jar       # Copy fat jar into image

EXPOSE 8080                                   # Expose port 8080

ENTRYPOINT ["java","-jar","/app/app.jar"]     # Run jar with default JVM options

Run with small memory limit (Kubernetes-style):

To see how this naive image behaves under real constraints, we’ll run it with a relatively small memory limit, similar to what you might configure for a microservice in Kubernetes. In this example we cap the container at 512 MB of RAM and expose port 8080 so we can hit the /work endpoint from our machine. This setup mimics a pretty common situation in production: the service looks fine with no limit on a developer laptop, but starts to struggle once it’s deployed into a tightly sized container. The commands below build the naive image and then start a container with that 512 MB cap, which we’ll use as the baseline for our load tests.

# build
docker build -f Dockerfile-naive -t jvm-demo-naive .  # Build naive image

# run with 512 MB limit
docker run --rm -m 512m -p 8080:8080 jvm-demo-naive   # Run container with 512MB memory

Problems :

The JVM might size the heap larger than the container’s real cgroup memory limit (especially on older JDKs or with certain defaults).
Under sustained load, the container can be OOMKilled by the kernel even though the application “looks fine” at startup.
Garbage collection behavior may be suboptimal, leading to longer pauses and unpredictable latency.

With the container running, the next step is to put a bit of pressure on the service so the JVM actually has to work. We don’t need a full-blown load-testing tool here; a simple loop that repeatedly calls the /work endpoint is enough to trigger allocations, garbage collection, and potential problems. The script below fires a bunch of requests in parallel and then waits for them to finish, which roughly mimics a burst of traffic hitting the service. You can run it from your terminal while the naive container is up and watch how the application behaves under this short, synthetic load.

You can simulate load using something like:

# Simple load using curl in a loop (or use hey / wrk)
for i in $(seq 1 1000); do              # Loop 1000 times
  curl -s http://localhost:8080/work > /dev/null &  # Fire a background request
done                                    # End loop
wait                                     # Wait for all background jobs

After running this script a few times, you’ll often start to see the weaknesses of the naive setup. The container might look stable at first, but as memory usage grows and GC starts working harder, you can run into sudden OOMKills or noticeable pauses in responses.From the outside it just feels like “sometimes the service is slow” or “the pod randomly dies,” but under the hood it’s usually the JVM making decisions based on the wrong assumptions about how much memory it really has. To fix that, we need to stop relying on the defaults and make the JVM explicitly aware of the container’s limits, which is exactly what we’ll do with a tuned Dockerfile and better JVM options in the next section.

4. Add metrics - Spring Boot + Micrometer + Promethus

Before we start tweaking JVM options, it’s important to have some basic visibility into how the JVM is behaving. Otherwise we’re just guessing. Spring Boot makes this pretty easy through Actuator and Micrometer, and with a Prometheus registry we can export standard JVM metrics like heap usage, GC pauses, and thread counts. By wiring these pieces together, we can see how our naive container behaves under load and then compare it directly with the tuned version later. The dependencies below enable Actuator in our application and register a Prometheus endpoint so that any metrics backend—or even a simple local Prometheus and Grafana setup—can scrape and visualize what the JVM is doing over time.

<!-- Add actuator and prometheus registry -->
<dependency>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>

<dependency>
    <groupId>io.micrometer</groupId>
    <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

management:
  endpoints:
    web:
      exposure:
        include: health,info,prometheus  # Expose prometheus endpoint
  endpoint:
    prometheus:
      enabled: true                      # Enable prometheus endpoint
  metrics:
    tags:
      application: jvm-tuning-demo       # Add common tag

With these properties in place, Spring Boot exposes a small HTTP endpoint that publishes all of the application’s metrics in Prometheus format. The health and info endpoints give you basic liveness details, while the prometheus endpoint is where all the JVM and HTTP metrics are scraped from. By adding a simple application tag, we also make it easier to filter and group metrics later when multiple services are running in the same cluster. At this point you can start the application, open the /actuator/prometheus URL in a browser, and you should see a long list of jvm_ and http_server_ metrics that we’ll use to compare the naive and tuned JVM configurations.

5. Tuned Dockerfile - container-aware JVM settings

Now that we have some visibility into how the JVM behaves, we can build a more realistic Docker image that is actually aware of the container’s limits. Instead of letting the JVM guess the heap size and GC strategy, we’ll pass a small set of options that explicitly control how much memory the heap can use and how aggressively garbage collection should run. The goal is not to find the perfect magic numbers, but to show a sensible starting point that respects the container’s RAM limit and avoids surprise OOMKills under load. In the Dockerfile below, we keep the same base image and application JAR as before, but introduce a JAVA_OPTS environment variable with container‑aware flags. This tuned image will behave much more predictably when we give it the same 512 MB limit as the naive version, which makes it easy to compare the two side by side.

Tuned Dockerfile.

# File: Dockerfile-tuned
FROM eclipse-temurin:17-jdk-alpine                        # Use container-aware JVM

WORKDIR /app                                              # Set working directory

COPY target/jvm-tuning-demo.jar app.jar                   # Copy jar

# Optional: set JVM options via environment variable
ENV JAVA_OPTS="-XX:+UseContainerSupport \
 -XX:MaxRAMPercentage=70.0 \
 -XX:InitialRAMPercentage=70.0 \
 -XX:+UseG1GC \
 -XX:MaxGCPauseMillis=200 \
 -XX:+ExitOnOutOfMemoryError"

EXPOSE 8080                                               # Expose port

ENTRYPOINT ["sh","-c","java $JAVA_OPTS -jar /app/app.jar"] # Run java with tuned options

With this Dockerfile in place, we’re still building and running the application in almost the same way as before, but the JVM now starts with a much clearer picture of its environment. The heap is capped to a percentage of the container’s memory instead of whatever the JVM might infer from the host, and G1GC is instructed to aim for reasonably short pause times.We also tell the process to exit immediately on an out‑of‑memory error so that an orchestrator like Kubernetes can restart it quickly instead of leaving a half‑broken pod running. To keep the comparison fair, we’ll run this tuned image with the same 512 MB memory limit as the naive version and send it through the same load test, then look at how memory usage, GC pauses, and latency differ between the two.

Before we look at the results, it’s worth pausing on the JVM flags we’ve just added and why they matter in a containerized environment. These options are small, but they completely change how the JVM thinks about memory, garbage collection, and failure modes inside a pod.If you understand what each of them does, you can confidently adjust the values for your own services instead of copy‑pasting random snippets from the internet. The list below walks through the key flags from the tuned Dockerfile and explains, in plain language, what role each one plays in keeping your Spring Boot application stable inside a constrained container.

key flags :

-XX:+UseContainerSupport
- Make JVM respect cgroup memory/CPU limits.
-XX:MaxRAMPercentage=70.0
- Let heap use only ~70% of container memory, leaving room for metaspace, threads, buffers.
-XX:InitialRAMPercentage=70.0
- Start with the same heap so you don’t pay for lots of resize operations.
-XX:+UseG1GC & -XX:MaxGCPauseMillis=200
- Low-pause collector suited for services; target ~200 ms pauses.
-XX:+ExitOnOutOfMemoryError
- Crash fast on OOM so Kubernetes can restart quickly instead of limping.

With these options in place, the JVM is no longer guessing how to behave inside the container; it has clear boundaries and goals. The heap is sized as a proportion of the container’s memory, garbage collection is tuned for shorter pauses, and any fatal memory issues cause the process to exit quickly instead of limping along in an unhealthy state.None of this is exotic tuning—it's just a small set of sensible defaults that match how microservices are actually deployed on Kubernetes or other container platforms. In the next step, we’ll run this tuned image under the same 512 MB limit and the same load test as before, and then look at the metrics to see how the behavior changes compared to the naive configuration.

Run with same memory limit:

docker build -f Dockerfile-tuned -t jvm-demo-tuned .      # Build tuned image
docker run --rm -m 512m -p 8081:8080 jvm-demo-tuned       # Run tuned container

With the tuned image built, we’ll run it under exactly the same conditions as before so the comparison is fair. The commands below start the container with the same 512 MB memory limit and expose it on a different port, but otherwise nothing about the workload changes. We’ll hit the /work endpoint again using the same simple load script and then watch how the JVM behaves through the metrics we exposed earlier. By keeping the test scenario identical, any differences we see in heap usage, GC pauses, or request latency can be attributed to the JVM tuning rather than to changes in traffic or environment.

Hit /work again with the same load script and collect:

Heap used over time. GC pause counts and durations. Request latency.

jvm_memory_used_bytes{area="heap",...} jvm_gc_pause_seconds_count http_server_requests_seconds_max{uri="/work"}

In practice, you’ll probably visualize these metrics in Grafana or another dashboarding tool, but even a quick look at the raw Prometheus output is enough to spot the differences. For the naive setup, you should see heap usage creeping close to the container limit, more frequent or longer GC pauses, and higher max latency for the /work endpoint as the JVM struggles to keep up. With the tuned image, those same metrics should look noticeably healthier: heap stays within a safer band, GC pauses are shorter or less disruptive, and request latency becomes more predictable. With that context in mind, we can now compare the two configurations side by side and summarize what actually changed.

6. Compare naive vs tuned - what you actually show.

At this point we’ve run both versions of the service through the same test, so we can finally step back and compare how they behave. Rather than drowning in every single metric, it’s more useful to focus on a few practical questions: how close does each setup get to its memory limit, how often and how long does GC pause the world, and what does all of that do to request latency?When you line up the naive and tuned containers side by side with those questions in mind, the differences become much easier to see and to explain to your team. The summary below captures the kind of patterns you’re likely to observe and gives you a simple way to talk about why JVM tuning for containers actually matters in day‑to‑day production work.

Findings are :

Naive container:
- Heap grows near cgroup limit.
- Occasional OOMKills when load spikes.
- Longer GC pauses (e.g., > 800ms).
- P95 latency for /work around 400–500ms.
Tuned container:
- Heap capped at ~350MB inside 512MB container.
- No OOMKills in the same test.
- GC pauses capped around 200–250ms.
- P95 latency drops to ~250–300ms.
- CPU usage a bit smoother (less spike from stop-the-world GC).

These numbers will look a bit different in every environment, but the pattern is usually the same: the naive configuration feels fine until it suddenly doesn’t, while the tuned configuration behaves in a slower, more predictable way that operators can actually trust. Once you see the two side by side, it becomes much easier to justify a small amount of JVM tuning work whenever a new Spring Boot service is deployed to containers.Instead of reacting to random OOMKills or vague complaints about latency, you can point to concrete changes in heap usage, GC pauses, and request times that came from a handful of well‑chosen flags. From there, you can continue to refine the settings for your own traffic patterns, but even this basic level of tuning is a big step up from running everything on JVM defaults.

7. Checklist for readers

If you take nothing else away from this article, let it be this: the JVM is a powerful runtime, but it doesn’t magically understand the constraints of your containers unless you tell it how to behave. A small amount of deliberate configuration—container‑aware heap sizing, a sensible GC choice, basic metrics, and tests that run inside real memory limits—can turn a fragile Spring Boot deployment into something much more predictable.

Always run Java in containers with container‑aware settings (e.g. -XX:+UseContainerSupport and MaxRAMPercentage), instead of relying on JVM defaults.
Set explicit memory limits for every Spring Boot service and size the heap as a percentage of that limit so there’s room left for metaspace, threads, and native buffers.
Pick a GC that matches microservice workloads (G1GC is a good default) and set a realistic pause target rather than chasing “zero pauses.”
Expose JVM and HTTP metrics by default using Spring Boot Actuator and Micrometer, and make sure jvm_* and http_server_* metrics are scraped in your monitoring stack.
Test under load inside real containers, with the same memory limits and startup options you plan to use in production, before you ship a new service.

The checklist above is a simple starting point you can apply to every new microservice, and you can refine it over time as you learn more about your own traffic patterns and performance goals. The important part is to stop treating JVM tuning as an afterthought and start treating it as a normal, repeatable part of how you ship Java services to production.

Production‑Ready JVM Tuning for Containerized Spring Boot Microservices.

1. Intro – Why JVM tuning inside containers is different

2. Example service – simple Spring Boot REST API

Dockerfile—naive vs tuned

4. Add metrics - Spring Boot + Micrometer + Promethus

5. Tuned Dockerfile - container-aware JVM settings

6. Compare naive vs tuned - what you actually show.

7. Checklist for readers

Comments

More from this blog

Vert.x Worker Threads vs Event Loop Threads: What Actually Runs Where?

Building a Zero-Trust Service Mesh for Spring Boot Microservices: mTLS, RBAC, Policy Enforcement, and Real-World Debugging !!!

Why Local AI Stacks Get Hard Fast: Ollama, FastAPI, Vector Search, and the Operational Tradeoffs Nobody Mentions !!!

Designing Production-Ready AI Agents with MCP, FastAPI, and LangChain !!!

The Architecture Smell Nobody Names: Systems That Work Only Because Experienced Engineers Babysit Them

Command Palette

1. Intro – Why JVM tuning inside containers is different

2. Example service – simple Spring Boot REST API

Dockerfile—naive vs tuned

4. Add metrics - Spring Boot + Micrometer + Promethus

5. Tuned Dockerfile - container-aware JVM settings

6. Compare naive vs tuned - what you actually show.

7. Checklist for readers

Comments

More from this blog