Building a Private Local AI Agent with Java 21 and Spring Boot

I've just released minimal-java-agent, an open-source template for building AI-powered agents using Java 21 and Spring Boot 4.0.3. The focus is on keeping it minimal, performant, and 100% local.

AI chat using Postman on localhost

Tech Stack

Java 21 + Spring Boot 4.0.3: Latest LTS with production-ready features
Virtual Threads: Lightweight concurrency for high-throughput scenarios
Spring WebFlux: Non-blocking reactive stack for optimal resource utilization
Ollama + Spring AI: Integration with local LLMs via Spring's ChatClient abstraction
100% Test Coverage: Comprehensive unit tests with JaCoCo reporting
Multi-stage Dockerfile: Optimized container builds for deployment

Architecture The project follows a clean separation of concerns:

controller/: REST endpoints for agent interaction
component/: Core services (instance ID, thread info)
model/: Data transfer objects

Code Example The reactive controller uses WebFlux to handle requests asynchronously:

@PostMapping("/api/chat")
public Mono<ChatResponse> chat(@RequestBody String message) {
    return chatClient.prompt(message)
            .stream()
            .content()
            .collect(Collectors.joining())
            .map(llmResponse -> new ChatResponse(
                agentInfo.getInstanceId(),
                agentInfo.getAgentName(),
                agentInfo.getThreadInfo(),
                llmResponse
            ));
}

Local LLM Integration The agent connects to Ollama running locally. All processing happens on your machine — no data leaves your environment.

One-time setup:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3.2:1b-q4_0

Test the agent:

curl -X POST http://localhost:8080/api/chat \
  -H "Content-Type: text/plain" \
  -d "Hello"

Performance on Mac Mini M4

Model: llama3.2:1b-q4_0 (~600 MB)
Response speed: 30-40 tokens/second
Memory usage: ~1 GB total (Ollama + Spring Boot)
CPU usage: < 10% during inference

Key Features

✅ 100% local, no cloud dependencies
✅ Virtual threads for efficient concurrency
✅ Reactive streaming responses
✅ Full test coverage
✅ Docker-ready with multi-stage builds
✅ Clean, minimal architecture

Repository GitHub: github.com/carlosquijano/minimal-java-agent License: Apache 2.0

#Java #SpringBoot #AI #Ollama #OpenSource #VirtualThreads #WebFlux

Search This Blog

Android Dev - Tech Notes

Building a Private Local AI Agent with Java 21 and Spring Boot

Comments

Post a Comment

Popular posts from this blog

🤖 Minimal Android Project

🚀 The Ultimate Gradle Optimization on M4