Building a Private Local AI Agent with Java 21 and Spring Boot

I've just released minimal-java-agent, an open-source template for building AI-powered agents using Java 21 and Spring Boot 4.0.3. The focus is on keeping it minimal, performant, and 100% local.

AI chat using Postman on localhost
AI chat using Postman on localhost

Tech Stack

  • Java 21 + Spring Boot 4.0.3: Latest LTS with production-ready features
  • Virtual Threads: Lightweight concurrency for high-throughput scenarios
  • Spring WebFlux: Non-blocking reactive stack for optimal resource utilization
  • Ollama + Spring AI: Integration with local LLMs via Spring's ChatClient abstraction
  • 100% Test Coverage: Comprehensive unit tests with JaCoCo reporting
  • Multi-stage Dockerfile: Optimized container builds for deployment

Architecture The project follows a clean separation of concerns:

  • controller/: REST endpoints for agent interaction
  • component/: Core services (instance ID, thread info)
  • model/: Data transfer objects

Code Example The reactive controller uses WebFlux to handle requests asynchronously:

@PostMapping("/api/chat")
public Mono<ChatResponse> chat(@RequestBody String message) {
    return chatClient.prompt(message)
            .stream()
            .content()
            .collect(Collectors.joining())
            .map(llmResponse -> new ChatResponse(
                agentInfo.getInstanceId(),
                agentInfo.getAgentName(),
                agentInfo.getThreadInfo(),
                llmResponse
            ));
}

Local LLM Integration The agent connects to Ollama running locally. All processing happens on your machine — no data leaves your environment.

One-time setup:

docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
docker exec -it ollama ollama pull llama3.2:1b-q4_0

Test the agent:

curl -X POST http://localhost:8080/api/chat \
  -H "Content-Type: text/plain" \
  -d "Hello"

Performance on Mac Mini M4

  • Model: llama3.2:1b-q4_0 (~600 MB)
  • Response speed: 30-40 tokens/second
  • Memory usage: ~1 GB total (Ollama + Spring Boot)
  • CPU usage: < 10% during inference

Key Features

  • ✅ 100% local, no cloud dependencies
  • ✅ Virtual threads for efficient concurrency
  • ✅ Reactive streaming responses
  • ✅ Full test coverage
  • ✅ Docker-ready with multi-stage builds
  • ✅ Clean, minimal architecture

Repository GitHub: github.com/carlosquijano/minimal-java-agent License: Apache 2.0

#Java #SpringBoot #AI #Ollama #OpenSource #VirtualThreads #WebFlux


Comments

Popular posts from this blog