πŸ—ΊοΈ SPRING AI β€” COMPLETE COMPREHENSIVE ROADMAP

From Beginner to Advanced | 2025–2026 Edition

Version: 2025-2026 | Last Updated: March 2026 | Purpose: Educational and Development Roadmap

1. WHAT IS SPRING AI?

Spring AI is an application framework for AI Engineering built on top of the Spring ecosystem. It is the Java/Spring answer to Python's LangChain and LlamaIndex β€” designed to make enterprise-grade AI application development accessible to the world's largest base of Java developers.

1.1 Core Mission

  • Apply Spring ecosystem principles (portability, modularity, POJO-based design) to the AI domain.
  • Connect enterprise Data and APIs with AI Models through a clean, unified API.
  • Enable Java developers to build production-ready AI applications without switching languages.

1.2 Working Principle

Spring AI works on a Provider-Abstraction-Client model:

  1. AI Provider Layer β†’ OpenAI, Anthropic, Azure OpenAI, Google Gemini, Amazon Bedrock, Ollama, Mistral, etc.
  2. Spring AI Abstraction Layer β†’ Unified interfaces: ChatModel, EmbeddingModel, ImageModel, etc.
  3. Application Layer β†’ Your Spring Boot application using ChatClient, VectorStore, Advisors, Tools, etc.

The framework auto-configures AI model clients via Spring Boot starters. Developers interact with provider-agnostic interfaces, allowing seamless provider switching without business logic rewrites.

1.3 Key Design Principles

  • Portability: Switch AI providers without changing application code.
  • Modularity: Use only the AI features you need via starter dependencies.
  • POJO-based: Map AI outputs directly to Java objects (Structured Output).
  • Production-ready: Built-in observability, evaluation, memory, and ETL pipelines.

2. ARCHITECTURE DEEP-DIVE

2.1 High-Level Architecture Layers

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Your Spring Boot Application β”‚ β”‚ (REST APIs, Services, Repositories, Controllers) β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Spring AI Core Layer β”‚ β”‚ ChatClient β”‚ Advisors API β”‚ Tool Calling β”‚ β”‚ VectorStore β”‚ RAG Pipeline β”‚ Memory β”‚ β”‚ ETL Framework β”‚ Evaluation β”‚ Observability β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ Model Abstraction Interfaces β”‚ β”‚ ChatModel β”‚ EmbeddingModel β”‚ ImageModel β”‚ β”‚ AudioModel β”‚ ModerationModel β”‚ StreamingChatModelβ”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ AI Provider Integrations β”‚ β”‚ OpenAI β”‚ Anthropic β”‚ Azure OpenAI β”‚ Google β”‚ β”‚ Amazon Bedrock β”‚ Ollama β”‚ Mistral β”‚ Groq β”‚ β”‚ Hugging Face β”‚ Perplexity β”‚ ZhiPu β”‚ Moonshot β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

2.2 Core Components

ChatClient API
  • The primary entry point for AI interactions.
  • Fluent builder-style API (similar to WebClient/RestClient).
  • Supports system prompts, user prompts, conversation memory, and advisors.
  • Supports synchronous and reactive (streaming) responses.
ChatModel Interface
  • Provider-agnostic abstraction over any LLM.
  • Implementations: OpenAiChatModel, AnthropicChatModel, AzureOpenAiChatModel, OllamaChatModel, etc.
  • Returns ChatResponse containing Generation objects.
EmbeddingModel Interface
  • Converts text to vector embeddings.
  • Used by RAG pipelines to semantically index and search documents.
  • Implementations for OpenAI, Azure, Ollama, HuggingFace, etc.
VectorStore Interface
  • Stores and retrieves vector embeddings with semantic similarity search.
  • Supported Stores: PGVector, Chroma, Milvus, Redis, Pinecone, Weaviate, Qdrant, MongoDB Atlas, Neo4j, Oracle, Azure AI Search, OpenSearch, Apache Cassandra, Elasticsearch, GemFire.
Advisors API
  • Encapsulates recurring patterns: RAG, memory, logging, safety guardrails.
  • Key Advisors:
    • QuestionAnswerAdvisor: Injects retrieved context into prompts (RAG).
    • MessageChatMemoryAdvisor: Manages conversation history.
    • PromptChatMemoryAdvisor: Injects memory into prompts.
    • SafeGuardAdvisor: Blocks sensitive content.
    • SimpleLoggerAdvisor: Logs requests and responses.
    • ReReadingAdvisor: Implements Re-reading prompt technique.
Tool Calling / Function Calling
  • Enables AI models to invoke Java methods at runtime.
  • @Tool annotation marks Spring beans as callable tools.
  • Spring AI auto-generates JSON schema from method signatures.
  • Supports tool resolution, error handling, and result injection.
Document ETL Pipeline
  • Extract: DocumentReader implementations (PDF, Word, HTML, CSV, JSON, YouTube, GitHub, S3, etc.)
  • Transform: TextSplitter, MetadataEnricher, TokenCountEstimator, etc.
  • Load: VectorStore with embedding generation.
RAG (Retrieval-Augmented Generation)
  • Simple RAG: QuestionAnswerAdvisor with a VectorStore.
  • Modular RAG: Fully customizable pipeline β€” Query Analysis β†’ Retrieval β†’ Post-Retrieval β†’ Augmentation β†’ Generation.
  • Components: QueryTransformer, DocumentRetriever, DocumentPostProcessor, ContextualQueryAugmenter.
Memory Management
  • InMemoryChatMemoryRepository: Session-based in-memory storage.
  • JdbcChatMemoryRepository: Database-backed conversation persistence.
  • RedisChatMemoryRepository: Redis-backed memory (added in 2.0-M1).
  • CassandraChatMemoryRepository: Cassandra-backed memory.
Model Context Protocol (MCP)
  • Standard protocol connecting AI models to external tools and data sources.
  • Spring AI provides MCP Client and MCP Server implementations.
  • @Tool annotation automatically exposes Spring beans as MCP-compliant tools.
  • Supports stdio and HTTP-based SSE transports.
  • OAuth2-secured MCP server connections.
Observability
  • Built on Micrometer for metrics and tracing.
  • Tracks token usage, latency, model parameters, and request metadata.
  • Integrates with Zipkin, Jaeger, Prometheus, Grafana, and OpenTelemetry.
Structured Output
  • Maps LLM text responses to Java objects using BeanOutputConverter, MapOutputConverter.
  • Uses Jackson for JSON marshalling.
  • Enables type-safe AI responses as POJOs.
AI Model Evaluation
  • EvaluationRequest / EvaluationResponse model.
  • RelevancyEvaluator: Checks if response is relevant to the query.
  • FactCheckingEvaluator: Validates factual correctness.
  • Used for automated quality assurance of AI outputs.

2.3 Spring AI Agents Architecture

Agents combine Planning + Memory + Actions to solve user tasks autonomously.

Workflow Agents (Predictable)
  • LLMs and tools orchestrated through predefined, prescriptive paths.
  • Better for well-defined, repeatable tasks.
  • Components: Chain of tools, conditional branching, retry logic.
Autonomous Agents (Flexible)
  • LLMs decide which tools to use and in what order.
  • Better for open-ended, exploratory tasks.
  • Components: ReAct (Reason + Act) loop, tool pool, termination conditions.
Agent Patterns
  • ReAct Agent: Thought β†’ Action β†’ Observation loop.
  • Plan-and-Execute Agent: First plan all steps, then execute.
  • Reflection Agent: Self-evaluates and re-runs if response quality is low.
  • Multi-Agent: Multiple specialized agents collaborating via MCP.

3. HARDWARE & INFRASTRUCTURE REQUIREMENTS

3.1 Development Environment

Minimum Requirements (Local Development with Cloud APIs)
  • CPU: 4-core modern processor (Intel i5 / AMD Ryzen 5 or better)
  • RAM: 16 GB (8 GB minimum, 32 GB recommended for large projects)
  • Storage: 50 GB SSD free space
  • OS: Windows 10+, macOS 12+, Ubuntu 20.04+
  • Java: JDK 17 minimum (JDK 21+ recommended for virtual threads)
  • Build Tool: Maven 3.9+ or Gradle 8+
  • IDE: IntelliJ IDEA (recommended), VS Code + Java Extension Pack, Eclipse
For Local Model Inference (Ollama)
  • CPU: 8-core modern processor (Apple M-series or AMD Ryzen 7+)
  • RAM: 32 GB minimum (64 GB for large models like Llama 3.1 70B)
  • GPU: NVIDIA RTX 3060+ (12 GB VRAM) for GPU acceleration
    • RTX 3090 / RTX 4090 (24 GB VRAM): for 70B models
    • Apple M2 Ultra / M3 Max: unified memory handles 70B models efficiently
  • Storage: 100–500 GB SSD (models range from 4 GB to 140 GB)

3.2 Production Infrastructure

Cloud API–Based Deployment (Recommended for Most Teams)
  • AWS EC2 / Azure VM / GCP Compute: t3.medium to t3.xlarge (2–4 vCPU, 4–16 GB RAM)
  • Kubernetes: Recommended for scaling β€” HPA based on token usage metrics
  • Docker: Spring Boot containerized with Docker
  • Databases: PostgreSQL (pgvector), Redis, MongoDB Atlas for memory and vector storage
Self-Hosted LLM Inference (Enterprise)
  • GPU Servers: NVIDIA A100 (80 GB), H100 (80 GB), or RTX A6000 (48 GB)
  • Memory: 256–512 GB RAM for large inference clusters
  • Network: 10 Gbps+ internal networking for distributed inference
  • Software: vLLM, Ollama, llama.cpp, TGI (Text Generation Inference), Triton Inference Server
Vector Database Infrastructure
  • PGVector: Extensions on existing PostgreSQL β€” minimal additional hardware
  • Pinecone / Weaviate Cloud: SaaS β€” no hardware management
  • Chroma: Lightweight, good for development; needs 8 GB+ RAM in production
  • Milvus Distributed: Kubernetes cluster β€” requires etcd + MinIO + multiple nodes

3.3 Networking & Security Requirements

  • Outbound HTTPS (443): Required for cloud AI API calls (OpenAI, Anthropic, Azure, AWS)
  • API Keys: Stored in environment variables or secrets management (HashiCorp Vault, AWS Secrets Manager)
  • mTLS: For secure MCP server connections
  • OAuth2: For MCP server authentication (Spring AI 1.1+)
  • Rate Limiting: Configure per API provider's rate limits (tokens per minute, requests per minute)
  • VPC/Private Networking: For enterprise deployments connecting on-premises databases

4. STRUCTURED LEARNING PATH

PHASE 0 β€” FOUNDATIONS (Weeks 1–3)

4.0.1 Java & Spring Boot Prerequisites
  • Java 17+ features: Records, Sealed Classes, Pattern Matching, Text Blocks, Virtual Threads
  • Spring Boot 3.x: Auto-configuration, Starters, Application Properties, Profiles
  • Spring Web MVC: @RestController, @GetMapping, @PostMapping, ResponseEntity
  • Spring WebFlux: Reactive Streams, Mono, Flux (for streaming AI responses)
  • Spring Data JPA: Repositories, Entities, JPQL (for memory persistence)
  • Maven / Gradle: Dependency management, BOM imports, build lifecycle
  • Docker Basics: Containers, images, docker-compose for local services
4.0.2 AI/ML Concepts for Developers
  • What are Large Language Models (LLMs)? Transformers, tokens, context windows
  • Prompt Engineering: System prompts, user prompts, few-shot examples, chain-of-thought
  • Temperature & Sampling: What temperature, top-p, and top-k mean for output quality
  • Embeddings: What are vector embeddings? Semantic similarity, cosine distance
  • Tokens & Pricing: How LLM APIs charge per token; input vs output tokens
  • Hallucinations: What they are, why they happen, how to mitigate them
  • RAG Basics: Why retrieval-augmented generation reduces hallucinations
  • Fine-Tuning vs Prompting: When to use each approach
4.0.3 Spring AI Introduction
  • What Spring AI is and what problems it solves
  • Comparison with LangChain (Python) and LangChain4j (Java)
  • Spring AI project structure, GitHub repository, documentation
  • Spring Initializr: Creating a Spring AI project at start.spring.io
  • Adding Spring AI BOM and starter dependencies
  • Basic project setup: API key configuration in application.yaml

PHASE 1 β€” CORE FUNDAMENTALS (Weeks 4–7)

4.1.1 ChatClient & ChatModel
  • ChatClient.Builder auto-configuration
  • Creating a simple chat service: prompt β†’ response
  • System prompt configuration: setting AI persona and behavior
  • User prompt construction: PromptTemplate, dynamic variable substitution
  • Synchronous chat: call().content()
  • Streaming chat: stream().content() returning Flux
  • Response metadata: token usage, model name, finish reason
  • ChatOptions: temperature, maxTokens, topP, stop sequences per-request
4.1.2 Prompt Engineering in Spring AI
  • PromptTemplate: Parameterized prompts with {variable} substitution
  • SystemPromptTemplate: Configuring AI behavior and persona
  • Few-shot prompting: Providing examples in prompts
  • Chain-of-thought prompting: Getting AI to reason step-by-step
  • Output format instructions: JSON, XML, structured formats
  • Loading prompts from classpath resources (.st files)
  • Message types: SystemMessage, UserMessage, AssistantMessage, ToolResponseMessage
4.1.3 Structured Output
  • BeanOutputConverter: Map LLM output to Java POJOs
  • MapOutputConverter: LLM output to Map
  • ListOutputConverter: LLM output to List
  • Using @JsonProperty and @JsonDescription on output POJOs
  • Error handling for malformed LLM output
  • Combining structured output with validation (Bean Validation API)
4.1.4 AI Model Providers Configuration
  • OpenAI: GPT-4o, GPT-4o-mini, GPT-5, GPT-5-mini configuration
  • Anthropic: Claude Opus, Sonnet, Haiku configuration
  • Azure OpenAI: Deployment names, endpoints, API versions
  • Ollama: Local model setup, model pulling, endpoint configuration
  • Google Vertex AI Gemini: Project, location, model configuration
  • Amazon Bedrock: AWS credentials, region, model IDs
  • Mistral AI: API key, model selection
  • Groq: Ultra-fast inference configuration

PHASE 2 β€” EMBEDDINGS & VECTOR STORES (Weeks 8–11)

4.2.1 Embeddings
  • What are embeddings and why they matter for AI applications
  • EmbeddingModel interface and call() method
  • OpenAI Embeddings: text-embedding-3-small vs text-embedding-3-large
  • Dimensionality: 1536-dim vs 3072-dim vectors
  • Batch embedding: Embed multiple texts efficiently
  • EmbeddingRequest / EmbeddingResponse model
  • Cosine similarity: How to compare embedding vectors manually
  • Use cases: Semantic search, deduplication, clustering, classification
4.2.2 Vector Stores
  • VectorStore interface: add(), similaritySearch(), delete()
  • SearchRequest: query, topK, similarityThreshold, metadata filters
  • SimpleVectorStore: In-memory, for development and testing
  • PGVector Setup: PostgreSQL with pgvector extension, Spring Data integration
  • Redis Vector Store: Configuration and metadata filtering
  • Chroma: Docker setup, collection management
  • Milvus: Cloud and self-hosted configuration
  • Pinecone: Cloud vector database integration
  • Weaviate: Schema-less vector store with hybrid search
  • MongoDB Atlas Vector Search: Atlas cluster configuration
  • Metadata Filtering: Type-safe metadata filter expressions
4.2.3 Document Processing (ETL Pipeline)
  • DocumentReader implementations:
    • TextReader, JsonReader, CsvReader
    • PdfDocumentReader (Apache PDFBox, Tika)
    • TikaDocumentReader: Handles 1000+ file formats
    • WordDocumentReader, PowerPointDocumentReader
    • HtmlDocumentReader, MarkdownDocumentReader
    • GithubDocumentReader: Reading from repositories
    • YouTubeDocumentReader: Transcript extraction
    • S3DocumentReader, AzureBlobStorageReader, GoogleCloudStorageReader
    • KafkaDocumentReader, MongoDocumentReader, JdbcDocumentReader
  • TextSplitter implementations:
    • TokenTextSplitter: Split by token count (recommended)
    • CharacterTextSplitter: Split by character count
    • SentenceTransformersTokenTextSplitter: Semantic sentence boundary splitting
    • RecursiveCharacterTextSplitter: Hierarchical splitting strategy
  • Document Transformers:
    • MetadataEnricher: Add custom metadata fields
    • SummaryMetadataEnricher: Generate summaries using LLM and store as metadata
    • KeywordMetadataEnricher: Extract keywords and store as metadata
    • ContentFormatTransformer: Normalize content formats

PHASE 3 β€” RAG & ADVISORS (Weeks 12–16)

4.3.1 Basic RAG with QuestionAnswerAdvisor
  • QuestionAnswerAdvisor: Automatic context injection from VectorStore
  • Configuring retrieval: topK, similarity threshold, metadata filters
  • Custom prompt templates: DEFAULT_USER_TEXT_ADVISE and DEFAULT_SYSTEM_TEXT_ADVISE
  • Dynamic metadata filter expressions: runtime filter construction
  • Combining multiple vector stores in RAG queries
4.3.2 Modular RAG Architecture

Modular RAG pipeline stages:

  1. Query Analysis & Transformation
  2. Document Retrieval
  3. Post-Retrieval Processing
  4. Augmentation
  5. Generation
  • Query Transformation:
    • ReWriteQueryTransformer: Rewrites user queries for better retrieval
    • TranslationQueryTransformer: Translates queries to match document language
    • MultiQueryExpander: Expands one query into multiple for broader retrieval
    • CompressionQueryTransformer: Compresses context + query for follow-up questions
    • StepBackQueryTransformer: Generates more abstract "step back" queries
  • Document Retrieval:
    • VectorStoreDocumentRetriever: Semantic vector-based retrieval
    • BM25/Keyword Retrieval: Lexical search integration
    • Hybrid Retrieval: Combining vector + keyword (RRF fusion)
  • Post-Retrieval Processing:
    • DocumentRanker: Re-rank documents by relevance (Cohere Rerank integration)
    • DuplicateContentFilter: Removes semantically duplicate documents
    • TokenBudgetContentFilter: Limits context to a token budget
    • ConcatenationDocumentJoiner: Merges documents from multiple retrievers
  • Augmentation:
    • ContextualQueryAugmenter: Injects retrieved context into the prompt
    • RetrievalAugmentationAdvisor: Wires together modular RAG pipeline
4.3.3 Chat Memory & Conversation History
  • ChatMemory interface: add(), get(), clear()
  • InMemoryChatMemoryRepository: Development use
  • JdbcChatMemoryRepository: Persistent conversation history
  • RedisChatMemoryRepository: Distributed memory
  • MessageChatMemoryAdvisor: Adds memory to ChatClient conversations
  • PromptChatMemoryAdvisor: Injects memory into system prompt
  • Memory window size: Configuring how many past messages to include
4.3.4 Custom Advisors
  • CallAroundAdvisor interface for synchronous advisors
  • StreamAroundAdvisor interface for reactive advisors
  • AdvisedRequest and AdvisedResponse models
  • Advisor ordering with getOrder()
  • Building a custom safety guardrail advisor
  • Building a custom caching advisor
  • Building a custom logging and audit advisor
  • Advisor chains and composition

PHASE 4 β€” TOOL CALLING & AGENTS (Weeks 17–22)

4.4.1 Tool Calling Fundamentals
  • @Tool annotation: Exposing Java methods as AI tools
  • @ToolParam annotation: Describing tool parameters for the AI
  • Tool description: Writing clear descriptions that guide AI tool selection
  • Return value handling: String, POJO, void tools
  • ToolContext: Passing application context to tools at runtime
  • Tool error handling: Exception translation and error messages
4.4.2 Built-in Tool Integrations
  • WebSearchTool: Real-time web search
  • WikipediaTool: Wikipedia lookup
  • WeatherTool: Weather data retrieval
  • CalendarTool: Calendar integration
  • DallETool: Image generation from within a conversation
4.4.3 Spring AI Agents
  • Agent interface: Agent.call() and Agent.stream()
  • ReAct Agent (Reasoning + Acting):
    • Thought β†’ Action β†’ Observation loop
    • Tool selection reasoning
    • Termination conditions
    • Maximum iterations configuration
  • Plan-and-Execute Agent:
    • Planning phase: Decomposing complex tasks into steps
    • Execution phase: Executing each step with appropriate tools
    • Replanning: Handling failed steps
  • Chat Agent:
    • Stateful conversation with tool access
    • Memory integration
    • Multi-turn reasoning
4.4.4 Model Context Protocol (MCP)
  • MCP Client: Connecting to external MCP servers
  • MCP Server: Exposing Spring application as an MCP server
  • spring-ai-starter-mcp-client: Adding MCP client capability
  • spring-ai-starter-mcp-server: Exposing Spring beans as MCP server
  • Tool discovery: Listing available tools from MCP servers
  • Resource access: Files, databases, APIs via MCP resources
  • Multi-transport: stdio transport for local tools, SSE for remote
  • OAuth2 MCP authentication (Spring AI 1.1+)
  • Protocol versioning: 2024-11-05 and 2025-03-26 versions

PHASE 5 β€” MULTIMODAL & ADVANCED MODELS (Weeks 23–27)

4.5.1 Image Generation
  • ImageModel interface and ImagePrompt
  • OpenAI DALL-E 3: size, quality, style options
  • Stability AI: Image generation with style prompts
  • Azure OpenAI DALL-E: Azure-hosted image generation
  • ImageResponse and handling base64 / URL responses
  • Batch image generation
4.5.2 Multimodal Chat (Vision)
  • Passing images to chat models: UserMessage with Media
  • Media class: Data URI, URL, file path
  • Supported vision models: GPT-4o, Claude 3.x, Gemini Pro Vision
  • Document analysis: PDF/image document understanding
  • Video frame analysis (Gemini)
  • Use cases: Receipt parsing, diagram explanation, chart analysis
4.5.3 Audio
  • AudioTranscriptionModel: Speech-to-text
  • OpenAI Whisper integration
  • AudioSpeechModel: Text-to-speech
  • OpenAI TTS models: tts-1, tts-1-hd
  • Voice options: alloy, echo, fable, onyx, nova, shimmer
  • Streaming audio responses
4.5.4 Moderation
  • ModerationModel: Content safety classification
  • OpenAI Moderation API integration
  • Custom moderation pipelines with Advisor pattern
  • Combining moderation with SafeGuardAdvisor

PHASE 6 β€” PRODUCTION, OBSERVABILITY & ADVANCED PATTERNS (Weeks 28–36)

4.6.1 Observability & Monitoring
  • Micrometer integration: Automatic metrics on AI calls
  • Key metrics: token.usage, latency, model, operation.name
  • Tracing: Distributed tracing with Zipkin/Jaeger
  • Prometheus + Grafana: AI dashboard setup
  • OpenTelemetry: Vendor-neutral observability
  • Cost tracking: Monitor token spend per endpoint/user
  • Spring Boot Actuator: Health checks for AI model connectivity
4.6.2 AI Model Evaluation Framework
  • EvaluationRequest / EvaluationResponse model
  • RelevancyEvaluator: Is the answer relevant to the question?
  • FactCheckingEvaluator: Is the answer factually grounded in context?
  • Custom evaluators: Building domain-specific evaluators
  • Automated test suites for RAG pipelines
  • Regression testing: Detecting quality degradation on code changes
  • A/B testing: Comparing two AI configurations
4.6.3 Security & Safety
  • API key management: Environment variables, Spring Cloud Vault
  • Rate limiting AI endpoints: Bucket4j, Resilience4j integration
  • Input sanitization: Preventing prompt injection attacks
  • Output filtering: SafeGuardAdvisor, custom content filters
  • PII redaction: Removing sensitive data from prompts and logs
  • Audit logging: Full request/response audit trails
  • GDPR compliance: Data retention policies for conversation memory
4.6.4 Performance Optimization
  • Caching: Spring Cache on embedding generation results
  • Async processing: @Async for non-blocking AI calls
  • Connection pooling: HTTP client configuration for AI APIs
  • Streaming responses: Reactive endpoint delivery
  • Batch embedding: Processing documents in batches
  • Model selection: Choosing the right model tier for the task
  • Context window management: Summarization for long conversations
  • Token optimization: Prompt compression techniques
4.6.5 Spring Boot Native Image (GraalVM)
  • AOT (Ahead-of-Time) compilation improvements in Spring AI 2.0
  • GraalVM native image support for Spring AI apps
  • Performance benefits: Sub-second startup, reduced memory footprint
  • Limitations and workarounds for reflection-heavy AI operations
4.6.6 Testing Spring AI Applications
  • MockChatModel: Mock AI responses in unit tests
  • TestcontainersOllamaService: Integration tests with Ollama
  • VectorStore testing: In-memory vector stores for tests
  • Advisor testing: Verifying advisor chain behavior
  • Evaluation-driven testing: Using AI evaluators in test assertions
  • WireMock: Mocking external AI API endpoints

5. ALGORITHMS, TECHNIQUES & TOOLS

5.1 Core AI Algorithms Used

Embedding & Similarity
  • Cosine Similarity: Primary metric for semantic search in vector stores
  • Dot Product Similarity: Alternative to cosine for normalized embeddings
  • Euclidean Distance (L2): Distance-based similarity
  • ANN (Approximate Nearest Neighbor) Search: HNSW algorithm in vector databases
  • BM25: TF-IDF based lexical search for hybrid retrieval
Retrieval & Ranking
  • RAG (Retrieval-Augmented Generation): Ground AI responses in retrieved context
  • HyDE (Hypothetical Document Embeddings): Generate hypothetical answers to improve retrieval
  • Multi-Query Retrieval: Expand user query into multiple variants for broader recall
  • Step-Back Prompting: Generate abstract questions for better concept retrieval
  • RRF (Reciprocal Rank Fusion): Combine rankings from multiple retrievers
  • Contextual Compression: Compress retrieved documents to relevant snippets only
  • Cross-Encoder Reranking: Neural reranking of retrieved documents (Cohere Rerank)
Prompt Engineering Techniques
  • Zero-shot: Direct question without examples
  • Few-shot: Provide examples to guide output format
  • Chain-of-Thought (CoT): "Let's think step by step"
  • Self-Consistency: Generate multiple answers, take majority vote
  • Reflection / Self-Critique: AI evaluates and refines its own output
  • Role Prompting: "You are an expert in X"
  • Structured Output: "Respond only in JSON format"
  • Tree-of-Thought (ToT): Explore multiple reasoning paths
Agent Reasoning Patterns
  • ReAct (Reasoning + Acting): Interleave thought and action steps
  • Plan-and-Execute: Explicit planning phase before execution
  • Reflection: Loop where agent critiques its own output
  • Multi-Agent Debate: Multiple agents argue to reach better conclusions
  • Tool Augmented Generation: Invoking external tools to ground responses

5.2 Spring AI–Specific Techniques

Context Management
  • Sliding Window Memory: Keep last N messages in context
  • Summary Memory: Summarize old messages to save tokens
  • Entity Memory: Extract and store key entities from conversations
  • Semantic Chunking: Chunk documents at semantic boundaries
  • Parent-Child Chunking: Store small chunks for retrieval but pass large parent chunks to LLM
Optimization Techniques
  • Prompt Caching: Cache frequently used system prompts (Claude's prompt caching)
  • Speculative Decoding: Faster inference using draft models
  • Quantization Awareness: Choosing right model precision (FP16 vs INT4) in Ollama
  • Streaming: Deliver AI tokens to client as they are generated (reduces TTFB)
  • Batching: Process multiple embedding requests together

5.3 Major Tools & Technologies

Spring AI Ecosystem
  • spring-ai-bom: Bill of Materials for dependency management
  • spring-ai-openai-spring-boot-starter
  • spring-ai-anthropic-spring-boot-starter
  • spring-ai-ollama-spring-boot-starter
  • spring-ai-vertex-ai-gemini-spring-boot-starter
  • spring-ai-bedrock-converse-spring-boot-starter
  • spring-ai-pgvector-store-spring-boot-starter
  • spring-ai-redis-store-spring-boot-starter
  • spring-ai-chroma-store-spring-boot-starter
  • spring-ai-milvus-store-spring-boot-starter
  • spring-ai-starter-mcp-server
  • spring-ai-starter-mcp-client
  • spring-ai-tika-document-reader
  • spring-ai-pdf-document-reader
AI Model Providers
  • OpenAI API: GPT-4o, GPT-5, DALL-E 3, Whisper, TTS, Embeddings
  • Anthropic API: Claude 3.5/3.7 Sonnet, Claude 3 Opus, Claude 3 Haiku
  • Google Vertex AI: Gemini 1.5 Pro, Gemini 2.0, Gemini Ultra
  • Amazon Bedrock: Claude, Titan, Llama, Mistral on AWS
  • Azure OpenAI: GPT-4o on Microsoft Azure
  • Ollama: Local LLM inference (Llama 3.x, Phi-4, Mistral, Gemma, Qwen)
  • Groq: Ultra-low latency inference
  • Mistral AI: Mistral Large, Mixtral models
  • Hugging Face: Open-source model inference
Vector Databases
  • PGVector: PostgreSQL extension (best for teams already on Postgres)
  • Chroma: Lightweight, developer-friendly
  • Milvus: High-performance, cloud-native
  • Pinecone: Fully managed cloud vector database
  • Weaviate: Multi-modal, hybrid search
  • Qdrant: Rust-based, high performance
  • Redis Vector: If already using Redis
  • MongoDB Atlas: If already using MongoDB
  • Neo4j Vector: If using graph databases
Supporting Infrastructure
  • Docker / Docker Compose: Local service orchestration
  • Kubernetes: Production container orchestration
  • Testcontainers: Integration testing with real containers
  • Prometheus + Grafana: Metrics visualization
  • Zipkin / Jaeger: Distributed tracing
  • HashiCorp Vault: Secret management for API keys
  • PostgreSQL: Conversation memory persistence + PGVector
  • Redis: Session memory, caching, rate limiting
  • Apache Kafka: Event streaming for AI pipelines
  • Spring Cloud Config: Centralized AI configuration management
Build & Developer Tools
  • IntelliJ IDEA + Spring Boot Plugin
  • start.spring.io: Project scaffolding
  • Spring CLI: Rapid project generation
  • OpenRewrite: Automated migration between Spring AI versions
  • Arconia Spring AI Migrations: Migration recipes for Spring AI upgrades
  • Maven 3.9+ / Gradle 8+: Build tools

6. DESIGN & DEVELOPMENT PROCESS

6.1 Forward Design Process (Scratch to Advanced)

Stage 1: Problem Definition & Requirements
  • Define the AI use case: Chat, RAG, Agent, or multimodal
  • Identify data sources: Documents, databases, APIs
  • Select AI provider: Cloud vs local, cost vs capability
  • Define quality requirements: Response latency, accuracy, safety
  • Map data flow: User β†’ Application β†’ AI β†’ Response
Stage 2: Project Setup
  1. Go to start.spring.io
  2. Select: Spring Boot 3.4+, Java 21, Maven/Gradle
  3. Add dependencies: Spring Web, Spring AI (choose model starter), Spring Data JPA (if needed)
  4. Configure application.yaml: API keys, model options, vector store connections
  5. Set up Docker Compose for local services (PGVector, Redis, Chroma)
Stage 3: Core AI Layer Development
  1. Configure ChatModel bean (auto-configured via starter)
  2. Build ChatClient with system prompt, memory, and advisors
  3. Define tool beans with @Tool and @Component
  4. Implement PromptTemplate for dynamic prompt construction
  5. Add streaming endpoint for real-time response delivery
Stage 4: Data Ingestion Pipeline
  1. Choose DocumentReader(s) for data sources
  2. Configure TextSplitter (TokenTextSplitter with 512-token chunks, 50 overlap)
  3. Configure EmbeddingModel (OpenAI text-embedding-3-small recommended)
  4. Configure VectorStore (PGVector for production)
  5. Build ingestion service: Read β†’ Split β†’ Embed β†’ Store
  6. Run ingestion pipeline: CLI runner, scheduled job, or event-driven
Stage 5: RAG Query Pipeline
  1. Add QuestionAnswerAdvisor to ChatClient with VectorStore
  2. Configure topK and similarityThreshold
  3. Add metadata filters for document access control
  4. For advanced RAG: compose modular pipeline components
  5. Add query transformation if simple retrieval isn't sufficient
Stage 6: API Layer
  1. Build REST controller with ChatClient injection
  2. Add streaming endpoint using Flux
  3. Add ingestion endpoint for document upload
  4. Add conversation management endpoints (start, continue, clear)
  5. Add authentication/authorization (Spring Security)
Stage 7: Observability & Evaluation
  1. Add Micrometer + Prometheus dependencies
  2. Configure token usage metrics collection
  3. Build evaluation test suite with RelevancyEvaluator
  4. Set up Grafana dashboard for AI KPIs
  5. Add structured logging with trace IDs
Stage 8: Production Hardening
  1. Add rate limiting per user/API key
  2. Implement circuit breakers for AI API calls (Resilience4j)
  3. Add retry logic with exponential backoff
  4. Configure API key rotation
  5. Add input/output content filtering
  6. Add cost alerts and budget limits

6.2 Reverse Engineering Method

Reverse engineering an existing AI application built with Spring AI:

Step 1: Map the Entry Points
  • Find all @RestController classes handling AI-related routes
  • Identify ChatClient or ChatModel injection points
  • Trace the request flow from HTTP endpoint to AI call
Step 2: Understand the Prompt Architecture
  • Find all PromptTemplate, @Value loaded prompts, and SystemMessage configurations
  • Understand the system persona, instructions, and constraints
  • Identify all {variable} substitution points
  • Check for multi-turn conversation memory configuration
Step 3: Identify the Retrieval Pipeline
  • Find VectorStore beans and their configuration
  • Identify QuestionAnswerAdvisor or custom RAG advisors
  • Trace document ingestion: DocumentReader β†’ TextSplitter β†’ VectorStore
  • Check metadata filtering strategy
Step 4: Map Tool Definitions
  • Find all @Tool-annotated methods
  • Understand what actions the AI can take in the system
  • Identify tool result handling and error scenarios
Step 5: Trace the Advisor Chain
  • List all Advisor beans and their order
  • Understand what each advisor adds or modifies
  • Identify memory, safety, logging, and RAG advisors
Step 6: Identify Configuration
  • application.yaml: model provider, model name, temperature, tokens
  • Vector store connection: host, port, collection/table name
  • Memory store: type and configuration
  • Observability: metrics, tracing configuration
Step 7: Reproduce & Modify
  • Replicate core functionality in a test environment
  • Substitute components with alternatives (e.g., swap OpenAI for Ollama)
  • Add or remove advisors to change behavior
  • Experiment with different chunking strategies

7. CUTTING-EDGE DEVELOPMENTS (2025–2026)

7.1 Spring AI 1.0 GA (May 2025)

  • First production-ready release with stable APIs
  • MCP Client and Server GA β€” connect Spring apps to any MCP tool ecosystem
  • Modular RAG pipeline with all components
  • Full advisor API with ordering and composition
  • Comprehensive vector store support (15+ providers)
  • Agent framework: workflow and autonomous agent implementations

7.2 Spring AI 1.1 (Late 2025)

  • OAuth2-secured MCP server connections
  • Multi-protocol MCP version negotiation (2024-11-05 and 2025-03-26)
  • Deep integration with latest MCP Java SDK
  • Redis-based chat memory repository
  • Enhanced observability hooks
  • Additional model provider integrations

7.3 Spring AI 2.0 (2026)

  • Built on Spring Boot 4.0 and Spring Framework 7.0
  • GraalVM native image AOT compilation improvements (contributed by Netflix/Bedrin)
  • Official OpenAI Java SDK native integration
  • Kotlin 2.2.x compatibility
  • Default model updated to GPT-5-mini
  • Removal of default temperature β€” explicit configuration required
  • Testcontainers 2.0 integration

7.4 Industry Trends Influencing Spring AI Roadmap

Agentic AI (2025 is the Year of Agents)
  • Multi-agent orchestration frameworks
  • Agent memory: episodic, semantic, and procedural memory types
  • Agent safety: bounded execution, resource limits, human-in-the-loop
  • Computer Use: Agents controlling desktop/browser (Spring AI in Chrome, Excel, PowerPoint)
Model Context Protocol Ecosystem
  • MCP becoming the standard for AI tool interoperability
  • Hundreds of MCP servers in the community ecosystem
  • Spring AI as a first-class MCP citizen (client + server)
Open-Weight Model Surge
  • Llama 3.x, Phi-4, Gemma 3, Qwen 2.5, Mistral 24B rivaling closed models
  • Ollama + Spring AI enabling full local AI stacks
  • Hybrid deployments: local for privacy-sensitive data, cloud for complex reasoning
Multimodal Expansion
  • Vision + Text becoming standard for enterprise AI apps
  • Audio transcription + synthesis in agent workflows
  • PDF and document intelligence as first-class use case
RAG Evolution
  • Moving beyond simple RAG to GraphRAG (knowledge graph + vector)
  • Agentic RAG: AI decides when and how to retrieve
  • Long-context models reducing (but not eliminating) need for retrieval
  • Reranking as standard practice for production RAG

8. BUILD IDEAS: BEGINNER TO ADVANCED

🟒 BEGINNER LEVEL (Phase 1–2 Skills)

Build 1: AI Personal Assistant API Beginner

Goal: Simple ChatClient with a system prompt defining a helpful assistant persona Skills: ChatClient, PromptTemplate, streaming

Build 2: AI Text Summarizer Beginner

Goal: Accept long text or URL, return structured summary Skills: Structured Output, BeanOutputConverter, PromptTemplate

Build 3: Multi-Language Translator Beginner

Goal: Accept text + target language, return translation Skills: ChatClient, PromptTemplate, Structured Output

Build 4: Code Reviewer Bot Beginner

Goal: Accept code snippet, return analysis Skills: ChatClient, Structured Output, prompt engineering

Build 5: AI FAQ Generator Beginner

Goal: Accept a document, generate a structured FAQ Skills: ChatClient, Document reading, Structured Output

🟑 INTERMEDIATE LEVEL (Phase 3–4 Skills)

Build 6: Document Q&A System (RAG) Intermediate

Goal: Upload PDFs/documents, ingest, query with grounded answers Skills: DocumentReader, TokenTextSplitter, EmbeddingModel, PGVector, QuestionAnswerAdvisor

Build 7: Conversational Customer Support Bot Intermediate

Goal: Multi-turn conversation with memory and RAG Skills: ChatMemory, QuestionAnswerAdvisor, Tool Calling, MessageChatMemoryAdvisor

Build 8: AI-Powered Code Generation Assistant Intermediate

Goal: Accept user description, generate and review code Skills: Tool Calling, Multi-step prompting, Structured Output

Build 9: Research Assistant with Web Search Intermediate

Goal: Agent searches web for current information and summarizes Skills: Agents, WebSearchTool, ReAct pattern, Tool Calling

Build 10: Multi-Source Knowledge Base Intermediate

Goal: Ingest from multiple sources with metadata filtering Skills: Multiple DocumentReaders, metadata filtering, modular RAG

Build 11: AI Email Assistant Intermediate

Goal: Connect to email via MCP, classify, summarize, draft responses Skills: MCP integration, Tool Calling, Structured Output

πŸ”΄ ADVANCED LEVEL (Phase 5–6 Skills)

Build 12: Autonomous Research Agent Advanced

Goal: Given research topic, autonomously search, ingest, cross-reference, report Skills: Autonomous Agents, dynamic RAG, Tool Calling, multi-step planning

Build 13: Enterprise Document Intelligence Platform Advanced

Goal: Multi-tenant document management with RBAC and analytics Skills: Multi-tenancy, metadata filtering, Observability, Security

Build 14: AI-Powered Data Analysis Platform Advanced

Goal: Natural language to SQL, chart generation, anomaly detection Skills: MCP, Tool Calling, Structured Output, Agents, multimodal output

Build 15: Multimodal AI Expense Tracker Advanced

Goal: Upload receipt images β†’ Extract data via vision, categorize, report Skills: Multimodal (vision), Structured Output, Tool Calling, ChatMemory

Build 16: AI Code Review & CI/CD Integration Advanced

Goal: GitHub webhook integration, automated PR review with suggestions Skills: MCP, Agent, Tool Calling, Evaluation framework

Build 17: Multi-Agent Legal Document Analyzer Advanced

Goal: Multiple specialized agents extract, identify risks, compare, summarize Skills: Multi-agent architecture, MCP, RAG, Structured Output

Build 18: Production AI Platform with Full Observability Advanced

Goal: Complete API gateway with auth, rate limiting, tracing, A/B testing Skills: All Phase 6 skills, full production stack

9. FLOW DIAGRAMS & REFERENCE STRUCTURES

9.1 Basic Chat Flow

User Request β”‚ β–Ό HTTP POST /api/chat {message: "..."} β”‚ β–Ό ChatController β”‚ β–Ό ChatClient.builder() .defaultSystem("You are a helpful assistant") .defaultAdvisors(advisor1, advisor2) β”‚ β–Ό Advisor Chain Processing (pre-call) β†’ Memory Advisor: Inject conversation history β†’ RAG Advisor: Retrieve and inject relevant context β†’ Safety Advisor: Check input for harmful content β”‚ β–Ό ChatModel.call(Prompt) β†’ AI Provider API β”‚ β–Ό Advisor Chain Processing (post-call) β†’ Memory Advisor: Save new messages to memory β†’ Logging Advisor: Log request and response β”‚ β–Ό ChatResponse β†’ Generation β†’ Content β”‚ β–Ό HTTP Response {answer: "..."}

9.2 RAG Pipeline Flow

INGESTION (Offline) ────────────────── DocumentReader(s) β†’ Extract raw text/metadata from source β”‚ β–Ό TextSplitter β†’ Split into chunks (e.g., 512 tokens, 50 overlap) β”‚ β–Ό MetadataEnricher β†’ Add source, date, department, etc. β”‚ β–Ό EmbeddingModel.embed() β†’ Convert chunks to vectors β”‚ β–Ό VectorStore.add() β†’ Persist embeddings QUERY (Runtime) ─────────────── User Query β”‚ β–Ό QueryTransformer β†’ Rewrite/expand/translate query β”‚ β–Ό EmbeddingModel.embed() β†’ Embed user query β”‚ β–Ό VectorStore.similaritySearch() β†’ Top-K similar chunks β”‚ β–Ό DocumentReranker β†’ Reorder by cross-encoder score β”‚ β–Ό ContextualQueryAugmenter β†’ Inject context into prompt β”‚ β–Ό ChatModel.call() β†’ Generate answer grounded in context β”‚ β–Ό Answer + Source Citations

9.3 Agent ReAct Loop

User Task / Goal β”‚ β–Ό Agent.call() β”‚ β–Ό β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚ ReAct Loop β”‚ β”‚ β”‚ β”‚ THINK: Analyze task β”‚ β”‚ Select next action β”‚ β”‚ β””β†’ Terminate? β†’ Answer β”‚ β”‚ β”‚ β”‚ β”‚ ACT: Call tool(s) β”‚ β”‚ Execute action β”‚ β”‚ β”‚ β”‚ β”‚ OBSERVE: Process tool result β”‚ β”‚ Update working memory β”‚ β”‚ β””β†’ Loop back to THINK β”‚ β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚ β–Ό Final Answer

9.4 MCP Architecture

Your Spring Boot App β”‚ β”œβ”€β”€ MCP Client ──────────────────► External MCP Server A (Filesystem) β”‚ (stdio or HTTP/SSE transport) β”œβ”€β”€ MCP Client ──────────────────► External MCP Server B (GitHub) β”‚ └── MCP Server ◄────────────────── Claude Desktop / Cursor / Other AI (Exposes @Tool beans as MCP)

9.5 Spring AI Module Dependencies Map

Core Modules β”œβ”€β”€ spring-ai-core (interfaces, models, advisors framework) β”œβ”€β”€ spring-ai-client-chat (ChatClient API) └── spring-ai-rag (RAG pipeline components) Model Provider Modules β”œβ”€β”€ spring-ai-openai β”œβ”€β”€ spring-ai-anthropic β”œβ”€β”€ spring-ai-azure-openai β”œβ”€β”€ spring-ai-vertex-ai-gemini β”œβ”€β”€ spring-ai-bedrock-converse β”œβ”€β”€ spring-ai-ollama └── spring-ai-mistral-ai Vector Store Modules β”œβ”€β”€ spring-ai-pgvector-store β”œβ”€β”€ spring-ai-redis-store β”œβ”€β”€ spring-ai-chroma-store β”œβ”€β”€ spring-ai-milvus-store └── spring-ai-pinecone-store Document Reader Modules β”œβ”€β”€ spring-ai-tika-document-reader β”œβ”€β”€ spring-ai-pdf-document-reader └── spring-ai-jsoup-document-reader MCP Modules β”œβ”€β”€ spring-ai-starter-mcp-client └── spring-ai-starter-mcp-server Spring Boot Auto-Configuration Modules └── spring-ai-spring-boot-autoconfigure (Auto-configures all of the above via starters)

10. LEARNING RESOURCES

Official Resources

Books

Community

Key Conference Talks (2025)

πŸ“š QUICK REFERENCE: SPRING AI DECISION MATRIX

Requirement Recommended Component
Simple Q&A chat ChatClient + SystemPrompt
Multi-turn conversation ChatClient + MessageChatMemoryAdvisor
Answer from documents QuestionAnswerAdvisor + VectorStore
Complex document Q&A Modular RAG pipeline
Call external APIs @Tool annotated methods
Autonomous task execution ReAct Agent
Connect to external tools MCP Client
Expose app as AI tool MCP Server
Extract structured data BeanOutputConverter
Analyze images Multimodal ChatModel + Media
Local models (no cloud) Ollama + spring-ai-ollama-starter
Production vector DB PGVector (if Postgres) or Pinecone
Conversation persistence JdbcChatMemoryRepository
Token usage tracking Micrometer + Prometheus
Test AI quality RelevancyEvaluator + FactCheckingEvaluator
Enterprise secrets Spring Cloud Vault + @Value