Amazon Nova Sonic is a foundation model that creates natural, human-like speech-to-speech conversations for generative AI applications, allowing users to interact with AI through voice in real-time, with capabilities for understanding tone, enabling natural flow, and performing actions.
Multi-agent architecture offers a modular, robust, and scalable design pattern for production-level voice assistants. This blog post explores Amazon Nova Sonic voice agent applications and demonstrates how they integrate with Strands Agents framework sub-agents while leveraging Amazon Bedrock AgentCore to create an effective multi-agent system.
Why multi-agent architecture?
Imagine developing a financial assistant application responsible for user onboarding, information collection, identity verification, account inquiries, exception handling, and handing off to human agents based on predefined conditions. As functional requirements expand, the voice agent continues to add new inquiry types. The system prompt grows enormous, and the underlying logic becomes increasingly complex, illustrates a persistent challenge in software development: monolithic designs lead to systems that are difficult to maintain and enhance.
Think of multi-agent architecture as building a team of specialized AI assistants rather than relying on a single do-it-all helper. Just like companies divide responsibilities across different departments, this approach breaks complex tasks into smaller, manageable pieces. Each AI agent becomes an expert in a specific area—whether that’s fact-checking, data processing, or handling specialized requests. For the user, the experience feels seamless: there’s no delay, no change in voice, and no visible handoff. The system functions behind the scenes, directing each expert agent to step in at the right moment.
In addition to modular and robust benefits, multi-agent systems offer advantages similar to a microservice architecture, a popular enterprise software design pattern, providing scalability, distribution and maintainability while allowing organizations to reuse agentic workflows already developed for their large language model (LLM)-powered applications.
Sample application
In this blog, we refer to the Amazon Nova Sonic workshop multi-agent lab code, which uses the banking voice assistant as a sample to demonstrate how to deploy specialized agents on Amazon Bedrock AgentCore. It uses Nova Sonic as the voice interface layer and acts as an orchestrator to delegate detailed inquiries to sub-agents written in Strands Agents hosted on AgentCore Runtime. You can find the sample source code on the GitHub repo.
In the banking voice agent sample, the conversation flow begins with a greeting and collecting the user’s name, and then it handles inquiries related to banking or mortgages. We use three secondary level agents hosted on AgentCore to handle specialized logic:
- Authenticate sub-agent: Handles user authentication using the account ID and other information
- Banking sub-agent: Handles account balance checks, statements, and other banking-related inquiries
- Mortgage sub-agent: Handles mortgage-related inquiries, including refinancing, rates, and repayment options
Sub-agents are self-contained, handling their own logic such as input validation. For instance, the authentication agent validates account IDs and returns errors to Nova Sonic if needed. This simplifies the reasoning logic in Nova Sonic while keeping business logic encapsulated, similar to the software engineering modular design patterns.
Integrate Nova Sonic with AgentCore through tool use events
Amazon Nova Sonic relies on tool use to integrate with agentic workflows. During the Nova Sonic event lifecycle, you can provide tool use configurations through the promptStart event, which is designed to initiate when Sonic receives specific types of input.
For example, in the following Sonic tool configuration sample, tool use is configured to initiate events based on Sonic’s built-in reasoning model, which classifies the inquiry for routing to the banking sub-agents.
When a user asks Nova Sonic a question such as ‘What is my account balance?’, Sonic sends a toolUse
event to the client application with the specified toolName
(for example, bankAgent
) defined in the configuration. The application can then invoke the sub-agent hosted on AgentCore to handle the banking logic and return the response to Sonic, which in turn generates an audio reply for the user.
Sub-agent on AgentCore
The following sample showcases the banking sub-agent developed using the Strands Agents framework, specifically configured for deployment on Bedrock AgentCore. It leverages Nova Lite through Amazon Bedrock as its reasoning model, providing effective cognitive capabilities with minimal latency. The agent implementation features a system prompt that defines its banking assistant responsibilities, complemented by two specialized tools: one for account balance inquiries and another for bank statement retrieval.
Best practices for voice-based multi-agent systems
Multi-agent architecture provides exceptional flexibility and a modular design approach, allowing developers to structure voice assistants efficiently and potentially reuse existing specialized agent workflows. When implementing voice-first experiences, there are important best practices to consider that address the unique challenges of this modality.
- Balance flexibility and latency: Although the ability to invoke sub-agents using Nova Sonic tool use events creates powerful capabilities, it can introduce additional latency to voice responses. For the use cases that require a synchronized experience, each agent handoff represents a potential delay point in the interaction flow. Therefore, it’s important to design with response time in mind.
- Optimize model selection for sub-agents: Starting with smaller, more efficient models like Nova Lite for sub-agents can significantly reduce latency while still handling specialized tasks effectively. Reserve larger, more capable models for complex reasoning or when sophisticated natural language understanding is essential.
- Craft voice-optimized responses: Voice assistants perform best with concise, focused responses that can be followed by additional details when needed. This approach not only improves latency but also creates a more natural conversational flow that aligns with human expectations for verbal communication.
Consider stateless vs. stateful sub-agent design
Stateless sub-agents handle each request independently, without retaining memory of past interactions or session-level states. They are simple to implement, easy to scale, and work well for straightforward, one-off tasks. However, they cannot provide context-aware responses unless external state management is introduced.
Stateful sub-agents, on the other hand, maintain memory across interactions to support context-aware responses and session-level states. This enables more personalized and cohesive user experiences, but comes with added complexity and resource requirements. They are best suited for scenarios involving multi-turn interactions and user or session-level context caching.
Conclusion
Multi-agent architectures unlock flexibility, scalability, and accuracy for complex AI-driven workflows. By combining the Nova Sonic conversational capabilities with the orchestration power of Bedrock AgentCore, you can build intelligent, specialized agents that work together seamlessly. If you’re exploring ways to enhance your AI applications, multi-agent patterns with Nova Sonic and AgentCore are a powerful approach worth testing.
Learn more about Amazon Nova Sonic by visiting the User Guide, building your application with the sample applications, and exploring the Nova Sonic workshop to get started. You can also refer to the technical report and model card for additional benchmarks.
About the authors
Lana Zhang is a Senior Specialist Solutions Architect for Generative AI at AWS within the Worldwide Specialist Organization. She specializes in AI/ML, with a focus on use cases such as AI voice assistants and multimodal understanding. She works closely with customers across diverse industries, including media and entertainment, gaming, sports, advertising, financial services, and healthcare, to help them transform their business solutions through AI.