Agentic AI with multi-model framework using Hugging Face smolagents on AWS

This post is cowritten by Jeff Boudier, Simon Pagezy, and Florent Gbelidji from Hugging Face.

Agentic AI systems represent an evolution from conversational AI to autonomous agents capable of complex reasoning, tool usage, and code execution. Enterprise applications benefit from strategic deployment approaches tailored to specific needs. These needs include managed endpoints, which deliver auto-scaling capabilities, foundation model APIs to support complex reasoning, and containerized deployment options that support custom integration requirements.

Hugging Face smolagents is an open source Python library designed to make it straightforward to build and run agents using a few lines of code. We will show you how to build an agentic AI solution by integrating Hugging Face smolagents with Amazon Web Services (AWS) managed services. You’ll learn how to deploy a healthcare AI agent that demonstrates multi-model deployment options, vector-enhanced knowledge retrieval, and clinical decision support capabilities.

While we use healthcare as an example, this architecture applies to multiple industries where domain-specific intelligence and reliability are critical. The solution uses the model-agnostic, modality-agnostic, and tool-agnostic design of smolagents to orchestrate across Amazon SageMaker AI endpoints, Amazon Bedrock APIs, and containerized model servers.

Solution overview

Many AI systems face limitations with single-model approaches that can’t adapt to diverse enterprise needs. These systems often have rigid deployment options, inconsistent APIs across different AI services, and lack multi-model deployment options for optimal model selection.

This solution demonstrates how organizations can build AI systems that address these limitations. The solution allows deployment selection based on operational needs and provides consistent request and response formats across different AI backends and deployment methods. It generates contextual responses through medical knowledge integration and vector search, supporting deployment from development to production environments through containerized architecture.

This healthcare use case illustrates how the AI agent can process complex medical queries for six medications with clinical decision support and AWS security and compliance capabilities.

Architecture

The solution consists of the following services and features to deliver the agentic AI capabilities:

SageMaker AI with BioM-ELECTRA-Large-SQuAD2 model for specialized medical queries and managed auto-scaling.
Amazon Bedrock with Claude 3.5 Sonnet V2 by Anthropic for complex reasoning and foundation model access.
Amazon OpenSearch Service for vector similarity matching and contextual knowledge retrieval with medical knowledge indexing.
Amazon Elastic Container Service (Amazon ECS), with AWS Fargate, for serverless container orchestration and scalable deployment of the Python application that uses the smolagents library.
AWS Identity and Access Management (IAM) for security and access control.
Containerized model server with BioM-ELECTRA-Large-SQuAD2 for self-hosted model deployment.

The following diagram illustrates the solution architecture.

The architecture is a complete integration of the Hugging Face smolagents framework with AWS services. A client web interface connects to a healthcare agent container that orchestrates across three model backends: SageMaker AI with BioM-ELECTRA, Amazon Bedrock with Claude 3.5 Sonnet V2, and a containerized model server with BioM-ELECTRA. The solution includes a vector store powered by OpenSearch Service and a security layer with data encryption at rest and in transit. The security layer also handles IAM access control and authentication, and any medical disclaimers for regulatory compliance.

This solution supports deployment options through smolagents with each backend optimized for different scenarios:

SageMaker AI for managed endpoints with auto-scaling and production workloads using Hugging Face Hub models.
Amazon Bedrock for serverless access to foundation models and complex reasoning through AWS APIs.
A containerized model server for self-hosted model deployment and tool integration from Hugging Face Hub.

The three backends implement Hugging Face Messages API compatibility, confirming consistent request and response formats regardless of the selected model service. Users select the appropriate backend based on their requirements—the solution provides deployment options rather than automatic routing.

The complete implementation is available in the sample-healthcare-agent-with-smolagents-on-aws GitHub repository.

Key benefits

The integration of Hugging Face smolagents with AWS managed services offers significant advantages for enterprise agentic AI deployments.

Deployment choice

Organizations can choose the optimal deployment for each use case: Amazon Bedrock for serverless access to foundation models and self-hosted containerized deployment for custom tool integration or SageMaker AI for specialized domain models. These options help to match specific workload requirements, rather than a one-size-fits-all approach.

Multi-model deployment options

Organizations can optimize their infrastructure choices without changing their agent logic. You can switch between containerized model server, SageMaker AI, and Amazon Bedrock without modifying your application code. This provides deployment options, while maintaining consistent agent behavior.

Code generation capabilities

The CodeAgent approach of smolagents streamlines multi-step operations through direct Python code generation and processing. The following comparison illustrates the multi-step operations of smolagents:

Multi-step JSON-based approach:

{
  "action": "search",
  "parameters": {"query": "drug interactions"},
  "next_action": {
    "action": "filter",
    "parameters": {"criteria": "severity > moderate"}
  }
}

smolagents CodeAgent:

# Search and filter in single code block
results = search_tool("drug interactions")
filtered_results = [r for r in results if r.severity > "moderate"]
final_answer(f"Found {len(filtered_results)} severe interactions: {filtered_results}")

The smolagents CodeAgent supports single code blocks to handle multi-step operations, reducing large language model (LLM) calls while streamlining agent development. It provides full control of agent logic across AWS service deployments.

Scalable architecture

By deploying the application on AWS, you gain access to security features and auto-scaling capabilities that help you meet organizational security requirements and maintain regulatory compliance. Running containerized workloads with Amazon ECS and Fargate helps you achieve reliable operations and optimize costs through automated resource scaling.

Let’s walk through implementing this solution.

Prerequisites

Before you deploy the solution, you need the following:

An AWS account with appropriate permissions to create IAM roles, Amazon ECS clusters, and Amazon OpenSearch Service domains.
AWS Command Line Interface (AWS CLI) version 2.0 or later installed and configured.
Python 3.10 or later for running deployment scripts.
Docker installed and running (required for production environments to provide secure code execution sandbox).
Access to Amazon Bedrock, SageMaker AI, and OpenSearch Service in your AWS Region with appropriate IAM permissions to create and manage resources.
For this implementation, we’re using Python 3.10+, smolagents framework, transformers 4.28.1+, PyTorch 2.0.0+, and boto3.

Run the following command to install the required Python packages:

pip install -r healthcare_ai_agent/phase_00_installation/requirements.txt

Configure environment variables

Set the required environment variables for your AWS Region and resource names before deploying the infrastructure.

Set the following environment variables in your terminal:

export AWS_REGION=us-west-2
export SAGEMAKER_ENDPOINT_NAME=healthcare-qa-endpoint-1
export OPENSEARCH_DOMAIN=healthcare-vector-store
export OPENSEARCH_INDEX=medical-knowledge
export BEDROCK_MODEL_ID=anthropic.claude-3-5-sonnet-20241022-v2:0
export SAGEMAKER_MODEL_ID=sultan/BioM-ELECTRA-Large-SQuAD2
export CONTAINERIZED_MODEL_ID=sultan/BioM-ELECTRA-Large-SQuAD2

Verify the variables are set:

echo $AWS_REGION
echo $SAGEMAKER_ENDPOINT_NAME

These environment variables are used throughout the deployment and testing processes. Verify that they’re set before proceeding to the next step.

Set up AWS infrastructure

Start by creating the foundational AWS infrastructure components using the SampleAWSInfrastructureManager class from the Smolagents_SageMaker_Bedrock_Opensearch.py implementation.

Deploy complete infrastructure (automated approach)

For automated deployment of the AWS infrastructure components, you can use the enhanced main function.

Start the enhanced main function:

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
enhanced_main()
# Select option 1 for complete AWS infrastructure deployment

The deployment automatically creates an Amazon ECS cluster, IAM roles, and an OpenSearch Service domain.
Wait for deployment to complete (approximately 15–20 minutes).

Create individual AWS components (alternative approach)

If you prefer to create components individually, you can set up an OpenSearch Service domain for vector-enhanced knowledge retrieval and an Amazon ECS cluster for containerized deployment.

Both the OpenSearch Service domain and Amazon ECS cluster are automatically created as part of the complete AWS infrastructure deployment (Option 1 in enhanced_main). If you’ve already deployed the complete infrastructure, both components are ready and you can skip to the Deploy the Amazon SageMaker AI endpoint section.

Deploy the Amazon SageMaker AI endpoint

Deploy the BioM-ELECTRA-Large-SQuAD2 model to SageMaker AI for specialized medical query processing. An automated method (for deployment through the enhanced main) and a manual method (for deployment through the SageMaker AI endpoint) are provided.

To deploy using enhanced main (automated method)

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 2 for SageMaker endpoint deployment

To deploy the SageMaker AI endpoint (manual method)

Verify your SageMaker environment variables are set (from the Configure environment variables section):

echo $SAGEMAKER_MODEL_ID
echo $SAGEMAKER_ENDPOINT_NAME

Run the Amazon SageMaker deployment function:

from Smolagents_SageMaker_Bedrock_Opensearch import deploy_sagemaker_endpoint_safe
endpoint_name = deploy_sagemaker_endpoint_safe()

The deployment uses the HuggingFaceModel with transformers 4.28.1, PyTorch 2.0.0, and ml.m5.xlarge instance type.
Wait for the endpoint deployment to complete (approximately 5–10 minutes).
Verify the endpoint deployment:

# Verify endpoint status
print(f"✅ Endpoint deployed: {endpoint_name}")

The endpoint is configured for question-answering tasks with MAX_LENGTH=512 and TEMPERATURE=0.1.

Configure the multi-model backends

Configure the two additional backend options using the SampleTripleHealthcareAgent class for model selection based on operational needs.

Option 1 – Set up Amazon Bedrock access

Configure access to Amazon Bedrock for foundation model integration with Claude 3.5 Sonnet V2.

Verify your Amazon Bedrock model configuration is set (from the Configure environment variables section):

echo $BEDROCK_MODEL_ID

Access to Claude 3.5 Sonnet V2 is automatically available for your AWS account.
You can verify model availability in the Amazon Bedrock console under Model catalog.

Initialize the medical knowledge base

Set up the medical knowledge database with six medications and vector embeddings in OpenSearch Service. You can use the enhanced_main() function, which provides an interactive menu for deployment tasks, or initialize manually using the SampleOpenSearchManager class.

To initialize using enhanced main (Automated method)

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 4 for OpenSearch initialization and medical knowledge indexing

To initialize the medical knowledge base (Manual method)

Initialize the SampleOpenSearchManager:

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 4 for OpenSearch initialization and medical knowledge indexing

The system creates the medical knowledge index with mappings for drug_name, content, and content_type fields.
Medical knowledge for Metformin, Lisinopril, Atorvastatin, Amlodipine, Omeprazole, and Simvastatin is automatically indexed.
Each medication includes side effects, monitoring requirements, and drug class information.
The vector store supports similarity search with content type filtering for targeted queries.
Verify the indexing completed successfully:

# Verify the indexing completed successfully 
print("Medical knowledge base initialized with 6 medications")

Option 2 – Deploy the containerized model server

After setting up the core infrastructure, deploy a containerized model server that provides self-hosted model deployment capabilities.

Deploy the containerized model server using the LocalContainerizedModelServer class with BioM-ELECTRA-Large-SQuAD2 for model deployment.

To deploy containers to Amazon ECS (automated method)

For automated deployment, use the containerized Amazon ECS deployment:

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 3 for ECS container deployment

To deploy the containerized model server (manual method)

Verify your containerized model configuration is set (from the Configure environment variables step):

echo $CONTAINERIZED_MODEL_ID

Initialize the containerized model server:

from Smolagents_SageMaker_Bedrock_Opensearch import LocalContainerizedModelServer
containerized_server = LocalContainerizedModelServer()
containerized_server.start_server()

The containerized model server uses Docker sandboxing for secure code execution.

Verify that the containerized model server is running:

print(f"✅ Containerized Model Server status: {containerized_server.get_status()}")

The containerized model server includes fallback mechanisms with medical knowledge database integration.
The containerized model server supports deployment with Amazon ECS service discovery.

Implement the healthcare agent

Deploy the core healthcare agent using the SampleTripleHealthcareAgent class that demonstrates smolagents integration with deployment capabilities across the three backends.

Initialize the triple healthcare agent

Set up the main healthcare agent that orchestrates across SageMaker AI, Amazon Bedrock, and containerized model server with smolagents framework integration.

Create the vector store instance:

from Smolagents_SageMaker_Bedrock_Opensearch import SampleOpenSearchVectorStore
vector_store = SampleOpenSearchVectorStore()
vector_store_available = vector_store.initialize_client()

Initialize the triple healthcare agent:

from Smolagents_SageMaker_Bedrock_Opensearch import SampleTripleHealthcareAgent
import os
agent = SampleTripleHealthcareAgent(
endpoint_name=os.getenv('SAGEMAKER_ENDPOINT_NAME'),
vector_store=vector_store if vector_store_available else None
)

The agent initializes three model backends:
- SampleSageMakerEndpointModel with BioM-ELECTRA integration
- SampleBedrockClaudeModel with Claude 3.5 Sonnet V2 API
- SampleContainerizedModel with fallback mechanisms
Each backend is wrapped with SampleHealthcareCodeAgent for smolagents integration.
The agents are configured with max_steps=3 and DuckDuckGoSearchTool integration.

Configure deployment options

The system demonstrates three deployment options based on operational needs.

To understand the deployment options:

SageMaker AI deployment: Specialized medical queries about drug information, side effects, and monitoring requirements.
Amazon Bedrock deployment: Complex reasoning tasks requiring detailed medical analysis and multi-step reasoning.
Containerized deployment: Custom model integration with specialized tools and self-hosted model access.

Each backend includes database fallback with medical knowledge for six medications and Amazon OpenSearch Service provides contextual information across the model backends with similarity matching.

Test the interactive system

Use the sample_interactive_healthcare_assistant function to test the multi-model deployment.

Start the interactive healthcare assistant:

from Smolagents_SageMaker_Bedrock_Opensearch import sample_interactive_healthcare_assistant
sample_interactive_healthcare_assistant()

The system provides commands for model switching:
- 1 for SageMaker Endpoint
- 2 for containerized model server
- 3 for Amazon Bedrock Claude API
- compare <question> to view the three responses
Test queries are limited to 100 characters for optimal performance.
Responses include vector context when available and database fallback information.

Run and test the solution

You can now run and test the multi-model healthcare agent to observe the multi-backend deployment capabilities of smolagents using the actual medical knowledge database with six medications.

Run the application

The solution provides multiple ways to interact with the healthcare AI agent based on your preferred development environment.

Streamlit web interface

For an interactive web-based experience:

cd healthcare_ai_agent/streamlit_demo
streamlit run healthcare_app.py

Jupyter Notebook

For interactive development and experimentation:

jupyter lab Smolagents_SageMaker_Bedrock_Opensearch.ipynb

Direct Python

For command-line execution:

python Smolagents_SageMaker_Bedrock_Opensearch.py

These methods provide access to the same multi-model healthcare agent functionality with different user interfaces.

Test the deployment options

Use the enhanced_main function to access testing utilities and validate the multi-model deployment.

Interactive healthcare demo

Experience the healthcare assistant’s conversational interface with near real-time model switching capabilities.

Start the enhanced main function:

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()

Choose option 5 for Interactive 3-model demo to launch the conversational interface.
The system provides commands for model switching:
- 1 for SageMaker Endpoint
- 2 for containerized model server
- 3 for Amazon Bedrock Claude API
- compare <question> to see the three responses
Test queries are limited to 100 characters for optimal performance.
Responses include vector context when available and database fallback information.

Compare model responses

Use the enhanced_main function to compare the three deployment options across different

query types.

To run interactive tests

Start the enhanced main function:

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
enhanced_main()

Choose option 7 for Compare the 3 models to test deployment options.

Test a medical query using SageMaker AI:

Query: "What are the side effects of metformin?"
Expected deployment: SageMaker AI (specialized medical knowledge)
Response: "Metformin side effects: Nausea, Diarrhea. Serious: Lactic acidosis."

Test a complex reasoning query using Amazon Bedrock:

Query: "Compare cardiovascular risks of atorvastatin and simvastatin"
Expected deployment: Amazon Bedrock (complex reasoning with Claude 3.5 Sonnet V2)
Response: Detailed analysis with mechanism of action and monitoring requirements

Test a query that uses the containerized model server:

Query: "Tell me about lisinopril monitoring requirements"
Expected deployment: Containerized Model Server (specialized tools and fallback)
Response: "Monitor Blood pressure, Kidney function for Lisinopril"

Actual responses include vector context from OpenSearch Service when available, showing similarity matching results alongside the model responses.

Verify deployment status

Use the built-in testing utilities to verify healthcare agent deployment and component status.

Run the component test (option 6 in enhanced_main):

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
enhanced_main()
# Select option 6 for component test

Expected output:

OpenSearch: Connected
SageMaker endpoint: Active
Bedrock: Available
Containerized model server: Started
ECS cluster: Active

Monitor the agent status using get_container_status():

status = agent.get_container_status()
print(status)
# Shows: Healthcare Agent Container, smolagents framework Active,
# Messages API Compatible, 6 medications loaded

Expected output:

Healthcare agent container: Active
smolagents framework: Active
Messages API: Compatible
Drug database: 6 medications loaded
Vector store: OpenSearch integrated

Test containerized model server functionality (option 10) to validate the containerized model server independently:

from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main
# Start the enhanced main function
enhanced_main()
# Select option 10 for containerized model server test

Expected output:

Containerized model server: Started successfully
BioM-ELECTRA model: Loaded
Medical knowledge: Available
Fallback mechanisms: Active

Test OpenSearch Service vector similarity (option 9) to observe how the system matches queries (such as metformin side effects and lisinopril monitoring) with vector similarity scores and content matching.

Clean up resources

To avoid incurring future charges, delete the resources you created using the cleanup utilities provided in the implementation.

Delete the Amazon OpenSearch Service domain:

aws opensearch delete-domain --domain-name healthcare-vector-store

Delete the Amazon ECS cluster and healthcare agent service:

aws ecs update-service --cluster healthcare-agent-cluster --service healthcare-agent-service --desired-count 0
aws ecs delete-service --cluster healthcare-agent-cluster --service healthcare-agent-service
aws ecs delete-cluster --cluster healthcare-agent-cluster

Clean up the Amazon SageMaker AI endpoint and containerized model server. Choose either of the following methods:
- Method 1: Use the enhanced_main menu
  - Run the enhanced main function:
```
from Smolagents_SageMaker_Bedrock_Opensearch import enhanced_main enhanced_main()
```
  - Select option 8 for SageMaker cleanup when prompted.
- Method 2: Use direct cleanup functions
  - Use the direct cleanup functions:
```
from Smolagents_SageMaker_Bedrock_Opensearch import cleanup_sagemaker_endpoint cleanup_sagemaker_endpoint("healthcare-qa-endpoint-1")
```
  - If the containerized server is running, stop it:
```
containerized_server.stop_server()
```

Remove the IAM role:

aws iam detach-role-policy --role-name ecsTaskExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
aws iam delete-role --role-name ecsTaskExecutionRole

Additional considerations

For production deployments, implementing observability is essential for monitoring agent performance, tracking execution traces, and verifying reliability. Amazon Bedrock AgentCore Runtime provides observability with automatic instrumentation. It captures session metrics, performance data, error tracking, and complete execution traces (including tool invocations). Read more about implementing observability in Build trustworthy AI agents with Amazon Bedrock AgentCore observability.

Use cases

While we demonstrated a healthcare implementation, this solution applies to multiple industries requiring domain-specific intelligence and reliability. With multi-model deployment, organizations can choose the optimal backend for each use case. This includes managed endpoints for production workloads, foundation models for complex reasoning, or self-hosted deployment for custom integration.

Financial services

Financial institutions can deploy agents for regulatory compliance, risk assessment, and fraud detection while meeting strict security and audit requirements. This deployment approach supports specialized fraud detection models, complex regulatory analysis, and custom financial tools integration.

Manufacturing and industrial operations

Manufacturing organizations can implement intelligent agents for predictive maintenance, quality control, and supply chain optimization. Multi-model deployment allows equipment monitoring, with domain-specific models and complex supply chain reasoning with foundation models.

Energy and utilities

Energy companies can deploy agents for grid operations, regulatory compliance, and infrastructure management. This approach supports near real-time demand forecasting with specialized models and complex environmental impact analysis with foundation models.

Conclusion

We showed how to build an agentic AI solution by integrating Hugging Face smolagents with AWS managed services. The solution demonstrates multi-model deployment options, vector-enhanced knowledge retrieval, and deployment capabilities so organizations can deploy domain-specific AI agents with AWS security features and compliance controls.

The healthcare use case illustrates how the model-agnostic design of smolagents supports deployment orchestration across Amazon SageMaker AI, Amazon Bedrock, and a containerized model server. Key technical innovations include the messages API compatibility across the backends, smolagents framework integration, and containerized deployment with AWS Fargate.

This solution architecture is extensible to financial services, manufacturing, energy, and other industries where domain-specific intelligence and reliability are critical.

Discuss a project and requirements to find the right help for your business needs with Hugging Face. Or if there are questions about getting started with AWS, speak with an AWS generative AI Specialist to learn how we can help accelerate your business today.

Further reading

About the authors

Sanhita Sarkar, PhD, Global Partner Solutions, AI/ML, and generative AI at AWS. She drives AI/ML and generative AI partner solutions at AWS, with extensive leadership experience across edge, cloud, and data center environments. She holds several patents, has published research papers, and serves as chair for technical conferences.

Jeff Boudier, Head of Products at Hugging Face. Jeff was also a co-founder of Stupeflix, acquired by GoPro, where he served as director of product management, product marketing, business development, and corporate development.

Simon Pagezy, Partner Success Manager at Hugging Face. He leads partnerships at Hugging Face with major cloud and hardware companies. He works to make the Hugging Face offerings broadly accessible across diverse deployment environments.

Florent Gbelidji, Cloud Partnership Tech Lead for the AWS account, driving integrations between the Hugging Face offerings and AWS services. He has also been an ML Engineer in the Expert Acceleration Program where he helped companies build solutions with open-source AI.

Solution overview

Architecture

Key benefits

Deployment choice

Multi-model deployment options

Code generation capabilities

Multi-step JSON-based approach:

smolagents CodeAgent:

Scalable architecture

Prerequisites

Configure environment variables

Set up AWS infrastructure

Deploy complete infrastructure (automated approach)

Create individual AWS components (alternative approach)

Deploy the Amazon SageMaker AI endpoint

To deploy using enhanced main (automated method)

To deploy the SageMaker AI endpoint (manual method)

Configure the multi-model backends

Option 1 – Set up Amazon Bedrock access

Initialize the medical knowledge base

To initialize using enhanced main (Automated method)

To initialize the medical knowledge base (Manual method)

Option 2 – Deploy the containerized model server

To deploy containers to Amazon ECS (automated method)

To deploy the containerized model server (manual method)

Implement the healthcare agent

Initialize the triple healthcare agent

Configure deployment options

To understand the deployment options:

Test the interactive system

Run and test the solution

Run the application

Streamlit web interface

Jupyter Notebook

Direct Python

Test the deployment options

Interactive healthcare demo

Compare model responses

Verify deployment status

Clean up resources

Additional considerations

Use cases

Financial services

Manufacturing and industrial operations

Energy and utilities

Conclusion

About the authors

Related Posts

Leave a Comment Cancel Reply

Sign In

Register

Reset Password