Recent enhancements in the field of generative AI, such as media generation technologies, are rapidly transforming the way businesses create and manipulate visual content. Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI. With that, it brings functionalities such as model customization, fine-tuning, and Retrieval Augmented Generation (RAG).
In your business, you might want to use those capabilities to improve the user experience and generate media content—such as images, diagrams, infographics or custom shapes—and understand the level of confidence of that generated content according to another model or even a customized, pre-trained evaluation model, with data and parameters from your own organization.
In this post, we demonstrate how to interact with the Amazon Titan Image Generator G1 v2 model on Amazon Bedrock to generate an image. Then, we show you how to use Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock to describe it, evaluate it with a score from 1–10, explain the reason behind the given score, and suggest improvements to the image. Amazon Titan Image Generator G1 v2 was recently released on Amazon Bedrock bringing new features in the image generation field and Anthropic’s Claude 3.5 Sonnet, also newly released, setting new industry benchmarks for graduate-level reasoning and improvements in grasping complex instructions.
Amazon Titan Image Generator G1 v2
Exclusive to Amazon Bedrock, the Amazon Titan models incorporate the 25 years of experience that Amazon has innovating with AI and machine learning (ML) across its business. It allows content creators to quickly generate high-quality, realistic images using simple English text prompts, and returns studio-quality images suitable for advertising, ecommerce, and entertainment.
The newly announced Amazon Titan Image Generator G1 v2 expands its initial version by allowing you to guide image creation using reference images, edit existing visuals, remove backgrounds, generate image variations, and securely customize the model to maintain brand style and subject consistency.
Anthropic Claude 3.5 Sonnet
Anthropic Claude 3.5 Sonnet raises the industry bar for intelligence, outperforming other generative AI models on a wide range of evaluations, including Anthropic’s previously most intelligent model, Anthropic Claude 3 Opus. Anthropic Claude 3.5 Sonnet is available on Amazon Bedrock with the speed and cost of the original Anthropic Claude 3 Sonnet model.
Solution overview
This solution is running in AWS Region us-east-1. It exposes an API endpoint through Amazon API Gateway that proxies the initial prompt request to a Python-based AWS Lambda function, which calls Amazon Bedrock twice. The following diagram illustrates the flow of events.
Users or applications submit a prompt as an API request.
The prompt and parameters are passed to Amazon Bedrock using an inference API called by the Lambda function.
Amazon Bedrock generates a high-quality image based on the prompt with Amazon Titan Image Generator G1 v2.
The Lambda function sends the image bytes and the original prompt to Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock.
Anthropic’s Claude 3.5 Sonnet evaluates the generated image against the original prompt.
The Lambda function saves the image to an Amazon Simple Storage Service (Amazon S3) bucket and generates a pre-signed URL.
The pre-signed URL and the evaluation are returned as an API response in JSON format.
Ultimately, the function saves the image in an S3 bucket and generates a pre-signed URL, returning it and the evaluation summary as the API response.
API Gateway proxies the request to a Lambda function that uses the Python Boto3 library to call Amazon Titan Image Generator v2 on Amazon Bedrock to generate the image and then decodes the image bytes. Then, it passes the image and an evaluation prompt through a multimodal call to Anthropic’s Claude 3.5 Sonnet and, after receiving the score, saves the image to Amazon S3, generates a pre-signed URL, and returns the complete response.
Prerequisites
You should have the following prerequisites:
An AWS account to create and manage the necessary AWS resources for this solution
Amazon Titan Image Generator G1 v2 and Anthropic Claude 3.5 Sonnet models enabled on Amazon Bedrock in AWS Region us-east-1
Provision the solution
You can build the solution architecture using AWS CloudFormation. A single YAML file contains the infrastructure, including AWS Identity and Access Management (IAM) users, policies, API methods, the S3 bucket, and the Lambda function code. Complete the following steps to set up the solution resources:
Sign in to the AWS Management Console as an IAM administrator or appropriate IAM user.
Choose Launch Stack to deploy the CloudFormation template.
Choose Next.
In the Parameters section, enter the following:
A name for the new S3 bucket that will receive the images (for example, image-gen-your-initials)
The name of an existing S3 bucket where access logs will be stored.
A token that you will use to authenticate your API (a string of your choice)
After entering the parameters, choose Next.
Choose Next again.
Acknowledge the creation of IAM resources and choose Submit.
When the stack status is CREATE_COMPLETE, navigate to the Outputs tab and find the API information. Copy the ApiId, the ApiUrl and ResourceId to a safe place and continue to test.
Test the solution
You can test the deployed API by calling it with a programming language of your choice (Python, React, and so in), using the console, a terminal window, or the AWS Command Line Interface (AWS CLI). In this post, we will review the console, the terminal, and AWS CLI. For a visual reference, the following picture is a rendered representation of the image and its evaluation using Streamlit (Python) and the prompt a black cat in an alleyway with blue eyes.
Note that the use of Amazon Bedrock is subject to the AWS Responsible AI Policy. If you encounter errors, or if generations or evaluations are being blocked, your prompt might conflict with the AWS Acceptable Use Policy or the AWS Responsible AI Policy. Retry with a different prompt that adheres to the policy.
Test the solution using the console
Complete the following steps to test the solution using the console:
On the API Gateway console, choose APIs in the navigation pane.
On the APIs list, choose BedrockImageGenEval.
In the Resources section, select the POST method below /generate-image.
Choose the Test tab in the method execution settings.
In the Request body section, enter the following JSON structure:
{ “prompt”:”your prompt” }
Choose Test.
Test the solution using the AWS CLI
To test the solution using the AWS CLI, make sure you have the latest version installed and configured. For instructions, see Install or update to the latest version of the AWS CLI. For configuration, see Configure the AWS CLI. Then complete the following steps:
Retrieve the ApiId and ResourceId information you saved from the Outputs tab.
In an environment running AWS CLI, run the following command:
Test the solution using the terminal
To test the solution using a terminal window, you need to have the curl tool installed. After you have it, run the following command:
Regardless of your choice, you should get a response with the following JSON structure:
Clean up
To avoid incurring future charges, clean up all the AWS resources that you created using CloudFormation. You can delete these resources on the console or using the AWS CLI. To clean up using the console:
On the Amazon S3 console, empty the S3 bucket that you created and delete it.
On the CloudFormation console, select the stack and choose Delete.
Conclusion
In this post, we demonstrated how to use Amazon Titan Generator G1 v2 and Anthropic’s Claude 3.5 Sonnet on Amazon Bedrock to generate and evaluate media assets (images) and create accurate, fine-grained, exclusive content for your users or internal business case. Thanks to the multimodal capabilities of Amazon Bedrock models, you can apply this solution to different types of media, such as documents, summarizations, translations, and more.
We encourage you to learn and experiment with Amazon Bedrock capabilities, such as how to customize a model to use your own data for generation or evaluation, or try different models and apply security guardrails to have standardized safety controls over the content generated.
About the Author
Raul Tavares is a Solutions Architect focused on games customers across EMEA. With a strong engineering approach, when not knee-deep in cloud architecture, you can find him transforming ideas into solutions, writing code samples or listening to some Japanese heavy metal bands to relax.