Blog_dumb

Simulate realistic users to evaluate multi-turn AI agents in Strands Evals

Simulate realistic users to evaluate multi-turn AI agents in Strands Evals

Evaluating single-turn agent interactions follows a pattern that most teams understand well. You provide an input, collect the output, and judge the result. Frameworks like Strands Evaluation SDK make this process systematic through evaluators that assess helpfulness, faithfulness, and tool usage. In a previous blog post, we covered how to build comprehensive evaluation suites for …

Simulate realistic users to evaluate multi-turn AI agents in Strands Evals Read More »

Scaling seismic foundation models on AWS: Distributed training with Amazon SageMaker HyperPod and expanding context windows

Scaling seismic foundation models on AWS: Distributed training with Amazon SageMaker HyperPod and expanding context windows

This post is cowritten with Altay Sansal and Alejandro Valenciano from TGS. TGS, a geoscience data provider for the energy sector, supports companies’ exploration and production workflows with advanced seismic foundation models (SFMs). These models analyze complex 3D seismic data to identify geological structures vital for energy exploration. To help enhance their next-generation models as …

Scaling seismic foundation models on AWS: Distributed training with Amazon SageMaker HyperPod and expanding context windows Read More »

Control which domains your AI agents can access

Control which domains your AI agents can access

AI agents that can browse the web open powerful possibilities—from research automation to real-time data gathering. However, giving an AI agent unrestricted internet access raises security and compliance concerns. What if the agent accesses unauthorized websites? What if sensitive data is exfiltrated to external domains? Amazon Bedrock AgentCore provides managed tools that enable AI agents …

Control which domains your AI agents can access Read More »

Rocket Close transforms mortgage document processing with Amazon Bedrock and Amazon Textract

Rocket Close transforms mortgage document processing with Amazon Bedrock and Amazon Textract

This post is cowritten by Jeremy Little and Chris Day from Rocket Close. Rocket Close, a Detroit-based title and appraisal management company within the Rocket Companies environment, has enhanced mortgage document processing by transforming a time-consuming manual process into an efficient automated solution. Processing approximately 2,000 abstract package files daily, with each file averaging 75 …

Rocket Close transforms mortgage document processing with Amazon Bedrock and Amazon Textract Read More »

Persist session state with filesystem configuration and execute shell commands

Persist session state with filesystem configuration and execute shell commands

AI agents have evolved significantly beyond chat. Writing code, persist filesystem state, execute shell commands, and managing states throughout the filesystem are some examples of things that they can do. As agentic coding assistants and development workflows have matured, the filesystem has become agents’ primary working memory, extending their capabilities beyond the context window. This …

Persist session state with filesystem configuration and execute shell commands Read More »

Automating competitive price intelligence with Amazon Nova Act

Automating competitive price intelligence with Amazon Nova Act

Monitoring competitor prices is essential for ecommerce teams to maintain a market edge. However, many teams remain trapped in manual tracking, wasting hours daily checking individual websites. This inefficient approach delays decision-making, raises operational costs, and risks human errors that result in missed revenue and lost opportunities. Amazon Nova Act is an open-source browser automation …

Automating competitive price intelligence with Amazon Nova Act Read More »

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations

Your AI agent worked in the demo, impressed stakeholders, handled test scenarios, and seemed ready for production. Then you deployed it, and the picture changed. Real users experienced wrong tool calls, inconsistent responses, and failure modes nobody anticipated during testing. The result is a gap between expected agent behavior and actual user experience in production. …

Build reliable AI agents with Amazon Bedrock AgentCore Evaluations Read More »

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.

Scroll to Top