Blog_dumb

Introducing container caching in Amazon SageMaker AI for faster model scaling

Introducing container caching in Amazon SageMaker AI for faster model scaling

Today, we’re excited to announce container image caching for Amazon SageMaker AI inference, the next major advancement in our faster scaling optimization journey. This speeds up end-to-end latency by up to 2x for generative AI models during scale-out events. Over the years, Amazon SageMaker AI has continued to reduce latency across these scaling stages: detecting …

Introducing container caching in Amazon SageMaker AI for faster model scaling Read More »

Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI

Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI

As large language models (LLMs) grow in size and complexity, maximizing inference throughput while minimizing latency remains a critical challenge for enterprise production deployments. Speculative decoding is one effective strategy to address this, utilizing a lightweight draft model to guess future tokens which are then verified by the target LLM in a single forward pass. While state-of-the-art frameworks like Extrapolation Algorithm for Greater …

Parallelize speculative decoding with P-EAGLE on Amazon SageMaker AI Read More »

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

AI Agent Failure Detection and Root Cause Analysis with Strands Evals

When your AI agent fails in production, knowing that it failed is only the beginning. The harder question is why it failed and what to fix. Traditional evaluation tells you “this agent scored 60 percent on goal completion,” but leaves you manually reviewing execution traces to understand what went wrong. For teams operating agents at …

AI Agent Failure Detection and Root Cause Analysis with Strands Evals Read More »

Build context-rich research agents with Deep Agents and Bedrock AgentCore

Build context-rich research agents with Deep Agents and Bedrock AgentCore

A common challenge in AI-powered research workflows is depth versus context. If your agent reads ten web pages, its context window (the amount of text a large language model (LLM) can process at once) gets filled with raw content. If it also runs data analysis code, chart-generation logic competes with strategic reasoning for limited space. …

Build context-rich research agents with Deep Agents and Bedrock AgentCore Read More »

Building Supercharger: How Rocket Close optimized title operations with agentic AI

Building Supercharger: How Rocket Close optimized title operations with agentic AI

Rocket Close is a Detroit-based title agency and appraisal management company within Rocket Companies that provides title insurance, property valuation, and settlement services. As demand for mortgages and loans grew, title operations became a bottleneck in the homebuying process. Time-intensive, state-specific title examinations, combined with manual research and fragmented systems, slowed throughput and made it …

Building Supercharger: How Rocket Close optimized title operations with agentic AI Read More »

Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers

Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers

Amazon Quick and Cisco Webex MCP servers can turn meeting prep and follow-up into a single conversational workflow. Instead of switching between Webex meetings, Vidcast videos, transcripts, recordings, and message spaces, users ask one assistant to gather the context they need. This post shows how to build a custom meeting prep and follow-up assistant using …

Build a meeting prep and follow-up assistant with Amazon Quick and Cisco Webex MCP servers Read More »

From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services

From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services

Organizations process millions of documents daily, from insurance claims and invoices to legal contracts and medical records. While traditional optical character recognition (OCR) solutions extract text, they can’t understand context, relationships, or meaning embedded within complex documents. This limitation creates bottlenecks that require manual intervention, increasing processing time and costs while introducing potential errors. Amazon …

From PDFs to insights: Architecting an intelligent document processing pipeline with AWS generative AI services Read More »

Built from the inside out: How AWS Professional Services became a frontier team first

Built from the inside out: How AWS Professional Services became a frontier team first

AWS Professional Services (AWS ProServe) compressed engagement timelines from months to days, not by adding artificial intelligence (AI) tools to an existing process, but by fundamentally rebuilding how we deliver from the inside out. The shift mirrors what my colleague Swami Sivasubramanian outlined in How Frontier Teams Are Reinventing AI-Native Development: real productivity gains come …

Built from the inside out: How AWS Professional Services became a frontier team first Read More »

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.

Scroll to Top