Blog Archives - Page 23 of 213

Adaptive infrastructure for foundation model training with elastic training on SageMaker HyperPod

Modern AI infrastructure serves multiple concurrent workloads on the same cluster, from foundation model (FM) pre-training and fine-tuning to production inference and evaluation. In this shared environment, the demands for AI accelerators fluctuates continuously as inference workloads scale with traffic patterns, and experiments complete and release resources. Despite this dynamic availability of AI accelerators, traditional …

Adaptive infrastructure for foundation model training with elastic training on SageMaker HyperPod Read More »

Customize agent workflows with advanced orchestration techniques using Strands Agents

Leave a Comment / Blog / By admin

Large Language Model (LLM) agents have revolutionized how we approach complex, multi-step tasks by combining the reasoning capabilities of foundation models with specialized tools and domain expertise. While single-agent systems using frameworks like ReAct work well for straightforward tasks, real-world challenges often require multiple specialized agents working in coordination. Think about planning a business trip: …

Customize agent workflows with advanced orchestration techniques using Strands Agents Read More »

Operationalize generative AI workloads and scale to hundreds of use cases with Amazon Bedrock – Part 1: GenAIOps

Leave a Comment / Blog / By admin

Enterprise organizations are rapidly moving beyond generative AI experiments to production deployments and complex agentic AI solutions, facing new challenges in scaling, security, governance, and operational efficiency. This blog post series introduces generative AI operations (GenAIOps), the application of DevOps principles to generative AI solutions, and demonstrates how to implement it for applications powered by …

Operationalize generative AI workloads and scale to hundreds of use cases with Amazon Bedrock – Part 1: GenAIOps Read More »

Applying data loading best practices for ML training with Amazon S3 clients

Leave a Comment / Blog / By admin

Amazon Simple Storage Service (Amazon S3) is a highly elastic service that automatically scales with application demand, offering the high throughput performance required for modern ML workloads. High-performance client connectors such as the Amazon S3 Connector for PyTorch and Mountpoint for Amazon S3 provide native S3 integration in training pipelines without dealing directly with the …

Applying data loading best practices for ML training with Amazon S3 clients Read More »

We’re publishing an AI playbook to help others with sustainability reporting.

Leave a Comment / Blog / By admin

We’re sharing a practical playbook to help organizations streamline and enhance sustainability reporting with AI.Corporate transparency is essential, but navigating frag…

Building a voice-driven AWS assistant with Amazon Nova Sonic

Leave a Comment / Blog / By admin

As cloud infrastructure becomes increasingly complex, the need for intuitive and efficient management interfaces has never been greater. Traditional command-line interfaces (CLI) and web consoles, while powerful, can create barriers to quick decision-making and operational efficiency. What if you could speak to your AWS infrastructure and get immediate, intelligent responses? In this post, we explore …

Building a voice-driven AWS assistant with Amazon Nova Sonic Read More »