Blog_dumb

Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference

Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference

Generative AI models continue to expand in scale and capability, increasing the demand for faster and more efficient inference. Applications need low latency and consistent performance without compromising output quality. Amazon SageMaker AI introduces new enhancements to its inference optimization toolkit that bring EAGLE based adaptive speculative decoding to more model architectures. These updates make …

Amazon SageMaker AI introduces EAGLE based adaptive speculative decoding to accelerate generative AI inference Read More »

Train custom computer vision defect detection model using Amazon SageMaker

Train custom computer vision defect detection model using Amazon SageMaker

On October 10, 2024, Amazon announced the discontinuation of the Amazon Lookout for Vision service, with a scheduled shut down date of October 31, 2025 (see Exploring alternatives and seamlessly migrating data from Amazon Lookout for Vision blog post). As part of our transition guidance for customers, we recommend the use of Amazon SageMaker AI tools …

Train custom computer vision defect detection model using Amazon SageMaker Read More »

Practical implementation considerations to close the AI value gap

Practical implementation considerations to close the AI value gap

Artificial Intelligence (AI) is changing how businesses operate. Gartner® predicts at least 15% of day-to-day work decisions will be made autonomously through agentic AI by 2028. And 92% of companies are boosting their AI spending, according to McKinsey. But here’s the problem: most companies are yet to realize a positive impact of AI on their …

Practical implementation considerations to close the AI value gap Read More »

Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI

Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI

In 2025, generative AI has evolved from text generation to multi-modal use cases ranging from audio transcription and translation to voice agents that require real-time data streaming. Today’s applications demand something more: continuous, real-time dialogue between users and models—the ability for data to flow both ways, simultaneously, over a single persistent connection. Imagine a speech …

Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI Read More »

Warner Bros. Discovery achieves 60% cost savings and faster ML inference with AWS Graviton

Warner Bros. Discovery achieves 60% cost savings and faster ML inference with AWS Graviton

This post is written by Nukul Sharma, Machine Learning Engineering Manager, and Karthik Dasani, Staff Machine Learning Engineer, at Warner Bros. Discovery. Warner Bros. Discovery (WBD) is a leading global media and entertainment company that creates and distributes the world’s most differentiated and complete portfolio of content and brands across television, film and streaming. With iconic …

Warner Bros. Discovery achieves 60% cost savings and faster ML inference with AWS Graviton Read More »

Physical AI in practice: Technical foundations that fuel human-machine interactions

Physical AI in practice: Technical foundations that fuel human-machine interactions

In our previous post, Transforming the physical world with AI: the next frontier in intelligent automation, we explored how the field of physical AI is redefining a wide range of industries including construction, manufacturing, healthcare, and agriculture. Now, we turn our attention to the complete development lifecycle behind this technology – the process of creating intelligent …

Physical AI in practice: Technical foundations that fuel human-machine interactions Read More »

HyperPod now supports Multi-Instance GPU to maximize GPU utilization for generative AI tasks

HyperPod now supports Multi-Instance GPU to maximize GPU utilization for generative AI tasks

We are excited to announce the general availability of GPU partitioning with Amazon SageMaker HyperPod, using NVIDIA Multi-Instance GPU (MIG). With this capability you can run multiple tasks concurrently on a single GPU, minimizing wasted compute and memory resources that result from dedicating entire hardware (for example, entire GPUs) to tasks that can under-utilize the resources. By …

HyperPod now supports Multi-Instance GPU to maximize GPU utilization for generative AI tasks Read More »

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.

Scroll to Top