How Workhuman built multi-tenant self-service reporting using Amazon Quick Sight embedded dashboards

This post is cowritten with Ilija Subanovic and Michael Rice from Workhuman. Workhuman’s customer service and analytics team were drowning in one-time reporting requests from seven million users worldwide—a common challenge with legacy reporting tools at scale. Business intelligence (BI) admins faced mounting pressure as their teams became overwhelmed with these requests. By rebuilding their …

How Workhuman built multi-tenant self-service reporting using Amazon Quick Sight embedded dashboards Read More »

Read More

Build an offline feature store using Amazon SageMaker Unified Studio and SageMaker Catalog

Building and managing machine learning (ML) features at scale is one of the most critical and complex challenges in modern data science workflows. Organizations often struggle with fragmented feature pipelines, inconsistent data definitions, and redundant engineering efforts across teams. Without a centralized system for storing and reusing features, models risk being trained on outdated or …

Build an offline feature store using Amazon SageMaker Unified Studio and SageMaker Catalog Read More »

Read More

P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

EAGLE is the state-of-the-art method for speculative decoding in large language model (LLM) inference, but its autoregressive drafting creates a hidden bottleneck: the more tokens that you speculate, the more sequential forward passes the drafter needs. Eventually those overhead eats into your gains. P-EAGLE removes this ceiling by generating all K draft tokens in a …

P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM Read More »

Read More

Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption

As organizations scale their generative AI workloads on Amazon Bedrock, operational visibility into inference performance and resource consumption becomes critical. Teams running latency-sensitive applications must understand how quickly models begin generating responses. Teams managing high-throughput workloads must understand how their requests consume quota so they can avoid unexpected throttling. Until now, gaining this visibility required …

Improve operational visibility for inference workloads on Amazon Bedrock with new CloudWatch metrics for TTFT and Estimated Quota Consumption Read More »

Read More

Secure AI agents with Policy in Amazon Bedrock AgentCore

Deploying AI agents safely in regulated industries is challenging. Without proper boundaries, agents that access sensitive data or execute transactions can pose significant security risks. Unlike traditional software, an AI agent chooses actions to achieve a goal by invoking tools, accessing data, and adapting its reasoning using data from its environment and users. This autonomy …

Secure AI agents with Policy in Amazon Bedrock AgentCore Read More »

Read More
Please wait...

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.

Scroll to Top