Secure distributed logging in scalable multi-account deployments using Amazon Bedrock and LangChain

Data privacy is a critical issue for software companies that provide services in the data management space. If they want customers to trust them with their data, software companies need to show and prove that their customers’ data will remain confidential and within controlled environments. Some companies go to great lengths to maintain confidentiality, sometimes adopting multi-account architectures, where each customer has their data in a separate AWS account. By isolating data at the account level, software companies can enforce strict security boundaries, help prevent cross-customer data leaks, and support adherence with industry regulations such as HIPAA or GDPR with minimal risk.

Multi-account deployment represents the gold standard for cloud data privacy, allowing software companies to make sure customer data remains segregated even at massive scale, with AWS accounts providing security isolation boundaries as highlighted in the AWS Well-Architected Framework. Software companies increasingly adopt generative AI capabilities like Amazon Bedrock, which provides fully managed foundation models with comprehensive security features. However, managing a multi-account deployment powered by Amazon Bedrock introduces unique challenges around access control, quota management, and operational visibility that could complicate its implementation at scale. Constantly requesting and monitoring quota for invoking foundation models on Amazon Bedrock becomes a challenge when the number of AWS accounts reaches double digits. One approach to simplify operations is to configure a dedicated operations account to centralize management while data from customers transits through managed services and is stored at rest only in their respective customer accounts. By centralizing operations in a single account while keeping data in different accounts, software companies can simplify the management of model access and quotas while maintaining strict data boundaries and security isolation.

In this post, we present a solution for securing distributed logging multi-account deployments using Amazon Bedrock and LangChain.

Challenges in logging with Amazon Bedrock

Observability is crucial for effective AI implementations—organizations can’t optimize what they don’t measure. Observability can help with performance optimization, cost management, and model quality assurance. Amazon Bedrock offers built-in invocation logging to Amazon CloudWatch or Amazon Simple Storage Service (Amazon S3) through a configuration on the AWS Management Console, and individual logs can be routed to different CloudWatch accounts with cross-account sharing, as illustrated in the following diagram.

Routing logs to each customer account presents two challenges: logs containing customer data would be stored in the operations account for the user-defined retention period (at least 1 day), which might not comply with strict privacy requirements, and CloudWatch has a limit of five monitoring accounts (customer accounts). With these limitations, how can organizations build a secure logging solution that scales across multiple tenants and customers?

In this post, we present a solution for enabling distributed logging for Amazon Bedrock in multi-account deployments. The objective of this design is to provide robust AI observability while maintaining strict privacy boundaries for data at rest by keeping logs exclusively within the customer accounts. This is achieved by moving logging to the customer accounts rather than invoking it from the operations account. By configuring the logging instructions in each customer’s account, software companies can centralize AI operations while enforcing data privacy, by keeping customer data and logs within strict data boundaries in each customer’s account. This architecture uses AWS Security Token Service (AWS STS) to allow customer accounts to assume dedicated roles in AWS Identity and Access Management (IAM) in the operations account while invoking Amazon Bedrock. For logging, this solution uses LangChain callbacks to capture invocation metadata directly in each customer’s account, making the entire process in the operations account memoryless. Callbacks can be used to log token usage, performance metrics, and the overall quality of the model in response to customer queries. The proposed solution balances centralized AI service management with strong data privacy, making sure customer interactions remain within their dedicated environments.

Solution overview

The complete flow of model invocations on Amazon Bedrock is illustrated in the following figure. The operations account is the account where the Amazon Bedrock permissions will be managed using an identity-based policy, where the Amazon Bedrock client will be created, and where the IAM role with the correct permissions will exist. Every customer account will assume a different IAM role in the operations account. The customer accounts are where customers will access the software or application. This account will contain an IAM role that will assume the corresponding role in the operations account, to allow Amazon Bedrock invocations. It is important to note that it is not necessary for these two accounts to exist in the same AWS organization. In this solution, we use an AWS Lambda function to invoke models from Amazon Bedrock, and use LangChain callbacks to write invocation data to CloudWatch. Without loss of generality, the same principle can be applied to other forms of compute such as servers in Amazon Elastic Compute Cloud (Amazon EC2) instances or managed containers on Amazon Elastic Container Service (Amazon ECS).

The sequence of steps in a model invocation are:

The process begins when the IAM role in the customer account assumes the role in the operations account, allowing it to access the Amazon Bedrock service. This is accomplished through the AWS STS AssumeRole API operation, which establishes the necessary cross-account relationship.
The operations account verifies that the requesting principal (IAM role) from the customer account is authorized to assume the role it is targeting. This verification is based on the trust policy attached to the IAM role in the operations account. This step makes sure that only authorized customer accounts and roles can access the centralized Amazon Bedrock resources.
After trust relationship verification, temporary credentials (access key ID, secret access key, and session token) with specified permissions are returned to the customer account’s IAM execution role.
The Lambda function in the customer account invokes the Amazon Bedrock client in the operations account. Using temporary credentials, the customer account’s IAM role sends prompts to Amazon Bedrock through the operations account, consuming the operations account’s model quota.
After the Amazon Bedrock client response returns to the customer account, LangChain callbacks log the response metrics directly into CloudWatch in the customer account.

Enabling cross-account access with IAM roles

The key idea in this solution is that there will be an IAM role per customer in the operations account. The software company will manage this role and assign permissions to define aspects such as which models can be invoked, in which AWS Regions, and what quotas they’re subject to. This centralized approach significantly simplifies the management of model access and permissions, especially when scaling to hundreds or thousands of customers. For enterprise customers with multiple AWS accounts, this pattern is particularly valuable because it allows the software company to configure a single role that can be assumed by a number of the customer’s accounts, providing consistent access policies and simplifying both permission management and cost tracking. Through carefully crafted trust relationships, the operations account maintains control over who can access what, while still enabling the flexibility needed in complex multi-account environments.

The IAM role can have assigned one or more policies. For example, the following policy allows a certain customer to invoke some models:

{
    "Version": "2012-10-17",
    "Statement": {
        "Sid": "AllowInference",
        "Effect": "Allow",
        "Action": [
            "bedrock:Converse",
            "bedrock:ConverseStream",
            "bedrock:GetAsyncInvoke",
            "bedrock:InvokeModel",
            "bedrock:InvokeModelWithResponseStream",
            "bedrock:StartAsyncInvoke"
        ],
        "Resource": "arn:aws:bedrock:*::foundation-model/<model-id>"
    }
}

The control would be implemented at the trust relationship level, where we would only allow some accounts to assume that role. For example, in the following script, the trust relationship allows the role for customer 1 to only be assumed by the allowed AWS account when the ExternalId matches a specified value, with the purpose of preventing the confused deputy problem:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "AmazonBedrockModelInvocationCustomer1",
            "Effect": "Allow",
            "Principal": {
                "Service": "bedrock.amazonaws.com"
                },
            "Action": "sts:AssumeRole",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "<account-customer-1>",
                    "sts:ExternalId": "<external-id>"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:bedrock::<account-customer-1>:*"
                }
            }
        }
    ]
}

AWS STS AssumeRole operations constitute the cornerstone of secure cross-account access within multi-tenant AWS environments. By implementing this authentication mechanism, organizations establish a robust security framework that enables controlled interactions between the operations account and individual customer accounts. The operations team grants precisely scoped access to resources across the customer accounts, with permissions strictly governed by the assumed role’s trust policy and attached IAM permissions. This granular control makes sure that the operational team and customers can perform only authorized actions on specific resources, maintaining strong security boundaries between tenants.

As organizations scale their multi-tenant architectures to encompass thousands of accounts, the performance characteristics and reliability of these cross-account authentication operations become increasingly critical considerations. Engineering teams must carefully design their cross-account access patterns to optimize for both security and operational efficiency, making sure that authentication processes remain responsive and dependable even as the environment grows in complexity and scale.

When considering the service quotas that govern these operations, it’s important to note that AWS STS requests made using AWS credentials are subject to a default quota of 600 requests per second, per account, per Region—including AssumeRole operations. A key architectural advantage emerges in cross-account scenarios: only the account initiating the AssumeRole request (customer account) counts against its AWS STS quota; the target account’s (operations account) quota remains unaffected. This asymmetric quota consumption means that the operations account doesn’t deplete their AWS STS service quotas when responding to API requests from customer accounts. For most multi-tenant implementations, the standard quota of 600 requests per second provides ample capacity, though AWS offers quota adjustment options for environments with exceptional requirements. This quota design enables scalable operational models where a single operations account can efficiently service thousands of tenant accounts without encountering service limits.

Writing private logs using LangChain callbacks

LangChain is a popular open source orchestration framework that enables developers to build powerful applications by connecting various components through chains, which are sequential series of operations that process and transform data. At the core of LangChain’s extensibility is the BaseCallbackHandler class, a fundamental abstraction that provides hooks into the execution lifecycle of chains, allowing developers to implement custom logic at different stages of processing. This class can be extended to precisely define behaviors that should occur upon completion of a chain’s invocation, enabling sophisticated monitoring, logging, or triggering of downstream processes. By implementing custom callback handlers, developers can capture metrics, persist results to external systems, or dynamically alter the execution flow based on intermediate outputs, making LangChain both flexible and powerful for production-grade language model applications.

Implementing a custom CloudWatch logging callback in LangChain provides a robust solution for maintaining data privacy in multi-account deployments. By extending the BaseCallbackHandler class, we can create a specialized handler that establishes a direct connection to the customer account’s CloudWatch logs, making sure model interaction data remains within the account boundaries. The implementation begins by initializing a Boto3 CloudWatch Logs client using the customer account’s credentials, rather than the operations account’s credentials. This client is configured with the appropriate log group and stream names, which can be dynamically generated based on customer identifiers or application contexts. During model invocations, the callback captures critical metrics such as token usage, latency, prompt details, and response characteristics. The following Python script serves as an example of this implementation:

class CustomCallbackHandler(BaseCallbackHandler):

    def log_to_cloudwatch(self, message: str):
        """Function to write extracted metrics to CloudWatch"""

    def on_llm_end(self, response, **kwargs):
        print("nChat model finished processing.")
        # Extract model_id and token usage from the response
        input_token_count = response.llm_output.get("usage", {}).get("prompt_tokens", None)
        output_token_count = response.llm_output.get("usage", {}).get("completion_tokens", None)
        model_id=response.llm_output.get("model_id", None)

        # Here we invoke the callback
        self.log_to_cloudwatch(
              f"User ID: {self.user_id}nApplication ID: {self.application_id}n Input tokens: {input_token_count}n Output tokens: {output_token_count}n Invoked model: {model_id}"
             )

    def on_llm_error(self, error: Exception, **kwargs):
        print(f"Chat model encountered an error: {error}")

The on_llm_start, on_llm_end, and on_llm_error methods are overridden to intercept these lifecycle events and persist the relevant data. For example, the on_llm_end method can extract token counts, execution time, and model-specific metadata, formatting this information into structured log entries before writing them to CloudWatch. By implementing proper error handling and retry logic within the callback, we provide reliable logging even during intermittent connectivity issues. This approach creates a comprehensive audit trail of AI interactions while maintaining strict data isolation in the customer account, because the logs do not transit through or rest in the operations account.

The AWS Shared Responsibility Model in multi-account logging

When implementing distributed logging for Amazon Bedrock in multi-account architectures, understanding the AWS Shared Responsibility Model becomes paramount. Although AWS secures the underlying infrastructure and services like Amazon Bedrock and CloudWatch, customers remain responsible for securing their data, configuring access controls, and implementing appropriate logging strategies. As demonstrated in our IAM role configurations, customers must carefully craft trust relationships and permission boundaries to help prevent unauthorized cross-account access. The LangChain callback implementation outlined places the responsibility on customers to enforce proper encryption of logs at rest, define appropriate retention periods that align with compliance requirements, and implement access controls for who can view sensitive AI interaction data. This aligns with the multi-account design principle where customer data remains isolated within their respective accounts. By respecting these security boundaries while maintaining operational efficiency, software companies can uphold their responsibilities within the shared security model while delivering scalable AI capabilities across their customer base.

Conclusion

Implementing a secure, scalable multi-tenant architecture with Amazon Bedrock requires careful planning around account structure, access patterns, and operational management. The distributed logging approach we’ve outlined demonstrates how organizations can maintain strict data isolation while still benefiting from centralized AI operations. By using IAM roles with precise trust relationships, AWS STS for secure cross-account authentication, and LangChain callbacks for private logging, companies can create a robust foundation that scales to thousands of customers without compromising on security or operational efficiency.

This architecture addresses the critical challenge of maintaining data privacy in multi-account deployments while still enabling comprehensive observability. Organizations should prioritize automation, monitoring, and governance from the beginning to avoid technical debt as their system scales. Implementing infrastructure as code for role management, automated monitoring of cross-account access patterns, and regular security reviews will make sure the architecture remains resilient and will help maintain adherence with compliance standards as business requirements evolve. As generative AI becomes increasingly central to software provider offerings, these architectural patterns provide a blueprint for maintaining the highest standards of data privacy while delivering innovative AI capabilities to customers across diverse regulatory environments and security requirements.

To learn more, explore the comprehensive Generative AI Security Scoping Matrix through Securing generative AI: An introduction to the Generative AI Security Scoping Matrix, which provides essential frameworks for securing AI implementations. Building on these security foundations, strengthen Amazon Bedrock deployments by getting familiar with IAM authentication and authorization mechanisms that establish proper access controls. As organizations grow to require multi-account structures, these IAM practices connect seamlessly with AWS STS, which delivers temporary security credentials enabling secure cross-account access patterns. To complete this integrated security approach, delve into LangChain and LangChain on AWS capabilities, offering powerful tools that build upon these foundational security services to create secure, context-aware AI applications, while maintaining appropriate security boundaries across your entire generative AI workflow.

About the Authors

Mohammad Tahsin is an AI/ML Specialist Solutions Architect at AWS. He lives for staying up-to-date with the latest technologies in AI/ML and helping customers deploy bespoke solutions on AWS. Outside of work, he loves all things gaming, digital art, and cooking.

Felipe Lopez is a Senior AI/ML Specialist Solutions Architect at AWS. Prior to joining AWS, Felipe worked with GE Digital and SLB, where he focused on modeling and optimization products for industrial applications.

Aswin Vasudevan is a Senior Solutions Architect for Security, ISV at AWS. He is a big fan of generative AI and serverless architecture and enjoys collaborating and working with customers to build solutions that drive business value.