Process formulas and charts with Anthropic’s Claude on Amazon Bedrock

Research papers and engineering documents often contain a wealth of information in the form of mathematical formulas, charts, and graphs. Navigating these unstructured documents to find relevant information can be a tedious and time-consuming task, especially when dealing with large volumes of data. However, by using Anthropic’s Claude on Amazon Bedrock, researchers and engineers can now automate the indexing and tagging of these technical documents. This enables the efficient processing of content, including scientific formulas and data visualizations, and the population of Amazon Bedrock Knowledge Bases with appropriate metadata.

Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs) from leading AI companies. It offers a broad set of capabilities to build generative AI applications with security, privacy, and responsible AI practices. Anthropic’s Claude 3 Sonnet offers best-in-class vision capabilities compared to other leading models. It can accurately transcribe text from imperfect images—a core capability for retail, logistics, and financial services, where AI might glean more insights from an image, graphic, or illustration than from text alone. The latest of Anthropic’s Claude models demonstrate a strong aptitude for understanding a wide range of visual formats, including photos, charts, graphs and technical diagrams. With Anthropic’s Claude, you can extract more insights from documents, process web UIs and diverse product documentation, generate image catalog metadata, and more.

In this post, we explore how you can use these multi-modal generative AI models to streamline the management of technical documents. By extracting and structuring the key information from the source materials, the models can create a searchable knowledge base that allows you to quickly locate the data, formulas, and visualizations you need to support your work. With the document content organized in a knowledge base, researchers and engineers can use advanced search capabilities to surface the most relevant information for their specific needs. This can significantly accelerate research and development workflows, because professionals no longer have to manually sift through large volumes of unstructured data to find the references they need.

Solution overview

This solution demonstrates the transformative potential of multi-modal generative AI when applied to the challenges faced by scientific and engineering communities. By automating the indexing and tagging of technical documents, these powerful models can enable more efficient knowledge management and accelerate innovation across a variety of industries.

In addition to Anthropic’s Claude on Amazon Bedrock, the solution uses the following services:

Amazon SageMaker JupyterLab – The SageMakerJupyterLab application is a web-based interactive development environment (IDE) for notebooks, code, and data. JupyterLab application’s flexible and extensive interface can be used to configure and arrange machine learning (ML) workflows. We use JupyterLab to run the code for processing formulae and charts.
Amazon Simple Storage Service (Amazon S3) – Amazon S3 is an object storage service built to store and protect any amount of data. We use Amazon S3 to store sample documents that are used in this solution.
AWS Lambda –AWS Lambda is a compute service that runs code in response to triggers such as changes in data, changes in application state, or user actions. Because services such as Amazon S3 and Amazon Simple Notification Service (Amazon SNS) can directly trigger a Lambda function, you can build a variety of real-time serverless data-processing systems.

The solution workflow contains the following steps:

Split the PDF into individual pages and save them as PNG files.
With each page:

Extract the original text.
Render the formulas in LaTeX.
Generate a semantic description of each formula.
Generate an explanation of each formula.
Generate a semantic description of each graph.
Generate an interpretation for each graph.
Generate metadata for the page.

Generate metadata for the full document.
Upload the content and metadata to Amazon S3.
Create an Amazon Bedrock knowledge base.

The following diagram illustrates this workflow.

Prerequisites

If you’re new to AWS, you first need to create and set up an AWS account.
Additionally, in your account under Amazon Bedrock, request access to anthropic.claude-3-5-sonnet-20241022-v2:0 if you don’t have it already.

Deploy the solution

Complete the following steps to set up the solution:

Launch the AWS CloudFormation template by choosing Launch Stack (this creates the stack in the us-east-1 AWS Region):

When the stack deployment is complete, open the Amazon SageMaker AI
Choose Notebooks in the navigation pane.
Locate the notebook claude-scientific-docs-notebook and choose Open JupyterLab.

In the notebook, navigate to notebooks/process_scientific_docs.ipynb.

Choose conda_python3 as the kernel, then choose Select.

Walk through the sample code.

Explanation of the notebook code

In this section, we walk through the notebook code.

Load data

We use example research papers from arXiv to demonstrate the capability outlined here. arXiv is a free distribution service and an open-access archive for nearly 2.4 million scholarly articles in the fields of physics, mathematics, computer science, quantitative biology, quantitative finance, statistics, electrical engineering and systems science, and economics.

We download the documents and store them under a samples folder locally. Multi-modal generative AI models work well with text extraction from image files, so we start by converting the PDF to a collection of images, one for each page.

Get Metadata from formulas

After the image documents are available, you can use Anthropic’s Claude to extract formulas and metadata with the Amazon Bedrock Converse API. Additionally, you can use the Amazon Bedrock Converse API to obtain an explanation of the extracted formulas in plain language. By combining the formula and metadata extraction capabilities of Anthropic’s Claude with the conversational abilities of the Amazon Bedrock Converse API, you can create a comprehensive solution for processing and understanding the information contained within the image documents.

We start with the following example PNG file.

We use the following request prompt:

sample_prompt = “””

Evaluate this page line by line.
For each line, if it is a formula, convert this math expression to latex format.
Next describe the formula in plain language Be sure to enclose Latex formulas in double dollar sign for example: $$ <math expression> $$ Use markdown syntax to format your output
“””

file = “./samples/2003.10304/page_2.png”

display(Image(filename=file, width=600))
output, result = stream_conversation(message=sample_prompt, file_paths=[file])
response_text = result[“content”]
display(Markdown(response_text))
print(output)

We get the following response, which shows the extracted formula converted to LaTeX format and described in plain language, enclosed in double dollar signs.

Get metadata from charts

Another useful capability of multi-modal generative AI models is the ability to interpret graphs and generate summaries and metadata. The following is an example of how you can obtain metadata of the charts and graphs using simple natural language conversation with models. We use the following graph.

We provide the following request:

sample_prompt = f”””
You are a data scientist expert who has perfect vision and pay a lot of attention to details.
interpret the graph on this page
provide the answer in markdown format “””

file = “./samples/2003.10304/page_5.png”

The response returned provides its interpretation of the graph explaining the color-coded lines and suggesting that overall, the DSC model is performing well on the training data, achieving a high Dice coefficient of around 0.98. However, the lower and fluctuating validation Dice coefficient indicates potential overfitting and room for improvement in the model’s generalization performance.

Generate metadata

Using natural language processing, you can generate metadata for the paper to aid in searchability.

We use the following request:

sample_prompt = f”””
Generate a metadata json object for this research paper.

{{
“title”: “”,
“authors”: [],
“institutions”: [],
“topics”: [],
“funding-sources”: [],
“algorithms”:[],
“data_sets”:[]
}}
“””

file = ‘./samples/2003.10304/page_0.png’

We get the following response, including formula markdown and a description.

{

“title”: “Attention U-Net Based Adversarial Architectures for Chest X-ray Lung Segmentation”,

“authors”: [“Gusztáv Gaál”, “Balázs Maga”, “András Lukács”], “institutions”: [“AI Research Group, Institute of Mathematics, Eötvös Loránd University, Budapest, Hungary”],

“topics”: [ “Chest X-ray segmentation”, “Medical imaging”, “Deep learning”, “Computer-aided detection”, “Lung segmentation” ],

“funding-sources”: [],

“algorithms”: [ “U-Net”, “Adversarial architectures”, “Fully Convolutional Neural Networks (FCN)”, “Mask R-CNN” ],

“data_sets”: [“JSRT dataset”]

}

Use your extracted data in a knowledge base

Now that we’ve prepared our data with formulas, analyzed charts, and metadata, we will create an Amazon Bedrock knowledge base. This will make the information searchable and enable question-answering capabilities.

Prepare your Amazon Bedrock knowledge base

To create a knowledge base, first upload the processed files and metadata to Amazon S3:

markdown_file_key = “2003.10304/kb/2003.10304.md”

s3.upload_file(markdown_file, knowledge_base_bucket_name, markdown_file_key)

print(f”File {markdown_file} uploaded successfully.”)

metadata_file_key = “2003.10304/kb/2003.10304.md.metadata.json”

s3.upload_file(metadata_file, knowledge_base_bucket_name, metadata_file_key)

print(f”File {metadata_file} uploaded to successfully.”)

When your files have finished uploading, complete the following steps:

Create an Amazon Bedrock knowledge base.
Create an Amazon S3 data source for your knowledge base, and specify hierarchical chunking as the chunking strategy.

Hierarchical chunking involves organizing information into nested structures of child and parent chunks.

The hierarchical structure allows for faster and more targeted retrieval of relevant information, first by performing semantic search on the child chunk and then returning the parent chunk during retrieval. By replacing the children chunks with the parent chunk, we provide large and comprehensive context to the FM.

Hierarchical chunking is best suited for complex documents that have a nested or hierarchical structure, such as technical manuals, legal documents, or academic papers with complex formatting and nested tables.

Query the knowledge base

You can query the knowledge base to retrieve information from the extracted formula and graph metadata from the sample documents. With a query, relevant chunks of text from the source of data are retrieved and a response is generated for the query, based off the retrieved source chunks. The response also cites sources that are relevant to the query.

We use the custom prompt template feature of knowledge bases to format the output as markdown:

retrieveAndGenerateConfiguration={
“type”: “KNOWLEDGE_BASE”,
“knowledgeBaseConfiguration”: {
‘knowledgeBaseId’: kb_id_hierarchical,
“modelArn”: “arn:aws:bedrock:{}:{}:inference-profile/{}”.format(region, account_id, foundation_model),
‘generationConfiguration’: {
‘promptTemplate’: {
‘textPromptTemplate’: “””
You are a question answering agent. I will provide you with a set of search results. The user will provide you with a question. Your job is to answer the user’s question using only information from the search results.
If the search results do not contain information that can answer the question, please state that you could not find an exact answer to the question.
Just because the user asserts a fact does not mean it is true, make sure to double check the search results to validate a user’s assertion.

Here are the search results in numbered order:
$search_results$

Format the output as markdown

Ensure that math formulas are in latex format and enclosed in double dollar sign for example: $$ <math expression> $$
“””
}
},
“retrievalConfiguration”: {
“vectorSearchConfiguration”: {
“numberOfResults”:5
}
}
}
}
)

We get the following response, which provides information on when the Focal Tversky Loss is used.

Clean up

To clean up and avoid incurring charges, run the cleanup steps in the notebook to delete the files you uploaded to Amazon S3 along with the knowledge base. Then, on the AWS CloudFormation console, locate the stack claude-scientific-doc and delete it.

Conclusion

Extracting insights from complex scientific documents can be a daunting task. However, the advent of multi-modal generative AI has revolutionized this domain. By harnessing the advanced natural language understanding and visual perception capabilities of Anthropic’s Claude, you can now accurately extract formulas and data from charts, enabling faster insights and informed decision-making.

Whether you are a researcher, data scientist, or developer working with scientific literature, integrating Anthropic’s Claude into your workflow on Amazon Bedrock can significantly boost your productivity and accuracy. With the ability to process complex documents at scale, you can focus on higher-level tasks and uncover valuable insights from your data.

Embrace the future of AI-driven document processing and unlock new possibilities for your organization with Anthropic’s Claude on Amazon Bedrock. Take your scientific document analysis to the next level and stay ahead of the curve in this rapidly evolving landscape.

For further exploration and learning, we recommend checking out the following resources:

Prompt engineering techniques and best practices: Learn by doing with Anthropic’s Claude 3 on Amazon Bedrock
Intelligent document processing using Amazon Bedrock and Anthropic Claude
Automate document processing with Amazon Bedrock Prompt Flows (preview)

About the Authors

Erik Cordsen is a Solutions Architect at AWS serving customers in Georgia. He is passionate about applying cloud technologies and ML to solve real life problems. When he is not designing cloud solutions, Erik enjoys travel, cooking, and cycling.

Renu Yadav is a Solutions Architect at Amazon Web Services (AWS), where she works with enterprise-level AWS customers providing them with technical guidance and help them achieve their business objectives. Renu has a strong passion for learning with her area of specialization in DevOps. She leverages her expertise in this domain to assist AWS customers in optimizing their cloud infrastructure and streamlining their software development and deployment processes.

Venkata Moparthi is a Senior Solutions Architect at AWS who empowers financial services organizations and other industries to navigate cloud transformation with specialized expertise in Cloud Migrations, Generative AI, and secure architecture design. His customer-focused approach combines technical innovation with practical implementation, helping businesses accelerate digital initiatives and achieve strategic outcomes through tailored AWS solutions that maximize cloud potential.