Turbocharging premium audit capabilities with the power of generative AI: Verisk’s journey toward a sophisticated conversational chat platform to enhance customer support

This post is co-written with Sajin Jacob, Jerry Chen, Siddarth Mohanram, Luis Barbier, Kristen Chenowith, and Michelle Stahl from Verisk.

Verisk (Nasdaq: VRSK) is a leading data analytics and technology partner for the global insurance industry. Through advanced analytics, software, research, and industry expertise across more than 20 countries, Verisk helps build resilience for individuals, communities, and businesses. The company is committed to ethical and responsible AI development with human oversight and transparency. Verisk is using generative AI to enhance operational efficiencies and profitability for insurance clients while adhering to its ethical AI principles.

Verisk’s Premium Audit Advisory Service (PAAS®) is the leading source of technical information and training for premium auditors and underwriters. PAAS helps users classify exposure for commercial casualty insurance, including general liability, commercial auto, and workers’ compensation. PAAS offers a wide range of essential services, including more than 40,000 classification guides and more than 500 bulletins. PAAS now includes PAAS AI, the first commercially available interactive generative-AI chats specifically developed for premium audit, which reduces research time and empower users to make informed decisions by answering questions and quickly retrieving and summarizing multiple PAAS documents like class guides, bulletins, rating cards, etc.

In this post, we describe the development of the customer support process in PAAS, incorporating generative AI, the data, the architecture, and the evaluation of the results. Conversational AI assistants are rapidly transforming customer and employee support. Verisk has embraced this technology and developed its own PAAS AI, which provides an enhanced self-service capability to the PAAS platform.

The opportunity

The Verisk PAAS platform houses a vast array of documents—including class guides, advisory content, and bulletins—that aid Verisk’s customers in determining the appropriate rules and classifications for workers’ compensation, general liability, and commercial auto business. When premium auditors need accurate answers within this extensive document repository, the challenges they face are:

Overwhelming volume – The sheer volume of documents (advisories, bulletins, and so on) makes manual searching time-consuming and inefficient
Slow response times – Finding accurate information within this vast repository can be slow, hindering timely decision-making
Inconsistent quality of responses – Manual searches might yield irrelevant or incomplete results, leading to uncertainty and potential errors

To address this issue, Verisk PAAS AI is designed to alleviate the burden by providing round-the-clock support for business processing and delivering precise and quick responses to customer queries. This technology is deeply integrated into Verisk’s newly reimagined PAAS platform, using all of Verisk’s documentation, training materials, and collective expertise. It employs a retrieval augmented generation (RAG) approach and a combination of AWS services alongside proprietary evaluations to promptly answer most user questions about the capabilities of the Verisk PAAS platform.

When deployed at scale, this PAAS AI will enable Verisk staff to dedicate more time to complex issues, critical projects, and innovation, thereby enhancing the overall customer experience. Throughout the development process, Verisk encountered several considerations, key findings, and decisions that provide valuable insights for any enterprise looking to explore the potential of generative AI.

The approach

When creating an interactive agent using large language models (LLMs), two common approaches are RAG and model fine-tuning. The choice between these methods depends on the specific use case and available data. Verisk PAAS began developing a RAG pipeline for its PAAS AI and has progressively improved this solution. Here are some reasons why continuing with a RAG architecture was beneficial for Verisk:

Dynamic data access – The PAAS platform is constantly evolving, adding new business functions and technical capabilities. Verisk needed to make sure its responses are based on the most current information. The RAG approach allows access to continuously updated data, providing responses with the latest information without frequently retraining the model.
Multiple data sources – Besides data recency, another crucial aspect is the ability to draw from multiple PAAS resources to acquire relevant context. The ease of expanding the knowledge base without the need for fine-tuning new data sources makes the solution adaptable.
Reduced hallucinations – Retrieval minimizes the risk of hallucinations compared with free-form text generation because responses come directly from the provided excerpts. Verisk developed an evaluation tool to enhance response quality.
LLM linguistics – Although appropriate context can be retrieved from enterprise data sources, the underlying LLM manages the linguistics and fluency.
Transparency – Verisk aimed to consistently improve the PAAS AI’s response generation ability. A RAG architecture offered the transparency required in the context retrieval process, which would ultimately be used to generate user responses. This transparency helped Verisk identify areas where document restructuring was needed.
Data governance – With diverse users accessing the platform and differing data access permissions, data governance and isolation were critical. Verisk implemented controls within the RAG pipeline to restrict data access based on user permissions, helping to ensure that responses are delivered only to authorized users.

Although both RAG and fine-tuning have their pros and cons, RAG is the best approach for building a PAAS AI on the PAAS platform, given Verisk’s needs for real-time accuracy, explainability, and configurability. The pipeline architecture supports iterative enhancement as the use cases for the Verisk PAAS platform develop.

Solution overview

The following diagram showcases a high-level architectural data flow that highlights various AWS services used in constructing the solution. Verisk’s system demonstrates a complex AI setup, where multiple components interact and frequently call on the LLM to provide user responses. Employing the PAAS platform to manage these varied components was an intuitive decision.

The key components are as follows:

Amazon ElastiCache
Amazon Bedrock
Amazon OpenSearch Service
Snowflake in Amazon
Evaluation API
Feedback loop (implementation in progress)

Amazon ElastiCache

Verisk’s PAAS team determined that ElastiCache is the ideal solution for storing all chat history. This storage approach allows for seamless integration in conversational chats and enables the display of recent conversations on the website, providing an efficient and responsive user experience.

Amazon Bedrock

Anthropic’s Claude, available in Amazon Bedrock, played various roles within Verisk’s solution:

Response generation – When building their PAAS AI, Verisk conducted a comprehensive evaluation of leading LLMs, using their extensive dataset to test each model’s capabilities. Through Amazon Bedrock, Verisk gained streamlined access to multiple best-in-class foundation models (FMs), enabling efficient testing and comparison across key performance criteria. The Amazon Bedrock unified API and robust infrastructure provided the ideal platform to develop, test, and deploy LLM solutions at scale. After this extensive testing, Verisk found Anthropic’s Claude model consistently outperformed across key criteria. Anthropic’s Claude demonstrated superior language understanding in Verisk’s complex business domain, allowing more pertinent responses to user questions. Given the model’s standout results across Verisk PAAS platform use cases, it was the clear choice to power the PAAS AI’s natural language capabilities.
Conversation summarization – When a user asks a follow-up question, the PAAS AI can continue the conversational thread. To enable this, Verisk used Claude to summarize the dialogue to update the context from ElastiCache. The full conversation summary and new excerpts are input to the LLM to generate the next response. This conversational flow allows the PAAS AI to answer user follow-up questions and have a more natural, contextual dialogue, bringing Verisk PAAS closer to having a true AI assistant that can engage in useful, back-and-forth conversations with users.
Keyword extraction – Keywords are extracted from user questions and previous conversations to be used for creating the new summarized prompt and to be input to Verisk’s knowledge base retrievers to perform vector similarity search.

Amazon OpenSearch Service

Primarily used for the storage of text embeddings, OpenSearch facilitates efficient document retrieval by enabling rapid access to indexed data. These embeddings serve as semantic representations of documents, allowing for advanced search capabilities that go beyond simple keyword matching. This semantic search functionality enhances the system’s ability to retrieve relevant documents that are contextually similar to the search queries, thereby improving the overall accuracy and speed of data queries. Additionally, OpenSearch functions as a semantic cache for similarity searches, optimizing performance by reducing the computational load and improving response times during data retrieval operations. This makes it an indispensable tool in the larger PAAS ecosystem, where the need for quick and precise information access is paramount.

Snowflake in Amazon

The integration of Snowflake in the PAAS AI ecosystem helps provide scalable and real-time access to data, allowing Verisk to promptly address customer concerns and improve its services. By using Snowflake’s capabilities, Verisk can perform advanced analytics, including sentiment analysis and predictive modeling, to better understand customer needs and enhance user experiences. This continuous feedback loop is vital for refining the PAAS AI and making sure it remains responsive and relevant to user demands.

Structuring and retrieving the data

An essential element in developing the PAAS AI’s knowledge base was properly structuring and effectively querying the data to deliver accurate answers. Verisk explored various techniques to optimize both the organization of the content and the methods to extract the most relevant information:

Chunking – A key step in preparing the accumulated questions and answers was splitting the data into individual documents to facilitate indexing into OpenSearch Service. Rather than uploading large files containing multiple pages of content, Verisk chunked the data into smaller segments by document section and character lengths. By splitting the data into small, modular chunks focused on a single section of a document, Verisk could more easily index each document and had greater success in pulling back the correct context. Chunking the data also enabled straightforward updating and reindexing of the knowledge base over time.
Hybrid query – When querying the knowledge base, Verisk found that using just standard vector search wasn’t enough to retrieve all the relevant contexts pertaining to a question. Therefore, a solution was implemented to combine a sparse bm25 search in combination with the dense vector search to create a hybrid search approach, which yielded much better context retrieval results.
Data separation and filters – Another issue Verisk ran into was that, because of the vast amount of documents and the overlapping content within certain topics, incorrect documents were being retrieved for some questions that asked for specific topics that were present across multiple sources—some of these weren’t needed or appropriate in the context of the user’s question. Therefore, data separation was implemented to split the documents based on document type and filter by line of business to improve context retrieval within the application.

By thoroughly experimenting and optimizing both the knowledge base powering the PAAS AI and the queries to extract answers from it, Verisk was able to achieve very high answer accuracy during the proof of concept, paving the way for further development. The techniques explored—hybrid querying, HTML section chunking, and index filtering—became core elements of Verisk’s approach for extracting quality contexts.

LLM parameters and models

Experimenting with prompt structure, length, temperature, role-playing, and context was key to improving the quality and accuracy of the PAAS AI’s Claude-powered responses. The prompt design guidelines provided by Anthropic were incredibly helpful.

Verisk crafted prompts that provided Anthropic’s Claude with clear context and set roles for answering user questions. Setting the temperature to 0 helped reduce the randomness and indeterministic nature of LLM-generated responses.

Verisk also experimented with different models to improve the efficiency of the overall solution. For scenarios where latency was more important and less reasoning was required, Anthropic’s Claude Haiku was the perfect solution. For other scenarios such as question answering using provided contexts where it was more important for the LLM to be able to understand every detail given in the prompt, Anthropic’s Claude Sonnet was the better choice to balance latency, performance, and cost.

Guardrails

LLM guardrails were implemented in the PAAS AI project using both the guardrails provided by Amazon Bedrock and specialized sections within the prompt to detect unrelated questions and prompt attack attempts. Amazon Bedrock guardrails can be attached to any Amazon Bedrock model invocation call and automatically detect if the given model input and output are in violation of the language filters that are set (violence, misconduct, sexual, and so on), which helps with screening user inputs. The specialized prompts further improve LLM security by creating a second net that uses the power of the LLMs to catch any inappropriate inputs from the users.

This allows Verisk to be confident that the model will only answer to its intended purpose surrounding premium auditing services and will not be misused by threat actors.

After validating several evaluation tools such as Deepeval, Ragas, Trulens, and so on, the Verisk PAAS team realized that there were certain limitations to using these tools for their specific use case. Consequently, the team decided to develop its own evaluation API, shown in the following figure.

This custom API evaluates the answers based on three major metrics:

Answer relevancy score – Using LLMs, the process assesses whether the answers provided are relevant to the customer’s prompt. This helps make sure that the responses are directly addressing the questions posed.
Context relevancy score – By using LLMs, the process evaluates whether the context retrieved is appropriate and aligns well with the question. This helps make sure that the LLM has the appropriate and accurate contexts to generate a response.
Faithfulness score – Using LLMs, the process checks if the responses are generated based on their retrieved context or if they are hallucinated. This is crucial for maintaining the integrity and reliability of the information provided.

This custom evaluation approach helps make sure that the answers generated are not only relevant and contextually appropriate but also faithful to the established generative AI knowledge base, minimizing the risk of misinformation. By incorporating these metrics, Verisk has enhanced the robustness and reliability of their PAAS AI, providing customers with accurate and trustworthy responses.

The Verisk PAAS team has implemented a comprehensive feedback loop mechanism, shown in the following figure, to support continuous improvement and address any issues that might arise.

This feedback loop is structured around the following key components:

Customer feedback analysis – The team actively collects and analyzes feedback from customers to identify potential data issues or problems with the generative AI responses. This analysis helps pinpoint specific areas that need improvement.
Issue categorization – After an issue is identified, it’s categorized based on its nature. If it’s a data-related issue, it’s assigned to the internal business team for resolution. If it’s an application issue, a Jira ticket is automatically created for the PAAS IT team to address and fix the problem.
QA test case updates – The system provides an option to update QA test cases based on the feedback received. This helps make sure that the test scenarios remain relevant and comprehensive, covering a wide range of potential issues.
Ground truth agreements – Ground truth agreements, which serve as the benchmark for evaluating LLM response quality, are periodically reviewed and updated. This helps make sure that the evaluation metrics remain accurate and reflective of the desired standards.
Ongoing evaluations – Regular evaluations of the LLM responses are conducted using the updated QA test cases and ground truth agreements. This helps in maintaining high-quality responses and quickly addressing any deviations from the expected standards.

This robust feedback loop mechanism enables Verisk to continuously fine-tune the PAAS AI, making sure that it delivers precise, relevant, and contextually appropriate answers to customer queries. By integrating customer feedback, categorizing issues efficiently, updating test scenarios, and adhering to stringent evaluation protocols, Verisk maintains a high standard of service and drives continuous improvement in its generative AI capabilities.

Business impact

Verisk initially rolled out the PAAS AI to one beta customer to demonstrate real-world performance and impact. Supporting a customer in this way is a stark contrast to how Verisk has historically engaged with and supported customers in the past, where Verisk would typically have a team allocated to interact with the customer directly. Verisk’s PAAS AI has revolutionized the way subject matter experts (SMEs) work and cost-effectively scales while still providing high-quality assistance. What previously took hours of manual review can now be accomplished in minutes, resulting in an extraordinary 96–98% reduction in processing time per specialist. This dramatic improvement in efficiency not only streamline operations but also allows Verisk’s experts to focus on more strategic initiatives that drive greater value for the organization.

In analyzing this early usage data, Verisk uncovered additional areas where it can drive business value for its customers. As Verisk collects additional information, this data will help uncover what will be needed to improve results and prepare to roll out to a wider customer base of approximately 15,000 users.

Ongoing development will focus on expanding these capabilities, prioritized based on the collected questions. Most exciting, though, are the new possibilities on the horizon with generative AI. Verisk knows this technology is rapidly advancing and is eager to harness innovations to bring even more value to customers. As new models and techniques emerge, Verisk plans to adapt the PAAS AI to take advantage of the latest capabilities. Although the PAAS AI currently focuses on responding to user questions, this is only the starting point. Verisk plans to quickly improve its capabilities to proactively make suggestions and configure functionality directly in the system itself. The Verisk PAAS team is inspired by the challenge of pushing the boundaries of what’s possible with generative AI and is excited to test those boundaries.

Conclusion

Verisk’s development of a PAAS AI for its PAAS platform demonstrates the transformative power of generative AI in customer support and operational efficiency. Through careful data harvesting, structuring, retrieval, and the use of LLMs, semantic search functionalities, and stringent evaluation protocols, Verisk has crafted a robust system that delivers accurate, real-time answers to user questions. By continuing to enhance the PAAS AI’s features while maintaining ethical and responsible AI practices, Verisk is set to provide increased value to its customers, enable staff to concentrate on innovation, and establish new benchmarks for customer service in the insurance sector.

For more information, see the following resources:

Explore generative AI on AWS
Learn about unlocking the business value of generative AI
Learn more about Anthropic’s Claude 3 models on Amazon Bedrock
Learn about Amazon Bedrock and how to build and scale generative AI applications with FMs
Explore generative AI quickstart proofs of concept

About the Authors

Sajin Jacob is the Director of Software Engineering at Verisk, where he leads the Premium Audit Advisory Service (PAAS) development team. In this role, Sajin plays a crucial part in designing the architecture and providing strategic guidance to eight development teams, optimizing their efficiency and ensuring the maintainability of all solutions. He holds an MS in Software Engineering from Periyar University, India.

Jerry Chen is a Lead Software Developer at Verisk, based in Jersey City. He leads the GenAi development team, working on solutions for projects within the Verisk Underwriting department to enhance application functionalities and accessibility. Within PAAS, he has worked on the implementation of the conversational RAG architecture with enhancements such as hybrid search, guardrails, and response evaluations. Jerry holds a degree in Computer Science from Stevens Institute of Technology.

Sid Mohanram is the Senior Vice President of Core Lines Technology at Verisk. His area of expertise includes data strategy, analytics engineering, and digital transformation. Sid is head of the technology organization with global teams across five countries. He is also responsible for leading the technology transformation for the multi-year Core Lines Reimagine initiative. Sid holds an MS in Information Systems from Stevens Institute of Technology.

Luis Barbier is the Chief Technology Officer (CTO) of Verisk Underwriting at Verisk. He provides guidance to the development teams’ architectures to maximize efficiency and maintainability for all underwriting solutions. Luis holds an MBA from Iona University.

Kristen Chenowith, MSMSL, CPCU, WCP, APA, CIPA, AIS, is PAAS Product Manager at Verisk. She is currently the product owner for the Premium Audit Advisory Service (PAAS) product suite, including PAAS AI, a first to market generative AI chat tool for premium audit that accelerates research for many consultative questions by 98% compared to traditional methods. Kristen holds an MS in Management, Strategy and Leadership at Michigan State University and a BS in Business Administration at Valparaiso University. She has been in the commercial insurance industry and premium audit field since 2006.

Michelle Stahl, MBA, CPCU, AIM, API, AIS, is a Digital Product Manager with Verisk. She has over 20 years of experience building and transforming technology initiatives for the insurance industry. She has worked as a software developer, project manager, and product manager throughout her career.

Arun Pradeep Selvaraj is a Senior Solutions Architect at AWS. Arun is passionate about working with his customers and stakeholders on digital transformations and innovation in the cloud while continuing to learn, build, and reinvent. He is creative, fast-paced, deeply customer-obsessed, and uses the working backward process to build modern architectures to help customers solve their unique challenges. Connect with him on LinkedIn.

Ryan Doty is a Solutions Architect Manager at AWS, based out of New York. He helps financial services customers accelerate their adoption of the AWS Cloud by providing architectural guidelines to design innovative and scalable solutions. Coming from a software development and sales engineering background, the possibilities that the cloud can bring to the world excite him.

Apoorva Kiran, PhD, is a Senior Solutions Architect at AWS, based out of New York. He is aligned with the financial service industry, and is responsible for providing architectural guidelines to design innovative and scalable fintech solutions. He specializes in developing and commercializing artificial intelligence and machine learning products. Connect with him on LinkedIn.