Philipp WinderLaurenz HommelChristian Alexander Hildebrand2025-05-282025-05-282025-05-27https://www.alexandria.unisg.ch/handle/20.500.14171/122793The integration of Large Language Models (LLMs) into insurance operations is expected to transform traditional insurance claims handling. Despite anecdotal evidence from single cases by insurance providers and agencies, a systematic assessment of the performance and potential biases of LLMs to perform claims handling across a wide variety of insurance domains is absent. This paper presents a systematic evaluation of LLM-powered claims handling across three studies. First, we benchmark LLM performance against human insurance clerks using standardized exam cases. Second, we assess the performance and potential biases of LLM-based claims handling in simulated cases across four major insurance domains and tasks. Third, we examine the applicability of LLMs by comparing their claim assessments to those of an expert third-party administrator based on real-world claim cases. Our findings show that LLMs process claims not only efficiently (seconds of processing image data, processing and matching text data from policy documents and damage reports, etc.) but also highly accurately (close to perfect assessments for simple claims and up to 94% accuracy for more complex claims involving multiple parties and policies), often exceeding human benchmarks and reducing claims leakage. These findings have important implications for the future of automated claims handling and the potential to not only substantially improve the speed but also the accuracy of claims handling in industry practice.LLMGenerative AIAlgorithmic BiasClaims HandlingClaim SettlementClaim LeakageAI-Based Claims Handling: A Systematic Performance and Bias Assessment of Large Language Models for Automated Insurance Claims Handlingworking paper