by Dr. Megan Ma and Klaudia Gak, CodeX Affiliates and VP of Underwriting at Swiss Re
In our previous work we developed a diagnostic tool to assess elasticity. We argued that, by defining elasticity as intentional or strategic vagueness, contractual parties have the ability to deal with the unknown. We also suggested that elasticity was a marker for the relationship between parties. The language is embedded with the DNA of the contractual parties’ trust and inherent negotiation power. By understanding the role that vagueness plays in contracts and making the implicit explicit in the process, we can offer insight to encourage more intentional contract drafting.
Our research has revealed that linguistic clues known as epistemic stretches, signal the preferences of contracting parties. These stretchers can reveal the intention of a party and how they use vagueness strategically. We created an abstract framework, and then assigned qualitative scores to the clauses of a contract. This is a structured model of knowledge that serves as the semantic fingerprint for contractual language.
William O’Hanley (then our Computable Contracts Developer) and I created an early prototype, called the Stretch Factor, of this tool.
We imagined that the diagnostic tool would not only act as a means of identifying areas with intentional and strategic vagueness but also areas where context could be formalized in order to facilitate other symbolic approaches, such as computable contract. We argued, furthermore, that by explaining the reasons behind the elasticity of contract wording and resolving the issue in the long term, clauses could be made more precise as well as act as a tool to understand party behavior.
When we began discussing what the next phase might be for this project, large language models (LLMs), started to play an important role. The benefits of their scalability, and user-friendliness, could not be ignored. We envisioned LLMs as the ideal testing ground to determine the impact of measuring elasticity. We hypothesized the results and observations of testing on LLMs would provide us with information about not only generalizability, but also on its importance relative to risk. We may also be able gain deeper insights into the contractual drafting process. In what circumstances, for example, could we imagine drafting low-elasticity clauses? Does there seem to be a preference for medium elasticity when drafting? Is there a difference between the use of elasticity in clauses and topics that are well-established, versus those which have new risks, which are not yet tested, e.g. Cyber Exclusions,?
We went one step further. We chose to build the next version of the Stretch Factor, and then launch an early trial for reinsurance attorneys. We were able, using the Quora platform Poe, to create chatbots that used three different LLMs. These included: (1) OpenAI’s GPT-4, (2) Anthropic’s Claude-2100K, and (3) Meta’s open-source Llama-270b.
First, we wanted to see how different LLMs react to our Semantic Fingerprint. We wanted to see how different LLMs reacted to our Semantic Fingerprint. We wanted to test whether our framework can be extended beyond standard clauses, such as “Access to Records”, and see if it could be applied to complex clauses. In this instance, we used Cyber Exclusion. Third, we wanted to make the tool highly accessible, so that it could be used with any clause from a contract or policy and a elasticity analysis based on Semantic Fingerprint would be generated. We wanted to change our prototype from a simple markup into a report that we felt would be more informative to the user.
Then we conducted a phase of beta testing with a sample of senior contract and underwriting experts. We gave them access to three Stretch Factor Chatbots via the web and observed their interactions through user interviews. At this stage, we determined that the ideal user is a reinsurance attorney with at least five years’ experience in the industry. This decision was made for two reasons: 1) we wanted to establish a benchmark to measure the performance of the software; and 2) we wanted this tool to be a useful metric to encourage human-machine cooperation.
Chatbots are tested in the following ways:
We will list a few initial observations below:
- The tool was widely regarded as being very effective. The tool was not only considered useful but also impressive by most of our users. The Stretch Factor was a great tool to use as they approached the time when the majority of their clients contracts would need to be renegotiated or renewed. It could provide an additional pair of eyes to assess if there were risks not included in the coverage of the contract.
- This tool was a great help to some users, as it clarified the “stretchiness of their clauses”. They felt that they could better understand the linguistic flexibility of their words, and control how elastic their clause should be. Some confirmed that reinsurance provisions must have some degree of flexibility, but they prefer that it be in line with the underwriting standards. A reinsurance clause’s elasticity is a useful metric to help underwriters determine if they have written the clause in a reasonable manner.
- This tool allows users to create risk scenarios at the contract-level.
Despite the fact that our sample size was small, we are cautious. The user interviews also did not reveal much about the differences in the chatbots. These observations are quite informative, if you consider that the chatbots were developed on three different LLMs. One of the chatbots, in particular, was built using Llama-270b with the aim of exploring the possibilities of open-source models, and reducing vendor dependency.
We anticipate that we will ask more specific questions in the next round and expand the pilot to include more experts on contracting and underwriting across the entire insurance value chain.