Codex-The Stanford Center for Legal Informatics, Casetext and their research collaborators recently announced a “watershed moment” when they deployed GPT-4 – the latest version Large Language Model (LLM) – to pass the Uniform Bar Exam. GPT-4 did not just barely pass. The GPT-4 passed both the multiple-choice and written portions of the test, surpassing not only the scores of previous LLMs but also the average bar exam score. It scored in the 90th per centile.

Pablo Arredondo is a Codex fellow.

Pablo Arredondo (JD ’05), Chief Innovation Officer at Casetext and co-founder, and fellow of Codex, worked with Codex faculty Daniel Katz and Michael Bommarito on a study to determine GPT-4’s performance in the UBE. Katz and Bommarito had found in earlier research that an LLM due to be released late 2022 would not pass the UBE’s multiple-choice section. The recently published “GPT-4 passes the bar exam” paper quickly gained national attention. Even The Late Show With Steven Colbert made a joke about robo-lawyers who run late-night TV advertisements looking for clients who slip and fall.


Arredondo, and his colleagues take this very seriously. He says that while GPT-4 is not sufficient for lawyers to use professionally, it’s the first large-scale language model that is “smart” enough to power professional AI products.


Arredondo discusses the implications of this AI breakthrough for the legal profession, and the evolution Casetext’s products.


How can you explain the leap from GPT-3 and GPT-4 in terms of its ability to read text and its comfort with the bar examination?

Taken in a broader perspective, the technological advances behind this new AI generation began 80 years before when the first computational model of neurons (McCulloch Pitts Neuron) was created. Recent advancements–including GPT-4 –have been powered by neuron nets, an AI type loosely based upon neurons that includes natural language processing. It would be remiss of me not to refer you to this fantastic article by Stanford Professor Chris Manning. He is the director of the Stanford Artificial Intelligence Laboratory. The first pages of the article provide an excellent history that led to current models.


You claim that computational technologies struggle with complex tasks or domain-specific tasks, such as those in law. But with the advancing capabilities large language models — and GPT-4 — you sought to demonstrate potential in law. Could you discuss language models, and specifically how they’ve improved for law? Does it mean that if it’s an educational model, the more the technology is used by the legal profession or the more the bar exam is taken the better/more useful the tool is for the legal profession?

The development of large language models is accelerating at an astonishing rate. The study that I conducted with Stanford CodeX fellows Dan Katz, a law professor, and Michael Bommarito, a Stanford Law Professor is illustrative. While GPT-3.5 scored in the 10th percentile or lower, GPT-4 was not only able to pass the bar but also approached the 90th percentile. The scale of the models is more important than fine-tuning the law. Our experience is that GPT-4 performs better than smaller models which have been fine-tuned for law. From a security perspective, it is important that the general model does not retain or learn from the information and activity of attorneys.


What are the next technologies and how will these impact the practice of Law?

This area has seen rapid progress. Every day, I hear or see a new application or version. Agentic AI is one of the most interesting areas. The LLMs are designed to strategize and execute a task by “themselves”, evaluating the situation along the way. You could, for example, ask an Agent to book transportation for a meeting and it would do so without prompting. It will also rent a vehicle and check multiple airlines, if necessary. Imagine applying this to substantive tasks in law (e.g., I would first gather supporting testimony during a deposition and then go through the responses to discovery to find additional support, etc.).


Multi-modal is another area for growth, where you look beyond text to include things like vision. This will allow AI to compare video evidence with written testimony or comprehend/describe patent numbers.


Big law firms have certain advantages and I expect that they would want to maintain those advantages with this sort of evolutionary/learning technology. Do you think AI will level the playing field?


This technology will certainly level the playing fields; in fact, it has already done so. This technology will both level the playing field and elevate it.


Can AI-powered technologies such as LLMs help to reduce the gap in access to justice?


Absolutely. This is probably the most important work LLMs do. Federal Rules of Civil Procedure Rule 1 calls for “justice, speed and economy” in the resolution of disputes. If you ask most people to name three words that come to their mind when thinking about the legal system “speedy”, “inexpensive”, and “just” would not be among the top answers. LLMs help increase access to justice for attorneys by making them more efficient.


We have read about AI and its double-edged blade. Do you have big concerns? Do you think we’re getting closer to the “Robocop moment”?


Casetext and I agree that, despite the technology’s power, it still needs attorney supervision. This is not a robotic lawyer but a powerful tool to help lawyers better represent their client. In AI debates, I believe it’s important to differentiate between near-term and long-term questions.


Most of the dramatic comments you hear about artificial general intelligence (“AGI”) (e.g. AI will lead us to utopia, AI is going to lead to our extinction) are in relation to “artificial global intelligence”, which many believe will be decades off and will not be achievable by simply scaling up current methods. In the near-term, the discussion is more measured, and I believe the legal profession needs to focus on this topic.


Professor Larry Lessig, at a workshop that we recently held during CodeX’s FutureLaw Conference, raised several immediate concerns about issues such as control and access. The managing partners of law firms have asked what this means for the training of associates. How do you train the next generation of lawyers in a world when a large part of attorney work is delegated by AI? I am more interested in these questions than in the apocalyptic predictions. It is good to see some people focusing on the long-term implications.

Pablo Arredondo, Fellow at Codex-The Stanford Center for Legal Informatics, is the co-founder and CEO of Casetext – a legal AI firm. Casetext’s CoCounsel Platform, powered by GPT-4 assists attorneys with document review, legal memos, deposition prep, and contract analyses, among other things. Arredondo’s role at Codex is focused on civil litigation with an emphasis placed on the way litigators access and assemble law. He graduated from Stanford Law School in 2005 with a JD and the University of California at Berkeley.

Leave a Reply

Your email address will not be published. Required fields are marked *