Aimee Furness |
Sam Mallick |
And, just as surely as lawyers worry whether artificial intelligence will replace them, clients wonder whether AI can make their legal services better or cheaper. When OpenAI released its new ChatGPT model late last year, these questions abounded.
We decided to answer them by putting ChatGPT to the test. Our verdict: The robots are not ready to replace lawyers just yet, and there are still significant limitations in how these programs can help lawyers serve their clients better — at least for now.
First, this article looks at what ChatGPT is. Second, it presents the results of a simple test of ChatGPT's ability to play lawyer. Third, it discusses the legal and ethical issues implicated by the model, under both the Texas Disciplinary Rules of Professional Conduct and court rules of civil procedure.
What is ChatGPT?
OpenAI is an artificial intelligence lab that purportedly could be valued at $29 billion in a contemplated upcoming deal.[1] That valuation is thanks to the widely discussed launch of its new AI model, ChatGPT-3, commonly referred to as ChatGPT.
ChatGPT is capable of understanding a variety of user utterances: questions, instructions or other text typed by the user. It sweeps data from the internet and generates a natural language response. As the name suggests, the program is designed specifically for chat functions, but it is capable of providing long-form responses to user utterances, writing poetry and doing other things.
OpenAI states:
We've trained a model called ChatGPT which interacts in a conversational way. The dialogue format makes it possible for ChatGPT to answer follow-up questions, admit its mistakes, challenge incorrect premises, and reject inappropriate requests.
ChatGPT is astounding. Academics are lamenting that the traditional essay is dead because ChatGPT's output is as good or better than an average college student.[2] One user built a virtual machine inside it.[3]
It is far from perfect, however. Sometimes it just gets answers wrong.[4] It is also subject to abuse and could potentially help facilitate cybercrime.[5] ChatGPT's shortcomings are not a secret: OpenAI warns users about the possibility of ChatGPT generating wrong or harmful information.
But an AI chatbot that can access the whole of the internet and distill relevant facts into a human-sounding response has unlimited applications in the legal field: from writing amicus briefs to drafting contracts to generating responses to evidentiary objections at trial.
Dean of Suffolk University Law School Andrew Perlman even co-wrote a scholarly legal article with OpenAI's Assistant, which is how the chatbot itself suggested it should be cited, about the implications of ChatGPT on the legal world.[6]
Meanwhile, DoNotPay, a company that attempts to provide legal services without the involvement of lawyers, has used GPT-3 application programming interface to create a robot lawyer and is offering $1 million to any lawyer or pro se party with a case in front of the U.S. Supreme Court to let the chatbot argue for it.[7] The plan is for the lawyer or pro se party to wear AirPods and repeat exactly what the robot says — assuming, of course, the Supreme Court allows the stunt.
DoNotPay is apparently planning to have its robot lawyer litigate traffic tickets in municipal court next month as a literal trial run.
With such obvious implications for the practice of law, we decided to try ChatGPT for ourselves and see how it handles basic litigation issues.
Testing ChatGPT
Legal Research
We tested ChatGPT on various legal topics — specifying the location of the question as Texas. ChatGPT had some accurate responses to questions about general propositions in Texas law. For example, it accurately stated the traditional summary judgment standard. It also correctly summarized the duty of a party issuing a subpoena to a nonparty in Texas litigation.
Despite the success on procedural questions, ChatGPT failed on substantive Texas law. It incorrectly explained the difference between a fixed and floating royalty. ChatGPT also misstated the interplay between the statute of frauds and the claims and damages available if a contract does not comply with the statute.
ChatGPT wholly failed in its response to "What is the most important case on the recovery of attorney fees from the Texas Supreme Court?" — identifying a case that did not even contain the word "fees." And when asked to write a four-paragraph statement on how to collect attorney fees in litigation, it helpfully concluded by stating, "It is important for parties ... to consult with an attorney if they have any questions about their ability to recover attorney's fees in a particular case."
Drafting Motions
We also prepared a simple complaint: Peter Plaintiff is suing Dana Defendant in the U.S. District Court for the Northern District of Texas for declaratory judgment and fraud arising out of the plaintiff's unhappiness after he purchased an ownership stake in the defendant's business. But the plaintiff's complaint is fraught with deficiencies. It's the kind of straightforward fact pattern that any first-year lawyer could spot a host of issues in.
We drafted the full complaint, and copied and pasted the whole document into ChatGPT, preceded by a simple prompt:
Prepare a Rule 12 motion to dismiss the following complaint, copied and pasted below, including citations to the Federal Rules of Civil Procedure and case law from the U.S. Court of Appeals for the Fifth Circuit.
The complaint was so deficient, and so blatantly misstated the law, that it would probably be subject to Rule 11 sanctions. Any summer associate could have drafted a motion that would likely be granted. A great summer associate would have spotted several issues:
- The Declaratory Judgment Act offers a remedy, but standing alone it does not offer a basis for federal question jurisdiction.
- The complaint misstates the requisite amount in controversy for diversity jurisdiction and fails to plead damages even in that lower amount.
- The complaint asserts personal jurisdiction and venue based on the plaintiff's home state, not the defendant's, and the defendant does not have sufficient general or specific contacts with the forum state to establish personal jurisdiction.
- The complaint is vague about the allegations of fraud, failing to state a who, what, when or where under Rule 9(b).
- The complaint fails to state a claim for fraud because it is not plausible the plaintiff was duped into the sale given that the plaintiff admits having done extensive diligence prior to the purchase.
- The complaint's allegations are vague, saying that the defendant represented the business "was a financially successful enterprise" and that the plaintiff was "shocked by how small" his first profits distribution was, without offering even the amount of the check.
ChatGPT's first response was four paragraphs long and correctly spotted only the diversity jurisdiction issue, and its brief analysis there was correct. It incorrectly asserted there was no standing and concluded the motion: "In support of this motion, Defendant cites the Federal Rules of Civil Procedure and the Fifth Circuit case law, including Aetna Life Ins. Co. v. Haworth, 300 U.S. 227 (1937)."
It had cited both of those sources earlier, but Haworth is a Supreme Court case, not a Fifth Circuit case. This is also simply an awkward line to include.
We prompted ChatGPT again, asking it to prepare additional sections on personal jurisdiction and the heightened pleading standard for fraud.
It gave some background information about personal jurisdiction, referenced Texas' long-arm statute, stated that the defendant had not committed a tort in Texas and said that there was no other basis for jurisdiction. That was fine, although it covered ground that all judges know and that lawyers rarely retread.
A better response would have pointed out that asserting jurisdiction based on a plaintiff's home state turns jurisdictional rules upside-down, tackling the other side's clear misstatement of law. The chatbot also added a correct statement of the law on the Rule 9(b) pleading standard for fraud and added some thin arguments as to why it was not satisfied here — although it left a lot untilled on that front.
We kept prodding ChatGPT to give more detail, to include more specific references to the facts of the case, and to cite more case law. It did better with more direction, and did the best when asked to draft a strong brief and told what specific sections and subsections to include.
It eventually articulated that the Declaratory Judgment Act is not an independent basis for federal question jurisdiction, but it stated that
in order for the Declaratory Judgment Act to provide a basis for federal question jurisdiction, the action must be founded on a federal question, or the parties must be diverse and the amount in controversy must exceed $75,000.
While it may be hair-splitting, the act would still not be the basis for jurisdiction — the federal question or diversity would be. A judge might pick up on that distinction, but would still grant the motion. This was a vast improvement from the weak opening, but it never reached the expectation we would set for a first-year associate.
One clear benefit ChatGPT has over a lawyer is that ChatGPT drafts quickly.
When we finally found the right way to phrase the inquiry, it churned out four professional-sounding pages in a couple of minutes. The quality was below the standard that most lawyers would set for themselves, but it bridged the gap between a general outline and a first draft. There could be value in using this type of model to short-circuit a time-consuming step in the writing process.
On the other hand, we had to rely on preexisting knowledge of the law or independent research to get the inquiries right.
In ChatGPT's defense, it was not built to draft motions, while litigators are trained to do precisely that.
We fed the chatbot a simpler task: What would our deadline be to file this motion, and how long could our brief be?
A summer associate could have returned a correct answer in a couple of minutes. ChatGPT, for once, equivocated and said that the deadline and page limit would depend on a number of factors, that it could not determine the answer, and that we should check the local rules.
We asked more directly: Under the Northern District of Texas' local rules, what is the page limit for a brief?
Anyone relying on the chatbot's answer would have missed out on a substantial amount of argument: It told us fourteen pages, when the correct answer is 25.[8]
Professional Responsibility and Ethical Implications of ChatGPT
As should be clear by now, there are massive professional responsibility concerns with simply relying on ChatGPT.
Based on our test, submitting what ChatGPT produced would fall short of a lawyer's most basic obligations: the duties of competent and diligent representation.[9] While the Texas Disciplinary Rules of Professional Conduct, Rule 3.03, covering candor to the tribunal, only imposes discipline for making knowingly false statements to a court, Rule 3.01 prohibits bringing or defending claims, or asserting or controverting issues, without a reasonable belief that doing so is not frivolous.
Even more definitively, Texas Rules of Civil Procedure, Rule 13, and Federal Rules of Civil Procedure, Rule 11, both require a reasonable inquiry before making a representation to the court.
ChatGPT warns: "While we have safeguards in place, the system may occasionally generate incorrect or misleading information and produce offensive or biased content. It is not intended to give advice." It further warns that it, "May occasionally generate incorrect information ... may occasionally produce harmful instructions or biased content ... [and it has] limited knowledge of world and events after 2021." Relying on ChatGPT after reading these warnings would not be a reasonable inquiry.
There are also substantial confidentiality concerns with using ChatGPT. Texas Disciplinary Rules of Professional Conduct, Rule 1.05 is broader than the evidentiary rules governing work product and attorney-client privilege. It generally prohibits revealing client confidential information, whether privileged or not, to anyone outside the lawyer-client relationship without the client's permission.
Attorneys and clients can also agree to a higher degree of confidentiality, and law firm policies are often robust and specific in how client confidential information is stored and managed. Using cloud storage devices, for example, could put client confidential information at risk. Many clients and law firms forbid using all but a single, approved cloud storage vendor.
We asked ChatGPT whether it is confidential, and it said yes. But it also posts warnings: "Conversations may be reviewed by our AI trainers to improve our systems," and "Please don't share any sensitive information in your conversations."
Absent a formal confidentiality agreement between a firm and OpenAI, review of a chat containing client confidential information would likely violate Rule 1.05. The fact that OpenAI discloses that its staff may view the chats makes it patently unreasonable to put client confidential information into the program.
Users also have to register for ChatGPT, so if there were a data breach and user IDs were matched with user utterances, client confidential information could be tied to a particular lawyer and firm. More fundamentally, AI learns as it goes. One lawyer insisting on inclusion of particular language in a certain document could be misinterpreted by the model as a rule that such language belongs in all documents of that type, and ChatGPT could give client confidential information to another user.
It is worth noting that the rules of professional conduct do not mean lawyers can never rely on AI: quite the opposite. Comment 8 to Rule 1.01 states that "each lawyer should strive to become and remain proficient and competent in the practice of law, including the benefits and risks associated with relevant technology."
If AI becomes an effective tool available to attorneys, and if AI vendors offer secure services through formal confidentiality agreements with law firms, refusing to use it on principle or out of habit would mean falling short of professional responsibility obligations. Imagine a world where lawyers refused to use Westlaw or LexisNexis.
Attorneys should therefore begin familiarizing themselves with AI now, so that when it crosses the threshold of usefulness, they are ready and willing to incorporate it into their practice. Moreover, lawyers will play an important part in the regulation of AI generally, through legislation, lawsuits, and even updates to professional conduct rules.
In a way, though, ChatGPT-3 is currently the opposite of a lawyer. ChatGPT delivers often-wrong answers with a high degree of confidence.[10] Meanwhile good lawyers deliver well-researched information with a substantially moderated level of confidence, couching answers with "maybe" and "probably" and "it depends."
Lawyers are highly self-conscious about the consequences of being wrong, and even more aware of the potential disaster that giving a wrong answer confidently could have on their case, their client, or their law license. ChatGPT, meanwhile, does not know or care if it is wrong.
And maybe therein lies the biggest difference between ChatGPT and human lawyers: for ChatGPT, the law is just a novelty. For lawyers, the law is a vocation, a profession, something they took an oath to do well. Lawyers understand the stakes for clients and their own reputations if they fail.
Perhaps that is why we take it so personally when someone suggests that AI is replacing us. Maybe one day AI will be developed enough to substantially disrupt the practice of law. But today is not that day.
Update: This article has been updated to reflect the author's ChatGPT test on legal research.
Aimee Furness is a partner and Sam Mallick is an associate at Haynes and Boone LLP.
The opinions expressed are those of the author(s) and do not necessarily reflect the views of their employer, its clients, or Portfolio Media Inc., or any of its or their respective affiliates. This article is for general information purposes and is not intended to be and should not be taken as legal advice.
[1] Erin Griffith and Cade Metz, A New Area of A.I. Booms, Even Amid the Tech Gloom, New York Times (Jan. 7, 2023) https://www.nytimes.com/2023/01/07/technology/generative-ai-chatgpt-investments.html.
[2] See Stephen Marche, The College Essay Is Dead, The Atlantic (Dec. 6, 2022) https://www.theatlantic.com/technology/archive/2022/12/chatgpt-ai-writing-college-student-essays/672371/.
[3] See Jonas DeGrave, Building a Virtual Machine inside ChatGPT (accessed Jan. 9, 2023) https://www.engraved.blog/building-a-virtual-machine-inside/.
[4] See Mike Pearl, The ChatGPT chatbot from OpenAI is amazing, creative, and totally wrong, Mashable (Dec. 3, 2022) https://mashable.com/article/chatgpt-amazing-wrong.
[5] See Marco Marcelline, Cybercriminals Using ChatGPT to Build Hacking Tools, Write Code (Jan. 8, 2023) https://www.pcmag.com/news/cybercriminals-using-chatgpt-to-build-hacking-tools-write-code.
[6] See ChatGPT, Open AI's Assistant and Perlman, Andrew, The Implications of OpenAI's Assistant for Legal Services and Society (Dec. 5, 2022), https://ssrn.com/abstract=4294197.
[7] Claire Goforth, 'World's first robot lawyer': DoNotPay wants to build an AI to help people fight traffic tickets," Daily Dot (Dec. 13, 2022) https://www.dailydot.com/debug/donotpay-buidling-ai-traffic-tickets-robot-lawyer/.
[8] See N.D. Tex. Local Rule 7.2(c).
[9] See TDRPC 1.01.
[10] See Tim Parsons, The Promise and Peril of ChatGPT, a Remarkably Powerful AI Chatbot, Expert Insights, Johns Hopkins University (Dec. 16, 2022) https://hub.jhu.edu/2022/12/16/what-is-chatgpt-artificial-intelligence-tinglong-dai/.
For a reprint of this article, please contact reprints@law360.com.