Deep Thoughts

Large Language Models (LLMs): I’ll have the Special Sauce

Written by Cheryl Wilson Griffin | March 7 2023

Though many products use the same LLMs, the value of any product is in its "special sauce."

 
With OpenAI releasing ChatGPT last year and the recent Casetext announcement, it’s become clear that artificial intelligence and machine learning is going to be a part of our day-to-day lives in legal sooner than anyone (or perhaps, sooner than I) thought. With the emergence of generative AI in legal comes new obligations to develop some baseline understanding of not only the risks and benefits, but also how to evaluate it. After all, how does a lawyer or head of innovation decide whether to move forward with Product A or Product B when both claim to do a similar thing without a solid comprehension of what’s driving the solution set each product offers? Enter large language models (or LLMs). For many platforms, the way LLMs are chosen, taught, and controlled act as the “special sauce” that can set one product apart from another.

 

What the heck is a Large Language Model (LLM)?!

For many of us, this is a new term that we’ve only begun learning about within the past year or two. According to tech company Nvidia, an LLM is a “deep learning algorithm that can recognize, summarize, translate, predict and generate text and other content based on knowledge gained from massive datasets.” Said another way, these models are trained to understand natural (human) language and to use that understanding to make predictions and generate content based on those predictions. 

 

Have you ever had an argument with a spouse, parent, or colleague where you could easily predict the next element of their argument based on the years of past conversations combined with their most recent arguments? Think of an LLM as a more scientific version of that everyday experience – looking back to what you have learned, adding the knowledge of what has just occurred, and predicting what will come next.

 

You’re probably already interacting with LLMs on a day-to-day basis whether you know it or not. Many companies’ chatbots have employed very specialized models focused on a specific topic, like helping you reset a lost password, for several years. Do you ever ask Siri or Google to handle a task for you? They’re also leveraging LLMs, though on a much more limited basis. Unless you’ve been living under a rock, you’ve certainly heard about OpenAI’s ChatGPT bot, which reached an estimated 100 million active users within two months of launch. Google, NVIDIA, Microsoft, Meta, and a host of others have new models in development with LLMs being used for everything from composing social media posts to helping find cures for cancer. 

 

Where’s the secret sauce? 

Many products are using the same LLMs. Doesn’t that mean all the products will be the same? Absolutely not.

 LLM_Diagram

 

Though each LLM has its own unique capabilities, there are a couple of key places where they foundationally differentiate themselves, starting with the number of parameters. Parameters are the part of the model that is learned from the background training data. Think of a parameter as an element of knowledge that can be updated as the model ‘learns’ over time. As a simple example, ‘blue’ might be a parameter than can be updated as a model learns what color the ocean is. The model can update that parameter as it learns more about what oceans look like. And while it is true that an LLM with more parameters has more capacity for learning, the largest LLMs can be huge (sometimes petabytes), slow to return results, expensive to operate, and not necessarily appropriate for every use case. 

 

Regardless of the model’s capacity to learn (i.e., the number of available parameters), the product’s designers decide how the model will learn over time. Especially in highly confidential industries like legal, this is likely to be one of your core areas of evaluation and inquiry – how does this product use my and my client’s data to teach its model? For instance, according to its recent announcement, Casetext’s new AI legal assistant CoCounsel doesn’t store any content their customers upload, meaning “none of the information used in CoCounsel is sent back to “train” OpenAI’s model.” This seems to mean that customer data is not being used to train the model for the public’s use, but it also means that the results it provides you probably won’t get better based on your particular interaction with the model over time because it will “forget” what you’ve previously told it. 

 

In contrast, other products allow their LLMs to learn from interactions with customers with the goal of improving the quality of the results it delivers over time. Facebook uses reinforcement learning techniques to understand which notifications should be delivered and at what times to be most relevant to you by learning from the entire Facebook userbase’s interaction with the platform. Adopting yet another approach, discovery and investigation platform, Altumatim, uses reinforcement learning techniques to teach the model about your specific case and improve the results you receive specific to that case. Instead of the model learning from everyone for everyone, their model learns from you for you.

 

Finally, there are any number of processes and controls that can be included to make the “special sauce” even more special. For instance, a product might limit how many questions the product can ask you before delivering results or limit their product’s ability to produce results with offensive language. Or, if a task is particularly complex, the product might combine different LLMs running in sequence or parallel to provide further interpretation and prediction.  

 

What’s next?

You should expect to see regular announcements about new products leveraging large language models as more companies find ways to incorporate the technology. In some cases, companies will be very transparent about which LLMs are being leveraged by their products – we know from their own marketing materials that Casetext is using OpenAI’s GPT-3, as is Jasper.ai and the world-famous ChatGPT bot. Other companies may be less wed to specific models and thus less likely to share specifics. Altumatim, for instance, says only that it has adopted a multi-layer LLM architecture for performance and accuracy. 

 

And, though there is plenty of room to continue expanding the vision for how we use these technologies today, there’s game-changing hardware waiting just around the corner. Enter quantum computing, a fast and wildly efficient form of high-performance computing that leverages the science of quantum physics to drastically increase computing power. Though largely theoretical to date, quantum computing data centers are starting to become commercially available, making it possible to deal with big, messy data sets, at exponentially high speeds, far more quickly than ever before. 

 

Hey legal friends, do we know anyone with big, messy data sets?