AI Love AI: The Open-Source LLM Effect (read aloud)

AI Love AI

0:00

-9:13

AI Love AI: The Open-Source LLM Effect (read aloud)

If you think OpenAI has an unbeatable head start, you are looking in the wrong place.

Jeff Wang

Apr 07, 2023

Keeping up with AI is a full time job, I would know since I’m looking for one! Everyday, new start-ups are created using GPT, new extensions are created on Stable Diffusion, new research papers are published with major advancements…if you’re thinking about an AI start-up, someone else is thinking the same idea too! And you’re both likely too late because a third person has already raised funding for it. Anyways, all kidding aside, I thought I’d dive a little deeper into where start-ups need to compete, and that’s leveraging open-source.

Stable Diffusion in particular has been very challenging to keep up, because so many different techniques, models, and new ideas keep getting released:

Thanks for reading Jeff’s Substack! Subscribe for free to receive new posts and support my work.

Subscribed

(The Extensions list on Automatic1111 in Stable Diffusion keeps growing)

The list is almost impossible to keep up to date on, but in such a short time how are things moving so fast? In only 6 months or so, we went from “Oh cool you can generate a whacky image from text!” to “Holy crap, you can generate anything that can fool AI detection AND things like video, depth models, and more?!”

Well in Stable Diffusion’s case the answer is pretty straightforward, because it’s open-sourced. With an open-source platform, the entire community can contribute and build composable layers on top of each other to move Generative Art at a blistering fast pace. But this article won’t be about Generative Art or Waifus (sorry), let’s talk about why this community involvement hasn’t really happened with the LLM space yet, until this month.

Even with multiple institutions trying to play catch up, OpenAI still has a pretty commanding lead with ChatGPT and its GPT4 model that can seemingly do anything. By being so far ahead, many companies are just giving up and using their API instead, after all, what’s the point of creating your own technology when everyone using GPT4 will beat your start-up idea anyways? Quickly going to market and raising money is more important than proprietary tech right now, but that’s only going to come back and bite many ideas when Microsoft adds more features to their productivity stack or when OpenAI adds a plugin that can automate that flow.

However, a month ago, in Facebook Research’s attempt to join the LLM bandwagon, their model got leaked on a Github pull request. With these weights now public, anybody could start using them, and an open-source LLM community was born…

(The first two models sit at 36GB, much larger than generative art models)

But weight! I mean, but wait! Why do I care about some random files when I don’t know how to run them?

Because these weights are now stored locally, they are not bound by API limitations that OpenAI has set on top of us. It means you can input any number of instructions, input and outputs and get the responses closer and closer to what they should sound like. Within days, Stanford came out with Alpaca, a fine-tuned model to LLaMa. By taking Llama 7B (the smallest model) and running another 52k instructions on top of it, they created Alpaca 7B. What would normally take millions of dollars of training cost only $500. Ooooh Alpaca, now I get the name.

So now I can run Llama on my computer and it’s pretty awesome, though requires some tweaking…:

(Apparently I created Facebook Llama)

But wait! Now that I can run an LLM on my computer, why does it matter? Well first, there’s a few limitations up until this point (2 weeks ago). The first is that I am running a pretty beefy machine with a 4090 GPU and 24GB of VRAM. That doesn’t seem feasible for anyone running a laptop or mobile phone. So the first order of business is to make this more efficient. Enter Llama.cpp

Llama.cpp made it possible to run Llama on a Macbook M1 by creating a C++ version that worked on CPUs. It also uses 4-bit quantization, or another way of saying “much faster on smaller machines”, but brings the dream closer to reality of running these locally for more people. Someone even got it working on a rasberry pi (though it takes 10 seconds to load each word…)

Okay, now we’re at a point where we can run an LLM on a laptop, what’s next? Well, what if we were to make it more lightweight? Andriy Mulyar came up with GPT4All, which uses a whopping 800k instructions on top of Llama, it will soon combine with GPTJ to get around distribution issues so Facebook’s lawyers don’t come knocking. Not only that, the model is now below 4 GB!

Then came Vicuna, another fine-tuned model utilizing ShareGPT.com’s output from over 70K user conversations. Their cost for their model was even cheaper, $140. In their evaluation they perform as high as 90% of how GPT4 performs.

So now we have the trifecta, we have fast performance, portability, and a much more accurate LLM, all within weeks of the Llama leak. The power of open source is real! But that’s not all…

What if we fine-tuned the model for specific use cases? In order to do this, you would need to provide some sort of instruction, an example input, and an example output. What’s the most important use case you can think of, well, the first thing off the top of Replicate.com’s mind was making a language model based off of the voice of Homer Simpson.

To do this, Replicate took all of the Simpson’s scripts from seasons 1 to 12, and took Homer’s lines as the output. They used the preceding line as the input. Then, given a prompt to voice Homer’s response, the language model gave a pretty accurate output:

With 60K lines of dialog and only 90 minutes, they now had a bot that could talk exactly like Homer Simpson. This use case would outperform GPT4 because it’s fine-tuned for a specific use case. Best of all, it’s not uploaded into OpenAI for them to use in future models. If there is one thing the open-source community can do, it’s to develop these use cases faster than OpenAI can.

Meta can make fine-tuned models for each user, and then have the user make social media posts in their voice. Substack can write entire newsletters in each of their author’s voices. Twitter can make posts in their user’s voices. Pretty soon, more industry niche use cases will start popping up, this includes stuff in the Sales, Ops, and Medical field. If start-ups want to survive the arms race, they will need to fine-tune proprietary information into these models to get an edge.

If you think OpenAI has an unbeatable head start, you are looking in the wrong place.

AI Relevance - Jeff's Substack

AI Love AI: The Open-Source LLM Effect (read aloud)

Discussion about this episode