February was perhaps the biggest month of AI news in a while, in fact on one day alone there were over a dozen AI announcements that were true game changers. Let’s cover some of the major news items from the month:
Amazon announced their upcoming chatbot named Rufus, which you could ask about purchasing items on their website.
Sam Altman made news when he said that he would probably need $7T to build a new GPU company. That’s more than 3 Nvidia’s. OpenAI also announced that they were creating agents, which would automate work processes. ChatGPT also accidentally got the wrong weights on its model and gave users strange responses.
A relatively unknown chip company called Groq became mainstream after being on the All-In podcast twice, and demos showing how much faster it was at ChatGPT went viral.
Stability announced a model that could use a 3 step process to generate images called Cascade, which can run on much lower power machines. They also announced Stable Diffusion 3 which is their best model yet that can do multi-subject prompts and add text to the images without much difficulty.
Google rebranded Bard to Gemini, hoping to shake off the bad press, but got major controversy when it filtered such prompts as “white man” and refused to act. One of the major news bits here that didn’t get much press was that it also announced Gemma, an open source model that was 2 billion parameters.
Reddit signed a $60M deal with Google to allow user data to be trained on Google’s Gemini platform.
Mistral signed a partnership with Microsoft to serve a new model and chatbot, they’re going to try to become the next OpenAI and Anthropic, this reminds me of Blockchain L1’s lol
Okay, you’re probably noticing that a bunch of major news announcements are missing. I wanted to summarize that in the next section
Feb 15, “AI”-Day
In the span of 12 hours, there were more than a dozen major AI news releases on February 15th, 2024. Let’s go over some of these in detail and list the rest
It almost seemed like it was all timed to match the main news from Google, which announced Gemini 1.5, which is Google’s answer to GPT4. Their Pro model has a whopping 1M context window, which means you can fit roughly a 1 hour video, 11 hours of audio, 30k lines of code, or 700k words. Like you can literally just put it in front of the model and it will answer you within seconds…this is incredible. Can you imagine putting 10 Youtube videos in and asking for a summary of each video within a few seconds? Or taking a book and writing a new chapter. Oh and here’s another kicker, their research version can go up to 10M tokens in the context window. That’s stepping into “no one else is doing this” territory.
To serve this model, Google is also doing a mixture of experts (MoE) model which takes a bunch of different LLMs and routes the prompt accordingly to the right ones. Google has also claimed they have made advancements to efficiently train and run these models, let the battle begin!
Somehow Google botched this release. Not only did controversy arrive because of its prompt filtering, but even with this amazing technology somehow the public perception was not excited. It is not likely they gained much market share away from OpenAI. And of course, OpenAI strategically picked a product to overshadow Gemini 1.5 Pro news.
OpenAI announced Sora, their text to video model which is unbelievably realistic.
Each of the video examples have not been modified, they took the results from a prompt such as the one above of a woman walking through Tokyo with neon lights (granted they probably chose the best results). In the example above the reflections, lighting, everything look almost indistinguishable from an AI generated image…let alone video. That’s not all though, it can do 3D graphics as if it were a Pixar movie, or pan through a scene, or recreate complicated physics
The physics part is mindblowing, to be able to recreate how the real world works simply by describing it on text means that they must have trained their model in a way where it understands how objects interact with each other. It also means that generating graphics or creating 3D worlds might soon be outdated by whatever method OpenAI is using to create these scenes. Remember just 2 months ago, I wrote an article about how you could use Stable Diffusion and generate an AI image and “link” it to another frame one by one to generate videos. This completely blows that technology out of the water.
Just to prove it was real, Sam Altman actually asked on Twitter for people to give examples and he posted quite a few of them as they came in. Those results were also mind blowing (though not as crazy as the cherry picked examples on their blog post). Every time OpenAI comes out with the next generation of technology, a few startups die. Immediately Pika Labs and RunwayML might have realized they are now probably a year behind. As usual, OpenAI might be killing off the startup landscape, who says their researchers’ $1M salaries aren’t justified? At the rate this technology is evolving there is some serious disruption ahead in any of the video industries. In response…Pika Labs announced lip syncing later in the month.
The rest of the news on the same day…
Okay so ironically everyone else decided to announce their news at the same time, so I’ll summarize some of the interesting ones we saw:
LangChain announced LangSmith (for LLM development) as well as a Series A raise by Sequoia
Magic.dev raised a $100M round to compete with the likes of Copilot and Codeium, but still has not shown the product yet.
Sourcegraph announced Cody for Enterprise, another competitor for AI Coding assistants, though it looks like it is still playing catch up, its goal is to offer variability on the cloud LLMs.
AI2 announced a new open source LLM called OLMo, which can be used by anyone, they’ve put out 2 7B models.
There’s probably more news that was piled on this day, but at this point, I would question why these other companies felt that it was the right time to announce something when both Google and OpenAI had some mega hitter announcements.
OpenAI however picked the right day to announce Sora, and likely saved a big marketshare shift from people switching over to Gemini by keeping the market underwhelmed on Google’s news.