OpenAI took the world by storm in 2023 with ChatGPT. I was blown away by its remarkably humanlike and compelling responses. Other companies quickly followed suit, introducing their own competitors to ChatGPT. In this update, I want to share my thoughts on these products and what the future looks like.
What is a generative AI model?
When I use the term “generative AI”, I am referring to a computer program which can generate text, image, sound, or other types of contents. This program can understand natural language and follow complex instructions. One notable instance of a generative AI is “GPT-4” from OpenAI, colloquially known as ChatGPT. Another well known generative AI is “Stable Diffusion” from, um, Stable Diffusion.
What does it mean to say a program can understand language and follow instructions? For instance, with GPT-4, you can say “Please review this blog post and provide feedback as if you were an editor.” and the program’s output will look like an email from a professional book editor who read your writing, thought about it, then offered constructive feedback. Let’s pause for a moment and think about how amazing that is. The output looks like someone has thought about it. I wouldn’t say that I think GPT-4 is sapient, but it sometimes looks the part.
*What is a generative AI product?"
A generative AI product builds on top of generative AI technologies. I would describe them as user interfaces to generative AI. While GPT-4 is a genearive AI model, the chatbot that OpenAI released and dubbed ChatGPT is a product. There are a few notable products for me: Perplexity, Github Copilot, ChatGPT, and Bard. I’ve used these in the past year and I think each has its unique strengths.
ChatGPT was the technology that brought generative AI into the limelight. When ChatGPT was released this time last year, its underlying language model was state-of-the-art. It shook the world with its human-like responses and ability to understand plain speech. It was especially lucky that ChatGPT was impressive at turing vague instructions into code, making it a hot topic amongs programmers. The adulation from programmers and geeks helped propel its growth much the same way they did for the Internet in the 90s.
Capabilities alone didn’t make ChatGPT revolutionary. OpenAI made ChatGPT incredibly easy to use by putting the technolog behind a chatbot. Even a non-technical user can “talk” to GPT and witness its superhuman powers. Everyone, technical or not, tried ChatGPT, and everyone got hooked. 2023 was the year of OpenAI and ChatGPT.
ChatGPT continued to stay ahead of the competition during the past year. It remains the most practical AI and OpenAI further improved it with the ability to understand images and speech. Recently, OpenAI released a voice assistant that blew Siri, Alexa, and (whatever Google calls their product) out of the water. Even now, much larger companies like Amazon and Google are still playing catchup. If you want to use a generative AI product, it is hard to go wrong with ChatGPT.
However, I am concerned that OpenAI’s future as a company. It is no longer an AI safety research group but a profit-seeking enterprise. Earlier in the year, I predicted OpenAI would stop selling its APIs, instead focusing on direct-to-consumer products. However, I turned out to be wrong. OpenAI continue to sell its APIs. But selling APIs makes your business easy to replace. OpenAI turned out to be really good at releasing products. Will they sustain that reputation? If they fail to develop a moat, we just might see them miss their window of opportunity. This could lead to all the very smart people working there lured elsewhere by better paying opportunities.
Over the past year, Perplexity was the generative AI product I used the most. Perplexity uses the same technology behind large language models (LLM) to build a better search engine. It can be very costly to re-train a LLM on new contents from the web. Technologists uses a technique called retrieval augmented generation (RAG) to teach a LLM with little cost. In the RAG technique, new information is transformed into a numerical format (an embedding), this numerical format is easy to store in a database and process. When a question is posed to Perplexity, it uses the same algorithm to chnage the question into an embedding also. Then Perplexity can compare the question embedding to the database and look up relevant information. It’s like teaching a LLM to use an encyclepedia. The LLM doesn’t need to memorize the encyclepedia, it can supplement what’s “in the LLM’s head” with knowledge in the encyclepedia.
Both Bing and Google Bard do the same thing. But Perplexity has been the best product in this area. Perplexity is the product I use If want information backed by reliable sources and I want to ask follow-up questions. Under the hood, Perplexity uses ChatGPT and other top-of-the-line models and they use Bing as their encyclepedia.
I am also concerned about the long term viability of Perplexity. It’s a great product, but it is competing against Google on Google’s turf. Assuming Google can get their act together and incorporate RAG into Google search, why would anyone switch to Perplexity? Even assuming it displaces Google as the internet’s search engine, how will they monetize? Using LLMs to post-process search results means the LLM will also strip out paid ads. Currently, Perplexity tries to make money though a subscription model. Is that really sustainable?
Google Bard was seen as a piece of junk when it was first released. Most responses would come back with “I am a large language model. I cannot perform that task.” Thankfully, it is much better now. Like ChatGPT, it understands images as well as text. Unlike ChatGPT, we hear little about its improvements and Google’s plans for it. Google Bard really seems to struggle with the “Gogogle has too much bureaucracy to polish a product” symptom. Recently, Google claimed that they improved Bard with a LLM called Gemini which rivals GPT-4. It can also now acccess your documents in Google drive, which is a huge win for companies using the Google Workplace suite of products. I found the drive integration to be very useful. With so much data in their cloud, Google Bard has a lot of potential.
The question remains: Will Google Bard will linger in limbo like other Gogole products? Recently I got access to yet another Google product called Notebook LM. Much like Bard, it uses generative AI to help you understand and make use of documents in your Google drive. Which product is Google going to really invest in? Who knows? If Google’s track record with messaging stands, their incredible advantage in AI might just go to waste.
Of all the generative AI offerings, Copilot was the least impressive product. Github Copilot featured the abiility to generate the implementation of a function by its name alone. That’s fine, I suppose. But I rarely want AI to write code for me. I prefer to write the code myself, then have AI review and improve it. I didn’t find Github Copilot useful when I tried it. I think the future in AI code assistants will take the form of automated code reviews and refactoring tools. The current completion-based Copilot will eventually be replaced by more workflows.
We will see more generative AI products in 2024. I believe there will be significant investments in the retrieval-augmented generation space. In the workplace, I can imagine a chatbot replacing “that guy who’s been around forever and knows all the answers.” I can also see more uses of products like Notebook LM to help with research and personal development. In the hardware space, I can see augmented reality goggles which uses AI to help you navigate the world. These opportunities promise an exciting year ahead.