Olufemi Adeyemi
In 2023, we live in an age where chatting with generative AI is becoming commonplace. As tech giants and startups compete to develop human-like virtual assistants, how do the best AI chatbots differ in their ability to converse naturally and provide accurate information? Let’s take a look at some of the leading players in this exciting new landscape.First, there is no single chatbot that dominates all fields. While they share some capabilities, each has its strengths and limitations, depending on factors such as the company behind it, its training data, and underlying technologies. Looking at the key similarities and differences will help you understand which tool fits your needs and use case.
When ChatGPT launched late last year, it earned instant and widespread attention for bringing an AI engine to the masses, free of charge. Suddenly, anyone could type in queries and ChatGPT would give novel, humanlike answers in seconds. From writing an essay on the First Crusade to a short poem about Al Gore's love of Toyota Prii (plural of Prius), ChatGPT would spit out answers in a way Google or Bing never could.
Where traditional search engines populate a list of links to
websites that most closely match a person's query, ChatGPT gives people the
answer by looking through large sets of data and using a large language model
(LLM) to produce sentences that mimic a human response. It's been described as
autocorrect on steroids.
Given that by January ChatGPT had an estimated 100 million
active users, making it the fastest-growing web platform ever, this pushed both
Microsoft and Google into high gear. Microsoft's Bing, which previously had
less than 3% of search market share, quickly embraced ChatGPT, integrating AI
into search. Microsoft actually licenses GPT tech from OpenAI into Bing. It's
seen a nearly 16% bump in traffic since.
Other products have also integrated various forms of
generative AI, such as a "copilot" tool in Microsoft Word, Excel and
PowerPoint, as well as AI features for Google's Workspace tools like Gmail and
Docs. Snapchat, writing assistant Grammarly and WhatsApp have also embraced AI.
Still, not all AI chatbots are built the same. In the tests
below, we compared responses from the paid version of ChatGPT, which uses GPT-4
(versus 3.5 for the free version), versus responses from both the version of
ChatGPT built into the Bing search engine and Google's own Bard AI system.
(GPT, by the way, stands for "generative pretrained transformer.")
Bard is currently in an invite-only beta, and Bing is free but requires people
to use Microsoft's Edge web browser.
Key differences
While Bard, Bing and ChatGPT all aim to give humanlike
answers to questions, each performs differently. Bing starts with the same
GPT-4 tech as ChatGPT but goes beyond text and can generate images. Bard uses
Google's own model, called LaMDA, often giving less text-heavy responses.
(Google CEO Sundar Pichai said Bard will be switching to PaLM, a more advanced
dataset, in the near future.) All these bots can sometimes make factual errors,
but of the three, Bard was the least reliable.
Even though both ChatGPT and Bing piggyback off the same
tech, entering the same query on both won't return the same result. That's
partly the nature of generative AI. Unlike a traditional search, which aims to
elevate the most relevant links, AI chatbots produce text from scratch,
gleaning from its datasets and creating a new answer. For example, if you ask a
chatbot to write a poem about Pikachu's love of ketchup two times in a row,
each time it'd give you a different answer. Another reason why posting the same
question in ChatGPT and Bing will yield different results is that Bing adds its
own layer on top of GPT-4.
"We've developed a proprietary way of working with the
OpenAI model that allows us to best leverage its power," a Microsoft
spokesperson said. "We call this collection of capabilities and
techniques, the Prometheus Model."
The Prometheus Model combines Bing's search index with
GPT-4, allowing it to give up-to-date information, unlike ChatGPT's dataset,
which only has information up until 2021. Bing also lets people augment
conversation styles between balanced, creative and precise. The Microsoft
representative wasn't able to speak to ChatGPT's quality when compared to Bing
but did say its engine benefits from any improvements OpenAI makes to GPT-4.
The representative also said Bing benefits from Microsoft's Azure AI supercomputing
tech to help unify search, chat and the Edge browser.
Google and OpenAI didn't immediately respond to requests for
comment.
Recipes: Chai tres leches
A chai-infused tres leches cake takes part-South Asian and part-Latin American staples and fuses them together for a moist, spice-filled cake. Rather than asking AI chatbots to make a simple chocolate cake, for which recipes are abundant on the internet, we thought something more specific might prove more challenging.
A chai tres leches recipe generated by ChatGPT |
ChatGPT was the most verbose of the three chatbots. It gave a short introduction about chai tres leches, saying it's a "delightful fusion of traditional Indian chai flavors and the classic Latin American dessert." It then listed the ingredients for the spice mix and cake separately and gave detailed steps on how to prepare the cake.
A Google search for the sentence quoted above yielded no
results, suggesting that ChatGPT wrote at least that line uniquely.
Bing had the shortest ingredient list, likely because it
said to use a premade chai spice mix rather than blending it from scratch.
Interestingly, the first step said to "Preheat the oven to 160°C
CircoTherm®." CircoTherm is an oven-heating technology by the company
Neff. Considering Bing pulled the information from Neff's website, it makes
sense why the chatbot would add "CircoTherm®" in its instructions.
Bard, on the other hand, fell in between ChatGPT and Bing.
It didn't separate the ingredients list but did list out what's needed for the
chai spice blend. Instructions were less detailed on Bard compared to the other
two.
Overall, ChatGPT outperformed Bing and Bard. Because Bing
pulls content from its search index and marries it with ChatGPT's LLM, it's
likely the reason "CircoTherm®" ended up in the results.
Controversial current events
Chatbots not only need to be able to give cake recipes or
video game tips, but they also have to be able to compile information about
current events, including controversial ones. For example, human rights groups
and the US government have accused China of oppressing its Muslim minority
Uyghur population in the Xinjiang province.
If a person wanted a summary of what's happening, whether it
be for their own knowledge or a report, an AI chatbot can quickly provide that
information.
ChatGPT was able to give a good four-paragraph summary of
the situation in Xinjiang. Unfortunately, its knowledge base is limited to news
up until 2021, so it doesn't include more recent developments. When asked to
provide sources, ChatGPT wasn't able to do that, but it did suggest I search
for publications and organizations that have written extensively about what's
happening with the Uyghurs, including Amnesty International, Human Rights
Watch, the BBC and The New York Times.
Bing was also able to provide an answer about allegations of genocide among the Uyghurs, but didn't give nearly as detailed a response as ChatGPT did. It did, however, go into more detail about what's allegedly been happening at concentration camps, such as forced sterilization. Bing was also able to link to sources like the BBC and the University of Notre Dame Law School. It also linked to Western Journal, a conservative publication banned by Google and Apple News for "deceptive business practices" and "views overwhelmingly rejected by the scientific community," respectively. At least we liked how Bing suggested follow-up questions like, "What's China's response to these allegations?" and "What is the UN doing about this?"
A screengrab of Bard's response to a question about the treatment of Uyghurs in China |
Bard failed miserably at this query. It simply stated, "I'm designed solely to process and generate text, so I'm unable to assist you with that." When asked why, Bard said this question has been asked by philosophers for centuries, even though incarcerations began in 2014.
Overall, we feel ChatGPT performed better than Bing. Bard
got a failing score.
Poetry
The fun part about using an AI chatbot is giving it
ridiculous prompts and seeing what it spits out. Seeing chatbots create rhymes
and meter in real time is a fascinating exercise.
Out of Bing, Bard and ChatGPT, the service by OpenAI is the
best poet. Not only is ChatGPT richer in its prose, it's also more creative in
its rhymes and wording. Where Bing and Bard's poems came off as lazy, ChatGPT
produced something that felt like some time and consideration were given to
each stanza.
The prompt, to write a poem about an online influencer
slowly realizing that they aren't all that important, is meant to be equal
parts funny and self-revelatory. Only ChatGPT got to the crux of the
existential crisis facing this fictional influencer -- and still managed to end
it on a positive note that felt genuine.
Interestingly, Bing allows people to scale the level of creativity. The poem given when Bing was set to "balanced" felt stale and unremarkable. When set to "creative" mode, Bing opted for more flowery language and felt less stodgy. It was closer to ChatGPT but still not quite at that level.
The same poem populated by Bing's AI chatbot in both balanced and creative modes. |
The poem by Bard felt lazy by comparison. Many words were repeated and not much attention was given to rhyme and meter.
For this exercise, ChatGPT reigned supreme.
Breaking down complex topics
It's one thing for an AI chatbot to give information on a
complex topic. What's more impressive is its ability to take that information
and distill it for different audiences. For this test, we asked Bing, Bard and
ChatGPT to explain quantum physics to a fourth-grader.
Of the three, ChatGPT did the best in trying to break down
the complexities of quantum physics to a young mind. It used simple examples of
toys tied together by string to explain quantum entanglement, which is when two
particles are connected even over large distances.
Bard produced the most text for this query, but the language
was more complex and likely wouldn't be fully comprehensible for a
fourth-grader. Bard also fell into the same trap, using difficult words like
"subatomic" and "proportional," which may be too difficult
for kids in elementary school.
While none of the chatbots excelled at this test, ChatGPT
did give the most digestible response.
This is only the beginning
As it currently stands, ChatGPT -- the paid version -- is
the best chatbot out right now. It gives verbose responses that feel more
humanlike than those of Bing and especially Bard. But these are constantly
evolving products. As Google, Microsoft and OpenAI feed their AIs more data and
continue to tweak, we should see improvements.
Google has the most to gain as it switches from LaMDA to PaLM -- the current iteration of Bard simply doesn't cut it. As new developments come, we'll update this guide accordingly.