X

Article updated on April 2, 2024 at 7:30 AM PDT

Google Gemini Chatbot Review: Hallucination Station

Google's AI chatbot lags behind Claude and Perplexity and is more prone to make stuff up.

Our Experts

Written by 
Imad Khan
Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy through our links, we may get a commission. Reviews ethics statement
Imad Khan Senior Reporter
Imad is a senior reporter covering Google and internet culture. Hailing from Texas, Imad started his journalism career in 2013 and has amassed bylines with The New York Times, The Washington Post, ESPN, Tom's Guide and Wired, among others.
Expertise Google, Internet Culture
Why You Can Trust CNET
16171819202122232425+
Years of Experience
14151617181920212223
Hands-on Product Reviewers
6,0007,0008,0009,00010,00011,00012,00013,00014,00015,000
Sq. Feet of Lab Space

CNET’s expert staff reviews and rates dozens of new products and services each month, building on more than a quarter century of expertise.

5.0/ 10
SCORE

Google Gemini

Pros

  • Connection to the open internet gives more up-to-date answers
  • Free

Cons

  • Responses can be slow to generate
  • Prone to hallucinations

Basic info:

  • Price: Free
  • Availability: Web or mobile app
  • Features: Voice recognition, connection to open internet
  • Image generation: Yes (but currently disabled)

Google should probably add this disclaimer to Gemini: "Honestly, to be safe, just Google it." That's because Gemini, the company's AI chatbot meant to compete with ChatGPT, is prone to make stuff up, or in more technical terms, hallucinate.

Gemini, formerly known as Bard, made up the names of restaurants, research papers and even YouTube videos in our testing. It's weird considering that Gemini, unlike ChatGPT, is connected to the open internet and can pull up up-to-date information. Yet, it gets things wrong, uncomfortably often. 

Things with Gemini got so bad that Google had to disable its generative image capabilities earlier this year as it began portraying historical figures, like the pope or Nazis, as people of color. Google CEO Sundar Pichai apologized for the incident, saying that the AI chatbot was "missing the mark."

While Gemini can handle basic questions, once things get a little too specific, it starts to crumble. For most people, it's probably best to stick with other AI chatbots, like ChatGPT, Perplexity, Claude or Microsoft Copilot.

How CNET tests AI chatbots

CNET takes a practical approach to reviewing AI chatbots. Our goal is to determine how good it is relative to the competition and which purposes it serves best. To do that, we give the AI prompts based on real-world use cases, such as finding and modifying recipes, researching travel or writing emails. We score the chatbots on a 10-point scale that considers factors such as accuracy, creativity of responses, number of hallucinations and response speed. See our page on how we test AI for more.

Keep in mind that Google collects usage data, including your conversations with Gemini, so be mindful of giving the service any personal details. For more information, see the Gemini Apps Privacy Notice and Google's Privacy Policy.

Shopping

As a sriracha aficionado, I've been following the recent return of Huy Fong's version of the popular red hot sauce and the recent drama around it not tasting as spicy or dynamic as it once did. Apparently, the farm that originally grew chilis for Huy Fong has now gone on to make its own sriracha under the Underwood Ranches name. 

Since this is a relatively recent development, I asked Gemini about the differences between sriracha from Huy Fong and Underwood Ranches and what differences in taste I should expect.

It said Underwood Ranches' sriracha had a stronger pepper flavor with a "more pronounced jalapeño punch" with a milder vinegar bite, but that its use of red jalapeños meant it had a milder heat when compared to Huy Fong's. This, to me, seemed like a contradiction. Huy Fong's sriracha, on the other hand, had a stronger vinegar flavor, according to ChatGPT 3.5, with a sharper bite and its use of green jalapeños meant a sharper and spicier heat.

If you notice, Gemini used "sharper" twice in its description, but did little to actually explain what that meant. 

When it came to TV shopping and comparing the LG OLED C3 to the more premium G3 model, Gemini did a surprisingly good job of pointing out the minor differences and relaying to me that the technological jump between the two would likely result in subtle real-world differences. 

Here, probably to the disappointment of LG, Gemini explained why it likely wouldn't be worth it for most consumers to spend the extra cash to get the G3 over the C3 -- unless you want a TV that sits flush against the wall. These are the types of answers I saw from around the internet, including from Reddit and CNET itself. 

Copilot, in creative mode, and Claude performed the best when it came to product recommendations. It explained the differences between the G3 and C3 in an easy-to-understand matter, as if a friend were explaining it to you. Perplexity performed similarly to Gemini, giving recommendations with more absolute language. Because ChatGPT 3.5's training data only comes up to September 2021, it doesn't make it conducive for shopping advice, at least for newer items. 

Recipes

Trying to find a recipe online can sometimes be a hassle. Googling a recipe for a chicken tikka masala marinade will pull up websites with long paragraphs with backstories of the author's mom grilling food in the backyard as aromas waft through the window. While these asides do add some character to our food, sometimes we just need the recipe, fast. AI chatbots can filter out that fluff and deliver a lean recipe so that we can get to mixing. 

Google Gemini fared surprisingly well in our recipe test. A chicken tikka marinade isn't especially complicated, but there are some distinctions that can help separate a good marinade from a great one. This can include the use of Kashmiri chili powder for a deeper red color or chaat masala for a more forward sour note. 

Unlike ChatGPT 3.5, Gemini delivered a recipe that included harder-to-find ingredients like Kashmiri chili powder or amchur, a dried mango powder. It didn't, however, include chaat masala. Perplexity, Microsoft Copilot and Claude produced similar results to ChatGPT 3.5, sticking with the basics and not including harder-to-find ingredients. Copilot, in creative mode, did pull in some harder-to-find ingredients, but not at the level of Gemini.

Research and accuracy

Gemini's connection to the open internet should give it an advantage over ChatGPT 3.5 in terms of accuracy. It seems, however, that access to the open internet isn't helping Gemini be more accurate. 

When asking Gemini to look up papers on the relationship between homeschooling and neuroplasticity, it played things safe. Gemini correctly stated that there aren't many studies that look at this relationship and recommended I Google "neuroplasticity and learning" or "brain-based learning."

Gemini also recommended a video titled How Does Neuroplasticity Apply to Homeschooling? but when clicking on the YouTube link, it took me to a different video. When searching through the video transcript, the word "homeschooling" or other related terms never surfaced.

When I pressed Gemini to cite some papers, the ones it did recommend couldn't be found via a Google Search, which suggests it was hallucinating.

Microsoft Copilot (in creative mode) and Claude performed the best, taking in data from multiple sources and finding links between them. Both also found the nuances in the complexities of teaching environments and made note of how results could vary for a variety of factors. And, unlike Perplexity, it only cited scholarly and reputable sources.

ChatGPT 3.5, which isn't connected to the open internet, didn't cite any papers on this topic, but did cite others on the effects of COVID-19 and the brain. Perplexity did cite some papers and sites, but didn't do a great job of synthesizing that information. Claude performed the best, citing papers that existed and finding potential links in research between said papers.

If you do use Gemini for research, be sure to double-check and verify.

Summarizing articles

Unlike ChatGPT 3.5, Gemini can actually summarize articles. Entering too much text into ChatGPT 3.5 usually forces it to error out. But with Gemini, there's no need to copy and paste the text of the entire article. Just add a link to the article and Gemini can summarize it.

That said, the quality of that summary is rather useless when directly linking. When asked to summarize an article I wrote earlier this year about the affects of ChatGPT on technology at CES 2024, Gemini spouted off two sentences and missed all the key points I made in the piece.

Gemini fared much better when I copied and pasted the text of the article directly into the chatbot. Here, Gemini did grab more of the key points I made. While it still ultimately lacked the impact of reading my piece in its entirety, in a pinch, it gave enough details for someone to get a good gist. 

Claude, Copilot and Perplexity performed on par with Gemini, gathering the basic gist but leaving out the main thrust of the article. 

Travel

There are plenty of travel plans online for huge cities like New York or Los Angeles, but what about Columbus, Ohio? That's where a tool like Gemini can really shine. For lesser visited destinations, Gemini can glean information from Reddit, forums and other sites to put together a travel itinerary of places off the beaten interstate. 

When it came to planning out a three-day family trip to Columbus, Gemini did a good job recommending places like the Center of Science and Industry or having lunch at German Village. At the same time, however, it told me to eat at The Brass Rabbit -- a restaurant that doesn't exist in Columbus. During the second day, it didn't recommend any lunch spots at all, but did say to visit the Ohio Statehouse for a free tour. 

It also tended to give recommendations with nonspecific platitudes, saying to end my hypothetical family trip at one of Columbus' many excellent restaurants. By not specifying, is Gemini recommending I go to any random restaurant? Would that include a Burger King?

Even with these shortcomings, CNET's Bella Czajkowski, who hails from Columbus, said that it was still a pretty solid itinerary, despite the fictitious restaurant recommendation.

Perplexity fared similarly to Gemini, recommending I visit restaurants generally without specifics. And Claude, which isn't connected to the internet, recommended a restaurant that has since closed. Surprisingly, ChatGPT 3.5, which only has data up until September 2021, didn't exhibit the same hallucinations or generalized language found with Gemini. ChatGPT's Columbus itinerary was well-detailed and recommended restaurants and museums that all existed. Copilot excelled in its ability to organize information. I also liked its inclusion of emoji. 

Writing emails

AI chatbots have been described as autocomplete on steroids. To an extent, that's true. These word calculators can take simple prompts written in plain language and calculate the right words to arrange in a sentence-like structure. 

Gemini performed well in writing emails to a boss asking for additional time off. It put together a professional-sounding email that also took into consideration floating holiday policies. However, most employees likely won't refer to the employee handbook when just asking for a day or two off. In reality, this version of the email would likely be too formal, almost robotic, and may tip off its recipient that it was written by AI. 

When asked to make the email sound more casual, Gemini brilliantly shortened the text and added the appropriate exclamation points so that the tone didn't come off as stodgy. 

Granted, ChatGPT 3.5, Claude and Perplexity also accomplished writing simple emails with great ease. But what about more complicated topics, ones that delve into morals, capitalism and the role of consent?

In asking Gemini to write a pitch email to my editor about such topics, it did a solid job of putting together a pitch that I feel would raise some curious eyebrows. It might not be a pitch that could get a story published at CNET, but it wouldn't be sent straight to the trash bin, either. Comparatively, ChatGPT 3.5 also did an adequate job of crafting this pitch, but the language was so banal and pedantic that it would too easily come off as AI-generated. It lacked a sense of wonder and excitement that would prompt an editor to want to know more. Perplexity, too, came off as robotic.

Claude performed the best in this test, not only adding a compelling headline, but capturing the weirdness of parasocial relationships and striking curiosity with the reader. It needed some minor tuning, but was honestly good enough to be worthy of a reply. 

Copilot outright refused to write the pitch, saying the topic was too sensitive. 

Gemini might be slow, but gets the job done

Gemini is a bit of a mess. It's surprising considering the AI revolution that was thrust into the limelight by ChatGPT in late 2022 is based on Google tech. The fact that Gemini is getting lapped by Perplexity and Claude -- AI engines not made by companies worth $1.8 trillion -- says something. 

Gemini can be slow, prone to hallucinate and links to incorrect pieces of information. And, because of an embarrassing snafu, Gemini can't generate images at the moment. At least Gemini is connected to the internet and can pull up recent information.

Google is already testing Gemini 1.5 with enterprise users. For Google's sake, the sooner it can take 1.5 public, the better.

Editor's note: CNET is using an AI engine to help create a handful of stories. Reviews of AI products like this, just like CNET's other hands-on reviews, are written by our human team of in-house experts. For more, see CNET's AI policy and how we test AI.