X

Article updated on April 2, 2024 at 8:01 AM PDT

Microsoft Copilot Chatbot Review: Bing Is My Default Search Engine Now

Copilot is so good I've installed Bing on my phone as my default search app.

Our Experts

Written by 
Imad Khan
Our expert, award-winning staff selects the products we cover and rigorously researches and tests our top picks. If you buy through our links, we may get a commission. Reviews ethics statement
Imad Khan Senior Reporter
Imad is a senior reporter covering Google and internet culture. Hailing from Texas, Imad started his journalism career in 2013 and has amassed bylines with The New York Times, The Washington Post, ESPN, Tom's Guide and Wired, among others.
Expertise Google, Internet Culture
Why You Can Trust CNET
16171819202122232425+
Years of Experience
14151617181920212223
Hands-on Product Reviewers
6,0007,0008,0009,00010,00011,00012,00013,00014,00015,000
Sq. Feet of Lab Space

CNET’s expert staff reviews and rates dozens of new products and services each month, building on more than a quarter century of expertise.

7.0/ 10
SCORE

Microsoft Copilot

Pros

  • Uses GPT-4 and GPT-4 Turbo
  • Free
  • Accurately links to relevant information
  • Includes emojis and pictures in responses

Cons

  • While prettier, not as cleanly organized as ChatGPT and Claude
  • Jumping between different modes requires an entirely new search
  • Can avoid making definitive statements
  • Refuses to answer prompts deemed controversial

Basic info:

  • Price: Free
  • Availability: Web, Windows 11 or mobile app
  • Features: Voice recognition, connection to open internet and Bing, ability to tune answers to either more creative or precise
  • Image generation: Yes

For Microsoft search engineers, there's probably no higher praise than telling them you've switched your default search engine from Google to Bing. Sure, it took a multibillion-dollar investment from Microsoft to integrate OpenAI's GPT-4 tech into its engine. But when Bing is operating at 3.3% global market share, compared to Google's 91.6%, drastic measures have to be taken.

The thing is, I'm not really using Bing. I'm actually using Copilot, Microsoft's renamed AI chatbot that's a part of Bing.

What makes Copilot unique is that it's essentially three GPT engines in one. Copilot has three modes: balanced, precise and creative. As of this review, the balanced and precise modes are using GPT-4, a model by OpenAI, creator of ChatGPT, that reportedly has over 1 trillion parameters. That's substantially more than ChatGPT 3.5, which has 175 billion. Creative, however, is using GPT-4 Turbo, which uses data up until April 2023, as opposed to September 2021 in GPT-4. It can also give substantially larger responses, the equivalent of 300 pages of text. It's uncertain when Microsoft will bring the power of GPT-4 Turbo to Copilot's balanced and precise modes. 

Copilot is the best of both ChatGPT and Google's Gemini. It has the accuracy and fine-tuning of ChatGPT with the internet connectivity found with Gemini. This means that answers read more like a human and it can pull up-to-date information from the internet. Really, Copilot delivers such good results it's a wonder why Microsoft isn't charging for it.

While Copilot can generate images, we won't be testing that feature for the purposes of this review.

How CNET tests AI chatbots

CNET takes a practical approach to reviewing AI chatbots. Our goal is to determine how good it is relative to the competition and which purposes it serves best. To do that, we give the AI prompts based on real-world use cases, such as finding and modifying recipes, researching travel or writing emails. We score the chatbots on a 10-point scale that considers factors such as accuracy, creativity of responses, number of hallucinations and response speed. See how we test AI for more.

Do note that Microsoft does collect data when using Copilot, and this includes Copilot integrations in Word, PowerPoint, Excel, OneNote, Loop and Whiteboard. 

Shopping

As a hot sauce aficionado, I've been following the recent drama surrounding Huy Fong Foods, the purveyors of the iconic red sriracha sauce, and how the flavor has changed since its hiatus and recent return. Turns out, there's been an ongoing dispute with its original jalapeño supplier and Huy Fong Foods now sources chilis from Mexico. To add another wrinkle in this saga, Underwood Ranches, the original jalapeño supplier, has entered the market with its own sriracha sauce. 

I asked Copilot if it could help describe the differences I should expect between the new sriracha from Huy Fong and the copycat from Underwood Ranches. Copilot excelled in giving a full breakdown with specific language and even gave a quick summary of the ongoing corporate drama. 

Copilot described Huy Fong's sriracha as more garlicky, with sweeter notes and less spice than before, whereas Underwood Ranches has added kick and is more reminiscent of the old sriracha. This description fell in line with other testimonies I've seen on YouTube and Reddit. 

Unlike Gemini and ChatGPT 3.5, Copilot gave specific descriptors and laid the information out in a manner that was easier to follow. 

Beyond sriracha sauces, I've also been in the market for a new TV. In comparing last year's LG OLED C3 and G3 models, Copilot did a good job breaking down the differences and explaining which one would be the better buy. It got the key details right, like the fact that both televisions use the same processor and that the G3 gets brighter. However, it didn't make the kinds of definitive arguments that Gemini did when prompted with the same question.

But when I asked the same question in Copilot's "creative" mode, which utilizes GPT-4 Turbo, it provided answers that felt more thought out, rather than a string of boilerplate bullet points. Here, Copilot put together cogent thoughts on brightness, design and performance, with a concluding paragraph explaining that, for most people, the increased brightness won't be noticeable on the more expensive G3. 

Copilot in "creative" mode felt most like Claude. Information was better synthesized and did feel like it was put together by a real person. Gemini and Perplexity performed similarly, with sharp descriptions and little fence-sitting. While all the AI chatbots performed well, I'd have to give the edge to Copilot and Claude. 

ChatGPT 3.5 currently can't make these types of shopping comparisons, as its training data is only up to September 2021. 

Recipes

Sometimes finding a good recipe online can be a chore. Popular dishes can vary wildly, making it difficult to find the best one. Plus, having to scroll through long-winded preambles about memorable flavors of yore can get tiresome. An AI can filter through all the fluff and generate recipes in an instant. 

Copilot did a decent job of generating a chicken tikka recipe in creative mode. It got the basic ingredients down, as well as a list of instructions on how to prepare the mix. However, it left out harder-to-find ingredients, ones that Gemini did capture, like Kashmiri chili powder, chaat masala and amchur, a dried mango powder.

I was curious what answer Copilot would yield if switching to precise mode. Interestingly, it included mustard powder, which isn't as common, and kasuri meti, or dried fenugreek.

Given Copilot's trifurcated nature, you might need to weigh which mode within Copilot might yield the best answer. Just because creative uses GPT-4 Turbo doesn't mean it'll give the best result to all queries. 

Overall, Google Gemini performed best in this test, providing the most robust recipe. This was followed by Copilot in precise mode. ChatGPT 3.5, Perplexity and Claude all performed similarly, with very basic recipes.

Research

The power of AI in doing research is that the model can look at multiple pieces of information and help find linking points in seconds. Normally, this would require you having to read through research papers yourself to make these sorts of connections. Copilot not only does this well, but links to sources, too.

Copilot gets excellent marks as a research tool. When I asked Copilot about the relationship between homeschooling and neuroplasticity, it pulled up research papers related to childhood education and brain development, and it even linked directly to PDF files containing the research. 

I then switched to creative mode and got an even better response, with Copilot synthesizing additional sources and giving more nuanced answers. It felt as if Copilot had a greater understanding of the topic and the complexities different schooling environments present.

Copilot in creative mode and Claude performed similarly in this test, and beat out Gemini, ChatGPT 3.5 and Perplexity. And unlike Gemini, all of Copilot's responses were real. It didn't make up the names of research papers in the way that Gemini did. 

While ChatGPT 3.5 was also accurate in recommending and summarizing research papers, it isn't connected to the open internet, so it can only recommend you go to Google and search for it yourself. 

Summarizing articles

Copilot does a decent job of summarizing articles, but like all the other AI chatbots we've tested, they continually fail to capture the central focus. 

Copilot, like Gemini, ChatGPT 3.5, Perplexity and Claude, were able to capture the basic points of an article I wrote earlier this year about AI at CES 2024. But all seemed to be unable to pinpoint the major crux of the piece: That a lot of AI hype is a rebranding of older smart tech. 

Can Copilot give you a good rundown of an article in a pinch? Sure. Should you rely on article summaries for a class presentation? Probably not.

Travel

The internet is glutted with travel recommendations. From blogs, travel guide publishers, TikTokers and YouTubers, so many people are trying to fill you in on the best sites and eats in iconic cities like Paris or London. But what about Columbus, Ohio? This is where AI can come into play with its ability to glean data from across the web and synthesize information about lesser traveled locations. 

When I asked Copilot for a three-day travel itinerary to Columbus, it performed spectacularly well in putting together recommendations for locations and restaurants in a bullet-pointed, easy-to-understand format. We cross-referenced Copilot's results with CNET's Bella Czajkowski, who hails from Cowtown. Copilot also did a great job weaving in bonus recommendations, something ChatGPT 3.5 and Gemini neglected to do. 

All the restaurants Copilot recommended were real. It didn't make up restaurants like Google Gemini did. And I have to hand it to the Microsoft team for coding Copilot to also bake emoji into responses. It adds that slight hint of personality and makes following a lengthy set of travel recommendations easier to follow. For example, if you want to pinpoint the bar recs, look for the beer emoji. 

Compared to the AI bots tested, Copilot outperformed them all. Copilot made recommendations to locales and restaurants, all of which exist and are still open, producing articulate and accurate results with easy-to-follow language and structure. ChatGPT performed adequately, despite it not being connected to the open internet.

Writing emails

Like every other chatbot tested, Copilot performs great in writing basic emails. You can easily ask Copilot to tune an email to be more or less formal. Regardless of the tone you go with, emails read as believable.

When asking Copilot to create an article pitch on racier topics, however, like the increased sexualization of online content creators and the ongoing changes in parasocial relationships with fans across the internet, Microsoft's AI engine refused to engage in discussions about explicit content or the moral and ethical qualms related to it.

All the other AI chatbots were able to take on this task. Claude performed the best, creating a pitch that was compelling and written well enough to be passed off as human-made. 

Better than ChatGPT, Gemini or Perplexity

Copilot is versatile and can generate responses to be creative or precise, something the other AI chatbots can't do unless prompted to. The way Copilot presents information, often with bullet points and emojis, makes it easy to read. It's also accurate, linking to actual pieces of news and information and showed no instances of hallucinations, at least in our testing.

While Copilot doesn't have Claude's personality, it usually performs at or beyond it, given the task. Microsoft, however, has seemingly put high guardrails on Copilot, which means that it'll refuse to answer dicier questions, even if the use is legitimate. 

Microsoft Copilot is excellent. And it should be, right? It's powered by GPT-4 and GPT-4 Turbo, and has access to Bing's search data to help bolster its generative capabilities. Gaining access to GPT-4 tech with ChatGPT requires a $20 monthly subscription. My recommendation: Don't pay $20 per month when Microsoft is giving away OpenAI's tech for free.

Editor's note: CNET is using an AI engine to help create a handful of stories. Reviews of AI products like this, just like CNET's other hands-on reviews, are written by our human team of in-house experts. For more, see CNET's AI policy and how we test AI.