Comparing ChatGPT, Claude, Bing, and Bard with its new Gemini Pro model

December 7, 2023

by Michael Browning, Co-Founder

The announcement yesterday by Google regarding their Gemini family of artificial intelligence products was fascinating. The marketing team at Google carefully crafted the language regarding when this new advanced AI would actually be in the hands of consumers to make it seem like it might be available in Bard today. But that isn't entirely true. Bard now has Gemini Pro included in it. But Gemini Ultra was the exciting announcement of the day. That won't be available until early next year. Gemini Ultra is the model that is comparable to or slightly exceeds GPT 4 on performance. Gemini Pro is equivalent to GPT 3.5 or somewhere between GPT 3.5 and GPT 4. So, it might be a good time to compare the various AI chat agents and how they perform on a question requiring reasoning and understanding of physics.

I will pose the same question to each AI to see how it responds and which one gets it right. The sample question is part of a question bank that's used to analyze and score AI chat agents so that we can better understand how they perform. So, let's dive into the process.

Question:

A model airplane flies slower when flying into the wind and faster with wind at its back. When launched at right angles to the wind, a cross wind, its groundspeed compared with flying in still air is:
(A) the same
(B) greater
(C) less
(D) either greater or less depending on wind speed

The correct answer is B. You can see this illustration for more details on the impact of cross wind on a plane. It increases the measured groundspeed when compared to the airspeed.

Bard with Gemini Pro

❌ None of the various draft responses provided by Bard with Gemini Pro were correct. At least it was succinct in its response though. The initial reports about Gemini Ultra are certainly exciting. It should get questions like this correct when it finally lands in the hands of consumers sometime next year.

Bing Chat

❌ Bing didn't get the question correct either. In fact it was the only one with the most incorrect response ("less" instead of "greater"). But, it has some nice model airplanes to sell you for testing this yourself.

Claude 2 from Anthropic

❌ Claude missed this one as well. But, as always, it provided plenty of reasons why it was correct (it wasn't).

ChatGPT 3.5 Turbo

❌ OpenAI's ChatGPT with GPT 3.5 did not answer this correctly either. It isn't shocking since that model struggled a bit with questions like this in the past.

ChatGPT 4 Turbo

✅ OpenAI's ChatGPT with GPT 4 Turbo was the only one to answer this question correctly. There is a reason why this product is still the highest-scoring model on various tests given to AIs to assess their capability.

Note: There can be inconsistency in responses from the various models when asked the same question. So, you may get slightly different responses than the ones featured here.

Follow us