Experimental Gemini 1.5 Pro takes first place against other LLMs

DeepMinds' experimental 1.5 Pro ascends LMSYS' leaderboard with a record of 1300 ELO

Google's Latest AI Breakthrough

Seemingly out of nowhere, the team at Google DeepMind have released an experimental version of Gemini 1.5 Pro that has quickly topped LMSYS' leaderboard versus other Large Language Models (LLMs). So far this experimental version has shown a substantial advancement in Google's AI Chatbot capabilities with impressive performance in various benchmarks.

Superior Performance

Gemini 1.5 Pro has demonstrated remarkable results in several key areas:

- Benchmark Tests: It outperformed competitors in 87% of the benchmarks used for developing LLMs.

- Long-Context Understanding: In the Needle In A Haystack (NIAH) evaluation, Gemini 1.5 Pro found embedded text with 99% accuracy in blocks of data up to 1 million tokens long.

Key Innovations

While Google DeepMind has yet to make any public announcements to this model's specific improvements, some features expanded off Gemini 1.5 Pro are known.

What sets this experimental model apart from the competition is it's abnormally large context window of up to 1,000,000 tokens allowing the model to process and analyse comprehensive amounts of information while maintaining a high accuracy. Gemini 1.5 also comes with the ability to learn new skills from information provided in long prompts without additional fine-tuning of the model.

Gemini 1.5 has multimodal capabilities similar to it's competition, the capabilities include text, image, audio, and video, adding versatility to it's long list of capabilities.

Performance in LMSYS and Significance

After only a week of testing in LMSYS' chatbot arena, the experimental version of Gemini 1.5 Pro achieved an impressive ELO rating of 1300. This performance far surpassed OpenAI's GPT-4o (1286 ELO), and Anthropic's Claude 3.5 Sonnet (1271 ELO). Remarkably, Google DeepMind released this new model without any prior announcement, and it quickly shot to the top of the AI chatbot leaderboard, establishing Google as a new leader in the field of large language models (LLMs). The sudden emergence and exceptional performance of Gemini 1.5 Pro have led to speculation that Google might have even more advanced models in development that have not yet been made public. This unexpected advancement is intensifying the competition among major AI companies, fuelling further innovation and advancements from competitors.

How it affects AI Call Agents

As we saw OpenAI be the leader in the LLM game for a while, with well-known models like 3.5 Turbo, 4 and 4 Turbo, we saw them pivot to voice capabilities with the release of GPT-4o which marked a significant pivot towards conversational AI. This focus on enhancing voice interactions set a new benchmark in the industry.

Now with Google ranking first against other LLMs, could we expect to see more focus on voice capabilities from DeepMind? While it seems unlikely at this point, only time will tell how text-based conversational AI evolves. Google did release a snippet showcasing their multimodal AI model capabilities in response to OpenAI's 4o initial release, but there hasn't been much news since then. If Google decides to focus on voice capabilities, it will be interesting to see how their performance compares to 4o and what unique approach they might take.

CFive AI helps businesses implement cutting-edge AI calling solutions that enhance customer experience while reducing operational costs. Contact us to learn how we can transform your customer service operations with sophisticated Voice AI technology.

Back to Blogs