Google launches Gemini, an AI model it hopes will defeat GPT-4
9 mins read

Google launches Gemini, an AI model it hopes will defeat GPT-4

For nearly a decade, Google has been an ‘AI-first’ company. It finally makes a big move now that it’s been a year into ChatGPT’s AI era.

A new era of artificial intelligence has begun at Google, heralded by CEO Sundar Pichai as the Gemini era. Pichai initially hinted at Gemini during the I/O developer conference in June, and now the model is being unveiled to the public. Pichai, along with Google DeepMind CEO Demis Hassabis, describes Gemini as a significant advancement in AI that will have a profound impact on virtually all of Google’s products. Pichai highlights the remarkable synergy achieved by enhancing one underlying technology, noting that improvements immediately translate across their diverse range of products.

Gemini transcends being a singular AI model, branching into distinct versions tailored for various applications. The lightweight Gemini Nano is designed for native, offline execution on Android devices. At the same time, the robust Gemini Pro is set to power numerous Google AI services, serving as the foundation for Bard. The pinnacle of this lineup is the formidable Gemini Ultra, crafted for high-performance tasks in data centers and enterprise applications, slated for release next year.

The model’s rollout is multifaceted: Bard now runs on Gemini Pro, delivering enhanced capabilities to Pixel 8 Pro users through Gemini Nano, with Gemini Ultra anticipated in the coming year. Beginning December 13th, developers and enterprise users can access Gemini Pro through Google Generative AI Studio or Vertex AI in Google Cloud. Although currently available only in English, plans are underway to expand language support shortly. Sundar Pichai envisions Gemini seamlessly integrating into Google’s search engine, ad products, Chrome browser, and more worldwide, marking it as the future of Google, arriving at an opportune moment.

OpenAI launched ChatGPT a year and a week ago, quickly becoming a significant player in the AI landscape. Now, Google, a pioneer in foundational AI technology and a self-proclaimed “AI-first” organization, is gearing up to respond to the industry dominance achieved by OpenAI’s ChatGPT. The much-anticipated showdown between OpenAI’s GPT-4 and Google’s Gemini is on the horizon.

Addressing the competition, Google’s CEO Demis Hassabis reveals that the company has conducted an extensive comparative analysis between GPT-4 and Gemini. Google employed a comprehensive approach, running 32 benchmarks that covered a spectrum of evaluations, ranging from broad assessments like the Multi-task Language Understanding benchmark to specific tests gauging the models’ proficiency in generating Python code. Hassabis confidently asserts that Google outperforms OpenAI on the majority, claiming, “I think we’re substantially ahead on 30 out of 32” benchmarks, hinting at Google’s confidence in its technology. The competitiveness is evident, with some benchmarks being narrow in scope and others more extensive in their evaluation. The battle between GPT-4 and Gemini is poised to shape the future of AI, and Google seems poised for a robust challenge.

In the extensive benchmarks conducted, where the performance of GPT-4 and Gemini closely competed, Gemini’s notable edge became evident in its adeptness at understanding and interacting with video and audio content. This advantage is not incidental but a deliberate aspect of Gemini’s design, emphasizing multimodality as a core element of its approach. Unlike OpenAI’s approach of training separate models for images (like DALL-E) and voice (like Whisper), Google’s strategy involved constructing a unified multisensory model from the project’s inception.

According to CEO Demis Hassabis, Google’s emphasis has always been on developing highly general systems. The intrigue lies in the seamless integration of various data modes—collecting information from diverse inputs and senses. The goal is to enable the model to respond with a wide variety that mirrors the complexity of the data it processes. This strategic focus on multimodality positions Gemini as a versatile and comprehensive AI model, capable of handling a wide range of inputs and delivering nuanced responses.

Also Read | Microsoft Bing Chat: Now With DALL-E 3 AI Image Generator

At its current stage, the foundational models of Gemini primarily operate on a text-to-text basis. However, as the complexity of the models increases, especially with the likes of Gemini Ultra, the capabilities extend to processing images, video, and audio. Google’s ambition with Gemini goes beyond the current range, aiming for even greater generality.

According to CEO Demis Hassabis, the roadmap for Gemini involves expanding its capabilities to encompass additional modalities such as action and touch, resembling functionalities associated with robotics. The trajectory for Gemini foresees the incorporation of more senses over time, leading to heightened awareness and improved accuracy. Hassabis envisions these models evolving to have a better understanding of the world around them. Acknowledging that these models still exhibit issues like hallucinations and biases, he emphasizes that the continuous accumulation of knowledge will contribute to their refinement, making them progressively more adept and grounded in their understanding. Despite existing challenges, the trajectory for Gemini is marked by a commitment to continual improvement and a deeper understanding of the intricacies of the world.

While benchmarks provide a quantitative measure, the true gauge of Gemini’s capabilities lies in the hands of everyday users leveraging it for brainstorming, information retrieval, coding, and more. Google places particular emphasis on coding as a potential game-changer for Gemini, introducing a new code-generating system called AlphaCode 2, boasting improved performance over its predecessor—now outperforming 85 percent of coding competition participants compared to the original AlphaCode’s 50 percent.

Also Read | Humane launches AI Pin, a wearable powered by OpenAI

Notably, Gemini is positioned as a more efficient model, trained on Google’s Tensor Processing Units (TPUs) and surpassing the speed and cost-effectiveness of previous models like PaLM. Concurrently, Google introduced the TPU v5p, a new version of its TPU system designed for data center use, catering to the demands of training and running large-scale models.

In discussions with Sundar Pichai and Demis Hassabis, it becomes evident that the launch of Gemini is not just the introduction of a model but a significant milestone in a broader project. It represents the culmination of years of development and strategic planning, positioning Gemini as the model Google has long been working towards—possibly even one that could have predated the rise of OpenAI and ChatGPT.

Reflecting on the aftermath of ChatGPT’s launch, which prompted a “code red” response from Google, the company is striving to maintain its “bold and responsible” approach. Both Hassabis and Pichai express a commitment to cautious progress, emphasizing that the journey toward artificial general intelligence (AGI) demands careful consideration. As AGI represents a paradigm shift—a self-improving, superintelligent AI—the approach remains cautious yet optimistic, acknowledging the transformative potential it holds for the world.

Google emphasizes its commitment to ensuring the safety and responsibility of Gemini through rigorous internal and external testing, including red-teaming processes. Sundar Pichai underscores the significance of prioritizing data security and reliability, particularly crucial for products targeting enterprise applications, where generative AI often finds its primary utility. However, Demis Hassabis acknowledges the inherent risks associated with launching a cutting-edge AI system, acknowledging that unforeseen issues and attack vectors may emerge. Hassabis highlights the necessity of releasing such systems to uncover and learn from potential challenges, emphasizing that the controlled release of new models allows for the identification of issues before they can adversely impact users.

Also Read | Constructing Intelligence: The Role of AI in Construction

With the Ultra release, Google adopts a cautious approach, likening it to a controlled beta phase with a designated “safer experimentation zone” for the most powerful and unrestrained model. This careful rollout aims to identify and address any potential pitfalls or unintended consequences, demonstrating Google’s commitment to user safety and satisfaction.

Despite the uncertainty surrounding the immediate impact of the first-generation Gemini model, Pichai and other Google executives express a profound belief in the transformative potential of AI. Pichai has previously likened AI’s impact on humanity to that of fire or electricity. While Gemini may not revolutionize the world in its initial iteration, the optimistic perspective is that it could propel Google forward in the race to develop advanced generative AI, potentially surpassing the capabilities of competing models like OpenAI’s ChatGPT. The vision at Google is expansive, with many foreseeing Gemini as the catalyst for a new era, possibly surpassing the transformative influence that the internet had in establishing Google as a tech giant.

Leave a Reply

Your email address will not be published. Required fields are marked *