Google IO 2024 biggest announcements: Gemini, AI coworkers, and more
Google doubles down on AI
📱 Project Astra looks like Google’s first crack at a Humane AI Pin & Rabbit R1 everyday AI assistant
🤖 AI coworkers powered by Gemini can join your workforce to help manage projects and data
🎼 Music AI Sandbox can produce musical loops with a single text prompt and some audio samples
🎥 Veo is a brand new AI video generation tool from Google
🖌️ Imagen 3 improves generative AI imagery with photorealistic detail and text
It’s the first day of Google IO and it’s a big day of Gemini and AI announcements. Streaming just a day after OpenAI revealed its expansive ChatGPT-4o improvements, Google’s first developer keynote had to be all about AI. Google announced all new Gemini features including real-time vision AI, video generation, helping you find images, summarizing email records or meetings, and more. Here’s everything Google announced for its services and Gemini from Google IO day 1.
🖼️ Gemini in Google Photos. If you’ve ever tried to find an image in your camera roll but just can’t find it, Gemini is here to help. Google introduced a new Ask Photos with Gemini feature that lets Google’s AI assistant find photos for you. Sundar Pichai asked Gemini what his license plate was and Gemini searched images of his car with the plate visible to give him the exact number. In another example, Google asked Gemini how their child’s swimming was going and it pulled up a series of photos of the person swimming as a child to an adult.
📬 Gemini in Gmail and Meet. Gemini can also help you with summarizing things and finding information from your inbox. You can ask Gemini things like what’s your child’s upcoming field trip and it’ll comb through your inbox to create a helpful summary of dates and locations. Alternatively, you can ask Gemini what was this meeting about and it’ll analyze what was said and create a summary of everything that was discussed.
🖊️Contexual Smart Reply in Gmail. Gemini is also making auto replies smarter. Instead of just helping you write short phrases and generic introductions, Gemini smart reply can create multiple responses, saving you from having to type out anything at all. For example, in response to a roofing appointment, you could have Gemini proceed and agree on the time or suggest another time.
🗣️ Conference with AI. Google showed off the new multi-modal capabilities of Gemini that now allow it to hold conversations on its own. You basically can tell Gemini to explain things like physics and it’ll create a debate/conversation about the topic. You can also join the conversation and steer it into a different topic like basketball.
📦 Returns made easy. Returning things is always a struggle and so Google wants to take the pain out of the process and let Gemini do it for you. You can simply take a photo of a product and let Gemini know you want to return it, from there the AI assistant can look up your email invoice, contact the retailer, and start the return process for you.
👀 Project Astra. Vision AI that can describe things you look at is all the rage and Google is developing a version of its own called Project Astra. Google introduced it as an AI assistant that can be truly helpful in everyday life. Google demoed Project Astra running on an Android phone, asking it what things are and adding arrows to point at specific things for more context. Amazingly, Project Astra can inference things like looking at a random piece of code and it was able to decipher it involved encryption. Project Astra was also shown a doodle of a alive cat and a dead cat above a cardboard box and deduced it was Schrodinger's Cat.
🎹 Text to Music AI Sandbox. Google has unveiled a new music-making tool called the Music AI Sandbox. It can generate loops from AI prompts that sound more human-made than previous iterations. All the user needs to provide the tool is a text prompt and short audio clips or "stems" to create a sound sample.
🎨Imagen 3. Google flexed the muscles of its latest AI image generation tool, Imagen 3. The company claims it can produce an incredible level of photorealistic, life-like images with more detail and far fewer distracting visual artifacts than its prior models. Google also claims to have cracked the code on rendering text in generative images, opening up the possibilities for personalized birthday messages, title slides, presentations, and more.
🎞️Veo video generation. Google has added video generation to Gemini’s toolbox in Video FX. Basically, Veo can create 1080p videos from a text prompt like a blooming flower, a moving Sci-Fi car, a drone shot of a skyscraper in a fake city, and more.
❓AI Overviews on steroids. If you’ve Googled anything recently, you might have noticed your search results pop up with a generative AI overview. Well now, Google has promised it plans to deliver them more in the US and beyond. What’s more, Google says you can ask it more and multiple questions, and it’ll produce an AI overview that tries to answer all of your longer questions and subqueries.
📂 AI organized search results. Beyond creating AI overviews, Google says AI will be changing how your search results are organized. While looking up “anniversary-worthy restaurants,” Google will divide location suggestions into categories like outdoor rooftops or restaurants with live music. Google says they’ll be first rolling out to food and movies, then music, books, shopping, and more.
🤖 Chip, your new AI coworker. AI will soon work with and for you literally. Google has introduced virtual teammates as an AI-powered coworker that has tasks, responsibilities, and their own workplace email. You can set up a virtual coworker to monitor and track projects, organize information, analyze data, and facilitate team collaboration.
💎Gems. Google announced users can save chatbot functions you use repeatedly with a new function called Gems. These personalized chatbots give you a shortcut to tasks and interactions you use regularly such as as yoga bestie or a calculus tutor. Additionally, you can create chatbot “personalities” for other things like studying for a test, preparing to run a marathon, getting help to eat more healthily, and more.
✈️ Trip planning coming this summer. Gemini trip planning right now is limited to creating a short list of tourist attractions, but at I/O Google unveiled a much more expanded trip planning tool that can create a multi-day itinerary planned out to the hour in a matter of seconds. What’s more, Gemini can compile a trip based on your flight and hotel details in your email inbox to find nearby restaurants and attractions based on your prompts. It can also further customize its choices based on your prompt, dietary restrictions and other things you chose to avoid.
🚩AI-powered scam detection. Android phones can already help you avoid scam calls and now Google says Gemini Nano AI will help you avoid scam hidden in calls. Basically, Google’s AI will listen in on your phone call (with your permission) to catch common scammer conversation patterns and pop up a real-time warning, suggesting that you should end the call.
📚 Gemma2 arriving June. Moving onto Google's lightweight language model, Google announced a new Gemma 2 with a larger 27 billion parameter model. According to Google, the new LLM is sized to run optimally on Nvidia’s next-gen GPUs.
Kevin Lee is The Shortcut’s Creative Director. Follow him on Twitter @baggingspam.