Happy Sunday! We just had another crazy week in AI. Google just had its yearly Google I/O event drooping Antigravity 2.0, Gemini Omni, and Gemini 3.5 Flash, while NVIDIA and ElevenLabs released their own speech models and engines.

And that's not all, here are the most important AI moves you need to know this week.

NVIDIA has open-sourced Nemotron Speech, a family of leaderboard-topping speech models that give any app real-time, low-latency speech recognition for free. Built from the ground up for voice agents and live captioning, the models ship with open weights, training data, and recipes.

  • Built for Real-Time: The streaming ASR model uses a cache-aware FastConformer encoder that finalizes transcription in as little as 24 milliseconds, with end-to-end latency kept under 500ms. Pick your latency mode (80ms, 160ms, 560ms, or 1.12s) at inference time without retraining.

  • Leaderboard Accuracy: Evaluated on the Hugging Face OpenASR leaderboard datasets (AMI, Earnings22, Gigaspeech, LibriSpeech), it keeps word error rate under 8% even in its most aggressive low-latency mode.

  • Massive Concurrency: On a single NVIDIA H100, the 600M-parameter model supports around 560 concurrent streams, making large-scale voice agents and live captioning genuinely affordable to run.

Google unveiled Antigravity 2.0 at I/O 2026, transforming its agentic coding tool into a full developer platform. It ships as two surfaces, a revamped desktop app and a brand-new CLI built in Go, both for orchestrating multiple AI agents that plan, build, and test software for you.

  • Multi-Agent Command Center: The standalone desktop app puts orchestration front and center. Set several agents to work on problems simultaneously, design custom subagent workflows, and schedule tasks that run automatically in the background.

  • Terminal-First CLI: For developers who live in the terminal, the new Antigravity CLI is faster and more responsive than its predecessor. It retains Skills, Hooks, Subagents, and plugins, and fully replaces the old Gemini CLI.

  • Deep Ecosystem Integration: Connects natively with Google AI Studio, Firebase, and Android. Export projects straight from AI Studio with full context intact, and build custom agents with the new Antigravity SDK. Much of it runs on the new Gemini 3.5 Flash model.

ElevenLabs has launched Speech Engine, an API-driven tool that turns any text-based chat agent into a full voice agent with a single prompt. Wire it into your coding agent or app so it can generate voices, clone them, and narrate audio with no architecture changes.

  • One Prompt to Voice: Speech Engine unifies speech, transcription, and voice orchestration models into one layer. Add it to an existing agent with no changes to its current setup, no rebuilds required.

  • Generate, Clone, Narrate: Through the ElevenLabs MCP server, agents like Claude Code, Codex, and Cursor get text-to-speech, voice cloning from audio samples, transcription, and sound effects, all callable in natural language.

  • Enterprise-Ready and Multilingual: Supports 70+ languages, with SOC 2, HIPAA, and GDPR compliance plus EU Data Residency and Zero Retention Mode options for sensitive workloads.

Google has launched Gemini 3.5 Flash, delivering flagship-level intelligence at Flash speed. It's free in the Gemini app and is now the default model in AI Mode for Search worldwide.

  • Flash Beats the Old Flagship: Gemini 3.5 Flash outscores the previous Gemini 3.1 Pro on coding and agentic benchmarks, hitting 76.2% on Terminal-Bench 2.1 and 83.6% on MCP Atlas tool use, while running roughly 4x faster and costing about 25% less.

  • Free and Default Everywhere: It's now the default model powering the Gemini app and AI Mode in Search globally, so billions of people get frontier-level intelligence at no cost.

  • Built to Act, Not Just Answer: Designed for long-horizon agentic tasks, it works with Antigravity to deploy parallel subagents and powers Gemini Spark, Google's new 24/7 personal AI agent.

Alibaba has unveiled Qwen3.7-Max, a frontier reasoning agent with a 1-million-token context window. It rivals top Western models on key benchmarks while costing a fraction of GPT-5.5 and Claude Opus 4.7.

  • Million-Token Context: A 1M-token window (up from 256K in the previous generation) holds an entire mid-sized code repository or a large stack of documents in a single request, with 64K max output and extended reasoning on by default.

  • Frontier Performance, Lower Price: It scores 92.4 on GPQA Diamond, edging past Claude Opus 4.6 Max's 91.3, and lands among the highest models on the Artificial Analysis Intelligence Index at 56.6. API pricing is $2.50 input and $7.50 output per million tokens, materially below Western frontier rates.

  • Long-Horizon Endurance: Built as an "agent foundation," it ran 35 hours of continuous autonomous work in one test, making 1,158 tool calls. It also plugs directly into Claude Code via the Anthropic API protocol.

Try it now → https://chat.qwen.ai/

Google has dropped Gemini Omni, a multimodal model that generates video with realistic physics. Blend real footage with generated content and edit it all in plain conversation, no timelines or keyframes required.

  • Physics-Aware Generation: Omni reasons about gravity, kinetic energy, and fluid dynamics rather than just pattern-matching pixels, so generated scenes hold up physically instead of looking floaty or morphing between cuts.

  • Blend Real and Generated: Mix existing footage, images, sketches, text prompts, and audio references into one cohesive clip. Start from rough footage or a sketch and build it into a polished cinematic result.

  • Edit by Conversation: Swap backgrounds, change wardrobe, adjust lighting, or add objects just by typing. Each instruction builds on the last, keeping characters, scenes, and physics consistent across edits. The first version, Gemini Omni Flash, is live now in the Gemini app, Google Flow, and YouTube Shorts.

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.

See you tomorrow :)

Dr. Alvaro Cintas