🤖 AI Weekly Recap (Week 24)

Happy Sunday! We just had another crazy week in AI. Google has dropped a free offline dictation app, while we got two insane new image models from Ideogram and Reve.

And that's not all, here are the most important AI moves you need to know this week.

6. Google Drops AI Edge Eloquent: A Free Offline Dictation App

Google has quietly released AI Edge Eloquent, a free dictation app that runs Gemma speech models directly on your device. It strips every "um," "ah," and self-correction in real time, transforming messy speech into clean, publication-ready prose, and your voice never has to touch the cloud.

100% On-Device: Once the Gemma-based ASR model is downloaded (~400MB), the app works fully offline. No subscriptions, no usage caps, and your audio never leaves your phone or Mac.
Smart Cleanup, Not Just Transcription: Unlike standard dictation tools that capture stumbles verbatim, Eloquent uses Gemma to infer your intended meaning, removing filler words and reformatting on the fly into "Key Points," "Formal," or "Short" modes.
Free vs. $15/Month Competitors: It directly undercuts paid apps like Wispr Flow and Willow ($15/mo each) and out-features Apple's built-in dictation, which offers no filler-word removal or text transformation.

Try it now → https://developers.google.com/edge/gallery

5. Reve Drops Reve 2.0: The Fastest Native 4K Image Model

Reve has launched Reve 2.0, a layout-based image model that generates true 4K × 4K (16 megapixel) images natively, no upscaling required. It ranks #2 on the Text-to-Image Arena, sitting just behind GPT-image-2 and above Google's Nano Banana, and pairs the model with a drag-and-drop editor where you can grab and adjust every element.

Layouts, Not Just Prompts: Reve 2.0 plans composition, spacing, text, and object relationships before rendering, treating image generation as a next-token-prediction problem on layouts rather than chaotic diffusion.
Native 4K, No Upscaling Roulette: Most models generate at 1K-2K and rely on third-party upscalers that shift subtle details. Reve renders at 16MP directly, so what you see is what you get.
Touch-and-Edit Every Element: The new editor lets you click on any object, text block, or region and adjust it precisely, turning "regenerate and pray" into actual design work.

Try it now → app.reve.com

4. NVIDIA Drops Nemotron 3 Ultra

NVIDIA has open-sourced Nemotron 3 Ultra, a 550-billion-parameter Mixture-of-Experts model that's now the highest-scoring US-developed open-weight model on Artificial Analysis's Intelligence Index. It ships under the permissive OpenMDW-1.1 license, weights, training data, and recipes, and runs 5× faster than comparable frontier models.

Hybrid Mamba-Transformer MoE: 550B total parameters with only 55B active per token. Mamba-2 state-space layers handle long context with minimal memory overhead, while transformer layers preserve reasoning and tool-use.
1M Token Context at Speed: Supports a 1-million-token context window with 5× higher throughput and roughly 30% lower inference cost than dense models at similar scale — built for long-horizon agent workflows.
Truly Open Supply Chain: Released June 4, 2026 with four checkpoints (NVFP4, BF16 instruct, BF16 base, GenRM), training data, and reproducible recipes — not just weights tossed over the wall.

Try it now → https://openrouter.ai/nvidia/nemotron-3-ultra-550b-a55b:free

3. Ideogram Open-Sources Ideogram 4.0

Ideogram has released Ideogram 4.0 as a 9.3B-parameter open-weight text-to-image model with native 2K resolution, JSON-based layout control via bounding boxes, and the best text rendering of any open-weight release tested. It now ranks #1 on the open-weight DesignArena leaderboard.

Native 2K + Flawless Text: Trained from scratch for design-grade output. Handles multilingual typography, dense small-scale text, and reliably renders headlines, packaging copy, and signage without typos.
JSON Layout Control: Instead of guessing where elements land, you specify bounding boxes and color palettes in a structured prompt. Logos, callouts, and subjects go exactly where you put them.
Local Inference, Open Weights: NF4 quantized weights live on Hugging Face for download and fine-tuning. Apache 2.0 code; commercial deployment requires a separate Ideogram license.

Try it now → ideogram.ai

2. Microsoft Launches Scout

Microsoft announced Scout at Build 2026, its first "Autopilot" agent, a fundamentally new category that goes far beyond Copilot. Unlike chatbots that wait to be prompted, Scout stays always-on in the background, learns your priorities, coordinates work across your apps, and takes action with its own identity, all within the governance guardrails your org sets.

Always-On, Not Episodic: Scout runs continuously across Teams, Outlook, OneDrive, and SharePoint, surfacing risks, drafting meetings, and keeping work moving without you having to ask. A chatbot can be ignored; an agent can notice.
Its Own Identity & Permissions: Every Scout instance operates under its own Entra identity, making actions fully accountable and auditable. Sensitive operations still require human approval before execution.
Local Desktop App + Cross-Platform: Runs locally on Windows and macOS, extends to browsers and MCP servers, and is powered by Microsoft's OpenClaw open-source agentic framework. Currently in Frontier private preview for enrolled customers.

Try it now → Microsoft Scout (Frontier Preview)

1. MiniMax Open-Sources M3

MiniMax has open-sourced M3, the first open-weight model to bring together three capabilities that until now only closed frontier models offered: top-tier coding, a one-million-token context window, and native image and video input, all in one architecture. It scored 59.0% on SWE-Bench Pro, edging past OpenAI's GPT-5.5 (58.6%) and Google's Gemini 3.1 Pro (54.2%).

MiniMax Sparse Attention (MSA): Replaces full attention with KV-block selection, slashing per-token compute at long context to roughly 1/20 the previous generation's cost at 1M tokens, with substantially faster prefill and decode.
Multimodal Across Text, Image & Video: Accepts text, images, and video as input, the first open-weight model to do all three at frontier-class quality, designed for long-horizon agentic work, coding, and tool use.
Days-Long Autonomous Execution: In one test, M3 optimized an FP8 CUDA kernel over 24 hours, made 147 benchmark submissions and 1,959 tool calls, and pushed Hopper GPU utilization from 7.6% to 71.3%, pulling ahead of Claude Opus 4.7.

Try it now → chat.minimaxi.com

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.

See you tomorrow :)

Dr. Alvaro Cintas