🤖 AI Weekly Recap (Week 25)

Happy Sunday! We just had another crazy week in AI. Google has dropped two new AI models, while NVIDIA opens its NIM platform to give you free access to more than 80 advanced AI models.

And that's not all, here are the most important AI moves you need to know this week.

6. NVIDIA Quietly Opens Its NIM Platform: 80+ AI Models, Free

NVIDIA has made its NIM (NVIDIA Inference Microservices) APIs publicly available, giving developers free access to more than 80 advanced AI models via a simple REST API that works seamlessly with the OpenAI SDK. Grab an API key, pick a model, start sending requests.

Real Models, Not Toys: The catalog spans chat, coding, embeddings, vision, and speech, including MiniMax M2.7, DeepSeek 3.2, Kimi 2.5, GLM 5.1, GPT-OSS-120B, and Sarvam-M alongside NVIDIA's own Nemotron 3 family.
OpenAI-Compatible: The API is OpenAI-compatible, any tool or framework that supports a custom base URL can point to NVIDIA NIM instead of OpenAI with no other code changes. Plugs into Cursor, Claude Code, LangChain, Aider out of the box.
Real Free Tier: Upon sign-up, users are granted 1,000 API credits, with an additional 4,000 unlocked by providing a business email to activate a free 90-day NVIDIA AI Enterprise license.

Try it now → https://build.nvidia.com/models

5. Google's Gemini 3.5 Live Translate Speaks 70+ Languages

Google has released Gemini 3.5 Live Translate, a real-time speech-to-speech model that automatically detects 70+ languages and generates smooth, natural-sounding translated speech that preserves the speakers' intonation, pacing and pitch. Instead of waiting for full sentences, it generates speech continuously and stays just a few seconds behind the speaker throughout the session.

Auto-Detect, No Manual Setup: The technology automatically detects more than 70 languages without requiring manual language selection and can generate low-latency translated speech that sounds more natural and conversational, and it holds up in noisy environments.
Your Voice, Not a Robot's: It preserves intonation, tempo, and vocal pitch so the translated voice sounds closer to how the original speaker actually sounds, rather than flat synthesized audio. Every translated audio output also includes a SynthID watermark identifying it as AI-generated.
Shipping Now Across the Stack: Available today for developers in public preview via the Gemini Live API and Google AI Studio, on the Google Translate app globally for both Android and iOS, and in private preview on Google Meet for enterprise.

Try it now → https://aistudio.google.com/

4. Understand Anything Turns Any Codebase into an Interactive Knowledge Graph

A new MIT-licensed open-source tool, Understand Anything, turns any codebase, knowledge base, or docs into an interactive knowledge graph you can explore, search, and ask questions about. It analyzes your project with a multi-agent pipeline, builds a knowledge graph of every file, function, class, and dependency, then gives you an interactive dashboard to explore it all visually.

Works Everywhere You Code: Supports Claude Code, Cursor, VS Code + GitHub Copilot, Copilot CLI, Codex, OpenCode, OpenClaw, Antigravity, Gemini CLI, Pi Agent, Vibe CLI, Hermes, Cline, and KIMI CLI, 15+ platforms in total, all from one install command.
Structure + Semantics: Tree-sitter parses every file into a concrete syntax tree and extracts the same imports, exports, function definitions, call sites, and inheritance relationships every single run. On top of that, an LLM layer reads the parsed structure alongside the source to produce summaries, architectural layer assignments, business-domain mapping, and guided tours.
Hands-On Commands: Ships with /understand-chat to ask natural-language questions about the codebase, /understand-diff to analyze the impact of current uncommitted changes before committing, and /understand-explain for a deep-dive into a specific file.

Try it now → https://github.com/Lum1104/Understand-Anything

3. Google Open-Sources DiffusionGemma

Google DeepMind has released DiffusionGemma, a 26-billion-parameter open-weights model that abandons the standard token-by-token approach to text generation in favor of diffusion, the same technique underpinning image generators like Stable Diffusion. Instead of predicting word-by-word, it generates entire blocks of text simultaneously.

Local on a Consumer GPU: DiffusionGemma is a mixture-of-experts architecture that activates only 3.8 billion of its 26 billion total parameters during inference, allowing it to fit within 18GB of VRAM when quantized to Nvidia's NVFP4 format, running on a GeForce RTX 5090 or 4090 without cloud dependency.
Blistering Throughput: On a single H100, the model exceeds 1,000 tokens per second at FP8 precision, with NVIDIA's model card citing over 1,100 tokens/sec. On a consumer GeForce RTX 5090, throughput reaches 700+ tokens per second, roughly 4x faster than equivalent autoregressive Gemma 4.
256-Token Parallel Blocks: Generates text in 256-token blocks in parallel, which allows every token to attend to all others, providing significant advantages for non-linear domains such as in-line editing and code infilling.

Try it now → https://huggingface.co/google/diffusiongemma-26B-A4B-it

2. Decart's Oasis 3 Spins Up Photorealistic Driving Worlds

AI startup Decart has launched Oasis 3, a new world model designed to generate photorealistic driving environments in real time. Available immediately through an API, the model is initially aimed at autonomous vehicle companies that need large-scale simulation environments for training and testing systems. The pitch: instead of capped demos, you can wander a generated world for hours.

Closed-Loop, Multi-Camera Sim: Oasis 3 generates physically accurate, multi-camera environments designed for training and testing autonomous systems. It responds to steering, movement, and API commands in real time, creating a true closed-loop simulation.
Order-of-Magnitude Cheaper Inference: Decart claims its models are more than an order of magnitude cheaper to run than any other in the industry, and that it has burned through drastically less than $100 million in its lifetime, which is how it can let you generate infinitely.
A Caveat Worth Knowing: The model can degrade significantly when you let it generate a world for too long. The system consistently sets up a strong initial scene that matches the prompt, but the thematic integrity degrades rapidly as you move through the world.

Try it now → https://decart.ai/oasis

1. Moonshot AI Open-Sources Kimi K2.7 Code

Moonshot AI has released Kimi K2.7 Code, an open-source, coding-focused agentic model built on a Mixture-of-Experts architecture with 1 trillion total parameters and 32 billion activated parameters per token. It beats Claude Opus 4.8 on tool-use benchmarks while reducing thinking-token usage by approximately 30% compared with K2.6.

Trillion-Parameter MoE, 256K Context: 1 trillion total parameters with 32 billion activated per token, supporting a 256K context length and Multi-head Latent Attention (MLA), tuned for long-horizon, multi-file engineering rather than one-shot snippets.
Double-Digit Benchmark Gains Over K2.6: +21.8% on Kimi Code Bench v2 (62.0 vs 50.9), +11.0% on Program Bench (53.6 vs 48.3), and +31.5% on MLS Bench Lite (35.1 vs 26.7), closing the gap to frontier closed models significantly.
Run It Yourself: The model is live on Moonshot AI's Kimi platform APIs and hosted on Hugging Face under a Modified MIT License. That license permits commercial use with attribution for large-scale deployments.

Try it now → https://www.kimi.com/

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.

See you tomorrow :)

Dr. Alvaro Cintas