Happy Sunday! We just had another crazy week in AI. Z.ai just dropped the world’s most powerful open-source AI that beats GPT-5.5 on coding, while there is a new open-source system keeps AI agents running non-stop.

And that's not all, here are the most important AI moves you need to know this week.

Apple has open-sourced coreai-models, a complete toolkit that lets developers export models from Hugging Face (and other sources) and run them natively on iPhone, iPad, and Mac with zero cloud dependency. It effectively turns the 2 billion+ active Apple devices into local AI machines.

  • Native Swift Runtime: Ships with a Swift package built directly on the Core AI framework, so developers can integrate models into iOS and macOS apps with just a few lines of code.

  • Coding-Agent Skills: Includes pluggable skills so agents like Codex and Claude Code can use Core AI as an expert, handling weight compression, palettization, quantization, and BC1S layouts automatically.

  • Zero Cloud, Zero Cost: Models run entirely across Apple Silicon's CPU, GPU, and Neural Engine. Private inference, no API fees, and no data ever leaves the device.

LangChain's LangGraph has quietly become the default execution layer for serious agentic systems, 32K+ GitHub stars, used in production by Klarna, Uber, LinkedIn, and Databricks. It solves the #1 cause of production agent failures: agents that crash mid-workflow, forget, or repeat work because earlier frameworks discarded state between steps.

  • Durable Execution: Agents survive crashes, deployments, and timeouts. State is checkpointed after every step, so a workflow resumes from exactly where it left off.

  • Human-in-the-Loop: Pause an agent mid-task, inspect its reasoning, edit its state, then resume. Useful for high-risk decisions like approving contracts or reviewing generated code.

  • Memory That Persists: Short-term working memory for the active session, plus long-term memory that carries facts and preferences across threads, sessions, and even multiple agents.

Google's NotebookLM just shipped Cinematic Video Overviews, immersive, narrated, deep-dive videos generated from anything you drop in. Three models work in tandem: Gemini 3 directs the story, Nano Banana Pro generates the visuals, and Veo 3 produces the final video, all from one click.

  • Three Models, One Pipeline: Gemini 3 acts as creative director (narrative arc, visual style, structural decisions), Nano Banana Pro handles imagery, and Veo 3 produces the video output.

  • Full Studio Workspace: The same notebook can also generate editable slide decks (with PPTX export), 10 infographic styles, audio overviews, flashcards, quizzes, and citation-backed reports.

  • Slide Revisions Without Regen: A new Pencil UI lets you target a specific slide ("fix slide 3, swap the chart for a bar graph") instead of regenerating the whole deck.

Block (formerly Square) open-sourced Goose, an Apache 2.0 AI agent with 27K+ GitHub stars and 350+ contributors. Tell it "build me a website like YouTube" and it scaffolds the project, writes the code, runs the tests, debugs the failures, and keeps iterating until it actually works.

  • Any LLM You Want: Works with Claude, GPT, Gemini, or local models via Ollama. Bring your own API key, or plug in an existing Claude Code or Codex subscription via ACP.

  • Real Tool Use, Not Autocomplete: Goose installs dependencies, runs scripts, edits files, executes tests, and self-corrects. A true agent, not a chat window dressed up as one.

  • Runs Anywhere: Native desktop app for macOS, Linux, and Windows, full CLI, and an API. Written in Rust, governed by the Linux Foundation's Agentic AI Foundation, vendor-neutral.

Google Research open-sourced TimesFM, a decoder-only foundation model for time-series forecasting that predicts sales trends, market prices, energy load, user traffic, or crypto volatility, with no fine-tuning required. Pretrained on roughly 10 billion real-world time points, the 2.5 release sits at #1 on the GIFT-Eval benchmark across 28 datasets.

  • Zero-Shot Forecasting: Drop in any series and get a forecast. No retraining, no per-dataset tuning, no ARIMA hand-tweaking, it generalizes across domains the way LLMs generalize across topics.

  • Compact and Fast: Just 200M parameters with a 16,000-step context window, small enough to run on CPU or edge hardware, yet beats models several times its size on the public leaderboard.

  • Available Everywhere: Hugging Face, PyPI, Vertex AI Model Garden, BigQuery, and even Google Sheets for spreadsheet-native forecasting.

Z.AI just dropped GLM-5.2, a 753B-parameter open-weights Mixture-of-Experts model purpose-built for long-horizon engineering. It ships with a stable 1-million-token context window, an MIT license, and benchmark scores that beat OpenAI's GPT-5.5 on multiple long-horizon coding evals, at roughly one-sixth the price.

  • True 1M-Token Context: Five times larger than GLM-5.1's 200K window. A new architecture trick called IndexShare reuses one indexer across every four sparse-attention layers, cutting per-token FLOPs by 2.9× at full context.

  • Beats GPT-5.5 on Coding: 81.0 on Terminal-Bench 2.1 (vs Claude Opus 4.8's 85.0), 62.1 on SWE-Bench Pro, and #1 on Design Arena (ELO 1,360), ahead of even Claude Fable 5 on frontend design quality.

  • MIT-Licensed, Fully Open: Weights are live on Hugging Face under an MIT license, commercial use, fine-tuning, and self-hosting allowed with zero royalty. FP8 variant included for cheaper inference.

Try it now → https://chat.z.ai/

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.

See you tomorrow :)

Dr. Alvaro Cintas