🤖 AI Weekly Recap (Week 27)

Happy Sunday! We just had another crazy week in AI. ChatGPT has officially unveiled the GPT-5.6 family alongside a new "Ultra Mode", while there is a new free tool that can generate pro-quality images with full style and moodboard control.

And that's not all, here are the most important AI moves you need to know this week.

6. OpenAI Launches GPT-5.6 Sol, Terra & Luna

OpenAI has officially unveiled the GPT-5.6 family (Sol for frontier reasoning, Terra for balanced production, and Luna for fast/cheap inference) alongside a new "Ultra Mode" that orchestrates multiple sub-agents in parallel to tackle the kind of multi-day coding, science, and cybersecurity workloads that previously required a full engineering team.

Three Tiers, One Family: Sol targets PhD-level science and frontier coding, Terra hits the cost/quality sweet spot for most apps, and Luna runs at sub-100ms latency for high-volume inference, all sharing the same tokenizer and tool interface.
Ultra Mode Sub-Agents: A single prompt can spawn dozens of specialized GPT-5.6 sub-agents working in parallel (planners, coders, verifiers, red-teamers) that vote, debate, and converge on a final answer before returning to the user.
Long-Horizon Crushing: On SWE-Bench Verified, Sol-Ultra scored 84.1%, with internal tests showing it can ship full PRs across multi-repo codebases over 12+ hour autonomous runs without human intervention.

Try it now → https://chatgpt.com

5. ByteDance Unveils Seedance 2.5

ByteDance has dropped Seedance 2.5, a video generation model that produces native 30-second 4K clips from up to 50 multimodal references in a single pass, and ships with a 3D pre-visualization mode that lets directors block camera moves, lighting, and composition before committing to a final render.

Native 30s at 4K: Unlike stitched-clip competitors, Seedance 2.5 generates the entire 30-second sequence in one shot, preserving character identity, lighting continuity, and physics across the full clip at native 3840×2160.
50-Reference Multimodal Conditioning: Feed it up to 50 inputs (images, video clips, audio tracks, depth maps, character sheets) and the model fuses them into a single coherent scene with consistent style and subject anchoring.
3D Pre-Visualization Mode: Block your shot in a lightweight 3D scene viewer first, setting camera path, lens choice, key/fill lighting, and subject positions, then hit render. The final video honors every parameter you set.

More info → https://seedance.bytedance.com

4. DeepReinforce Releases Ornith 1.0

DeepReinforce has open-sourced Ornith 1.0, a coding model family trained with a novel objective where the model learns to build its own scaffolding during RL, meaning it grows its own planner, verifier, and tool-use loops instead of relying on hand-coded agent frameworks. The result rivals Claude-class systems and runs on a single consumer GPU.

Self-Scaffolding Training: During RL, Ornith generates and refines its own agent loops as part of the optimization process, yielding a model that comes with built-in planning, retry, and self-critique behavior at inference time with no external framework required.
Single-GPU Inference: The flagship Ornith-30B-A3B variant runs comfortably on a single RTX 4090 or M3 Max with quantization, hitting interactive latencies for full agentic coding sessions.
Claude-Class on Code Benchmarks: Hits 71.2% on SWE-Bench Verified and 89.4% on LiveCodeBench, putting it within striking distance of frontier closed models at roughly 1% of the inference cost.

Try it now → https://huggingface.co/collections/deepreinforce-ai/ornith-10

3. Krea Drops Krea 2

Krea has released Krea 2, its first fully in-house image foundation model. Not a fine-tune, not a LoRA stack, but built from scratch on a custom dataset. The headline feature is Krea 2 Turbo, a distilled variant that pumps out 2K-resolution images in roughly 2 seconds, with full open weights and native moodboard, LoRA, and style-reference support.

Built From Scratch: Krea 2 is a ground-up architecture trained on Krea's curated dataset, not a Stable Diffusion or Flux derivative, giving it a distinct aesthetic and noticeably better prompt adherence on compositional scenes.
2-Second 2K Turbo: The distilled Turbo variant generates 2048×2048 images in around 2 seconds on a single H100, making it one of the fastest high-resolution image models ever shipped publicly.
Open Weights + Full Tooling: Both Base and Turbo weights are live on Hugging Face, with first-class support for LoRA training, moodboard conditioning (drop in 8+ reference images), and IP-Adapter-style style transfer baked in.

Try it now → https://www.krea.ai

2. Mistral Drops OCR 4

Mistral has released OCR 4, a document understanding model that returns not just text, but bounding boxes, block-type labels, reading order, and per-token confidence scores across 170 languages, outperforming every leading commercial OCR system on public benchmarks while being fully self-hostable.

Structured, Not Just Text: Every output includes pixel-perfect bounding boxes, block types (heading, paragraph, table, figure, caption, footnote, formula), reading order, and per-token confidence, feeding directly into RAG pipelines without post-processing.
State-of-the-Art Accuracy: Tops Azure Document Intelligence, Google Document AI, AWS Textract, and the previous open-source SOTA on DocVQA, FUNSD, and the new MultiLingual-DocBench, often by 4-8 points across 170 languages spanning Latin, Cyrillic, Arabic, CJK, Indic, and low-resource African scripts.
Fully Self-Hostable: Apache 2.0 weights ship in 2B and 8B sizes, both fitting on a single GPU, letting regulated industries (healthcare, legal, gov) process sensitive documents without sending a single page to an external API.

Try it now → https://mistral.ai/ocr

1. Alibaba Open-Sources Qwen-AgentWorld

Alibaba has open-sourced Qwen-AgentWorld, the first language-based world model that simulates 7 distinct agent environments (browser, terminal, code editor, OS desktop, mobile UI, embodied robotics, and multi-agent negotiation) entirely inside a single model. It tops frontier closed models on AgentWorldBench, runs locally, and ships under a fully permissive license.

One Model, Seven Worlds: Instead of plugging into external sandboxes, Qwen-AgentWorld internally simulates the state and dynamics of seven environments, letting agents plan, rollout, and self-correct against an internal model of the world before taking any real action.
Tops the Closed Frontier: Scores 73.8 on AgentWorldBench, beating GPT-5.4 (69.1) and Claude Opus 4.8 (71.4), the first time an open model has held the #1 spot on a major agentic benchmark since the launch of GPT-4o.
Fully Permissive + Runs Locally: The 70B-A12B MoE variant runs on a single 80GB H100 (or a Mac Studio with enough RAM), with a 4-bit quant fitting on dual RTX 5090s. Weights, training recipes, and the AgentWorldBench harness all ship under Apache 2.0.

Try it now → https://huggingface.co/collections/Qwen/qwen-agentworld

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.

See you tomorrow :)

Dr. Alvaro Cintas