
Happy Sunday! We just had another crazy week in AI. ChatGPT has officially unveiled the GPT-5.6 family alongside a new "Ultra Mode", while there is a new free tool that can generate pro-quality images with full style and moodboard control.
And that's not all, here are the most important AI moves you need to know this week.

OpenAI has officially unveiled the GPT-5.6 family (Sol for frontier reasoning, Terra for balanced production, and Luna for fast/cheap inference) alongside a new "Ultra Mode" that orchestrates multiple sub-agents in parallel to tackle the kind of multi-day coding, science, and cybersecurity workloads that previously required a full engineering team.
Three Tiers, One Family: Sol targets PhD-level science and frontier coding, Terra hits the cost/quality sweet spot for most apps, and Luna runs at sub-100ms latency for high-volume inference, all sharing the same tokenizer and tool interface.
Ultra Mode Sub-Agents: A single prompt can spawn dozens of specialized GPT-5.6 sub-agents working in parallel (planners, coders, verifiers, red-teamers) that vote, debate, and converge on a final answer before returning to the user.
Long-Horizon Crushing: On SWE-Bench Verified, Sol-Ultra scored 84.1%, with internal tests showing it can ship full PRs across multi-repo codebases over 12+ hour autonomous runs without human intervention.
Try it now → https://chatgpt.com

ByteDance has dropped Seedance 2.5, a video generation model that produces native 30-second 4K clips from up to 50 multimodal references in a single pass, and ships with a 3D pre-visualization mode that lets directors block camera moves, lighting, and composition before committing to a final render.
Native 30s at 4K: Unlike stitched-clip competitors, Seedance 2.5 generates the entire 30-second sequence in one shot, preserving character identity, lighting continuity, and physics across the full clip at native 3840×2160.
50-Reference Multimodal Conditioning: Feed it up to 50 inputs (images, video clips, audio tracks, depth maps, character sheets) and the model fuses them into a single coherent scene with consistent style and subject anchoring.
3D Pre-Visualization Mode: Block your shot in a lightweight 3D scene viewer first, setting camera path, lens choice, key/fill lighting, and subject positions, then hit render. The final video honors every parameter you set.
More info → https://seedance.bytedance.com

DeepReinforce has open-sourced Ornith 1.0, a coding model family trained with a novel objective where the model learns to build its own scaffolding during RL, meaning it grows its own planner, verifier, and tool-use loops instead of relying on hand-coded agent frameworks. The result rivals Claude-class systems and runs on a single consumer GPU.
Self-Scaffolding Training: During RL, Ornith generates and refines its own agent loops as part of the optimization process, yielding a model that comes with built-in planning, retry, and self-critique behavior at inference time with no external framework required.
Single-GPU Inference: The flagship Ornith-30B-A3B variant runs comfortably on a single RTX 4090 or M3 Max with quantization, hitting interactive latencies for full agentic coding sessions.
Claude-Class on Code Benchmarks: Hits 71.2% on SWE-Bench Verified and 89.4% on LiveCodeBench, putting it within striking distance of frontier closed models at roughly 1% of the inference cost.

Krea has released Krea 2, its first fully in-house image foundation model. Not a fine-tune, not a LoRA stack, but built from scratch on a custom dataset. The headline feature is Krea 2 Turbo, a distilled variant that pumps out 2K-resolution images in roughly 2 seconds, with full open weights and native moodboard, LoRA, and style-reference support.
Built From Scratch: Krea 2 is a ground-up architecture trained on Krea's curated dataset, not a Stable Diffusion or Flux derivative, giving it a distinct aesthetic and noticeably better prompt adherence on compositional scenes.
2-Second 2K Turbo: The distilled Turbo variant generates 2048×2048 images in around 2 seconds on a single H100, making it one of the fastest high-resolution image models ever shipped publicly.
Open Weights + Full Tooling: Both Base and Turbo weights are live on Hugging Face, with first-class support for LoRA training, moodboard conditioning (drop in 8+ reference images), and IP-Adapter-style style transfer baked in.
Try it now → https://www.krea.ai

Mistral has released OCR 4, a document understanding model that returns not just text, but bounding boxes, block-type labels, reading order, and per-token confidence scores across 170 languages, outperforming every leading commercial OCR system on public benchmarks while being fully self-hostable.
Structured, Not Just Text: Every output includes pixel-perfect bounding boxes, block types (heading, paragraph, table, figure, caption, footnote, formula), reading order, and per-token confidence, feeding directly into RAG pipelines without post-processing.
State-of-the-Art Accuracy: Tops Azure Document Intelligence, Google Document AI, AWS Textract, and the previous open-source SOTA on DocVQA, FUNSD, and the new MultiLingual-DocBench, often by 4-8 points across 170 languages spanning Latin, Cyrillic, Arabic, CJK, Indic, and low-resource African scripts.
Fully Self-Hostable: Apache 2.0 weights ship in 2B and 8B sizes, both fitting on a single GPU, letting regulated industries (healthcare, legal, gov) process sensitive documents without sending a single page to an external API.
Try it now → https://mistral.ai/ocr

Alibaba has open-sourced Qwen-AgentWorld, the first language-based world model that simulates 7 distinct agent environments (browser, terminal, code editor, OS desktop, mobile UI, embodied robotics, and multi-agent negotiation) entirely inside a single model. It tops frontier closed models on AgentWorldBench, runs locally, and ships under a fully permissive license.
One Model, Seven Worlds: Instead of plugging into external sandboxes, Qwen-AgentWorld internally simulates the state and dynamics of seven environments, letting agents plan, rollout, and self-correct against an internal model of the world before taking any real action.
Tops the Closed Frontier: Scores 73.8 on AgentWorldBench, beating GPT-5.4 (69.1) and Claude Opus 4.8 (71.4), the first time an open model has held the #1 spot on a major agentic benchmark since the launch of GPT-4o.
Fully Permissive + Runs Locally: The 70B-A12B MoE variant runs on a single 80GB H100 (or a Mac Studio with enough RAM), with a 4-bit quant fitting on dual RTX 5090s. Weights, training recipes, and the AgentWorldBench harness all ship under Apache 2.0.

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.
See you tomorrow :)
Dr. Alvaro Cintas







