🧠 ChatGPT finally remembers you

Good Morning! Here is today's breakdown:

ChatGPT now remembers your preferences and you control what it knows
NVIDIA's 550B open model just went live on Hugging Face
Google Magenta 2 brings live AI music to your MacBook
The first mistake most people make with Claude
4 new AI tools worth trying today

AI MEMORY

🧠 ChatGPT just upgraded its memory

OpenAI shared that ChatGPT is now better at remembering preferences, constraints, and context across conversations, with a new memory summaries feature that lets you review and steer exactly what it stores about you. Rolling to all users over the next few weeks, starting with Plus and Pro in the US.

ChatGPT now retains preferences, constraints, and ongoing project context across sessions, so you spend less time re-explaining who you are and how you work each time you open a new chat
Memory summaries give you a readable view of everything ChatGPT has stored, with direct controls to edit, remove, or correct what it keeps
Contextual pickup is meaningfully improved: returning to a long-running task now feels continuous rather than starting from scratch
Rolling out to Plus and Pro users in the US first, then expanding to all users over the next few weeks

ChatGPT's memory has existed for over a year but the experience was opaque. You could turn it on but could not see what it knew or fix what it got wrong. Memory summaries fix the visibility problem. Combined with better contextual retention, this moves ChatGPT from a session-based tool to something that builds a real working model of how you operate over time. For daily users, this is the update that finally makes it feel like ChatGPT knows you.

AI MODELS

💻 NVIDIA's 550B open model is live

INVIDIA released Nemotron 3 Ultra, a 550B parameter open-source model now live on Hugging Face, OpenRouter, ModelScope, and NVIDIA NIM, using only 55B active parameters per token via a Mixture-of-Experts architecture at 300+ tokens per second. Commercial use is permitted under the NVIDIA Open Model License.

550B total parameters with 55B active per token via Latent MoE, delivering frontier-scale capability at inference costs closer to a 55B dense model
Ranks #1 among all US open-weights models on the Artificial Analysis intelligence index with a score of 48, and supports a 1 million token context window
Available now via Hugging Face for download and OpenRouter or NVIDIA NIM for managed API access, with confirmed early adopters including CrowdStrike, Palantir, Accenture, and Perplexity
Requires datacenter-grade infrastructure to self-host; the practical path for most teams is NVIDIA NIM or OpenRouter at standard API rates

IA 550B open-weight model with commercial use permission was not possible outside of proprietary APIs six months ago. For enterprises evaluating whether to move off closed API providers, Nemotron 3 Ultra is the first serious candidate at this scale. The throughput and 1M context window are competitive with closed frontier models. It does not run on a laptop, but for any team with cloud or on-premises GPU access, it is now a real option.

AI MUSIC

🎵 Google Magenta 2 brings live AI music to your MacBook

Google DeepMind's Magenta team launched Magenta RealTime 2, an open-weights live music model that runs locally on Apple Silicon MacBooks at 200ms latency, controlled simultaneously via MIDI, audio, and text. Available today as a standalone app and DAW plugin.

Runs natively on Apple Silicon MacBooks via a C++ MLX engine at ~200ms end-to-end latency, enabling real-time musical performance with no cloud dependency
Accepts MIDI controllers, audio input, and text prompts simultaneously so musicians can blend styles, clone sounds, and build live accompaniment in one session
Works as a standalone app or drops directly into your DAW as a plugin, integrating with your existing production setup without replacing any current tool
Open-weights under Apache 2.0 with model weights on Hugging Face and code on GitHub, plus a collection of built-in playable instruments and experiences

Every other AI music tool works as a generator. You give it a prompt and it produces a track. Magenta RealTime 2 works as an instrument. It responds to what you play, in real time, while you play it. That distinction matters for any musician who wants AI in the creative loop rather than handling the output. It runs locally on hardware you already own, costs nothing to download, and ships as a DAW plugin on day one.

PREMIUM GUIDE

📋 The Cowork and Claude Code playbook I use daily to not hit daily limits.

I hit Claude's usage limit three times in one day last week.

Not a huge research project. Not a coding sprint. Normal work. A few Cowork sessions. Some writing. Some debugging in Claude Code. Then at 3pm, the limit screen.

I sat down and check what was actually eating my tokens. The math was uglier than I expected.

Here's the part most people miss: every message you send to Claude re-reads the full conversation before it responds. Message 1 costs a few thousand tokens. Message 30 costs roughly 100,000 tokens, because Claude re-reads the 29 messages before it even starts thinking about the new one.

Your conversation is a bank account that gets more expensive every time you dip in.

Most habits in this guide come back to that single idea: stop paying for context you aren't using.

I split this into two parts. Cowork is the chat side. Claude Code is the terminal side. Both burn tokens. Both have specific fixes.

First, the framework.

Two numbers run your life with Claude:

Context window. Roughly 200K tokens per conversation. Everything counts: your messages, Claude's replies, file uploads, tool outputs, system instructions.
Usage limit. How much you can send across all conversations in a 5-hour window (plan-dependent).

The mental model: context is like RAM on your computer. You wouldn't load every file on your drive into memory to edit one document. You load what's needed, release what isn't.

Most Claude users work the opposite way. One chat for a week. Same PDF dropped into five sessions. Extended Thinking on by default. Every connector enabled at all times.

All of that compounds. And here's how to reverse it.

Part 1: Cowork

1. Pick the right model before you start.

You don't drive a semi-truck to the grocery store.

Claude has three tiers. Haiku is fast and cheap. Sonnet is the daily workhorse. Opus is the deep reasoning model, roughly 5x the cost of Sonnet.

Default rule: start on Sonnet. Escalate to Opus only when you hit something that actually needs it.

When Opus earns its cost:

→ Multi-step architectural decisions.

→ Complex financial, legal, or scientific reasoning.

→ Writing that needs real nuance (scripts, speeches, long-form essays).

When Sonnet handles it fine:

→ Drafting emails, posts, summaries.

→ Research recaps.

→ Any single-step task.

When Haiku is the right pick:

→ Simple lookups.

→ Formatting and cleanup.

→ Anything where the answer takes Claude under 30 seconds.

Model switching is one click. Make it reflex.

2. Kill PDFs before you upload them.

This is the single biggest token trap in Cowork.

`Want to read the full guide?`

Become a paying subscriber to get access to all premium content.

Upgrade to Paid

✓ Full archive of premium guides with ready-to-use prompts

✓ Structured AI courses (step-by-step, start-to-finish)

✓ Every upcoming premium tutorial

Micro launched as an AI agent built to remember more than Claude, Codex, and OpenClaw across sessions. Brett Goldstein introduced it to 632K views in 24 hours.

Arena launched the Agent Arena leaderboard, ranking AI models on real-world agentic tasks by measuring behavioral signals like file downloads, disapproval events, and retries rather than static benchmarks.

ElevenLabs brought GenFM to the web, making it accessible directly from the ElevenLabs website in addition to the ElevenReader app.

🧠 ChatGPT: It now remembers your preferences and context across sessions, with summaries you can review and edit.

🎵 Magenta RealTime 2: Google's open live music model. Runs on your MacBook at 200ms latency. MIDI, audio, and text control. Free to download today.

💻 Nemotron 3 Ultra: NVIDIA's 550B open model, live on Hugging Face and OpenRouter. Commercial use permitted. API access available now.

🤖 Micro: An AI agent built to remember more than Claude, Codex, and OpenClaw. 30 days free to try.

Which image is real?

Option A |
Option B

THAT’S IT FOR TODAY

Thanks for making it to the end! I put my heart into every email I send, I hope you are enjoying it. Let me know your thoughts so I can make the next one even better!

See you tomorrow :)

- Dr. Alvaro Cintas