🤖 AI Weekly Recap (Week 28)

Happy Sunday! We just had another crazy week in AI. Anthropic has released Claude Science, Google just rolled out Gemini Omni Flash, and there is a new free tool that turns Claude into a full job-hunting agent.

And that's not all, here are the most important AI moves you need to know this week.

6. Alibaba Open-Sources PageAgent

Alibaba just released PageAgent, an open-source JavaScript library that embeds an AI agent directly into any website. One script tag turns your web app into something users control with plain language: no browser extension, no Python, no headless browser, and no backend rewrite. It's MIT licensed and free.

It Runs Inside the Page: Unlike Playwright or Selenium, which drive the browser from outside, PageAgent lives in the page as plain JavaScript. It reads the live DOM as text (no screenshots, no multimodal models) and uses "DOM dehydration" to compress thousands of nodes so even small text models can act precisely.
Bring Any LLM: It works with any OpenAI-compatible endpoint like GPT, Claude, Qwen, DeepSeek, and Gemini, or fully offline via Ollama. There's even a free demo LLM baked into the CDN script for instant evaluation.
Real Use Cases, Day One: Ship a SaaS copilot that actually clicks buttons instead of describing them, collapse 20-click ERP/CRM workflows into one sentence, or make any web app accessible through voice and natural language. A built-in MCP server even lets Claude Code and Cursor control your browser.

Try it now → https://alibaba.github.io/page-agent/

5. Anthropic Launches Claude Science

Anthropic just released Claude Science, an AI workbench that pulls over 60 scientific databases, computation tools, and research workflows into a single environment. Instead of bouncing between PubMed, Jupyter, R, and a cluster terminal all day, researchers ask questions in plain language and specialist agents handle the querying, synthesis, and analysis.

60+ Databases, One Question: Sources like UniProt, PDB, Ensembl, ClinVar, and ChEMBL, each with its own schema and query language, get queried and synthesized automatically. It's pre-configured for genomics, single-cell, proteomics, and cheminformatics, and connects to NVIDIA BioNeMo models like Evo 2, Boltz-2, and OpenFold3.
Fully Reproducible Output: It renders 3D protein structures, genome browser tracks, and chemical structures natively, and every figure ships with the exact code, environment, and message history that produced it, so results stay traceable months later.
Real Results Already: In beta, a UCSF Brain Tumor Center team cut glioma germline analysis to roughly a tenth of its former time, and the Allen Institute built a multi-agent pipeline producing 100+ page literature reviews. Anthropic is also funding research projects with up to $30,000 in compute credits (apply by July 15).

Try it now → https://claude.com/product/claude-science

4. Google Open-Sources Stitch Design Skills

Google Labs just open-sourced Stitch Skills, a library of agent skills built on the Agent Skills open standard that connects its Stitch design tool to the coding agents you already use: Claude Code, Codex, Cursor, Gemini CLI, and Antigravity. Your agent can now generate designs, convert them to production code, and keep everything visually consistent.

Full Design-to-Code Cycle: The 7 skills cover everything from generating new design screens in Stitch to converting Stitch HTML into production-ready React and React Native components, complete with automated validation and design token consistency.
DESIGN.md Is the Secret Weapon: One skill analyzes your Stitch project and generates a DESIGN.md, a portable markdown file encoding your entire design system (colors, typography, spacing). Any agent reads it before generating UI, so screen ten matches screen one instead of drifting off-brand.
Multi-Page Sites From a Prompt: Combined with the Stitch MCP server, agents can map screens to routes and build complete multi-page websites. No more copy-pasting HTML from a design tool into your editor.

Try it now → https://stitch.withgoogle.com

3. Google Launches Gemini Omni Flash

Google just rolled out Gemini Omni Flash, the first model in its new Omni family, and it changes how video editing works. Generate a clip, then just tell it what to change: swap the background, adjust the lighting, add a character. It applies the edit while preserving everything else, and every instruction builds on the last.

It Remembers Your Scene: Characters stay consistent, physics hold up, and the scene remembers what came before across multiple editing turns. You can change the camera angle, environment, or style without re-describing anything or losing the thread of your original video.
Truly Multimodal Input: It natively processes text, images, audio, and video together. Feed it a product photo, a style reference, and a text prompt in a single request and it blends all three into one coherent output.
Cheap and Available Now: It's rolling out in the Gemini app, Google Flow, and YouTube Shorts, plus the Gemini API and AI Studio for developers at $0.10 per second of output, a quarter of standard Veo 3.1's price. Every clip carries an invisible SynthID watermark.

Try it now → https://gemini.google.com

2. A Danish Scientist Open-Sourced AI Job Search

A Danish scientist just open-sourced AI Job Search, a framework that turns Claude Code into a complete job application assistant. Fork the repo, fill in your profile, and it handles the rest: evaluating job fit, tailoring your CV per posting, writing the cover letter, and prepping you for the interview.

A Two-Agent Quality Loop: It's not one-shot generation. A drafter agent writes your tailored CV (in LaTeX) and cover letter, then a separate reviewer agent critiques the drafts, forces revisions, and only then presents the final output.
It Knows You Better Than Your CV Does: The /expand command scans your GitHub repos, portfolio site, Kaggle, and Google Scholar to surface skills your documents never made explicit, each added to your profile with a source tag. The /upskill command then builds a prioritized skill-gap heatmap plus a learning plan with resources and time estimates.
Swap In Your Local Market: The search skills ship configured for Danish job portals (Jobindex, Jobnet), but the whole pattern is designed to be swapped for the job boards in your country.

Try it now → https://github.com/MadsLorentzen/ai-job-search

1. Meituan Open-Sources LongCat-2.0

China's food delivery giant Meituan just open-sourced LongCat-2.0, a colossal 1.6 trillion parameter agentic coding model with a native 1 million token context window, and it beats GPT-5.5 on SWE-bench Pro. The kicker: it was trained entirely on a 50,000-card cluster of domestic Chinese chips, with no Nvidia hardware anywhere in the pipeline. It's completely free under the MIT license, meaning full commercial use, unrestricted fine-tuning, and self-hosting.

Frontier Scale, Efficient Inference: The Mixture-of-Experts architecture holds 1.6T total parameters but dynamically activates only 33B to 56B per token (~48B on average), while the 1M context window lets it reason over an entire large codebase at once.
Benchmark Wins: Meituan reports 59.5 on SWE-bench Pro, edging past GPT-5.5's 58.6, plus 70.8 on Terminal-Bench 2.1 and 77.3 on SWE-bench Multilingual. It was also revealed as "Owl Alpha," the stealth model that quietly topped OpenRouter's developer charts for two months.
Zero Nvidia, Full Stack: Meituan claims it's the industry's first trillion-parameter model to complete full training AND inference on a 50,000-card domestic compute cluster, a direct answer to US export controls. Weights are live on Hugging Face and GitHub, built to slot into coding harnesses like Claude Code as the agent's "brain."

Try it now → https://longcat.ai

Thanks for making it to the end! I put my heart into every email I send. I hope you are enjoying it. Let me know your thoughts so I can make the next one even better.

See you tomorrow :)

Dr. Alvaro Cintas