Issue #12 — Sora-Level Open Models & New VBench Records

Published 2026-03-03 · 6 papers · 3 generation · 3 editing

generationOpenSora-v2 · 3.2B

Introduces a 3D variational autoencoder that enables 1080p video generation at 24fps with significantly reduced memory cost.

📄 arXiv 🔗 GitHubDataset: WebVid-10M + Panda-70M

editingVideoEdit-XL · 1.8B

A flow-matching approach for instruction-guided video editing that preserves temporal coherence across 128-frame clips.

generationCogVideoX-5B · 5B

Mixture-of-experts architecture generating coherent 60-second videos with consistent characters and scene transitions.

📄 arXiv 🔗 GitHubDataset: CogVid-Internal + WebVid-10M

editingStableVideo-3.0 · 2.4B

DiT-based inpainting model that handles 4K resolution video with spatiotemporal attention for seamless object removal.

📄 arXiv 🤗 HuggingFaceDataset: DAVIS + YouTube-VOS + Internal

generationMotionForge · 4.1B

Combines a learned world model with diffusion generation to produce physically plausible motion and object interactions.

📄 arXiv 🔗 GitHubDataset: PhysVid-500K

editingVidStyler · 890M

Applies artistic styles to video with zero-shot generalization using adaptive low-rank adaptation modules per frame.

Want full archive access?

Upgrade to Pro for $5/month and get every past issue plus early Tuesday delivery.