๐ŸŽฌVideoModelWeekly
โ† Back to archive

Issue #12 โ€” Sora-Level Open Models & New VBench Records

Published 2026-03-03 ยท 6 papers ยท 3 generation ยท 3 editing

generationOpenSora-v2 ยท 3.2B

OpenSora v2: Scaling Video Diffusion to 1080p with 3D-VAE

Introduces a 3D variational autoencoder that enables 1080p video generation at 24fps with significantly reduced memory cost.

MetricScoreฮ”
VBench Total84.3+2.1
FVD (UCF-101)198-47
CLIPSIM0.312+0.008
Temporal Consistency97.1+1.4
๐Ÿ“„ arXiv๐Ÿ”— GitHubDataset: WebVid-10M + Panda-70M
editingVideoEdit-XL ยท 1.8B

VideoEdit-XL: Instruction-Based Video Editing with Flow Matching

A flow-matching approach for instruction-guided video editing that preserves temporal coherence across 128-frame clips.

MetricScoreฮ”
Edit Accuracy91.2%+3.8%
Temporal Consistency95.6+2.3
CLIPSIM0.298
FVD312-28
generationCogVideoX-5B ยท 5B

CogVideoX-5B: Scaling Expert Mixture for Long Video Generation

Mixture-of-experts architecture generating coherent 60-second videos with consistent characters and scene transitions.

MetricScoreฮ”
VBench Total83.7+1.9
FVD (UCF-101)211-35
Subject Consistency96.8+2.1
CLIPSIM0.307+0.005
๐Ÿ“„ arXiv๐Ÿ”— GitHubDataset: CogVid-Internal + WebVid-10M
editingStableVideo-3.0 ยท 2.4B

StableVideo 3.0: Diffusion Transformers for 4K Video Inpainting

DiT-based inpainting model that handles 4K resolution video with spatiotemporal attention for seamless object removal.

MetricScoreฮ”
Inpainting PSNR34.2dB+1.8dB
LPIPS0.042-0.011
Temporal Flicker0.98+0.03
๐Ÿ“„ arXiv๐Ÿค— HuggingFaceDataset: DAVIS + YouTube-VOS + Internal
generationMotionForge ยท 4.1B

MotionForge: Physics-Aware Video Generation with World Models

Combines a learned world model with diffusion generation to produce physically plausible motion and object interactions.

MetricScoreฮ”
VBench Physics78.9+5.2
FVD (Kinetics)267-41
Motion Realism88.4+3.7
๐Ÿ“„ arXiv๐Ÿ”— GitHubDataset: PhysVid-500K
editingVidStyler ยท 890M

VidStyler: Zero-Shot Video Style Transfer via Adaptive LoRA

Applies artistic styles to video with zero-shot generalization using adaptive low-rank adaptation modules per frame.

MetricScoreฮ”
Style Accuracy87.3%+4.1%
Temporal Consistency94.2+1.9
CLIPSIM0.285

Want full archive access?

Upgrade to Pro for $5/month and get every past issue plus early Tuesday delivery.

See pricing โ†’