S2DM:

Sector-Shaped Diffusion Models for Video Generation

Anonymous

Abstract

Diffusion models have achieved remarkable success in image generation. However, applying this concept to video generation introduces significant challenges, particularly in maintaining consistency and continuity throughout video frames. Existing approaches primarily address these challenges by incorporating spatiotemporal attention modules or additional temporal conditions. However, they often overlook the impact of non-shared noise between frames in the diffusion process, which can disrupt both semantic coherence and consistent stochastic details in the video. To tackle this problem, we introduce the Sector-Shaped Diffusion Model (S2DM), which employs a sector-shaped diffusion process with shared noise across frames under specific conditions. S2DM ensures that video frames maintain consistent semantic features and stochastic details, while preserving continuous temporal characteristics through guided conditions. We evaluate S2DM on various conditional video generation tasks, using optical flow or posture information as temporal conditions, and descriptive text or reference images as semantic conditions. Experimental results demonstrate that S2DM outperforms existing methods in generating videos with thematic coherence and smooth narrative progression. For text-to-video generation, where temporal conditions are not explicitly provided, we propose a three-step generation strategy that decouples the generation of temporal characteristics from semantic features. Our results can be viewd at https://s2dm.github.io/S2DM/.

Pipeline

Performance Comparison (TikTok)

Reference

Ours

Disco [Wang et al.]

Magic Animate [Xu et al.]

Performance Comparison (MHAD)

Ours

(Flow Conditioned)

Ours

(Text Conditioned)

LFDM [Ni et al.]

A lady with glasses, wearing a black T-shirt and black jeans is right arm swipe to the right

A lady with glasses, wearing a floral sweater and black jeans is knock on door

A lady with glasses, wearing a red sweater and brown pants is draw circle clockwise

A lady with glasses, wearing a red sweater and brown pants is knock on door

A lady with glasses, wearing a red sweater and brown pants is stand to sit

A man wearing a gray T-shirt and green pants is draw circle counter clockwise

A man wearing a gray T-shirt and green pants is right arm swipe to the left

A man with glasses, wearing a black coat and blue jeans is squat

A man with glasses, wearing a blue shirt and black jeans is draw triangle

Performance Comparison (MUG)

Ours

(Flow Conditioned)

Ours

(Text Conditioned)

LFDM [Ni et al.]

Person 010 is making surprise expression

Person 020 is making fear expression

Person 020 is making happiness expression

Person 039 is making fear expression

Person 076 is making happiness expression

Person 065 is making fear expression

Performance Comparison (OpenVid)

Reference Image

Ours

(Flow Conditioned)

SVD [Andreas et al.]

High-resolution Demo (MHAD 256*256)

A lady with glasses, wearing a red sweater and brown pants is knock on door.

A man wearing a gray T-shirt and green pants is right arm swipe to the left

A man with glasses, wearing a blue shirt and black jeans is sit to stand

A lady with glasses, wearing a red sweater and brown pants is stand to sit

A man with glasses, wearing a black coat and blue jeans is right arm throw

A man wearing a gray T-shirt and green pants is draw circle counter clockwise

A man with glasses, wearing a blue shirt and black jeans is draw circle counter clockwise

A lady with glasses, wearing a black T-shirt and black jeans is right arm swipe to the right

Ablation

Training Shared Noise

Sampling Shared Noise

(Ours)

Training Shared Noise

Sampling Non-shared Noise

Training Non-shared Noise

Sampling Non-shared Noise

Training Non-shared Noise

Sampling Shared Noise

A man with glasses, wearing a black coat and blue jeans is tennis forehand swing

A man with glasses, wearing a blue shirt and black jeans is walking

Person 020 is making happiness expression

Person 076 is making happiness expression

BibTeX

TBA