Guest User - Provide.ai

Uncategorised

Foley Control: Aligning a Frozen Latent Text-to-Audio Model to Video

Guest User / October 27, 2025

Foley Control is a lightweight approach to video-guided Foley that keeps
pretrained single-modality models frozen and learns only a small
cross-attention bridge between them.

Uncategorised

SViM3D: Stable Video Material Diffusion for Single Image 3D Generation

Guest User / October 10, 2025

We present Stable Video Materials 3D (SViM3D), a framework to predict
multi-view consistent physically based rendering (PBR) materials, given a
single image. Recently, video diffusion models have been successfully used
to reconstruct 3D objects from a single image efficiently.

Uncategorised

ReSWD: ReSTIR’d, not shaken. Combining Reservoir Sampling and Sliced Wasserstein Distance for Variance Reduction

Guest User / October 2, 2025

We introduce Reservoir SWD (ReSWD), which integrates Weighted Reservoir
Sampling into SWD to adaptively retain informative projection directions in
optimization steps, resulting in stable gradients while remaining unbiased.

Uncategorised

Stable Cinemetrics: Structured Taxonomy and Evaluation for Professional Video Generation

Guest User / October 1, 2025

We introduce Stable Cinemetrics, a structured evaluation framework that
formalizes filmmaking controls into four disentangled, hierarchical
taxonomies: Setup, Event, Lighting, and Camera.

Uncategorised

Music and Artificial Intelligence: Artistic Trends

Guest User / September 30, 2025

We study how musicians use artificial intelligence (AI) across formats like
singles, albums, performances, installations, voices, ballets, operas, or
soundtracks.

Uncategorised

SD3.5-Flash: Distribution-Guided Distillation of Generative Flows

Guest User / September 26, 2025

We present SD3.5-Flash, an efficient few-step distillation framework that
brings high-quality image generation to accessible consumer devices.

Uncategorised

Stable Part Diffusion 4D: Multi-View RGB and Kinematic Parts Video Generation

Guest User / September 17, 2025

We present Stable Part Diffusion 4D (SP4D), a framework for generating
paired RGB and kinematic part videos from monocular inputs.

Uncategorised

MARBLE: Material Recomposition and Blending in CLIP-Space

Guest User / June 5, 2025

Editing materials of objects in images based on exemplar images is an
active area of research in computer vision and graphics. We propose MARBLE,
a method for performing material blending and recomposing fine-grained
material properties by finding material embeddings in CLIP-space and using
that to control pre-trained text-to-image models.

Uncategorised

Fast Text-to-Audio Generation with Adversarial Post-Training

Guest User / May 13, 2025

We present Adversarial Relativistic-Contrastive (ARC) post-training, the
first adversarial acceleration algorithm for diffusion/flow models not
based on distillation.

Uncategorised

FaceCraft4D: Animated 3D Facial Avatar Generation from a Single Image

Guest User / April 21, 2025

We present a novel framework for generating high-quality, animatable 4D
avatar from a single image. While recent advances have shown promising
results in 4D avatar creation, existing methods either require extensive
multiview data or struggle with shape accuracy and identity consistency.

Author name: Guest User