NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation

Artículo de revista

NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation

Dmitrii Mikhailov et al · IEEE · 2026

Acceso abierto disponible

Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Autor / responsable

Dmitrii Mikhailov et al

Editorial

IEEE

Año

2026

ISSN

2169-3536

ISSN

2169-3536

Idioma

eng

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Acceso abierto disponible

Recurso identificado como acceso abierto, sin confirmar automáticamente si es texto completo directo.

Abrir recurso

Resumen

Descripción general del contenido del recurso.

Full self-attention in video diffusion transformers scales quadratically with the spatio-temporal token count, making processing high-resolution video generation prohibitively slow and memory-intensive. We introduce NABLA, a Neighborhood-Adaptive Block-Level Attention mechanism that constructs a per-head sparse mask in three steps: 1) average-pooling queries and keys into <inline-formula> <tex-math notation="LaTeX">$N \times N$ </tex-math></inline-formula> blocks, 2) retaining the highest-probability blocks via a cumulative distribution function (CDF) threshold, and 3) optionally unioning the result with Sliding-Tile Attention (STA) to mitigate boundary artifacts. NABLA integrates seamlessly into PyTorch’s FlexAttention without requiring custom kernels or auxiliary losses. Extensive experiments demonstrate significant acceleration for both training and inference: on the Wan 2.1 14B text-to-video model at 720p, NABLA achieves up to <inline-formula> <tex-math notation="LaTeX">$2.7\times $ </tex-math></inline-formula> speed-up in inference while matching baseline quality metrics (CLIP: <inline-formula> <tex-math notation="LaTeX">$42.06 \rightarrow 42.08$ </tex-math></inline-formula>, VBench: <inline-formula> <tex-math notation="LaTeX">$83.16 \rightarrow 83.17$ </tex-math></inline-formula>, FVD: <inline-formula> <tex-math notation="LaTeX">$68.9 \rightarrow 67.5$ </tex-math></inline-formula>). Furthermore, during pre-training of a 2B DiT at <inline-formula> <tex-math notation="LaTeX">$512^{2}$ </tex-math></inline-formula> resolution, NABLA reduces iteration time from 10.9s to 7.5s (<inline-formula> <tex-math notation="LaTeX">$1.46\times $ </tex-math></inline-formula> acceleration) while achieving lower validation loss.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

al, D. M. E. (2026). NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation. https://doi.org/10.1109/ACCESS.2026.3686867

MLA

al, Dmitrii Mikhailov et. "NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation." 2026. https://doi.org/10.1109/ACCESS.2026.3686867.

Chicago

al, Dmitrii Mikhailov et. 2026. "NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation.". https://doi.org/10.1109/ACCESS.2026.3686867.

Harvard

al, D. M. E. 2026, NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation, IEEE, available at: https://doi.org/10.1109/ACCESS.2026.3686867 [Accessed 28 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

RIS BibTeX

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título: NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation

Autor / colaboradores: Dmitrii Mikhailov et al

Editorial: IEEE

Año de publicación: 2026

ISSN: 2169-3536

ISSN: 2169-3536

Idioma: eng

Materias

Explorá otros recursos relacionados a partir de estas materias.

Diffusion models; efficient attention; sparse attention; transformer acceleration; video generation

NABLA: Neighborhood Adaptive Block-Level Attention for Efficient Video Generation

3PS-RAN: A Real-Time Framework for Securing the O-RAN RACH Against DDoS Attacks Toward NextG

Acceso al recurso

Resumen

Cómo citar

APA 7

MLA

Chicago

Harvard

Compartir e imprimir

Exportar referencia

Detalles del recurso

Materias