How to Set Up Local AI Video Generation With Wan and LTX-2

Wan and LTX-2 are open-weight video diffusion models that run through ComfyUI locally, but generating short clips demands far more VRAM than still-image generation and requires 24GB or more to avoid offloading layers to system RAM. Setup involves installing the.

Build Lab · 27 Jun 2026 · 5 min read · NexaForge ·

Open-Weight Video Models Running Locally in ComfyUI

Generating video on your own machine is a different demand from making still images, and the gap is mostly VRAM. Local AI video generation with Wan and LTX-2 runs through ComfyUI on Windows, but where a still-image model is happy on 8GB, a video diffusion model wants 24GB or more before it stops spilling layers into system RAM and slowing to a crawl. Get the model variant and the setup right and you can produce short clips entirely offline.

Quick Answer

Wan and LTX-2 are open-weight video diffusion models that run locally through ComfyUI. Plan for 24GB of VRAM for a comfortable fp16 setup; an int8 build fits roughly 16GB, and quantised GGUF versions can run on 8 to 12GB cards with quality and speed trade-offs. You install ComfyUI, add the video custom nodes through ComfyUI Manager, drop in the model checkpoint, and load a video workflow.

What the VRAM numbers actually mean

The model variant you pick decides everything. Full bf16 precision wants around 40GB or more and gives the best quality. The fp16 build lands near 22GB and is the sweet spot for a 24GB card with a little optimisation. An int8 version drops to roughly 11GB and runs on a 16GB card with a visible quality reduction. Below that, GGUF quantised builds let mainstream cards take part: a Q4 GGUF is the practical balance for 8 to 16GB GPUs, and people have even pushed LTX-2 onto 6GB with large amounts of system RAM, though that route is slow.

Model variants: Wan vs LTX-2

Wan 2.1 and 2.2 come in 1.3B and 14B parameter sizes. The 1.3B variant is the accessible entry point, running on consumer GPUs with 8 to 16GB of VRAM and producing decent short clips. The 14B model is broadcast-quality but demanding: full FP16 wants a 24GB card and full bf16 needs 40GB or more. GGUF quantised versions of both bring the 14B model within reach of mainstream hardware. LTX-2 (built on a 22B architecture) follows a similar pattern, where the official build wants 32GB but FP8 and GGUF variants squeeze onto smaller cards at reduced quality.

Setting up ComfyUI for Wan and LTX-2

Install ComfyUI and update it to a current build (LTX-2 workflows need a recent version, so do not skip the update).
Open ComfyUI Manager, which is the easiest way to pull the video custom nodes that Wan and LTX-2 need to slot into the workflow.
Download the model checkpoint that matches your card. Match the variant to your VRAM: fp16 for a 24GB GPU, int8 for 16GB, or a Q4 GGUF for smaller cards.
Place the checkpoint and any text-encoder and VAE files in the correct ComfyUI model folders so the nodes can find them.
Load a video workflow, set a short clip length and a sensible resolution, and run a test generation before committing to a long render.

What to expect once it runs

Video is slow compared to images even on strong hardware. As a reference point, an fp16 model on a high-end 24GB card producing a roughly ten-second 1080p clip can take several minutes per render, and pushing to 4K stretches that to the better part of half an hour. Start short and low-resolution while you dial in your prompts and settings, then scale up once a workflow behaves. If your current card is offloading layers and dragging, more VRAM is the single biggest improvement, and the high-VRAM cards in Evetech's GPU range show current options and price points for the upgrade step.

Frequently Asked Questions

How much VRAM do I really need for local video generation?

For a smooth fp16 setup, aim for 24GB. You can run an int8 build on roughly 16GB and a Q4 GGUF on 8 to 12GB cards, but each step down costs quality or speed. Video is far more memory-hungry than still-image generation.

Can I run Wan or LTX-2 on a smaller card?

Yes, with quantised GGUF models. A Q4 GGUF is the usual choice for 8 to 16GB GPUs, and very low-VRAM runs are possible with plenty of system RAM, though render times become long enough that it is mainly worthwhile for testing.

Why is ComfyUI the recommended way to run these?

Both models slot into ComfyUI through custom nodes, and ComfyUI Manager makes installing those nodes straightforward. The node-based workflow also lets you control clip length, resolution and sampling steps precisely, which matters a great deal for managing VRAM.

How long does a clip take to generate?

It depends on the card, resolution and length. A ten-second 1080p clip on a strong 24GB GPU can take several minutes, and 4K can run close to half an hour, so start small while tuning.

Ready to generate video offline without renting cloud time? Browse the AI PC range at Evetech for machines built around high-VRAM cards, and pick one with the memory your video workflow needs.

It depends on the card, resolution and length. A ten-second 1080p clip on a strong 24GB GPU can take several minutes, and 4K can run close to half an hour, so start small while tuning.