
LTX-2 ComfyUI Guide: Complete Local Deployment Tutorial
Step-by-step guide to running LTX-2 locally with ComfyUI. Learn how to set up text-to-video, image-to-video, and audio synchronization workflows.
“Full control over AI video generation—run LTX-2 on your own hardware with ComfyUI's powerful node-based workflow.”
Why Run LTX-2 Locally with ComfyUI?
Running LTX-2 locally offers several compelling advantages over cloud-based solutions. You gain complete privacy—your prompts and generated videos never leave your machine. You eliminate per-generation costs after the initial hardware investment. You can customize workflows with LoRA models and fine-tune the base model for specific styles. And you get faster iteration without network latency or queue times. ComfyUI provides the ideal interface for LTX-2, offering a node-based visual workflow that makes complex video generation pipelines intuitive and reproducible. This guide will walk you through everything from initial setup to advanced optimization techniques.
System Requirements and Prerequisites
Before starting, ensure your system meets the minimum requirements. For GPU, you need an NVIDIA card with at least 24GB VRAM (RTX 4090, A6000, or A100 recommended). For optimal performance at 4K resolution, 48GB+ VRAM is ideal. Your system should have at least 32GB RAM and 100GB free disk space for models. Software requirements include: Python 3.10 or higher, CUDA 12.0 or higher with compatible drivers, Git for cloning repositories, and FFmpeg for video processing. For Windows users, ensure you have Visual Studio Build Tools installed. For Linux, standard build essentials are sufficient. Mac users note that LTX-2 currently requires NVIDIA CUDA and does not support Apple Silicon natively.
Installing ComfyUI and LTX-2 Models
Start by cloning the ComfyUI repository: git clone https://github.com/comfyanonymous/ComfyUI. Navigate into the directory and install dependencies with pip install -r requirements.txt. Next, download the LTX-2 model weights from Hugging Face. Place the main model file in ComfyUI/models/checkpoints/ and the VAE in ComfyUI/models/vae/. For audio generation, download the audio model separately and place it in the corresponding folder. Install the LTX-2 custom nodes by cloning the extension into ComfyUI/custom_nodes/. After installation, restart ComfyUI and verify that LTX-2 nodes appear in the node menu. The initial model load may take a few minutes depending on your storage speed.
Building a Text-to-Video Workflow
Create a basic text-to-video workflow by adding the following nodes: LTX-2 Model Loader (connects to your checkpoint), CLIP Text Encode (for your prompt), LTX-2 Video Sampler (core generation node), VAE Decode (converts latents to video frames), and Video Combine (outputs final video file). Connect the nodes in sequence and configure the sampler settings. For best results, use 30-50 denoising steps, CFG scale between 7-9, and select your target resolution (720p for testing, 4K for final output). Frame count determines video length—at 25 FPS, 150 frames gives you 6 seconds of video. Add the Audio Generator node after the Video Sampler if you want synchronized audio output.
Image-to-Video Animation Workflow
For animating static images, modify the text-to-video workflow by adding an Image Loader node. The image provides the first frame reference, ensuring visual consistency throughout the video. Connect your image to the LTX-2 Video Sampler's image input. Adjust the image influence strength—higher values (0.7-0.9) maintain closer fidelity to the source image, while lower values (0.3-0.5) allow more creative motion. The prompt should describe the desired animation rather than the image content. For example, 'camera slowly pans right, subtle wind movement in hair' rather than describing the person in the image. This workflow excels for product animations, portrait animations, and style-consistent video series.
Configuring Native Audio Synchronization
LTX-2's breakthrough feature is native audio generation that syncs perfectly with video content. Enable audio by adding the LTX-2 Audio Generator node after your Video Sampler. The audio node analyzes the generated video and produces matching sound—dialogue with accurate lip sync, environmental ambience, and background music. Configure audio type: 'full' generates all audio types, 'dialogue' focuses on speech, 'ambient' creates environmental sounds, and 'music' adds background tracks. For dialogue, include speaker descriptions in your prompt: 'a man with a deep voice speaking slowly about technology'. The audio sampling rate defaults to 44.1kHz—adjust if your downstream workflow requires different rates. Output format supports WAV and MP3.
Performance Optimization Tips
Maximize your generation speed and quality with these optimizations. Enable FP16 precision in model loading to halve VRAM usage with minimal quality loss. Use xformers or flash-attention for faster attention computation—install with pip install xformers. For multi-GPU setups, ComfyUI supports model distribution across devices. Batch processing: queue multiple generations and let them run overnight. Resolution strategy: generate at 720p for testing prompts, then regenerate winners at 4K. Caching: enable model caching to avoid reloading between generations. VRAM management: close other GPU-intensive applications during generation. For 4K at 50 FPS, expect 3-5 minutes per 10-second clip on RTX 4090, or 1-2 minutes on A100.
Common Issues and Solutions
CUDA out of memory: Reduce resolution or enable memory-efficient attention. Try generating fewer frames per batch. Model not loading: Verify file placement in correct model directories. Check that model files aren't corrupted (compare checksums). Black or corrupted output: Update your GPU drivers to the latest version. Ensure CUDA version matches PyTorch CUDA version. Audio desync: Regenerate with explicit audio timing parameters. Check that video FPS matches audio sample rate calculations. Slow generation: Enable all recommended optimizations. Consider upgrading GPU VRAM. ComfyUI won't start: Delete ComfyUI/custom_nodes/__pycache__ folders and restart. Update all custom nodes to latest versions. For persistent issues, the LTX-2 community Discord and GitHub issues are excellent resources for troubleshooting specific configurations.
Running LTX-2 locally with ComfyUI gives you complete control over AI video generation—privacy, cost savings, and unlimited customization. With proper setup, you can generate 4K videos with synchronized audio on consumer hardware.