
How to Install and Run Wan 2.1 Image to Video Generator Locally
Learn how to install and run Wan 2.1 image to video generator on your local machine with this comprehensive step-by-step guide. We'll cover the complete installation process, troubleshoot common issues, and help you generate high-quality videos from images or text prompts without relying on cloud services.
How to Set Up Your Developer Environment
Before running Wan 2.1 locally, you need to install Git and .NET 8 SDK. Visit the official Git website and download the version for your operating system. After downloading, run the installer and follow the setup prompts until completion.
Next, download the .NET 8 SDK from Microsoft's official website. Look for the table showing different versions and select the one matching your system, typically the x64 version. Double-click the downloaded file and follow the installation prompts.

How to Install SwarmUI for Wan 2.1
Navigate to the SwarmUI GitHub page and click on "Releases" on the right side. Scroll down to find "install-windows.bat" and click to download this startup script.
Move the downloaded file to a drive with at least 80GB of free space, avoiding your C drive to prevent storage issues. The model files are quite large and need adequate space.
Double-click "install-windows.bat" to launch the installation. A command window will appear - let it run without interruption. The script will automatically download necessary files and open SwarmUI installer page in your browser.
Click "Agree," then "Install," and finally "Yes I'm sure install." Wait for the installation to complete, then locate the new SwarmUI folder. Inside, you'll find "launch-windows.bat" for startup and "update-windows.bat" for fixing backend errors.
How to Download Wan 2.1 Model Files
Access the Hugging Face page for Wan 2.1 files. You'll see four essential folders:
Clip Vision: Contains image to video files (1.26 GB). Skip this if you only need text to video functionality.
Diffusion Models: Includes eight model files. The first four (i2v) handle image to video, while the last four (t2v) manage text to video. Choose based on your system's performance - fp8 or 1.3b models for weaker systems, advanced models for high-end GPUs with 4080 or better.
Text Encoders: Contains clip files. Download fp16 for better quality, though generation takes longer.
VAE: Has one file that you must download.
Place each file in its corresponding folder inside SwarmUI/models directory. Important: Text encoders folder goes inside the clip folder, not the main models directory.
Download workflow files for your needs: 480p image to video, 720p image to video, or text to video.
How to Run SwarmUI and Generate Videos
Double-click "launch-windows.bat" in the SwarmUI folder. This opens a browser page automatically, or copy the link from the command window if it doesn't.
Verify your models installed correctly by clicking "Models" at the bottom - you should see two new models listed.
Click "Comfy Workflow" at the top and drag your downloaded workflow file into the interface. Ensure it matches your chosen model (e.g., 720p workflow for 720p model).
The workflow interface shows various nodes and parameters. The middle boxes handle positive and negative prompts, while the top allows base model switching. The left side lets you switch clip models - fp16 is recommended for best results.
Upload your source image using the bottom-left node. Configure video settings on the right:
- Keep width and height at defaults
- Set length based on desired duration: 17 = 1 second, 33 = 2 seconds, 49 = 3 seconds, 65 = 4 seconds, 81 = 5 seconds
Enter your prompt in the positive prompt box. Use AI tools like ChatGPT to generate prompts if needed. Click "Queue" to start generation.
Generation time varies by system performance. A 4090 GPU processes a 3-second 720p video in about 500 seconds (under 10 minutes). For slower systems, use the 480p model instead.
Preview your video by clicking the small eye icon, then right-click and select "Save Video As" to download.
How to Troubleshoot Common Issues
Backend Error: Update SwarmUI by clicking "update-windows.bat." If problems persist, reinstall SwarmUI on a different drive and transfer your model files before deleting old installations.
VRAM Requirements: 8GB VRAM handles 1.3 billion text to video and 480p image to video. 720p videos need at least 16GB VRAM. Insufficient VRAM causes extremely slow processing.
Clip Component Error: This indicates corrupted clip files during download. Redownload and replace the corrupted file. Verify you're using the correct workflow - mixing 720p workflow with 480p model causes errors.
MP4 Export Issues: Double-click empty space, type "vs," and select the second node. Delete the default output node and replace it with the new one. Change the format setting at the bottom to MP4.
Files Not Detected: Check file sizes against official versions - corrupted files cause detection problems. Ensure fp8 files are placed in the clip folder, not the main models directory.
Running Wan 2.1 image to video locally gives you complete control over your video generation process without depending on cloud services. Follow this guide carefully, ensure adequate system resources, and troubleshoot any issues using our solutions above.
Level up your team's AI usage—collaborate with Promptus. Be a creator at https://www.promptus.ai
Create you next AI video with the power of Promptus
Start using Promptus ➜