For Patreon Supporters - Wan Fun Control -ImageRestyle V2V (Version 20250328) (Workflow)
Added 2025-03-28 17:05:16 +0000 UTC
Video: https://youtu.be/YiVkevHuXIU
Related Post: https://www.patreon.com/posts/125384420
In this post, I’ll walk you through my ComfyUI workflow for generating AI-enhanced videos using Alibaba’s WAN 2.1 Fun Control models. This powerful tool allows you to restyle existing videos while preserving motion consistency—perfect for creating influencer content, music videos, or AI-driven animations.
Workflow Overview
The WAN 2.1 Video-to-Video (V2V) workflow leverages ControlNet-guided diffusion to:
✅ Restyle videos (e.g., change outfits, backgrounds, or entire aesthetics).
✅ Preserve motion from reference videos (e.g., dance moves, gestures).
✅ Maintain consistency across frames (no flickering or distortions).
Unlike traditional methods (e.g., AnimateDiff), WAN 2.1 uses a diffusion transformer architecture, ensuring smoother outputs with coherent styles.
Step-by-Step Workflow
1. Setup & Model Installation
Download the WAN 2.1 Fun Control 1.3B SafeTensor file from Hugging Face.
Place it in ComfyUI/models/diffusion/.
Ensure you have the latest ComfyUI update (git pull in the terminal).
2. Key Components
A. Reference Video & ControlNet
Input: A reference video
ControlNet Preprocessors:
DW Pose (for skeletal motion tracking).
Line Art (for outline consistency).
Depth Maps (for spatial depth).
These guide the AI to mimic motions while allowing creative restyling.
B. Style Transfer with Flux
The first frame of the reference video is restyled using:
Text prompts (e.g., "hip-hop dancer in a blue jacket").
LoRA models (for character consistency, if needed).
This "style seed" ensures the entire video follows the new aesthetic.
C. WAN 2.1 Fun Control Processing
The restyled frame + ControlNet data are fed into WAN 2.1’s diffusion pipeline:
Clip Vision Encode: Embeds the style reference.
K Sampler: Generates frames (steps: 20, CFG: 7.5).
Skip Layer Guidance: Enhances details (blocks 9-10).
D. Refinement (Optional)
Tile Control LoRA: Upscales and sharpens frames.
CFG-Zero Star: Improves contrast/coherence.
How It Works
Motion Extraction: ControlNet (DW Pose/Line Art) extracts poses/outlines from the reference video.
Style Injection: The first frame is restyled via Flux + text prompts.
Diffusion: WAN 2.1 regenerates the video frame-by-frame, blending the new style with the original motion.
Output: A seamless, high-quality video with:
Consistent character designs (thanks to LoRAs).
Stable backgrounds (no flickering).
Example Use Cases
Dance Video Restyling
Input: A dancer in casual clothes.
Output: Same moves, but with a cyberpunk outfit and neon-lit backdrop.
AI Influencer Content
Input: A stock video of a person talking.
Output: The same speech delivered by a custom AI avatar.
Animation Enhancement
Input: A rough storyboard animation.
Output: A polished, stylized final render.
Optimization Tips
For Creativity: Use only DW Pose (less restrictive than Line Art).
For Accuracy: Combine Line Art + DW Pose for precise motion tracking.
For Speed: Use the 1.3B model (VRAM-friendly; ~5GB).
For Quality: Refine with Tile LoRA + Skip Layer Guidance.
Conclusion
The WAN 2.1 Video-to-Video workflow in ComfyUI is a breakthrough for AI video generation. By combining ControlNet-guided motion with diffusion-based style transfer, it outperforms older tools like AnimateDiff in consistency and ease of use.
Ready to try it?
Grab the WAN 2.1 Fun Control modelhere : https://huggingface.co/alibaba-pai/Wan2.1-Fun-1.3B-Control
Follow the workflow above in ComfyUI.
Experiment with poses, styles, and refinements!
Workflow updated(2025-03-29):
Flux group, I update the input resolution times 2 for Empty Latent, because it works better for Flux Union Pro ControlNet to work with image above 1024px.
Therefore , the output image, I did a resize back to 832X480px or 480x832px for Wan first image frame.

Attached the Wan 2.1 Fun Control Video2video workflow Version 20250328
Comments
Tuple is a data type in Python. Are you using Python 3.10 or above?
Benjamin Law
2025-05-16 18:28:07 +0000 UTCI am getting this error. unsupported operand type(s) for /: 'tuple' and 'tuple'
Mayur Jha
2025-05-16 17:11:50 +0000 UTCThen I paid for the patreon membership, even if the free one worked it supports you, you put work into it. That one is complex for a beginner, and offers no way to load an image to style the first frame. I don't get it, the beginner workflow does nothing useful for me, I load my first frame image and it does nothing with it, it just outputs a video of the wireframe with a black background, the same way it looks in dwpose.
Steven Haffley
2025-05-09 05:09:41 +0000 UTCI am confused. All I want to do is what you did with the man with demon wings. I download your starter workflow, and after 4 days of trying to figure out why it doesn't work with a 5000 series card (you need triton nightly), I finally got everything to work. I generates at the end the wire frame where i see the animation moving, but it does absolutely nothing with the image load for the first frame style. It outputs no video other than the wire frame.
Steven Haffley
2025-05-09 05:08:00 +0000 UTC