Published on: July 30, 2025 / update from: July 30, 2025 – Author: Konrad Wolfenstein
China's big AI offensive: With WAN 2.2 Alibaba wants to overtake the West – and does all the open source – Image: Xpert.digital
This is Alibaba's new Wunder-Ki Wan2.2: Free, more powerful than the competition and available for everyone
China's video response to Sora von Openaai: This new AI generates videos in cinema quality – and is also free of charge
The Chinese technology company Alibaba published an interesting new version of its open source video model on July 29, 2025 with WAN2.2 and thus fundamentally changed the landscape of artificial intelligence for video production. This innovative technology represents the world's first open source video video model that implemented a mixture-of-experts (MOE) architecture and was designed for both professional film productions and for use on commercially available hardware.
Suitable for:
- Alibaba invests over $ 50 billion in AI and cloud computing – Artificial General Intelligence (AGI) plays a central role
Technological revolution through MOE architecture
For the first time, WAN2.2 introduces a mixture-of-experts architecture in video devotional models, which is a significant technological breakthrough. This innovative architecture works with a dual expert system that divides the videoogenization process into two specialized phases. The first expert focuses on the early phases of the noise suppression and determines the basic layout of the scene, while the second expert takes over the later phases and refines details and textures.
The system has a total of 27 billion parameters, but activates only 14 billion parameters per inference step, which reduces the computing effort by up to 50 percent without affecting quality. This increase in efficiency enables high -quality videos to generate, while the computing costs remain constant and at the same time the overall model capacity is expanded.
Film aesthetics and cinematic control
An outstanding feature of WAN2.2 is the cinematic aesthetic control system, which enables users to carry out precise control over various visual dimensions. The model was trained with carefully curated aesthetic data that contain detailed labels for lighting, composition, contrast, color, camera hob, image size, focal length and other cinematic parameters.
This functionality is based on a cinematically inspired prompt system that categorizes key dimensions such as lighting, illumination, composition and coloring. As a result, WAN2.2 can precisely interpret and implement the aesthetic intentions of the users during the generation process, which enables the creation of videos with customizable cinematic preferences.
Extended training data and complex movement generation
Compared to the predecessor WAN2.1, the training data set was significantly expanded: 65.6 percent more image data and 83.2 percent more video data. This massive data expansion significantly improves the generalization skills of the model and increases the creative diversity in several dimensions such as movement, semantics and aesthetics.
The model shows significant improvements in the production of complex movements, including lively facial expressions, dynamic hand gestures and complicated sports movements. In addition, it provides realistic representations with improved command compliance and compliance with physical laws, which leads to more natural and convincing video sequences.
Efficient hardware use and accessibility
WAN2.2 offers three different model variants that cover different requirements and hardware configurations:
- WAN2.2-T2V-A14B: A text-to-video model with 27 billion parameters (14 billion active), which generates videos with 720p resolution and 16FPS.
- WAN2.2-I2V-A14B: A picture-to-video model with the same architecture for the conversion of static images into videos.
- WAN2.2-TI2V-5B: A compact 5 billion parameter model that combines both text-to-video and image-to-video functions in a uniform framework.
The compact TI2V-5B model is a special breakthrough, since it can generate 5 seconds 720p videos in less than 9 minutes on a single consumer GPU like the RTX 4090. This speed makes it one of the fastest available 720p@24FPS models and enables both industrial applications and academic research to benefit from technology.
Advanced VAE architecture for optimized compression
The TI2V 5B model is based on a highly efficient 3D VAE architecture with a compression ratio of 4 × 16 × 16, which increases the total information compression rate to 64. With an additional patchification layer, the total compression ratio of TI2V-5B even reaches 4 × 32 × 32, which ensures high-quality video reconstruction with minimal memory requirements.
This advanced compression technology enables the model to support both text-to-video and image-to-video tasks in a single, uniform framework, which covers both academic research and practical applications.
Benchmark performance and market position
WAN2.2 was tested against leading commercial AI video video models with the help of the new WAN-BENCH 2.0 evaluation suite, including Sora, Kling 2.0 and Hailuo 02. The results show that WAN2.2 achieves state-of-the-art performance in the majority of the categories and exceeds its high-level competitors.
In direct ranking comparison, WAN2.2-T2V-A14B secured first place in four of the six central benchmark dimensions, including aesthetic quality and motion dynamics. This performance establishes WAN2.2 as a new open source market leader in high-resolution videoogenization.
Open source availability and integration
WAN2.2 is available as a completely open source software under the Apache 2.0 license and can be downloaded via Hugging Face, Github and Modelscope. The models have already been integrated into popular frameworks such as Comfyui and diffusers, which enables seamless use in existing workflows.
Hugging Face Space is available for direct use for the TI2V 5B model, which means that users can try out the technology immediately without having to carry out complex installations. This accessibility democratizes access to state -of -the -art videoogenization technology and promotes innovation in the entire developer community.
China's strategic AI offensive
The publication of WAN2.2 is part of a wider Chinese open source AI strategy that has already attracted international attention with models like Deepseek. This strategy follows the official Chinese digitization plan, which has been promoting open source collaboration as a national resource since 2018 and provides for massive state investments in AI infrastructure.
Alibaba has already recorded over 5.4 million downloads of his WAN models on Hugging Face and Modelscope, which underlines strong international demand for Chinese open source AI solutions. The company is planning further investments of around $ 52 billion in cloud computing and AI infrastructure to consolidate its position in this rapidly growing market.
Suitable for:
WAN2.2 provides a breakthrough on AI videos: Open source at a professional level
WAN2.2 represents a turning point in AI videoogenization because it offers the first open source alternative to be paid, proprietary models that can compete with commercial solutions. The combination of cinematic quality, efficient hardware use and complete open source availability positions the model as an attractive alternative for content manufacturers, filmmakers and developers worldwide.
The publication is likely to intensify the competition in the field of AI videoogenization and could cause other companies to pursue similar open source strategies. With its ability to run on consumer hardware and deliver professional results, WAN2.2 has the potential to democratize video production and open up new creative opportunities.
Through the combination of advanced technology with open developmental philosophy, Alibaba with WAN2.2 sets new standards in AI videoogenization and establishes China as a leading force in global AI innovation. The far -reaching effects of this development will change the way in which videos are created and produced, in the coming years.
Suitable for:
Your AI transformation, AI integration and AI platform industry expert
☑️ Our business language is English or German
☑️ NEW: Correspondence in your national language!
I would be happy to serve you and my team as a personal advisor.
You can contact me by filling out the contact form or simply call me on +49 89 89 674 804 (Munich) . My email address is: wolfenstein ∂ xpert.digital
I'm looking forward to our joint project.