Why AI-Generated Videos Struggle with Realistic Motion

AI-generated videos are visually stunning but falter when it comes to motion. Researchers explore why and unveil a strategy to improve realism.

AI systems capable of generating high-quality videos from simple text prompts have made significant strides, producing results that are eye-catching, customizable, and inexpensive. However, despite their ability to create visually appealing frames, users often find that something feels 'off' when watching AI-generated videos. The root of this problem is motion—or rather, its lack of authenticity—and researchers are exploring innovative ways to address it.

The Allure and Limitations of AI-Generated Videos

AI models for video generation are undeniably groundbreaking. They can deliver videos that look nearly flawless in terms of individual frames' quality and photorealism. According to a light transport researcher, these advancements are particularly impressive given that they took AI systems only a few years to master what many traditional professionals spent decades perfecting. However, the illusion falls apart when those frames are played in sequence. Movements that should feel natural instead appear unnerving—disrupting immersion for viewers.

Many AI developers believe that throwing more data and computational power at the problem will resolve these imperfections. Increasing an AI's computing capacity does indeed show incremental improvement. For instance, one AI model demonstrated significantly better motion continuity as compute resources scaled up to 32 times the base level. Despite this correlation, the approach is far from universally effective. Simply feeding more training data into the system often fails to teach AI models how real-world physics works.

Bad Data, Worse Motion

Rather than insufficient data, the real challenge lies in the quality of training data. AI systems learn by observing patterns from their datasets. However, these datasets may include conflicting or unrealistic examples—often derived from animations or cartoons that violate the laws of physics. For instance, cartoon characters defy gravity by floating mid-air or bounce in ways that resemble rubber. When an AI model internalizes such examples, its understanding of motion becomes skewed.

One particularly compelling research study attempted to solve this problem by trimming low-quality, unrealistic training examples from the dataset. The researchers hypothesized that less, but higher-quality, data could improve learning. Their hypothesis proved correct. By filtering out problematic training examples and focusing on “clean” data, the AI showed significant improvements in generating realistic motion. Even deceptively simple movements, such as a spinning coin, looked far better when the model was fine-tuned with well-curated data.

Cutting Junk to Enhance Learning

The breakthrough in this study wasn’t just about removing bad examples—it involved isolating motion from other visual aspects using a method called optical flow. Optical flow is a time-tested technique for tracking how objects move between frames. By applying an optical flow mask—not to the video itself but to the AI's learning signals—the researchers were able to analyze precisely which parts of the training data influenced the AI’s motions. This allowed the team to identify and cut out the negative influences while keeping only the useful examples.

However, AI models are notoriously large and complex. Many modern models contain over one billion parameters, which makes storing and analyzing decisions for thousands of videos computationally prohibitive. To tackle this, the study used a technique called the Johnson–Lindenstrauss projection. This data compression method reduces high-dimensional data into a much smaller space while preserving the relative relationships between numbers. In this case, over 1 billion parameters were compressed into just 512, making the process far more efficient without sacrificing accuracy.

The Results: Better Motion, Proven Effectiveness

The outcomes were clear and measurable. The researchers performed a rigorous user study, asking participants to compare 50 videos generated by different methods. Across 850 individual comparisons, the improved method achieved a decisive 74.1% preference rate over the original, less-refined approach. This is no small feat, given the subjective and varied nature of human perception.

A Lesson Beyond AI: Quality Over Quantity

The significance of this study goes beyond improving AI-generated videos. It offers broader insights into how learning works—both for machines and for humans. The researchers emphasize that adding more information doesn’t always result in better outcomes. In fact, excessive or low-quality information can distort understanding rather than refine it. The key is to focus on fewer, high-quality examples to achieve better results—a lesson that applies as much to human education as it does to training AI models.

Industry Implications

The findings are particularly timely as video-generation AI tools become more prevalent across creative industries. While motion defects currently limit their applications in professional filmmaking or realistic video simulations, the methods outlined in this research could pave the way for substantial improvements. Moreover, these techniques hint at potential advancements in other AI applications where realism and fidelity are crucial, such as autonomous driving simulations and virtual reality training environments.

What’s Next?

The researchers have promised to release their code freely, ensuring that others can build upon their work. This openness aligns with ongoing efforts in the AI community to make systems more interpretable and controllable. As techniques like these mature, they could fundamentally reshape how AI models are trained—not by drowning them in data, but by feeding them smarter, curated inputs.

For now, while AI-generated videos may still struggle with motion realism, this study represents a major step forward. By focusing on the quality of training data and refining the way AI learns motion, researchers are charting a path toward more immersive and believable AI creations. The future of AI-generated content, it seems, lies not in overwhelming power, but in precision and strategy.