Researchers Apply AlphaGo's MCTS Algorithm to Video Generation, Surpassing Sora by 18% in Length

According to Beating, researchers from University of Waterloo and Brown University introduced Planning at Inference, a new test-time scaling framework in a paper submitted to ICLR 2026. The framework applies AlphaGo's Monte Carlo Tree Search (MCTS) algorithm to long-form video generation for the first time, addressing semantic drift and error accumulation issues in traditional sequential generation methods.

In experiments using Nvidia's open-source Cosmos-Predict2 model, Planning at Inference generated coherent videos exceeding 20 seconds. The system outperformed baseline methods like Greedy Search and Beam Search in object persistence, temporal consistency, and text-video alignment. Compared to industry-leading closed-source models, videos generated by this method were 18% longer than Sora and 47% longer than Kling, with comparable visual fidelity. As a plug-and-play inference optimization, the framework requires no retraining of the underlying model.

Disclaimer: The information on this page may come from third-party sources and is for reference only. It does not represent the views or opinions of Gate and does not constitute any financial, investment, or legal advice. Virtual asset trading involves high risk. Please do not rely solely on the information on this page when making decisions. For details, see the Disclaimer.
Comment
0/400
No comments