According to Beating, researchers from University of Waterloo and Brown University introduced Planning at Inference, a new test-time scaling framework in a paper submitted to ICLR 2026. The framework applies AlphaGo's Monte Carlo Tree Search (MCTS) algorithm to long-form video generation for the first time, addressing semantic drift and error accumulation issues in traditional sequential generation methods.
In experiments using Nvidia's open-source Cosmos-Predict2 model, Planning at Inference generated coherent videos exceeding 20 seconds. The system outperformed baseline methods like Greedy Search and Beam Search in object persistence, temporal consistency, and text-video alignment. Compared to industry-leading closed-source models, videos generated by this method were 18% longer than Sora and 47% longer than Kling, with comparable visual fidelity. As a plug-and-play inference optimization, the framework requires no retraining of the underlying model.