Yesterday marked a significant moment for Microsoft Xbox as they introduced Muse, described as “a generative AI model crafted for gameplay ideation.” Accompanying this reveal was an open-access article on Nature.com and a corresponding blog post, complete with a YouTube video. Now, if you’re scratching your head over the term “gameplay ideation,” you’re not alone. Microsoft explains it as the generation of “game visuals, controller actions, or both.” However, its current capabilities are fairly limited and certainly don’t replace the traditional game development process.
There’s still some compelling data to note, though. Muse’s training took place on a large scale using H100 GPUs, requiring around a million updates to stretch just one second of actual gameplay into nine more seconds of simulated gameplay that’s both responsive and true to the engine. Most of this training data originated from existing multiplayer sessions.
Instead of simply running this on a standard PC, Microsoft needed to rely on a cluster of 100 Nvidia H100 GPUs. This choice significantly increased costs and power consumption, yet managed to deliver an output resolution of only 300×180 pixels, extending gameplay by approximately nine seconds.
A particularly intriguing demonstration by the Muse team involved replicating existing props and enemies within a game environment, with Muse capable of mimicking their functionality. But, considering the hefty investment in hardware, energy, and AI training, one can’t help but wonder why not just opt for the straightforward approach of using existing development tools to add enemies or props?
Even though it’s impressive to see Muse handling object permanence and emulating the original game’s behavior to an extent, the end results still seem inefficient when compared to proven methods in the video game development industry.
Perhaps future iterations of Muse will achieve more groundbreaking accomplishments. For now, it joins a long list of ventures attempting to simulate gameplay entirely through AI. And while maintaining a degree of engine accuracy and object permanence is commendable, the whole process feels sub-optimal for developing, testing, or playing video games. After diving deep into the details, I’m at a loss as to why anyone would opt for this method over tried-and-true techniques.