SIMA, the Scalable Instructable Multiworld Agent, was developed and tested across multiple video game environments in collaboration with eight game studios. It was designed to perform keyboard and mouse actions, evaluated on 600 basic skills, and tested for its ability to follow instructions across nearly 1500 unique tasks using human judgment. The study compared the performance of specialized and generalist SIMA agents, acknowledging the contributions of numerous authors and game developers.
Main Points
SIMA's Design
SIMA features pre-trained vision models and a main model with memory capability, enabling it to output keyboard and mouse actions.
Evaluation of SIMA
A comprehensive evaluation was carried out, benchmarking SIMA’s ability to interact with objects, use menus, and navigate, among other tasks, demonstrating a wide application scope.
Acknowledgements
Thanking numerous paper authors and game developers for their contributions highlights the collaborative nature of the project.
Insights
SIMA was trained and tested on nine different video games in collaboration with eight game studios.
We collaborated with eight game studios to train and test SIMA on nine different video games.
SIMA's performance was evaluated across 600 basic skills.
SIMA was evaluated across 600 basic skills, spanning navigation, object interaction, and menu use.
SIMA's ability to follow instructions was tested through nearly 1500 in-game tasks using human judges.
We evaluated SIMA’s ability to follow instructions to complete nearly 1500 unique in-game tasks, in part using human judges.
Performance of SIMA was compared between environment-specialized agents and generalist agents across multiple environments.
We compare the performance of environment-specialized SIMA agents with three types of generalist SIMA agents, each trained across multiple environments.