Nvidia Introduces 'NitroGen': A Foundation Model for Generalist Gaming Agents
Nvidia researchers developed NitroGen, a vision-action foundation model trained on 40,000 hours of gameplay videos across 1,000+ games, to create generalist AI agents that can operate in unknown virtual environments.
Why it matters
This research demonstrates a scalable pipeline for training generalist AI agents that can operate in unknown virtual environments, a key step towards more capable and versatile embodied AI.
Key Points
- 1NitroGen is a vision-action foundation model trained on internet-scale gameplay video data
- 2It exhibits strong competence across diverse gaming domains like combat, platforming, and exploration
- 3NitroGen transfers effectively to unseen games, outperforming models trained from scratch
- 4The dataset, evaluation suite, and model weights are open-sourced to advance research on generalist AI agents
Details
The NitroGen model was developed by NVIDIA researchers to address the data bottleneck in training embodied AI agents. By automatically extracting player actions from 40,000 hours of publicly available gameplay videos across over 1,000 games, the team was able to create a large-scale video-action dataset to train a unified vision-action model. This 'scale is all you need' approach, similar to the success of large language models, allows the NitroGen model to exhibit strong competence across diverse gaming domains like combat, high-precision control, and exploration. Notably, NitroGen is able to effectively transfer to unseen games, achieving up to 52% relative improvement in task success rates over models trained from scratch. By open-sourcing the dataset, evaluation suite, and model weights, the researchers have lowered the barrier to entry for the community to fine-tune these foundation models for new tasks, accelerating progress towards universally capable AI agents.
No comments yet
Be the first to comment