Roam is a research lab building its first social product: a mobile game builder that lets anyone create multiplayer games in minutes. We run targeted bounties to solve complex technical challenges that unlock scalability and extensibility in our stack.
This bounty is focused on developing a Spatial Reasoning Engine, a critical component of our AI pipeline designed to analyze gameplay screenshots and videos. The primary goal is to accurately convert 2D screen-space object positions into 3D world-space coordinates, effectively bridging the gap between visual analysis and spatial understanding. The budget for this bounty is $8,000.
Our AI pipeline excels at visual analysis of gameplay footage, identifying objects, and understanding on-screen events. However, a significant gap exists in translating this 2D understanding into a coherent 3D representation of the game world. The core challenges to be addressed are:
Our current ecosystem consists of several key components that the Spatial Reasoning Engine will need to integrate with:
The central task of this bounty is to build the "Spatial Reasoning Engine" which acts as the missing link in our data flow pipeline. This engine will take the 2D screen-space positions and object data from roam-game-analysis and convert them into accurate 3D world-space positions, which are then used to generate a dynamic scene graph in JSON format.