Roam Bounty Program #019 – $5000
Overview
Roam is a research lab building its first social product: a mobile game builder that lets anyone create multiplayer games in minutes. We run targeted bounties to solve complex technical challenges that unlock scalability and extensibility in our stack.
This bounty is focused on building a text-to-audio generation system for games.
Problem Statement
Modern game creation tools let anyone design visuals quickly, but audio (background music and sound effects) remains a bottleneck. Players expect audio that matches the vibe, timing, and feel of a game. Static libraries of sound files don’t provide the flexibility needed for dynamic, player-driven content.
We want a system that can generate high-quality, perfectly loopable background music and unique, optimized SFX clips directly from a text prompt. The system should:
- Match Vibe and Context: Understand the style and emotional tone of the game (e.g., “retro cyberpunk arcade”, “calm mountain temple”).
- Generate Event-Aware Sounds: Ensure timing matches gameplay events (e.g., jump sound ≤ jump duration, punch impact syncs with animation).
- Produce Clean Audio: Sounds must start exactly at the point of play (no silence or fade-in unless explicitly requested) and be well-compressed for mobile/web.
- Handle Both Music and SFX: Support dynamic background tracks as well as individual action-based effects.
Context and References
Current Gaps in Audio Generation
- Many AI audio systems produce long, unstructured clips that are unsuitable for real-time games.
- Game-ready audio requires:
- Loopability: background tracks must loop and change seamlessly, relevant to the environment player is in..
- Timing Precision: SFX must match frame-accurate events.
- Optimization: mobile-ready compression without artifacts.
Scope of Deliverable
- Background music should adapt to the described setting, mood, and game genre.
- SFX generation must be parameterized: type (jump, punch, pickup), duration, and context.