1 code implementation • 23 Oct 2024 • Juyong Lee, Dongyoon Hahm, June Suk Choi, W. Bradley Knox, Kimin Lee
In this work, we introduce MobileSafetyBench, a benchmark designed to evaluate the safety of device-control agents within a realistic mobile environment based on Android emulators.
no code implementations • 25 Apr 2024 • Juyong Lee, Taywon Min, Minyong An, Dongyoon Hahm, Haeone Lee, Changyeon Kim, Kimin Lee
In this work, we introduce B-MoCA: a novel benchmark with interactive environments for evaluating and developing mobile device control agents.