publications

2025

  1. ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
    Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta, Samuel Roberts, and 6 more authors
    arXiv preprint arXiv:2502.09696, 2025
  2. Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
    Jonathan Roberts, Kai Han, and Samuel Albanie
    ICLR, 2025

2024

  1. Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games
    Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, and 1 more author
    arXiv preprint arXiv:2412.13602, 2024
  2. GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
    Jonathan Roberts, Kai Han, and Samuel Albanie
    arXiv preprint arXiv:2408.11817, 2024
  3. SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
    Jonathan Roberts, Kai Han, Neil Houlsby, and Samuel Albanie
    NeurIPS, 2024
  4. Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
    Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, and Samuel Albanie
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024

2023

  1. GPT4GEO: How a Language Model Sees the World’s Geography
    Jonathan Roberts, Timo Lüddecke, Sowmen Das, Kai Han, and Samuel Albanie
    In NeurIPS Foundation Models for Decision Making Workshop, 2023
  2. SATIN: A Multi-task Metadataset for Classifying Satellite Imagery using Vision-language Models
    Jonathan Roberts, Kai Han, and Samuel Albanie
    In ICCV Towards the Next Generation of Compter Vision Datasets Workshop, 2023