Jonathan Roberts

jonathan-roberts-square-profile.png

Cambridge, UK

I am a PhD student in the Machine Intelligence Laboratory at the University of Cambridge, supervised by Samuel Albanie (Google DeepMind), Kai Han (The University of Hong Kong), and Emily Shuckburgh (University of Cambridge). I am part of the Application of Artificial Intelligence for Environmenart oftal Risk (AI4ER) CDT.

My research focuses on evaluating, benchmarking and understanding the behaviour and capabilities of frontier models (VLMs, LLMs and LMMs). I am particularly interested in hard evals and the application of these models to the scientific and geospatial domains, as well as long-context settings. More recently, I've worked on low-latency browser automation and regularly update warpsurf.

Previously, I completed an MRes in Environmental Data Science at the University of Cambridge. Before this, I worked as a Systems Engineer in the aerospace industry. I initially completed a Master of Physics (BSc MPhys) at the University of Warwick, supervised by Don Pollacco and Marco Polin.



selected publications

  1. How Long Is a Piece of String? A Brief Empirical Analysis of Tokenizers
    Jonathan Roberts, Kai Han, and Samuel Albanie
    arXiv preprint arXiv:2601.11518, 2026
  2. ZeroBench: An Impossible Visual Benchmark for Contemporary Large Multimodal Models
    Jonathan Roberts, Mohammad Reza Taesiri, Ansh Sharma, Akash Gupta, Samuel Roberts, and 6 more authors
    arXiv preprint arXiv:2502.09696, 2025
  3. ACL
    beyond_outcomes.png
    Beyond Outcomes: Transparent Assessment of LLM Reasoning in Games
    Wenye Lin, Jonathan Roberts, Yunhan Yang, Samuel Albanie, Zongqing Lu, and 1 more author
    ACL, 2025
  4. Needle Threading: Can LLMs Follow Threads through Near-Million-Scale Haystacks?
    Jonathan Roberts, Kai Han, and Samuel Albanie
    ICLR, 2025
  5. GRAB: A Challenging GRaph Analysis Benchmark for Large Multimodal Models
    Jonathan Roberts, Kai Han, and Samuel Albanie
    ICCV, 2025
  6. SciFIBench: Benchmarking Large Multimodal Models for Scientific Figure Interpretation
    Jonathan Roberts, Kai Han, Neil Houlsby, and Samuel Albanie
    NeurIPS, 2024
  7. Charting New Territories: Exploring the Geographic and Geospatial Capabilities of Multimodal LLMs
    Jonathan Roberts, Timo Lüddecke, Rehan Sheikh, Kai Han, and Samuel Albanie
    In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, 2024
  8. GPT4GEO: How a Language Model Sees the World’s Geography
    Jonathan Roberts, Timo Lüddecke, Sowmen Das, Kai Han, and Samuel Albanie
    In NeurIPS Foundation Models for Decision Making Workshop, 2023
  9. SATIN: A Multi-task Metadataset for Classifying Satellite Imagery using Vision-language Models
    Jonathan Roberts, Kai Han, and Samuel Albanie
    In ICCV Towards the Next Generation of Compter Vision Datasets Workshop, 2023