Progress Towards Robust and Deployable AI Detectors in the Real World

Date:


Abstract

In this talk I discuss my work on evaluating and improving the robustness of multimodal AI detection systems. I show that while AI detectors seem to thrive in in-domain settings, they frequently fail to generalize to generative models, domains, and adversarial attacks not seen at training time. I also discuss how we may be able to use bottlenecks to overcome these limitations. In addition, I show how we can tune classification thresholds to help combat fairness issues in classifiers. Finally, I end with a vision for the future real world application of detectors and how we should embed these models in larger systems to prevent downstream harms.

Location

This talk was given on February 20th, 2025 at the Stanford University Hoover Institute in Palo Alto, CA as part of the weekly Challenges and Safeguards against AI-Generated Disinformation Seminar.

Slides