rvlgm

Tutorial on Responsible Vision-Language Generative Models @ ICCV 2025

The official Website for Responsibly Building Generative Models Tutorial at ICCV 2025.

Date and Time : 10/19 Sunday, 1pm - 5pm

Location: 316B

Zoom link: https://us06web.zoom.us/j/84288623315?pwd=As0O2exWbUnQ29Xj7TqRTlG3OZHhay.1

Pwd: iccv2025

Hosted by Changhoon Kim (SSU), Wanyu Du (AWS), Dongyoon Yang (SK Hynix), Kyle Min (Oracle), Maitreya Patel (ASU) and
Yezhou Yang(ASU)

Agenda

Over the past few years, text-to-image (T2I) and image-to-text (I2T) vision-language generative models have evolved from research prototypes to widely deployed systems, transforming fields such as entertainment, journalism, and education. While these models enable compelling applications, they also introduce critical challenges related to robustness, controllability, and ethical risks. They can generate unauthorized content, including explicit or copyright-protected material, violating ethical and legal guidelines. Additionally, they suffer from prompt misalignment, biases, factual inconsistencies, and hallucinations, raising serious reliability concerns in high-stakes applications. Despite extensive guardrails, adversarial actors continue to develop jailbreaking techniques, leading to an ongoing arms race between security measures and attacks. Addressing these vulnerabilities demands a systematic analysis of failure modes, rigorous mitigation strategies, and deeper theoretical insights into adversarial robustness. Advancing our understanding of these challenges is essential for ensuring the safe, reliable, and responsible deployment of vision-language generative models.

In this tutorial, we will provide a comprehensive exploration of vision-language generative models, covering both key challenges and recent advancements:

Reliability concerns in the real-world deployment of state-of-art models,
Failure models, including adversarial vulnerabilities and prompt misalignment, along with their broader implications,
Robustness and security strategies, with theoretical insights into adversarial training and multi-modal reliability,
Techniques for improving controllability, efficiency, and trustworthiness in vision-language generation.

Each segment of the tutorial comprises two key components: (1) an in-depth lecture covering foundational methodologies, significant research findings, and relevant literature, and (2) interactive hands-on sessions, where applicable. The hands-on sessions will allow participants to directly experiment with vision-language generative models, providing practical insights into the reliability challenges discussed in the lectures. By analyzing failure modes firsthand, attendees will gain a deeper understanding of both theoretical frameworks and real-world implications.

Tentative Schedule

Time (UTC-10)	Topic	Presenter
10 min	Welcome Message and Tutorial Overview	Changhoon Kim (Assistant Professor, Soongsil University)
15 min	Reliability Challenges and Advances in Vision-Language Generative Models	Changhoon Kim (Assistant Professor, Soongsil University)
50+5 min	Enhancing Generative Model Safety through Machine Unlearning	Sijia Liu (Michigan State University)
50+5 min	Adversarial Attacks and Robustness in Foundation Models	Wanyu Du (Applied Scientist, AWS)
50+5 min	Theoretical Foundations of Robustness in Vision-Language Generative Models	Dongyoon Yang (Research Scientist, SK Hynix)
10 min	Concluding Remarks and Future Direction	Yezhou Yang (Associate Professor, ASU)

Student Co-organizers

Minjun So, Yoonha Seo, Hyewon Shin (SSU)

This website will be updated closer to the event date.