The official Website for Responsibly Building Generative Models Tutorial at ICCV 2025.
Date and Time : TBD
Zoom link: TBD
Pwd: TBD
Hosted by Changhoon Kim (SSU), Wanyu Du (AWS), Dongyoon Yang (SK Hynix), Kyle Min (Intel Labs),
Maitreya Patel (ASU) and
Yezhou Yang(ASU)
Over the past few years, text-to-image (T2I) and image-to-text (I2T) vision-language generative models have evolved from research prototypes to widely deployed systems, transforming fields such as entertainment, journalism, and education. While these models enable compelling applications, they also introduce critical challenges related to robustness, controllability, and ethical risks. They can generate unauthorized content, including explicit or copyright-protected material, violating ethical and legal guidelines. Additionally, they suffer from prompt misalignment, biases, factual inconsistencies, and hallucinations, raising serious reliability concerns in high-stakes applications. Despite extensive guardrails, adversarial actors continue to develop jailbreaking techniques, leading to an ongoing arms race between security measures and attacks. Addressing these vulnerabilities demands a systematic analysis of failure modes, rigorous mitigation strategies, and deeper theoretical insights into adversarial robustness. Advancing our understanding of these challenges is essential for ensuring the safe, reliable, and responsible deployment of vision-language generative models.
In this tutorial, we will provide a comprehensive exploration of vision-language generative models, covering both key challenges and recent advancements:
Each segment of the tutorial comprises two key components: (1) an in-depth lecture covering foundational methodologies, significant research findings, and relevant literature, and (2) interactive hands-on sessions, where applicable. The hands-on sessions will allow participants to directly experiment with vision-language generative models, providing practical insights into the reliability challenges discussed in the lectures. By analyzing failure modes firsthand, attendees will gain a deeper understanding of both theoretical frameworks and real-world implications.
Time (UTC-10) | Topic | Presenter |
---|---|---|
10 min | Welcome Message and Tutorial Overview | Changhoon Kim (Assistant Professor, Soongsil University) |
15 min | Reliability Challenges and Advances in Vision-Language Generative Models |
![]() (Assistant Professor, Soongsil University) |
50+5 min | Concept Erasure in Text-to-Image Diffusion Models and Jailbreak Mitigation |
TBD |
50+5 min | Adversarial Attacks and Robustness in Foundation Models |
![]() (Applied Scientist, AWS) |
50+5 min | Theoretical Foundations of Robustness in Vision-Language Generative Models |
![]() (Research Scientist, SK Hynix) |
10 min | Concluding Remarks and Future Direction |
![]() (Associate Professor, ASU) |
Minjun So (SSU)
This website will be updated closer to the event date.