Publications

  • [P2] Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs
    Sangyeon Yoon*, Wonje Jeung*, Yoonjun Cho, Dongjae Jeon, and Albert No
    Preprint, UNDER REVIEW, 2026 [pdf]
    TL;DR: We show that DPO fine-tuning can create a hard-to-audit jailbreak risk: just 10 harmless preference pairs that prefer helpful answers over refusals can broadly suppress refusal behavior on harmful prompts.
  • [P1] BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs
    Sangyeon Yoon, Sunkyoung Kim, Hyesoo Hong, Wonje Jeung, Yongil Kim, Wooseok Seo, Heuiyeen Yeen, and Albert No
    Preprint, UNDER REVIEW, 2026 [pdf]
  • [C7] Position: The Term “Machine Unlearning” Is Overused in LLMs
    Sangyeon Yoon*, Yeachan Jun*, and Albert No
    ICML, 2026
  • [C6] DUSK: Do not unlearn shared knowledge
    Wonje Jeung*, Sangyeon Yoon*, Hyesoo Hong*, Soeun Kim, Seungju Han, Youngjae Yu, and Albert No
    ACL, Findings, 2026 [pdf]
  • [T2] K-EXAONE Technical Report
    LG AI Research
    Technical Report, 2026 [pdf]
  • [T1] EXAONE 4.5 Technical Report
    LG AI Research
    Technical Report, 2026 [pdf]
  • [C5] Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures
    Sangyeon Yoon, Hyesoo Hong, Wonje Jeung, and Albert No
    ICLR, 2026 [pdf]
  • [C4] A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models
    Wonje Jeung*, Sangyeon Yoon*, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, and Albert No
    ICLR, 2026 [pdf]
  • [C3] R-TOFU: Unlearning in Large Reasoning Models
    Sangyeon Yoon, Wonje Jeung, and Albert No
    EMNLP, Main, 2025 [pdf]
  • [C2] SEPS: A Separability Measure for Robust Unlearning in LLMs
    Wonje Jeung*, Sangyeon Yoon*, and Albert No
    EMNLP, Main, 2025 [pdf]
  • [C1] SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment
    Wonje Jeung, Sangyeon Yoon, Minsuk Kang, and Albert No
    NeurIPS, 2025 [pdf]
  • [W1] Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios
    Sangyeon Yoon*, Wonje Jeung*, and Albert No
    NeurIPS WORKSHOP (SFLLM), 2024 [pdf]