Sangyeon Yoon

Few-Shot Truly Benign DPO Attack for Jailbreaking LLMs

Sangyeon Yoon*, Wonje Jeung*, Yoonjun Cho, Dongjae Jeon, and Albert No

Preprint

We show that DPO fine-tuning can create a hard-to-audit jailbreak risk: just 10 harmless preference pairs that prefer helpful answers over refusals can broadly suppress refusal behavior on harmful prompts.

VLMs Trace Without Tracking: Diagnosing Failures in Visual Path Following

Hyesoo Hong, Minsoo Kim, Wonje Jeung, Sangyeon Yoon, Dongjae Jeon, and Albert No

Preprint

Paper

We show that state-of-the-art VLMs still struggle with visual path following: even in controlled line-tracing tasks, nearby similar distractors often pull models onto the wrong path, and scaling, reasoning, or explicit tracing instructions do not fully fix this local-competition failure.

BenchPreS: A Benchmark for Context-Aware Personalized Preference Selectivity of Persistent-Memory LLMs

Sangyeon Yoon, Sunkyoung Kim, Hyesoo Hong, Wonje Jeung, Yongil Kim, Wooseok Seo, Heuiyeen Yeen, and Albert No

Preprint

Previously at ICML 2026@SCALE (Oral Presentation)

Paper

We introduce BenchPreS, a benchmark for context-aware preference selectivity in persistent-memory LLMs, testing whether models apply user preferences only when appropriate and suppress them in formal communication contexts.

Position: The Term "Machine Unlearning" Is Overused in LLMs

Sangyeon Yoon*, Yeachan Jun*, and Albert No

ICML, 2026

DUSK: Do not unlearn shared knowledge

Wonje Jeung*, Sangyeon Yoon*, Hyesoo Hong*, Soeun Kim, Seungju Han, Youngjae Yu, and Albert No

ACL, 2026 (Findings)

Paper

We introduce DUSK, a realistic LLM unlearning benchmark that tests whether methods can remove specific forget data while preserving shared knowledge that also appears in retain data.

K-EXAONE Technical Report

LG AI Research

Technical Report, 2026

Paper

We introduce K-EXAONE, a 236B-parameter MoE multilingual foundation model from LG AI Research that activates 23B parameters, supports 256K context, and targets frontier-level Korean, multilingual, reasoning, coding, agentic, and safety performance.

EXAONE 4.5 Technical Report

LG AI Research

Technical Report, 2026

Paper

We introduce EXAONE 4.5, LG AI Research's first open-weight VLM, integrating a 1.2B vision encoder into EXAONE 4.0 to improve document understanding, Korean contextual reasoning, multilingual multimodal ability, and long-context industrial use cases.

Rethinking Benign Relearning: Syntax as the Hidden Driver of Unlearning Failures

Sangyeon Yoon, Hyesoo Hong, Wonje Jeung, and Albert No

ICLR, 2026

Paper

We introduce a new perspective on benign relearning in machine unlearning, showing that syntactic similarity rather than topical overlap is the main cause of relearning failures, and propose syntactic diversification to improve forgetting robustness and utility preservation.

A2D: Any-Order, Any-Step Safety Alignment for Diffusion Language Models

Wonje Jeung*, Sangyeon Yoon*, Yoonjun Cho, Dongjae Jeon, Sangwoo Shin, Hyesoo Hong, and Albert No

ICLR, 2026

Paper

We introduce A2D, a token-level safety alignment method for diffusion LLMs that emits [EOS] whenever harmful content appears at any masked position, defending against any-order and any-step attacks while preserving utility.

R-TOFU: Unlearning in Large Reasoning Models

Sangyeon Yoon, Wonje Jeung, and Albert No

EMNLP, 2025 (Main)

Paper

We introduce R-TOFU, the first benchmark for unlearning in Large Reasoning Models, showing that forgotten knowledge can remain in CoT traces even when final answers appear erased, and proposing Reasoned IDK to better balance forgetting and reasoning utility.

SEPS: A Separability Measure for Robust Unlearning in LLMs

Wonje Jeung*, Sangyeon Yoon*, and Albert No

EMNLP, 2025 (Main)

Paper

We introduce SEPS, a mixed-query evaluation framework for LLM unlearning that tests whether models can forget targeted knowledge while retaining unrelated knowledge within the same prompt, and propose Mixed Prompt unlearning to improve this separability.

SAFEPATH: Preventing Harmful Reasoning in Chain-of-Thought via Early Alignment

Wonje Jeung, Sangyeon Yoon, Minsuk Kang, and Albert No

NeurIPS, 2025

Paper

We introduce SAFEPATH, a lightweight safety-alignment method for Large Reasoning Models that fine-tunes the model to emit a short 8-token "Safety Primer" at the start of reasoning for harmful prompts, reducing harmful outputs and jailbreak success while preserving reasoning ability.

Adversarial Sample-Based Approach for Tighter Privacy Auditing in Final Model-Only Scenarios

Sangyeon Yoon*, Wonje Jeung*, and Albert No

NeurIPS WORKSHOP (SFLLM), 2024

Paper

We introduce an adversarial sample-based privacy auditing method that crafts worst-case inputs from the final model to obtain tighter empirical lower bounds for DP-SGD privacy leakage, showing that canary-based final-model audits can underestimate leakage.

	EXAONE Lab, LG AI Research Research Intern Mentor: Sunkyoung Kim	Seoul, South Korea Sep 2025 ~ Feb 2026
	AI-ISL Lab, Yonsei University Research Student Advisor: Albert No	Seoul, South Korea Mar 2024 ~ Present

Sangyeon Yoon

News

Research Experience

Publications

Preprints

Peer-Reviewed

Academic Services

NeurIPS 2026

Blog

Coming Soon