Latest AI News

arXiv cs.LG•about 8 hours ago·5 min read

REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference

arXiv:2606.07141v1 Announce Type: new Abstract: Language models trained for clinical disease inference are trained on patient data, which may include sensitive and private information, and data owners may request the removal of their data from a trained model due to privacy or copyright concerns. However, exactly unlearning patient-specific data is intractable, and retraining with minor data removal is resource-intensive. While there exists several machine unlearning methods that can be used, their utility is generally restricted to non-medical domains. Moreover, the existing benchmarks for evaluating such unlearning methods primarily utilize synthetically curated datasets, which are not truly representative of real-world systems. Hence, the effectiveness of these unlearning methods in the medical domain is largely unclear. To this end, we introduce REMEDI, an extensive benchmark for machine unlearning tailored to multi-label and multiclass clinical disease inference, where label correlations, longitudinal structure, and safety constraints make unlearning particularly challenging. Unlike the existing benchmarks, REMEDI considers: (1) a relevant application domain (medical), (2) comprehensive unlearning setups involving diverse sets of forget instances, (3) challenging unlearning scenarios including multi-label and multi-class classification tasks, and (4) evaluation metrics involving performance both in terms of utility and extent of unlearning achieved. REMEDI is developed using the MIMIC-III clinical database that contains comprehensive clinical data of patients. Experiments with existing unlearning methods indicate that there exists a trade-off between utility and unlearning performance. They are also largely unsuited to multi-label classification tasks. To facilitate reproducibility, we make our benchmark publicly available.

Latest News

Learning Explicit Behavioral Models with Adaptive Questions and World-Model Probes

OffQ: Taming Structured Outliers in LLM Quantization by Offsetting

Latest News

Learning Explicit Behavioral Models with Adaptive Questions and World-Model Probes

OffQ: Taming Structured Outliers in LLM Quantization by Offsetting

REMEDI: A Benchmark for Retention and Unlearning Evaluation in Multi-label Clinical Disease Inference

RETROSPECT: RETROsynthesis via Sequential Prediction, and Chemically Transformed-ranking

A Held-Out Transition-Pair Falsifier for Long-Horizon Non-Abelian State Tracking

Trio: Learning Time-Series Forecasting with Temporal-Spatial-Sample Attention and Structural Causal Priors

CoMetaPNS: Continually Meta-learning Personalized Neural Surrogates for Cardiac Electrophysiology Simulations

Probabilistic learning to perform pre-onset individualised prediction of disease severity: application to Veno Occlusive Disease

CrowdMath: A Dataset of Crowdsourced Mathematical Research Discussions

Closed-Form Spectral Regularization for Multi-Task Model Merging

Self-evolving LLM agents with in-distribution Optimization

Generative Modeling of Discrete Latent Structures via Dynamic Policy Gradients

A Comprehensive Anatomy of Human and DeepSeek-R1 LLM Mathematical Reasoning

Sparsely gated tiny linear experts

Reversible Foundations: Training a 120B Sparse MoE through State-Preserving Scaling

Time series Foundation Models based on Physics-Informed Synthetic Histories for Cold-Start Photovoltaic Forecasting

Drifting Models for Surrogate Flow Modeling

SafeGene: Reusable Adapters for Transferable Safety Alignment

Attack Selection in Agentic AI Control Evaluations Meaningfully Decreases Safety

Does Topic Sentiment Cause Perceived Ideology? Comparing Human and LLM Annotations in Political News Articles