Latest AI News

arXiv cs.CL•about 6 hours ago·6 min read

Interpreting Brain Responses to Language with Sparse Features from Language Models

arXiv:2606.06857v1 Announce Type: new Abstract: A central goal of cognitive neuroscience is to characterize the features that are represented by human language cortex. Artificial language models (LMs) have emerged as a powerful tool to address this challenge, but studies relating biological and artificial representations are often criticized as relating one black box to another. The present work introduces Augmented Sparse Encoding Models, an encoding framework that replaces dense LM hidden states with hierarchically-organized sparse autoencoder (SAE) features, while explicitly including surprisal as a predictor. Using this approach, we (i) produce interpretations of neural responses and (ii) test whether model-brain alignment reflects primary or idiosyncratic variation in LM representations. Using a high-field 7T fMRI dataset of eight participants listening to 200 linguistically diverse sentences, we first validate our modeling framework by recovering previous interpretations of voxel populations tuned to processing difficulty and meaning abstractness. We then interpret a previously-uncharacterized (but reliable) voxel population and find that it is tuned to people-related content. Next, we show that the fronto-temporal human language network is predicted by a common set of features across its constituent regions, but find that frontal regions are relatively well-explained by surprisal alone, even in the absence of LM-based features. Finally, we show that brain responses during language processing are not merely predictable from an arbitrary set of LM features. Rather, brain responses are best explained by the features that tend to capture the most general information encoded in LM representations, suggesting a nontrivial correspondence between brain and LM language representation.

Latest News

ADAGE: Active Defenses Against GNN Extraction

Latest News

ADAGE: Active Defenses Against GNN Extraction

MoDA: Modulation Adapter for Fine-Grained Visual Grounding in Instructional MLLMs

When Better Codebooks Are Not Enough: Predictive Performance and Behavioral Reliability in LLM Political Event Coding

An Expanded Synthetic Conversation Dataset for Multi-Turn Smishing Detection

What Do People Actually Want From AI? Mapping Preference Plurality

HKJudge: A Legal Discourse-Annotated Corpus for Interpreting What Courts Find, How They Reason, and What They Rule

Data-Efficient Autoregressive-to-Diffusion Language Models via On-Policy Distillation

CRAFT: A Unified Counterfactual Reasoning Framework for Tabular Question Answering and Fact Verification

Interpreting Brain Responses to Language with Sparse Features from Language Models

Korean Culture into LLM Alignment: Toward Cultural Coherence

Geometry of Semantic Space: Comparative Study of Discrete and Continuous Models

From Correctness to Utility: Gain-Based Prefix Evaluation for LLM Reasoning

M$^3$Exam: Benchmarking Multimodal Memory for Realistic User-Agent Interactions

Contrastive Training with LLM-generated Near-Misses for Robust Code-Switching Speech Recognition

OpenHalDet: A Unified Benchmark for Hallucination Detection across Diverse Generation Scenarios

Modeling semantic association in self-paced reading with language model embeddings

Your UnEmbedding Matrix is Secretly a Feature Lens for Text Embeddings

mmPISA-bench: Do LLMs Reason Equally Well Across 43 Languages?

MADRAG: Multi-Agent Debate with Retrieval-Augmented Generation for Training-Free Analytic Essay Scoring

Sycophantic Praise: Evaluating Excessive Praise in Language Models