Latest AI News

arXiv cs.AI•about 8 hours ago·6 min read

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

arXiv:2603.13259v2 Announce Type: replace-cross Abstract: When a decoder-only transformer is forced to process matched correct and incorrect single-token continuations of a factual query, the two pathways through hidden-state space diverge in a specific way: displacement vectors from the query-only representation maintain approximately equal magnitude but rotate apart in direction. The angular separation grows through mid-depth, and late layers resolve the asymmetric outcome -a logit-lens preference that, in the incorrect run, falls far below the naive prior of equal probability, corresponding to the model assigning approximately 11.5 times more probability to the incorrect token than to the correct one. We characterize this two-phase pattern-rotational divergence in mid-depth followed by late-layer asymmetric commitment-as the empirical geometric signature of what looks externally like the model rejecting a wrong continuation, while remaining explicit that it is an observational characterization, not a causal account. The pattern is consistent across six decoder-only transformers including five architecture families from 1B to 13B parameters. A seventh model (Qwen2 1.5B) shows a flat profile under the present extraction protocol that is plausibly a tokenizer-fragmentation artefact rather than a real scale floor; the question of an emergence threshold is left open. Single-layer activation patching does not recover the correct token at any layer band, meaning the late-layer asymmetry is not localized to a discrete component under the protocol used. Taken together, the evidence is consistent with a distributed-by-trajectory account of factual constraint processing-geometric structure that emerges cumulatively across many layers rather than from a single localized circuit and inconsistent with the simplest single-layer localized-recall account.

arXiv cs.AI•about 8 hours ago·4 min read

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

arXiv:2603.25184v2 Announce Type: replace-cross Abstract: Reinforcement learning (RL) has become essential for post-training large language models (LLMs) in reasoning tasks. While scaling rollouts can stabilize training and enhance performance, the computational overhead is a critical issue. In algorithms like GRPO, multiple rollouts per prompt incur prohibitive costs, as a large portion of prompts provide negligible gradients and are thus of low utility. To address this problem, we investigate how to select high-utility prompts before the rollout phase. Our experimental analysis reveals that sample utility is non-uniform and evolving: the strongest learning signals concentrate at the ``learning edge", the intersection of intermediate difficulty and high uncertainty, which shifts as training proceeds. Motivated by this, we propose HIVE (History-Informed and online-VErified prompt selection), a dual-stage framework for data-efficient RL. HIVE utilizes historical reward trajectories for coarse selection and employs prompt entropy as a real-time proxy to prune instances with stale utility. By evaluating HIVE across multiple math reasoning benchmarks and models, we show that HIVE yields significant rollout efficiency without compromising performance.

arXiv cs.AI•about 8 hours ago·4 min read

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

arXiv:2605.11855v2 Announce Type: replace-cross Abstract: Sequence learning is dominated by Transformers and parallelizable recurrent neural networks (RNNs) such as state-space models, yet learning long-term dependencies remains challenging, and state-of-the-art designs trade power consumption for performance. The Bistable Memory Recurrent Unit (BMRU) was introduced to enable hardware-software co-design of ultra-low power RNNs: quantized states with hysteresis provide persistent memory while mapping directly to analog primitives. However, BMRU performance lags behind parallelizable RNNs on complex sequential tasks. In this paper, we identify gradient blocking during state updates as a key limitation and propose a cumulative update formulation that restores gradient flow while preserving persistent memory, creating skip-connections through time. This leads to the Cumulative Memory Recurrent Unit (CMRU) and its relaxed variant, the $\alpha$CMRU. Experiments show that the cumulative formulation dramatically improves convergence stability and reduces initialization sensitivity. The CMRU and $\alpha$CMRU match or outperform Linear Recurrent Units (LRUs) and minimal Gated Recurrent Units (minGRUs) across diverse benchmarks at small model sizes, with particular advantages on tasks requiring discrete long-range retention, while the CMRU retains quantized states, persistent memory, and noise-resilient dynamics essential for analog implementation.

arXiv cs.AI•about 8 hours ago·5 min read

CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

arXiv:2605.21854v2 Announce Type: replace-cross Abstract: Vision-Language-Action (VLA) models have rapidly converged on a small set of architectural patterns: discrete-token autoregression (e.g. OpenVLA) and continuous-action flow-matching (e.g. pi-0.5). Yet preference alignment via Direct Preference Optimisation (DPO) -- the de-facto post-training step in language models -- has been studied almost exclusively on autoregressive VLAs. We present CrossVLA, an empirical study of cross-paradigm VLA post-training. Three contributions: (i) a surrogate flow-matching log-probability estimator that lets DPO operate on continuous-action backbones without probability-flow ODE integration; (ii) a head-to-head comparison of LoRA and DoRA as the parameter-efficient layer for VLA DPO, finding DoRA improves over OpenVLA SFT by a mean +10.4 pp across LIBERO 4-suite (600 trials, 3 seeds) -- per-suite +20.0 Object, +11.0 Long-horizon, +8.0 Goal, +2.7 Spatial -- with zero seed variance on Object (38/50 on each of 3 seeds); (iii) an inference-time anatomy showing the denoise loop dominates 78.6% of sample_actions latency and prefix-K/V caching a la VLA-Cache caps at a 21% acceleration ceiling -- both chunk-level and token-level cache strategies degrade success rate to 0-80% in our benchmarks. We further pretrain a multi-view + temporal projection head on 6000 LIBERO frames, achieving 99.5% k-NN recall@1 for same-task retrieval (36x over random), available as a downstream initialisation. All code, ckpts, training logs, and reproduction scripts are open at https://github.com/lz-googlefycy/vla-lab.

arXiv cs.AI•about 8 hours ago·3 min read

Locality-Aware Redundancy Pruning for LLM Depth Compression

arXiv:2605.27786v2 Announce Type: replace-cross Abstract: Large language models are known to contain representational redundancy across network depth, making depth pruning an effective approach for improving inference efficiency. Existing one-shot pruning methods rely on local layer importance or fixed redundancy assumptions across architectures. We propose Locality-Aware Redundancy Pruning (LoRP), a training-free one-shot depth pruning framework guided by representation locality. We show that inter-layer redundancy can be either localized or globally distributed depending on the LLM architecture. To characterize this phenomenon, we introduce Representation Locality Score (RLS), derived from global inter-layer hidden-state similarity. Using a small calibration set, LoRP computes pairwise layer similarity, clusters layers by representational similarity, and allocates pruning according to residual intra-cluster redundancy. Experiments across diverse LLM families show improvements in both perplexity and downstream task accuracy. Official github repository: https://github.com/daniel-eai/LoRP-Locality-Aware-Redundancy-Pruning/

arXiv cs.AI•about 8 hours ago·6 min read

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

arXiv:2605.28510v2 Announce Type: replace-cross Abstract: Large language models (LLMs) for code completion and generation are increasingly used in software development, yet they may reproduce training examples verbatim and without authorship attribution, raising legal and ethical concerns around plagiarism and license compliance. Classical fingerprint-based plagiarism detectors based on fingerprinting, such as Winnowing, remain highly effective, yet the inspection requires comparing fragments of code to the entire training set, and their linear-time search makes them impractical for the billion-scale corpora used to train modern code LLMs. To bridge this gap, we introduce SOURCETRACKER, a 300M-parameter encoder tailored for code retrieval, together with a hybrid two-stage provenance-tracking pipeline HYBRIDSOURCETRACKER (HST). HST first narrows down a small set of candidate snippets via vector search, then re-ranks those candidates using Winnowing on exact fingerprints. We train and evaluate our system on a 10M-snippet subset of the THESTACKV2 dataset, with both verbatim and adapted snippets that emulate realistic identifier renaming. On an in vitro 100k-snippet search space with adapted queries, our hybrid approach reaches a mean reciprocal rank on par with Winnowing for 30-token fragments. Then, starting from windows >= 60 tokens, it consistently over-performs by up to 5.4% while preserving logarithmic-time query complexity. In a complementary evaluation using an LLM-based judge, we find that many retrieved snippets not labeled as ground truth are still highly similar to the expected sources, particularly with longer context windows, and thus remain useful for end users. Overall, our results demonstrate that integrating vector search with fingerprinting enables scalable, high-precision provenance tracking for code produced by LLMs.

arXiv cs.LG•about 8 hours ago·5 min read

SPIN: Decentralized Swarm Control via Tensorized Policy Coordination

arXiv:2606.07557v1 Announce Type: new Abstract: Decentralized multi-agent swarm coordination on resource-constrained edge platforms remains fundamentally bottlenecked by the exponential scaling of joint action spaces and high-latency communication overhead. This paper introduces the Swarm Policy Interference Network (SPIN) framework, an architectural paradigm that bypasses these limitations by modeling swarm topologies as a compressed tensor network. We factorize the joint policy tensors of local multi-agent cliques into Matrix Product State (MPS) chains, reducing the computational complexity of evaluation from an exponential $O(n^m)$ wall to a strictly linear $O(m \cdot n \cdot \chi^2)$ constraint. To bridge local continuous spatial geometry with this discrete algebraic backend without requiring power-intensive online training loops, we introduce a decoupled, hybrid neuro-symbolic control pipeline. Local multi-layered neural networks operate as structural coordination encoders, pre-trained offline to nonlinearly map hand-engineered geometric descriptors into abstract environmental target measures. At runtime, edge agents execute instantaneous behavioral adaptations by applying the Radon-Nikod\'ym derivative directly as a zero-shot importance-reweighting filter. We validate the framework within a discrete-time multi-agent simulation sandbox spanning tracking, decentralized dispersion/area coverage, and multi-goal coordination regimes. Qualitative telemetry demonstrates that the integrated pipeline achieves stable target-directed motion, anti-collapse spatial spreading under decentralized constraints, and structured subgroup formation across multiple targets, providing a mathematically grounded route to tractable, low-power edge swarm intelligence.

arXiv cs.LG•about 8 hours ago·4 min read

When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery

arXiv:2606.07576v1 Announce Type: new Abstract: We present CARTOGRAPH, a verification layer for AI scientists that couples unresolved-subspace experiment steering (select), explicit ambiguity closure (resolve), and residual-based library inadequacy detection (refuse). Under a local linear-Gaussian bridge, raw unresolved projection is the isotropic unresolved Fisher-information trace, while CARTOGRAPH-A is the exact unresolved A-optimal rule; closed-form EIG and Box-Hill arise as local comparators rather than global equivalents. Across five testbeds, CARTOGRAPH-A beats raw projection 129W/0T/15L at d = 8 (p < 10^-21) in a replicated structured cascade. More distinctively, the framework tentatively identifies three out-of-library pharmacokinetic mechanisms and then revokes those identifications as residuals expose structural misfit, while one perturbed in-library control stays identified throughout. In low-dimensional pharmacokinetic and filtered EPA settings, near-ties against disagreement are predicted by theory and observed. Finally, in a retrospective audit of 40 positive claims from the published A-Lab autonomous materials system, the refuse guard flags all 4 claims later marked inconclusive under manual reanalysis while passing 32/36 confirmed claims. Code is available at https://github.com/ai4science-boed/cartograph.git

arXiv cs.LG•about 8 hours ago·5 min read

QDSP: An Interpretable Structured Learning Framework for Predicting Death or Cerebral Palsy in Very Low Birth Weight Infants

arXiv:2606.07606v1 Announce Type: new Abstract: Very low birth weight infants (VLBWI) are at high risk of mortality and severe neurodevelopmental impairment, including cerebral palsy, yet reliable discharge-time prognostic stratification remains challenging in high-dimensional and data-limited clinical settings. To address this problem, we propose QDSP, an interpretable structured learning framework that integrates Quota-guided Subspace Sampling (QSS) and Differentiable-decision-guided Structure Perception (DSP). The QSS module constructs stability-aware and low-redundancy feature subspaces through bootstrap-based feature consistency estimation, whereas the DSP module employs differentiable soft oblique decision structures to model nonlinear clinical interactions while preserving traceable decision evidence. The proposed framework was evaluated on a real-world VLBWI cohort comprising 51 infants and further validated on three public medical tabular datasets. On the primary cohort, QDSP achieved an accuracy of 0.9200 and an AUC of 0.9714, outperforming representative machine learning and deep tabular learning baselines, including XGBoost, TabNet, and TabPFN. Across external datasets, QDSP maintained competitive discrimination and calibration under varying sample sizes and clinical distributions. In addition, SHAP-based analyses and differentiable decision-path tracing identified clinically relevant predictors, including cystic periventricular leukomalacia (cPVL) and birth weight, consistent with established neonatal pathophysiological evidence. These results suggest that QDSP provides an interpretable and robust framework for discharge-time risk stratification in VLBWI and may support early individualized clinical decision-making in neonatal intensive care settings.

arXiv cs.LG•about 8 hours ago·5 min read

Measuring Poverty and Inequality with Reduced Data: A Machine Learning Approach Using Nigerian Household Data

arXiv:2606.07614v1 Announce Type: new Abstract: Reliable measurement of income and consumption is essential for monitoring poverty and inequality in low- and middle-income countries, yet full household surveys are costly and difficult to implement regularly. This paper examines whether reduced survey instruments can preserve key distributional information. We apply Random Forest Recursive Feature Elimination (RF-RFE) to the 2018/19 Nigeria General Household Survey-Panel to identify the income sources, consumption categories and household characteristics that best classify individuals within the welfare distribution. The analysis focuses on three outcomes: poverty status, location in the quintile distribution and position relative to the Gini-based inequality line. The survey's post-planting and post-harvest periods allow us to assess performance under different seasonal contexts. Results show that RF-RFE achieves strong classification accuracy with few predictors. For consumption, poverty status and inequality-line position are accurately predicted using a small set of expenditure categories, while quintile classification reaches about 80 percent accuracy for seasonal consumption and 60--65 percent for annual consumption predicted from a single seasonal visit. For income, poverty status reaches around 90 percent accuracy with five predictors, and inequality-line position is largely captured by labour earnings. The findings suggest that machine-learning methods can help improve survey design and reduce data requirements while retaining much of the distributional information needed to measure and monitor poverty and inequality.

arXiv cs.AI•about 8 hours ago·3 min read

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

arXiv:2602.08222v2 Announce Type: replace Abstract: As post-training optimization becomes central to improving large language models, we observe a persistent saturation bottleneck: once models grow highly confident, further training yields diminishing returns. While existing methods continue to reinforce target predictions, we find that informative supervision signals remain latent in models' own historical weak states. Motivated by this observation, we propose WMSS (Weak Agents Can Make Strong Agents Stronger), a post-training paradigm that leverages weak checkpoints to guide continued optimization. By identifying recoverable learning gaps via entropy dynamics and reinforcing them through compensatory learning, WMSS enables strong agents to improve beyond conventional post-training saturation. Experiments on mathematical reasoning and code generation datasets show that agents trained with our approach achieve effective performance improvements, while incurring zero additional inference cost.

arXiv cs.LG•about 8 hours ago·4 min read

STARIXNet: Multivariate and Multi-attribute Deep Learning Approach to Real-Time Resource Allocation in Cloud Platforms

arXiv:2606.07565v1 Announce Type: new Abstract: Intelligent scaling of microservices in cloud platforms is crucial for mitigating escalating compute costs while avoiding service disruptions. Current solutions are limited to the univariate space, typically focusing on CPU usage alone to drive scaling decisions. Moreover, they address the problem as a purely forecasting task, focusing on prediction precision while neglecting the greater risks of underestimation and delays in system responsiveness. Alternative solutions are computationally complex, making them impractical for large-scale, real-time deployments. To address these challenges, we present STARIXNet, a lightweight neural network that guides resource allocation decisions in the multivariate space by capturing spatio-temporal relationships among multiple system metrics. STARIXNet models multiple quasi-dependent attributes, in particular the (S)easonal, (T)emporal, (A)uto-(R)egressive (I)ntegrated, and e(X)ogenous patterns, then implements an aggregation policy to finalize scaling decisions, prioritizing service stability, followed by cost-efficiency, over raw forecast accuracy. We empirically demonstrate the performance of STARIXNet by benchmarking against existing solutions in real-world settings. STARIXNet is deployed for critical production microservices at Walmart achieving tangible savings ranging from 10\% to 50\%, in addition to intangible benefits through improved service stability and customer experience.

arXiv cs.LG•about 8 hours ago·4 min read

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

arXiv:2606.07587v1 Announce Type: new Abstract: LLM routing has become a popular approach to improve the cost-quality trade-off of LLM services by dynamically selecting a model for each query. Recent work has explored a broad range of routing methods, including clustering-based routers, learned classifiers, pairwise ranking, and confidence-based approaches. Our extensive study of 21 routing methods across five benchmarks reveals a consistent phenomenon that we call the routing plateau: many methods, including kNN, achieve very similar accuracy and converge to a narrow performance range that remains far below the oracle router. Our investigation shows that the plateau is largely caused by a predictability bottleneck: current routers mainly learn global averaged model-performance trends rather than fine-grained query-specific routing signals. As a result, they solve overlapping easy queries but collectively fail on hard queries that require instance-specific routing decisions. We further study how to move beyond the plateau and find that larger training datasets, stronger encoders, and end-to-end fine-tuning can further improve routing accuracy. These findings characterize the common limits of current routing methods and provide insights and actionable directions for the community to build more effective routing systems.

arXiv cs.LG•about 8 hours ago·4 min read

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

arXiv:2606.07592v1 Announce Type: new Abstract: Offline reinforcement learning requires careful conservatism to mitigate distribution shift, yet most existing methods apply a fixed penalty uniformly across all states regardless of local data coverage. We present UNIQ (Uncertainty-Informed Quantile), an offline RL method that introduces state-adaptive conservatism through conformally calibrated uncertainty estimation. Built on the Implicit Q-Learning (IQL) backbone, UNIQ trains a multi-expectile value ensemble, computes distribution-free uncertainty estimates using split conformal prediction, and maps the resulting signal to a state-dependent expectile that relaxes conservatism in well-covered regions while strengthening it in uncertain regions near the data frontier. On D4RL MuJoCo benchmarks, UNIQ consistently improves over IQL, with the largest gains observed on Walker2d and replay-heavy tasks. At the same time, UNIQ operates at near-IQL memory cost (approximately 250 MB peak VRAM), providing roughly a 10x reduction compared to EDAC. Rather than pursuing overall state-of-the-art performance, we position UNIQ as a practical mechanism contribution that improves the performance-efficiency trade-off in offline reinforcement learning.

arXiv cs.LG•about 8 hours ago·5 min read

Finite Certificates for In-Context Determinacy and a Threshold Theory of Emergence in Language Models

arXiv:2606.07623v1 Announce Type: new Abstract: This paper develops a model-theoretic framework for verifying context-conditioned language-model behavior by replacing benchmark labels with finite semantic certificates. The first problem is finite determinacy: when do examples in a context force the answer to a query without changing model parameters? In finite-field linear task families, we prove an exact row-space criterion, compute the residual hypothesis count, derive full and query-local identification curves, and show that extracting a smallest forcing subcontext is NP-complete even for binary outputs. The second problem is threshold emergence: when does an apparent benchmark jump reflect a semantic transition rather than a discontinuity of the scoring map? We prove an anti-mirage theorem separating thresholded metrics from semantic confidence and give a rate-sensitive crossing bound for latent commitments becoming visible above threshold. The common semantic object is a confidence functional on definable events. We show that it is a Boolean probability measure, equivalently a Keisler measure on the relevant type space, whose measure-one formulas form a proper filter and whose Stone-space representation is invariant under definitional expansion. The resulting calculus provides finite context certificates, pair-separator hitting sets, query teaching dimension, prompt-preservation criteria, and scale-limit witnesses. Exact-arithmetic ancillary scripts reproduce the finite-field and threshold calculations and generate the data used by the figures.

arXiv cs.AI•about 8 hours ago·4 min read

Seq103: A Unified Neuroevolution Framework for Compact Sequence Architecture Discovery

arXiv:2606.07664v1 Announce Type: cross Abstract: Neuroevolution is a representative neural architecture search paradigm that evolves both network topology and weights through evolutionary algorithms. In this paper, we propose Seq103, a unified NEAT-style neuroevolution framework for compact sequence architecture discovery. Seq103 consists of a shared evolutionary backbone and an optional recurrent extension. The shared backbone includes an elementary node-and-connection representation, per-class RMSE-based evaluation, mutation-based evolution with class-wise recombination, and elitism. The optional hidden-state mechanism extends the search space with hidden-state nodes and hidden connections, enabling temporal memory when step-wise recurrent inference is required. With this design, Seq103 applies the same core search pipeline to both step-wise recurrent and sample-wise feedforward sequence classification. In recurrent tasks, the hidden-state extension is enabled to provide temporal memory; in feedforward tasks, it is disabled while the shared evolutionary backbone remains unchanged. We evaluate Seq103 on 8 text classification datasets and the full UCRArchive2018 benchmark with 128 univariate time-series datasets. On step-wise tasks, Seq103 retains 86.96% of the best-baseline accuracy on average while using 34.6x to 3218.0x fewer parameters. On sample-wise tasks over the full UCRArchive2018 benchmark, Seq103 retains 81.95% of the best-baseline accuracy on average while using 11.8x to 160,601.0x fewer parameters.

arXiv cs.LG•about 8 hours ago·4 min read

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

arXiv:2606.07596v1 Announce Type: new Abstract: Fine-tuning often introduces spurious correlations alongside task knowledge, causing systematic failures on underrepresented groups. Existing mitigations require retraining, group labels, or curated counterfactual data. We show a simple post-hoc intervention reduces shortcut reliance without any of these: truncating the tail of the SVD of $\Delta W = W_\mathrm{ft} - W_\mathrm{base}$ reduces the spurious-group gap while preserving task accuracy. Across three instruction-tuned models ($0.5$B--$7$B) and four classification benchmarks, top-$k$ truncation reduces the gap on every cell at $<2$ pp accuracy loss, by up to $5\times$ on CivilComments. We propose this works because the shortcut response sits in the tail of the singular ordering of $\Delta W$, a claim about how truncation behaves rather than about the raw singular values, which are broadly distributed and look the same across all four datasets. A controlled boundary case in which fine-tuning has only a shortcut to learn shows the predicted FT-to-base collapse, and bottom-/random-$k$ and matched-rank LoRA controls rule out generic low-rank approximation and rank-constrained training as the explanation. We read this as preliminary evidence that the singular basis of $\Delta W$ is a useful coordinate system for studying what fine-tuning has learned.

arXiv cs.LG•about 8 hours ago·4 min read

Position: Genomic Model Research Must Move Beyond Anecdotal Evaluation of Interpretability Methods

arXiv:2606.07607v1 Announce Type: new Abstract: Advances in machine learning and computational power have unlocked the predictive potential of the human genome, yet biologists now demand that these models also elucidate the underlying biological mechanisms. While interpretable machine learning (IML) techniques have been increasingly applied to bridge this gap, there has been a pervasive reliance on anecdotal validation: the vast majority of research relies on a single IML method and reports only isolated successful instances. Through a benchmarking study on transcription factor binding, we demonstrate the risks of current practices. We show that different IML methods can often (1) yield contradictory explanations for the same predictions, (2) fail to localize known regulatory motifs, and (3) fail to faithfully reflect the model's internal decision process. In light of this, we argue for a validation framework analogous to clinical trials: just as trials require rigorous design and adverse-event reporting, genomic interpretability must move beyond cherry-picked plausibility toward systematic assessment of consistency, faithfulness, and biological validity. To facilitate this, we propose a tiered framework to guide rigorous evaluation and reporting of genomic IML methods.

arXiv cs.LG•about 8 hours ago·4 min read

Learning Transfers: Kan Extensions for Neural Invariants

arXiv:2606.07627v1 Announce Type: new Abstract: Transfer learning presumes that a representation learned on source tasks carries structure that remains usable on related target tasks. Standard evaluations probe this through target accuracy or distributional discrepancy, yet leave unspecified which structural invariant is meant to transfer. We supply that invariant categorically. A source task category $\mathcal A$, a target task category $\mathcal B$, and a task-change functor $J:\mathcal A\to\mathcal B$ determine, for every invariant-valued source representation $F:\mathcal A\to\mathcal V$, the universal transferred invariant $\operatorname{Lan}J F$. Given a target invariant $G:\mathcal B\to\mathcal V$, we define the transfer discrepancy $\operatorname{Comp}J(F,G)=\sup{b\in\operatorname{Ob}(\mathcal B)} d{\mathcal V}\bigl((\operatorname{Lan}_J F)(b),G(b)\bigr)$, evaluating transfer not by an objectwise comparison of source and target, but by comparing the target invariant against the one forced by the prescribed task transformation. We prove finite cokernel formulas for $(\operatorname{Lan}_J F)(b)$ in chain complexes and persistence modules, indexed by the comma category $J\downarrow b$. For persistence-valued finite-type one-parameter invariants, the discrepancy is computed exactly by bottleneck distances between barcodes. Controlled experiments on neural latent point clouds then test whether the score recovers the correct task functor and flags representation collapses that preserve classification accuracy while destroying transfer-relevant topology.

arXiv cs.LG•about 8 hours ago·4 min read

Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment

arXiv:2606.07632v1 Announce Type: new Abstract: Proper accounting of the energy requirements and environmental impact of artificial intelligence (AI) systems is necessary for researchers, developers, policy makers, and users to assess the barriers to building systems at scale. With the growing complexity of pipelines and underlying infrastructure needed to develop and deploy AI systems, previous approaches for evaluating AI efficiency which focus on the costs of a single training run or an individual inference prediction are no longer sufficient. In this position paper, we enunciate the need for applying life cycle assessment to evaluate the costs of the machine learning model development and deployment pipeline to properly account for the required resources and downstream impact. Life cycle assessments enable the incorporation of costs across the full life cycle of an AI system and its underlying infrastructure, from the embodied costs associated with the physical computing hardware through the operational costs in training and inference.

Latest News

Latest News

How Transformers Reject Wrong Answers: Rotational Dynamics of Factual Constraint Processing

Train at Moving Edge: Online-Verified Prompt Selection for Efficient RL Training of Large Reasoning Model

Improving the Performance and Learning Stability of Parallelizable RNNs Designed for Ultra-Low Power Applications

CrossVLA: Cross-Paradigm Post-Training and Inference Optimization for Vision-Language-Action Models

Locality-Aware Redundancy Pruning for LLM Depth Compression

Efficient and Scalable Provenance Tracking for LLM-Generated Code Snippets

SPIN: Decentralized Swarm Control via Tensorized Policy Coordination

When Should an AI Scientist Stop? Verifiable Experiment Steering and Refusal for Autonomous Discovery

QDSP: An Interpretable Structured Learning Framework for Predicting Death or Cerebral Palsy in Very Low Birth Weight Infants

Measuring Poverty and Inequality with Reduced Data: A Machine Learning Approach Using Nigerian Household Data

Weak-Driven Learning: How Weak Agents make Strong Agents Stronger

STARIXNet: Multivariate and Multi-attribute Deep Learning Approach to Real-Time Resource Allocation in Cloud Platforms

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

UNIQ: Conformal Calibration for Adaptive Conservatism in Offline Reinforcement Learning

Finite Certificates for In-Context Determinacy and a Threshold Theory of Emergence in Language Models

Seq103: A Unified Neuroevolution Framework for Compact Sequence Architecture Discovery

Shortcuts in the Tail: Debiasing via Post-Hoc Spectral Compression of Fine-Tuning Updates

Position: Genomic Model Research Must Move Beyond Anecdotal Evaluation of Interpretability Methods

Learning Transfers: Kan Extensions for Neural Invariants

Evaluation of ML Resource Utilization Requires Model Life Cycle Assessment