RECALL proposes active recovery-data collection to fine-tune vision-language-action models

A new arXiv preprint introduces RECALL, an active continual-learning method for vision-language-action models that collects recovery demonstrations based on uncertainty. The authors say it fine-tunes more efficiently than passive data collection, but also triggers forgetting without mitigation.

A new arXiv preprint argues that robot policies built on vision-language-action models can be fine-tuned more efficiently when they are trained on recovery demonstrations chosen actively, rather than waiting for failures and collecting passive imitation data.

The paper, RECALL: Recovery Experience Collection for Active Lifelong Learning in Vision-Language-Action Models, was posted on June 22, 2026 by Ulas Berk Karli and Tesca Fitzgerald. It frames the work as an empirical study of active continual learning for autoregressive VLA systems.

How RECALL works

The basic idea is to use uncertainty signals to decide when to collect recovery experiences. Instead of relying only on demonstrations gathered after a robot policy breaks, RECALL targets situations where the model appears uncertain and then uses those moments to collect training data.

According to the paper, that approach makes fine-tuning more efficient than passive demonstration collection. The authors position the method as a way to reduce wasted data collection during robot policy adaptation.

The forgetting tradeoff

The paper also highlights the downside: training only on the newly collected recovery data leads to catastrophic forgetting. In other words, the model can get better at the new recovery behavior while losing performance on older skills.

To address that, the authors test continual-learning mitigations, including replay-based data mixing and elastic weight consolidation. The study sits in a fast-moving corner of robotics ML that is already debating how much pretrained VLA models forget during continual learning.

That context matters because a March 2026 arXiv paper, Pretrained Vision-Language-Action Models are Surprisingly Resistant to Forgetting in Continual Learning, reported that pretrained VLAs can be more robust to forgetting than expected, especially when replay is used. A 2024 paper on OpenVLA helped establish the broader open-source VLA fine-tuning setting, while a 2025 paper on inference-time introspection showed the field is increasingly interested in uncertainty and help-trigger signals.

What is still unknown

For now, RECALL is a preprint, not a peer-reviewed paper. No independent coverage was located in web search, and the open questions are practical ones: whether the authors release code or a project page, which robot tasks and datasets they used, how large the efficiency gains are, and how well different mitigation methods limit forgetting.

That makes the story a clear research peg rather than a settled result. The immediate next step is to watch for code, benchmarks, and follow-up reproduction from other robotics labs.

Revision note

Initial automated publication.