Learning adaptive policies that can transfer to unseen environments remains challenging in visual generalizable learning (RL). Existing methods commonly learn a robust policy via data augmentation and domain randomization for better generalization. Limited by the unobservability of the target environment, these methods are unable to utilize reward signals for model fine-tune and adaptatively transfer into new scenarios. In this work, we first investigate how a visual RL agent would benefit from the Test-time Adaptation. Surprisingly, we find that the optimization on the Batch-Normalization layer could significantly improve the generalization of visual RL. Hence, we propose a lightweight test-time adaptation algorithm Mix Test-time Batch Normalization (MixTBN), which adaptively transfer the learnt policy into unseen environments without any additional parameter. By solely mixing the statistics of the Batch Normalization layers, our method achieves a state-of-the-art performance on two robotic manipulation tasks. Extensive ablation experiments demonstrate the effectiveness of each component of our method.
MixTBN: A Fully Test-Time Adaptation Method for Visual Reinforcement Learning on Robotic Manipulation
2023-10-11
2618619 byte
Conference paper
Electronic Resource
English
Reinforcement learning for robotic manipulation ; Reinforcement learning för manipulering med robot
BASE | 2017
|A review on reinforcement learning for contact-rich robotic manipulation tasks
BASE | 2023
|A review on reinforcement learning for contact-rich robotic manipulation tasks
BASE | 2022
|