ViTacGen: Robotic Pushing with Vision-to-Touch Generation (IEEE RA-L 2025)

1King's College London, 2University of Bristol

Abstract

Robotic pushing is a fundamental manipulation task that requires tactile feedback to capture subtle contact forces and dynamics between the end-effector and the object. However, real tactile sensors often face hardware limitations such as high costs and fragility, and deployment challenges involving calibration and variations between different sensors, while vision-only policies struggle with satisfactory performance. Inspired by humans' ability to infer tactile states from vision, we propose ViTacGen, a novel robot manipulation framework designed for visual robotic pushing with vision-to-touch generation in reinforcement learning to eliminate the reliance on high-resolution real tactile sensors, enabling effective zero-shot deployment on visual-only robotic systems. Specifically, ViTacGen consists of an encoder-decoder vision-to-touch generation network that generates contact depth images, a standardized tactile representation, directly from visual image sequence, followed by a reinforcement learning policy that fuses visual-tactile data with contrastive learning based on visual and generated tactile observations. We validate the effectiveness of our approach in both simulation and real world experiments, demonstrating its superior performance and achieving a success rate of up to 86%.

Pipeline

Pipeline Overview
The workflow of our proposed ViTacGen comprises two components: a VT-Gen for vision-to-touch generation, and a VT-Con for reinforcement learning on visual and generated tactile contact depth images with contrastive learning.

Simulation Results

Simulation Results
We build our simulation environment based on Tactile Gym 2.

Real-World Setting

Real-World Setting
Our real-world setting includes an external RGB camera, a UR5e robot arm, an end-effector without tactile sensing for pushing only and the object for pushing.

Real-World Validation

Seen Objects
Seen Objects
Unseen Objects
Unseen Objects

Quantitative Results

Quantitative Results