1 | I-Chun (Arthur) Liu

CLAMP: Contrastive Learning for 3D Multi‑View Action‑Conditioned Robotic Manipulation Pretraining

Mon, 02 Feb 2026 01:00:00 +0000

D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation

Wed, 07 May 2025 01:00:00 +0000

Abstract

Learning bimanual manipulation is challenging due to its high dimensionality and tight coordination required between two arms. Eye-in-hand imitation learning, which uses wrist-mounted cameras, simplifies perception by focusing on task-relevant views. However, collecting diverse demonstrations remains costly, motivating the need for scalable data augmentation. While prior work has explored visual augmentation in single-arm settings, extending these approaches to bimanual manipulation requires generating viewpoint-consistent observations across both arms and producing corresponding action labels that are both valid and feasible. In this work, we propose Diffusion for COordinated Dual-arm Data Augmentation (D-CODA), a method for offline data augmentation tailored to eye-in-hand bimanual imitation learning that trains a diffusion model to synthesize novel, viewpoint-consistent wrist-camera images for both arms while simultaneously generating joint-space action labels. It employs constrained optimization to ensure that augmented states involving gripper-to-object contacts adhere to constraints suitable for bimanual coordination. We evaluate D-CODA on 5 simulated and 3 real-world tasks. Our results across 2250 simulation trials and 300 real-world trials demonstrate that it outperforms baselines and ablations, showing its potential for scalable data augmentation in eye-in-hand bimanual manipulation. Our project website is at: https://dcodaaug.github.io/D-CODA/.

Accepted to Conference on Robot Learning (CoRL), 2025.

Authors: I-Chun Arthur Liu, Jason Chen, Gaurav S. Sukhatme, Daniel Seita.

VoxAct‐B: Voxel‐Based Acting and Stabilizing Policy for Bimanual Manipulation

Thu, 05 Sep 2024 01:00:00 +0000

Abstract

Bimanual manipulation is critical to many robotics applications. In contrast to single-arm manipulation, bimanual manipulation tasks are challenging due to higher-dimensional action spaces. Prior works leverage large amounts of data and primitive actions to address this problem, but may suffer from sample inefficiency and limited generalization across various tasks. To this end, we propose VoxAct-B, a language-conditioned, voxel-based method that leverages Vision Language Models (VLMs) to prioritize key regions within the scene and reconstruct a voxel grid. We provide this voxel grid to our bimanual manipulation policy to learn acting and stabilizing actions. This approach enables more efficient policy learning from voxels and is generalizable to different tasks. In simulation, we show that VoxAct-B outperforms strong baselines on fine-grained bimanual manipulation tasks. Furthermore, we demonstrate VoxAct-B on real-world Open Drawer and Open Jar tasks using two UR5s. Code, data, and videos are available at https://voxact-b.github.io.

Accepted to Conference on Robot Learning (CoRL), 2024.

Authors: I-Chun Arthur Liu, Sicheng He, Daniel Seita*, Gaurav S. Sukhatme*.

* denotes equal advising.

Blog post: https://rasc.usc.edu/blog/voxact-b/.

Twitter thread: https://x.com/arthur801031/status/1851072842114482222.

Learning Robot Manipulation from Cross-Morphology Demonstration

Sat, 25 Feb 2023 01:00:00 +0000

Abstract

Some Learning from Demonstrations (LfD) methods handle small mismatches in the action spaces of the teacher and student. Here we address the case where the teacher’s morphology is substantially different from that of the student. Our framework, Morphological Adaptation in Imitation Learning (MAIL), bridges this gap allowing us to train an agent from demonstrations by other agents with significantly different morphologies. MAIL learns from suboptimal demonstrations, so long as they provide some guidance towards a desired solution. We demonstrate MAIL on manipulation tasks with rigid and deformable objects including 3D cloth manipulation interacting with rigid obstacles. We train a visual control policy for a robot with one end-effector using demonstrations from a simulated agent with two end-effectors. MAIL shows up to 24% improvement in a normalized performance metric over LfD and non-LfD baselines. It is deployed to a real Franka Panda robot, handles multiple variations in properties for objects (size, rotation, translation), and cloth-specific properties (color, thickness, size, material).

Accepted to Conference on Robot Learning (CoRL), 2023.

Authors: Gautam Salhotra *, I-Chun Arthur Liu *, Gaurav S. Sukhatme.

* indicates equal contribution.

Learning Deformable Object Manipulation from Expert Demonstrations

Mon, 20 Jun 2022 01:00:00 +0000

Abstract

We present a novel Learning from Demonstration (LfD) method, Deformable Manipulation from Demonstrations (DMfD), to solve deformable manipulation tasks using states or images as inputs, given expert demonstrations. Our method uses demonstrations in three different ways, and balances the trade-off between exploring the environment online and using guidance from experts to explore high dimensional spaces effectively. We test DMfD on a set of representative manipulation tasks for a 1-dimensional rope and a 2-dimensional cloth from the SoftGym suite of tasks, each with state and image observations. Our method exceeds baseline performance by up to 12.9% for state-based tasks and up to 33.44% on image-based tasks, with comparable or better robustness to randomness. Additionally, we create two challenging environments for folding a 2D cloth using image-based observations, and set a performance benchmark for them. We deploy DMfD on a real robot with a minimal loss in normalized performance during real-world execution compared to simulation (~6%).

Accepted to IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) and IEEE Robotics and Automation Letters (RA-L), 2022.

Authors: Gautam Salhotra *, I-Chun Arthur Liu *, Marcus Dominguez-Kuhne, Gaurav S. Sukhatme.

* indicates equal contribution.

Also published in ICRA 2022 2nd Workshop on Representing and Manipulating Deformable Objects.

Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation

Fri, 12 Nov 2021 01:00:00 +0000

Abstract

Learning complex manipulation tasks in realistic, obstructed environments is a challenging problem due to hard exploration in the presence of obstacles and high-dimensional visual observations. Prior work tackles the exploration problem by integrating motion planning and reinforcement learning. However, the motion planner augmented policy requires access to state information, which is often not available in the real-world settings. To this end, we propose to distill a state-based motion planner augmented policy to a visual control policy via (1) visual behavioral cloning to remove the motion planner dependency along with its jittery motion, and (2) vision-based reinforcement learning with the guidance of the smoothed trajectories from the behavioral cloning agent. We evaluate our method on three manipulation tasks in obstructed environments and compare it against various reinforcement learning and imitation learning baselines. The results demonstrate that our framework is highly sample-efficient and outperforms the state-of-the-art algorithms. Moreover, coupled with domain randomization, our policy is capable of zero-shot transfer to unseen environment settings with distractors. Code and videos are available at https://clvrai.com/mopa-pd.

Accepted to Conference on Robot Learning (CoRL), 2021.

Authors: I-Chun Arthur Liu*, Shagun Uppal*, Gaurav S. Sukhatme, Joseph J. Lim, Peter Englert, Youngwoon Lee.

* indicates equal contribution.