<?xml version="1.0" encoding="utf-8" standalone="yes"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>1 | I-Chun (Arthur) Liu</title><link>https://arthurliu.netlify.app/publication-type/1/</link><atom:link href="https://arthurliu.netlify.app/publication-type/1/index.xml" rel="self" type="application/rss+xml"/><description>1</description><generator>Wowchemy (https://wowchemy.com)</generator><language>en-us</language><lastBuildDate>Mon, 02 Feb 2026 01:00:00 +0000</lastBuildDate><image><url>https://arthurliu.netlify.app/media/icon_hu01dd9009026c55abe17115c16e2bf2fe_85898_512x512_fill_lanczos_center_3.png</url><title>1</title><link>https://arthurliu.netlify.app/publication-type/1/</link></image><item><title>CLAMP: Contrastive Learning for 3D Multi‑View Action‑Conditioned Robotic Manipulation Pretraining</title><link>https://arthurliu.netlify.app/publication/clamp/</link><pubDate>Mon, 02 Feb 2026 01:00:00 +0000</pubDate><guid>https://arthurliu.netlify.app/publication/clamp/</guid><description/></item><item><title>D-CODA: Diffusion for Coordinated Dual-Arm Data Augmentation</title><link>https://arthurliu.netlify.app/publication/d-coda/</link><pubDate>Wed, 07 May 2025 01:00:00 +0000</pubDate><guid>https://arthurliu.netlify.app/publication/d-coda/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="../../uploads/featured_d-coda.png" alt="D-CODA" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2>Abstract&lt;/h2>
Learning bimanual manipulation is challenging due to its high dimensionality and tight coordination required between two arms. Eye-in-hand imitation learning, which uses wrist-mounted cameras, simplifies perception by focusing on task-relevant views. However, collecting diverse demonstrations remains costly, motivating the need for scalable data augmentation. While prior work has explored visual augmentation in single-arm settings, extending these approaches to bimanual manipulation requires generating viewpoint-consistent observations across both arms and producing corresponding action labels that are both valid and feasible. In this work, we propose Diffusion for COordinated Dual-arm Data Augmentation (D-CODA), a method for offline data augmentation tailored to eye-in-hand bimanual imitation learning that trains a diffusion model to synthesize novel, viewpoint-consistent wrist-camera images for both arms while simultaneously generating joint-space action labels. It employs constrained optimization to ensure that augmented states involving gripper-to-object contacts adhere to constraints suitable for bimanual coordination. We evaluate D-CODA on 5 simulated and 3 real-world tasks. Our results across 2250 simulation trials and 300 real-world trials demonstrate that it outperforms baselines and ablations, showing its potential for scalable data augmentation in eye-in-hand bimanual manipulation. Our project website is at: &lt;a href="https://dcodaaug.github.io/D-CODA/">https://dcodaaug.github.io/D-CODA/&lt;/a>.
&lt;br>
&lt;br>
&lt;p>Accepted to &lt;b>&lt;em>Conference on Robot Learning (CoRL)&lt;/em>&lt;/b>, 2025.&lt;/p>
&lt;p>Authors: I-Chun Arthur Liu, &lt;a href="https://jasoonchen.com/" target="_blank" rel="noopener">Jason Chen&lt;/a>, &lt;a href="https://robotics.usc.edu/~gaurav/" target="_blank" rel="noopener">Gaurav S. Sukhatme&lt;/a>, &lt;a href="https://danielseita.github.io/" target="_blank" rel="noopener">Daniel Seita&lt;/a>.&lt;/p></description></item><item><title>VoxAct‐B: Voxel‐Based Acting and Stabilizing Policy for Bimanual Manipulation</title><link>https://arthurliu.netlify.app/publication/voxact-b/</link><pubDate>Thu, 05 Sep 2024 01:00:00 +0000</pubDate><guid>https://arthurliu.netlify.app/publication/voxact-b/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="../../uploads/featured_voxact-b.png" alt="VoxAct-B" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2>Abstract&lt;/h2>
Bimanual manipulation is critical to many robotics applications. In contrast to single-arm manipulation, bimanual manipulation tasks are challenging due to higher-dimensional action spaces. Prior works leverage large amounts of data and primitive actions to address this problem, but may suffer from sample inefficiency and limited generalization across various tasks.
To this end, we propose VoxAct-B, a language-conditioned, voxel-based method that leverages Vision Language Models (VLMs) to prioritize key regions within the scene and reconstruct a voxel grid.
We provide this voxel grid to our bimanual manipulation policy to learn acting and stabilizing actions.
This approach enables more efficient policy learning from voxels and is generalizable to different tasks. In simulation, we show that VoxAct-B outperforms strong baselines on fine-grained bimanual manipulation tasks. Furthermore, we demonstrate VoxAct-B on real-world Open Drawer and Open Jar tasks using two UR5s.
Code, data, and videos are available at &lt;a href="https://voxact-b.github.io">https://voxact-b.github.io&lt;/a>.
&lt;br>
&lt;br>
&lt;p>Accepted to &lt;b>&lt;em>Conference on Robot Learning (CoRL)&lt;/em>&lt;/b>, 2024.&lt;/p>
&lt;p>Authors: I-Chun Arthur Liu, &lt;a href="https://hesicheng.net/" target="_blank" rel="noopener">Sicheng He&lt;/a>, &lt;a href="https://danielseita.github.io/" target="_blank" rel="noopener">Daniel Seita&lt;/a>*, &lt;a href="https://robotics.usc.edu/~gaurav/" target="_blank" rel="noopener">Gaurav S. Sukhatme&lt;/a>*.&lt;/p>
&lt;p>* denotes equal advising.&lt;/p>
&lt;p>Blog post: &lt;a href="https://rasc.usc.edu/blog/voxact-b/">&lt;a href="https://rasc.usc.edu/blog/voxact-b/" target="_blank" rel="noopener">https://rasc.usc.edu/blog/voxact-b/&lt;/a>&lt;/a>.&lt;/p>
&lt;p>Twitter thread: &lt;a href="https://x.com/arthur801031/status/1851072842114482222" target="_blank" rel="noopener">https://x.com/arthur801031/status/1851072842114482222&lt;/a>.&lt;/p></description></item><item><title>Learning Robot Manipulation from Cross-Morphology Demonstration</title><link>https://arthurliu.netlify.app/publication/mail/</link><pubDate>Sat, 25 Feb 2023 01:00:00 +0000</pubDate><guid>https://arthurliu.netlify.app/publication/mail/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="../../uploads/featured-mail.gif" alt="Morphological Adaption in Imitation Learning (MAIL)" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2>Abstract&lt;/h2>
&lt;p style="text-align: justify!important">
&lt;p>Some Learning from Demonstrations (LfD) methods handle small mismatches in the action spaces of the teacher and student. Here we address the case where the teacher’s morphology is substantially different from that of the student. Our framework, &lt;b>Morphological Adaptation in Imitation Learning (MAIL)&lt;/b>, bridges this gap allowing us to train an agent from demonstrations by other agents with significantly different morphologies. MAIL learns from suboptimal demonstrations, so long as they provide some guidance towards a desired solution. We demonstrate MAIL on manipulation tasks with rigid and deformable objects including 3D cloth manipulation interacting with rigid obstacles. We train a visual control policy for a robot with one end-effector using demonstrations from a simulated agent with two end-effectors. MAIL shows up to 24% improvement in a normalized performance metric over LfD and non-LfD baselines. It is deployed
to a real Franka Panda robot, handles multiple variations in properties for objects (size, rotation, translation), and cloth-specific properties (color, thickness, size, material).&lt;/p>
&lt;/p>
&lt;br>
&lt;p>Accepted to &lt;b>&lt;em>Conference on Robot Learning (CoRL)&lt;/em>&lt;/b>, 2023.&lt;/p>
&lt;p>Authors: &lt;a href="https://www.gautamsalhotra.com" target="_blank" rel="noopener">Gautam Salhotra&lt;/a> *, I-Chun Arthur Liu *, &lt;a href="https://robotics.usc.edu/~gaurav/" target="_blank" rel="noopener">Gaurav S. Sukhatme&lt;/a>.&lt;/p>
&lt;p>* indicates equal contribution.&lt;/p></description></item><item><title>Learning Deformable Object Manipulation from Expert Demonstrations</title><link>https://arthurliu.netlify.app/publication/dmfd/</link><pubDate>Mon, 20 Jun 2022 01:00:00 +0000</pubDate><guid>https://arthurliu.netlify.app/publication/dmfd/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="../../uploads/featured-dmfd.gif" alt="DMfD" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2>Abstract&lt;/h2>
We present a novel Learning from Demonstration (LfD) method, Deformable Manipulation from Demonstrations (DMfD), to solve deformable manipulation tasks using states or images as inputs, given expert demonstrations. Our method uses demonstrations in three different ways, and balances the trade-off between exploring the environment online and using guidance from experts to explore high dimensional spaces effectively. We test DMfD on a set of representative manipulation tasks for a 1-dimensional rope and a 2-dimensional cloth from the SoftGym suite of tasks, each with state and image observations. Our method exceeds baseline performance by up to 12.9% for state-based tasks and up to 33.44% on image-based tasks, with comparable or better robustness to randomness. Additionally, we create two challenging environments for folding a 2D cloth using image-based observations, and set a performance benchmark for them. We deploy DMfD on a real robot with a minimal loss in normalized performance during real-world execution compared to simulation (~6%).&lt;br>
&lt;br>
&lt;p>Accepted to &lt;b>&lt;em>IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)&lt;/em>&lt;/b> and &lt;b>&lt;em>IEEE Robotics and Automation Letters (RA-L)&lt;/em>&lt;/b>, 2022.&lt;/p>
&lt;p>Authors: &lt;a href="https://www.gautamsalhotra.com" target="_blank" rel="noopener">Gautam Salhotra&lt;/a> *, I-Chun Arthur Liu *, &lt;a href="https://doku88.github.io/website.github.io/" target="_blank" rel="noopener">Marcus Dominguez-Kuhne&lt;/a>, &lt;a href="https://robotics.usc.edu/~gaurav/" target="_blank" rel="noopener">Gaurav S. Sukhatme&lt;/a>.&lt;/p>
&lt;p>* indicates equal contribution.&lt;/p>
&lt;p>Also published in &lt;a href="../../post/may-2022-one-paper-is-accepted-to-icra-2022-2nd-workshop-on-representing-and-manipulating-deformable-objects">ICRA 2022 2nd Workshop on Representing and Manipulating Deformable Objects&lt;/a>.&lt;/p>
&lt;!-- [Paper](https://openreview.net/pdf?id=NZnz3cExrDW) / [Project](https://clvrai.com/mopa-pd) / [Code](https://github.com/clvrai/mopa-pd) --></description></item><item><title>Distilling Motion Planner Augmented Policies into Visual Control Policies for Robot Manipulation</title><link>https://arthurliu.netlify.app/publication/mopa-pd/</link><pubDate>Fri, 12 Nov 2021 01:00:00 +0000</pubDate><guid>https://arthurliu.netlify.app/publication/mopa-pd/</guid><description>&lt;p>
&lt;figure >
&lt;div class="d-flex justify-content-center">
&lt;div class="w-100" >&lt;img src="../../uploads/featured-mopa-pd.gif" alt="MoPA-PD" loading="lazy" data-zoomable />&lt;/div>
&lt;/div>&lt;/figure>
&lt;/p>
&lt;h2>Abstract&lt;/h2>
Learning complex manipulation tasks in realistic, obstructed
environments is a challenging problem due to hard exploration in the presence
of obstacles and high-dimensional visual observations. Prior work tackles the
exploration problem by integrating motion planning and reinforcement learning.
However, the motion planner augmented policy requires access to state
information, which is often not available in the real-world settings. To this
end, we propose to distill a state-based motion planner augmented policy to a
visual control policy via (1) visual behavioral cloning to remove the motion
planner dependency along with its jittery motion, and (2) vision-based
reinforcement learning with the guidance of the smoothed trajectories from the
behavioral cloning agent. We evaluate our method on three manipulation tasks
in obstructed environments and compare it against various reinforcement
learning and imitation learning baselines. The results demonstrate that our
framework is highly sample-efficient and outperforms the state-of-the-art
algorithms. Moreover, coupled with domain randomization, our policy is capable
of zero-shot transfer to unseen environment settings with distractors. Code
and videos are available at https://clvrai.com/mopa-pd.&lt;br>
&lt;br>
&lt;p>Accepted to &lt;b>&lt;em>Conference on Robot Learning (CoRL)&lt;/em>&lt;/b>, 2021.&lt;/p>
&lt;p>Authors: I-Chun Arthur Liu*, &lt;a href="https://shagunuppal.github.io" target="_blank" rel="noopener">Shagun Uppal&lt;/a>*, &lt;a href="https://robotics.usc.edu/~gaurav/" target="_blank" rel="noopener">Gaurav S. Sukhatme&lt;/a>, &lt;a href="https://viterbi-web.usc.edu/~limjj/" target="_blank" rel="noopener">Joseph J. Lim&lt;/a>, &lt;a href="http://www.peter-englert.net" target="_blank" rel="noopener">Peter Englert&lt;/a>, &lt;a href="https://youngwoon.github.io" target="_blank" rel="noopener">Youngwoon Lee&lt;/a>.&lt;/p>
&lt;p>* indicates equal contribution.&lt;/p>
&lt;!-- [Paper](https://openreview.net/pdf?id=NZnz3cExrDW) / [Project](https://clvrai.com/mopa-pd) / [Code](https://github.com/clvrai/mopa-pd) --></description></item></channel></rss>