Pipeline (played at 2x speed)

Abstract

Robotic chemists promise to both liberate human experts from repetitive tasks and accelerate scientific discovery, yet remain in their infancy. Chemical experiments involve long-horizon procedures over hazardous and deformable substances, where success requires not only task completion but also strict compliance with experimental norms. To address these challenges, we propose RoboChemist, a dual-loop framework that integrates Vision-Language Models (VLMs) with Vision-Language-Action (VLA) models. Unlike prior VLM-based systems (e.g., VoxPoser, ReKep) that rely on depth perception and struggle with transparent labware, and existing VLA systems (e.g., RDT, π₀) that lack semantic-level feedback for complex tasks, our method leverages a VLM to serve as (1) a planner to decompose tasks into primitive actions, (2) a visual prompt generator to guide VLA models, and (3) a monitor to assess task success and regulatory compliance. Notably, we introduce a VLA interface that accepts image-based visual targets from the VLM, enabling precise, goal-conditioned control. Our system successfully executes both primitive actions and complete multi-step chemistry protocols. Results show 23.57% higher success rate and a 0.298 increase in compliance rate over state-of-the-art VLA baselines, while also demonstrating strong generalization to objects and tasks.

(a) Overview of RoboChemist. The VLM in our system acts as the planner, decomposing high-level tasks into subtasks. Based on each subtask, the VLM generates prompted images through visual prompting and provides them, along with other relevant information, to the VLA models. The VLM also functions as the monitor, assessing the completion status of subtasks, thus ensuring a complete feedback loop in the system.

(b) RoboChemist outperforms baselines in both primitive tasks and complete chemical experiment tasks.

(c) Some tasks performed by RoboChemist.

Primitive Tasks

In this section, we present the videos of seven primitive tasks that are used to build the complete tasks and their correspondingvisual prompt examples. The videos are played at 1x speed.

"Grasp the Glass Rod"

Prompted Image

Video

"Heat Platium Wire"

Prompted Image

Video

"Insert Into Solution"

Prompted Image

Video

"Pour Liquid"

Prompted Image

Video

"Stir Liquid"

Prompted Image

Video

"Transfer Solid"

Prompted Image

Video

"Press Button"

Prompted Image

Video

Complete Tasks

In this section, we present videos of several complete tasks that our RoboChemist can perform. The videos are played at 1x speed.

Mixing NaCl and CuSO\(_4\)Solutions

Thermal Decomposition of Cu(OH)\(_2\)

Flame Test of CuSO\(_4\) Solution

Evaporation of NaCl Solution

Generalization

In this section, we present videos of several generalization tasks that our RoboChemist can perform. The videos are played at 1x speed.

Primitive Task Generalization

Place Glass Rod

Grasp Test Tube

Stir Solid Reagents

Heat Test Tube

Insert a Thermometer

Place Test Tube into Cooling Liquid

Complete Task Generalization

Combination Reaction: CaO+H\(_2\)O

Decomposition Reaction: H\(_2\)O\(_2\)

Displacement Reaction: Fe+CuSO\(_4\)

Displacement Reaction: Zn+HCl

Double Displacement Reaction: NaOH+CuSO\(_4\)

Double Displacement Reaction: NaHCO\(_3\)+HCl

Citation

We kindly request that you cite our work if you utilize the code or reference our findings in your research.

  @inproceedings{zhang2025robochemist,
    title={RoboChemist: Vision-Language-Action Models for Robotic Chemistry},
    author={Zhang, Zongzheng and Yue, Chenghao and Xu, Haobo and Liao, Minwen and Qi, Xianglin and Gao, Huan-ang and Wang, Ziwei and Zhao, Hao},
    booktitle={Conference on Robot Learning (CoRL)},
    year={2025}
  }

Elucidating the Design Space of Torque-aware Vision-Language-Action Models

CoRL 2025

Pipeline (played at 2x speed)

Abstract

Primitive Tasks

"Grasp the Glass Rod"

Prompted Image

Video

"Heat Platium Wire"

Prompted Image

Video

"Insert Into Solution"

Prompted Image

Video

"Pour Liquid"

Prompted Image

Video

"Stir Liquid"

Prompted Image

Video

"Transfer Solid"

Prompted Image

Video

"Press Button"

Prompted Image

Video

Complete Tasks

Mixing NaCl and CuSO\(_4\)Solutions

Thermal Decomposition of Cu(OH)\(_2\)

Flame Test of CuSO\(_4\) Solution

Evaporation of NaCl Solution

Generalization

Primitive Task Generalization

Place Glass Rod

Grasp Test Tube

Stir Solid Reagents

Heat Test Tube

Insert a Thermometer

Place Test Tube into Cooling Liquid

Complete Task Generalization

Combination Reaction: CaO+H\(_2\)O

Decomposition Reaction: H\(_2\)O\(_2\)

Displacement Reaction: Fe+CuSO\(_4\)

Displacement Reaction: Zn+HCl

Double Displacement Reaction: NaOH+CuSO\(_4\)

Double Displacement Reaction: NaHCO\(_3\)+HCl

Citation