VLA之外,具身+VA工作汇总

具身智能之心 2025-07-14 12:00

点击下方卡片,关注“具身智能之心”公众号




>>点击进入→具身智能之心技术交流群

更多干货,欢迎加入国内首个具身智能全栈学习社区具身智能之心知识星球(戳我)这里包含所有你想要的。

前面一直分享VLA相关工作,这里也为大家汇总下具身+VA相关的工作( Vision + Action),涉及机器人操作、DP、全身控制、One-Shot 、sim2real、端到端等;内容出自具身智能之心知识星球!

2025年工作

  • [2025] Steering Your Diffusion Policy with Latent Space Reinforcement Learning
  • [2025] [ByteDance Seed] Chain-of-Action: Trajectory Autoregressive Modeling for Robotic Manipulation
  • [2025] [RSS 25] Unified Video Action Model
  • [2025] Streaming Flow Policy: Simplifying diffusion/flow-matching policies by treating action trajectories as flow trajectories
  • [2025] Modality-Composable Diffusion Policy via Inference-Time Distribution-level Composition
  • [2025] Adapt3R: Adaptive 3D Scene Representation for Domain Transfer in Imitation Learning
  • [2025] BEHAVIOR Robot Suite: Streamlining Real-World Whole-Body Manipulation for Everyday Household Activities
  • [2025] [RSS 25] Reactive Diffusion Policy: Slow-Fast Visual-Tactile Policy Learning for Contact-Rich Manipulation
  • [2025] Robotic World Model: A Neural Network Simulator for Robust Policy Optimization in Robotics
  • [2025] You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations
  • [2025] ASAP: Aligning Simulation and Real-World Physics for Learning Agile Humanoid Whole-Body Skills
  • [2025] VILP: Imitation Learning with Latent Video Planning
  • [2025] Learning the RoPEs: Better 2D and 3D Position Encodings with STRING
  • [2025] When Pre-trained Visual Representations Fall Short: Limitations in Visuo-Motor Robot Learning
  • [2025] RoboGrasp: A Universal Grasping Policy for Robust Robotic Control
  • [2025] CordViP: Correspondence-based Visuomotor Policy for Dexterous Manipulation in Real-World
  • [2025] Learning to Group and Grasp Multiple Objects
  • [2025] Beyond Behavior Cloning: Robustness through Interactive Imitation and Contrastive Learning
  • [2025] COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping
  • [2025] DexTrack: Towards Generalizable Neural Tracking Control for Dexterous Manipulation from Human References
  • [2025] S2-Diffusion: Generalizing from Instance-level to Category-level Skills in Robot Manipulation
  • [2025] MTDP: Modulated Transformer Diffusion Policy Model
  • [2025] FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
  • [2025] RHINO: Learning Real-Time Humanoid-Human-Object Interaction from Human Demonstrations
  • [2025] Responsive Noise-Relaying Diffusion Policy: Responsive and Efficient Visuomotor Control
  • [2025] Learning a High-quality Robotic Wiping Policy Using Systematic Reward Analysis and Visual-Language Model Based Curriculum
  • [2025] IMLE Policy: Fast and Sample Efficient Visuomotor Policy Learning via Implicit Maximum Likelihood Estimation
  • [2025] X-IL: Exploring the Design Space of Imitation Learning Policies
  • [2025] Towards Fusing Point Cloud and Visual Representations for Imitation Learning
  • [2025] Pick-and-place Manipulation Across Grippers Without Retraining: A Learning-optimization Diffusion Policy Approach
  • [2025] FACTR: Force-Attending Curriculum Training for Contact-Rich Policy Learning
  • [2025] DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning
  • [2025] Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
  • [2025] AnyDexGrasp: General Dexterous Grasping for Different Hands with Human-level Learning Efficiency
  • [2025] COMPASS: Cross-embOdiment Mobility Policy via ResiduAl RL and Skill Synthesis
  • [2025] Retrieval Dexterity: Efficient Object Retrieval in Clutters with Dexterous Hand
  • [2025] From planning to policy: distilling Skill-RRT for long-horizon prehensile and non-prehensile manipulation
  • [2025] FetchBot: Object Fetching in Cluttered Shelves via Zero-Shot Sim2Real
  • [2025] Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation
  • [2025] FuseGrasp: Radar-Camera Fusion for Robotic Grasping of Transparent Objects
  • [2025] Sensor-Invariant Tactile Representation
  • [2025] Generalist World Model Pre-Training for Efficient Reinforcement Learning
  • [2025] ProDapt: Proprioceptive Adaptation using Long-term Memory Diffusion
  • [2025] Falcon: Fast Visuomotor Policies via Partial Denoising
  • [2025] HGDiffuser: Efficient Task-Oriented Grasp Generation via Human-Guided Grasp Diffusion Models
  • [2025] SHADOW: Leveraging Segmentation Masks for Cross-Embodiment Policy Transfer
  • [2025] Phantom: Training Robots Without Robots Using Only Human Videos
  • [2025] General Force Sensation for Tactile Robot
  • [2025] Action Tokenizer Matters in In-Context Imitation Learning
  • [2025] AVR: Active Vision-Driven Robotic Precision Manipulation with Viewpoint and Focal Length Optimization
  • [2025] FRMD: Fast Robot Motion Diffusion with Consistency-Distilled Movement Primitives for Smooth Action Generation
  • [2025] Variable-Friction In-Hand Manipulation for Arbitrary Objects via Diffusion-Based Imitation Learning
  • [2025] Learning Dexterous In-Hand Manipulation with Multifingered Hands via Visuomotor Diffusion
  • [2025] RGBSQGrasp: Inferring Local Superquadric Primitives from Single RGB Image for Graspability-Aware Bin Picking
  • [2025] ArticuBot: Learning Universal Articulated Object Manipulation Policy via Large Scale Simulation
  • [2025] SRSA: Skill Retrieval and Adaptation for Robotic Assembly Tasks
  • [2025] GAGrasp: Geometric Algebra Diffusion for Dexterous Grasping
  • [2025] OPG-Policy: Occluded Push-Grasp Policy Learning with Amodal Segmentation
  • [2025] RA-DP: Rapid Adaptive Diffusion Policy for Training-Free High-frequency Robotics Replanning
  • [2025] Robotic Compliant Object Prying Using Diffusion Policy Guided by Vision and Force Observations
  • [2025] CoinRobot: Generalized End-to-end Robotic Learning for Physical Intelligence
  • [2025] Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects
  • [2025] How to Train Your Robots? The Impact of Demonstration Modality on Imitation Learning
  • [2025] One-Shot Dual-Arm Imitation Learning
  • [2025] GAT-Grasp: Gesture-Driven Affordance Transfer for Task-Aware Robotic Grasping
  • [2025] Enhanced View Planning for Robotic Harvesting: Tackling Occlusions with Imitation Learning
  • [2025] ES-Parkour: Advanced Robot Parkour with Bio-inspired Event Camera and Spiking Neural Network
  • [2025] NIL: No-data Imitation Learning by Leveraging Pre-trained Video Diffusion Models
  • [2025] World Modeling Makes a Better Planner: Dual Preference Optimization for Embodied Task Planning
  • [2025] RILe: Reinforced Imitation Learning
  • [2025] HumanoidPano: Hybrid Spherical Panoramic-LiDAR Cross-Modal Perception for Humanoid Robots
  • [2025] Distillation-PPO: A Novel Two-Stage Reinforcement Learning Framework for Humanoid Robot Perceptive Locomotion
  • [2025] Trinity: A Modular Humanoid Robot AI System
  • [2025] LiPS: Large-Scale Humanoid Robot Reinforcement Learning with Parallel-Series Structures
  • [2025] Elastic Motion Policy: An Adaptive Dynamical System for Robust and Efficient One-Shot Imitation Learning
  • [2025] Learning Gentle Grasping Using Vision, Sound, and Touch
  • [2025] RoboCopilot: Human-in-the-loop Interactive Imitation Learning for Robot Manipulation
  • [2025] Rethinking Bimanual Robotic Manipulation: Learning with Decoupled Interaction Framework
  • [2025] MoE-Loco: Mixture of Experts for Multitask Locomotion
  • [2025] Humanoid Policy ~ Human Policy
  • [2025] Dense Policy: Bidirectional Autoregressive Learning of Actions
  • [2025] Learning to Play Piano in the Real World
  • [2025] CCDP: Composition of Conditional Diffusion Policies with Guided Sampling
  • [2025] DyWA: Dynamics-adaptive World Action Model for Generalizable Non-prehensile Manipulation [2025] AdaWorld: Learning Adaptable World Models with Latent Actions
  • [2025] Visuo-Tactile Object Pose Estimation for a Multi-Finger Robot Hand with Low-Resolution In-Hand Tactile Sensing
  • [2025] Empirical Analysis of Sim-and-Real Cotraining Of Diffusion Policies For Planar Pushing from Pixels
  • [2025] ManipTrans: Efficient Dexterous Bimanual Manipulation Transfer via Residual Learning
  • [2025] Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation
  • [2025] HACTS: a Human-As-Copilot Teleoperation System for Robot Learning
  • [2025] ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
  • [2025] Learning Coordinated Bimanual Manipulation Policies using State Diffusion and Inverse Dynamics Models
  • [2025] Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets
  • [2025] RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics
  • [2025] Slot-Level Robotic Placement via Visual Imitation from Single Human Video
  • [2025] Robust Dexterous Grasping of General Objects from Single-view Perception
  • [2025] Two by Two: Learning Multi-Task Pairwise Objects Assembly for Generalizable Robot Manipulation
  • [2025] ZeroGrasp: Zero-Shot Shape Reconstruction Enabled Robotic Grasping
  • [2025] Novel Demonstration Generation with Gaussian Splatting Enables Robust One-Shot Manipulation
  • [2025] Grasping Deformable Objects via Reinforcement Learning with Cross-Modal Attention to Visuo-Tactile Inputs
  • [2025] Few-Shot Vision-Language Action-Incremental Policy Learning
  • [2025] Latent Diffusion Planning for Imitation Learning
  • [2025] Physically Consistent Humanoid Loco-Manipulation using Latent Diffusion Models
  • [2025] PRISM-DP: Spatial Pose-based Observations for Diffusion-Policies via Segmentation, Mesh Generation, and Pose Tracking
  • [2025] Rethinking Latent Representations in Behavior Cloning: An Information Bottleneck Approach for Robot Manipulation
  • [2025] Predictive Inverse Dynamics Models are Scalable Learners for Robotic Manipulation
  • [2025] Fast Flow-based Visuomotor Policies via Conditional Optimal Transport Couplings
  • [2025] KineDex: Learning Tactile-Informed Visuomotor Policies via Kinesthetic Teaching for Dexterous Manipulation
  • [2025] CLAM: Continuous Latent Action Models for Robot Learning from Unlabeled Demonstrations
  • [2025] H3DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
  • [2025] UniSkill: Imitating Human Videos via Cross-Embodiment Skill Representations
  • [2025] Learning Long-Context Diffusion Policies via Past-Token Prediction
  • [2025] DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
  • [2025] [ICLR 25] Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning
  • [2025] IN-RIL: Interleaved Reinforcement and Imitation Learning for Policy Fine-Tuning
  • [2025] NVSPolicy: Adaptive Novel-View Synthesis for Generalizable Language-Conditioned Policy Learning
  • [2025] EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
  • [2025] FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation
  • [2025] Conditioning Matters: Training Diffusion Policies is Faster Than You Think
  • [2025] H2R: A Human-to-Robot Data Augmentation for Robot Pre-training from Videos
  • [2025] GLOVER++: Unleashing the Potential of Affordance Learning from Human Behaviors for Robotic Manipulation
  • [2025] Zero-Shot Visual Generalization in Robot Manipulation
  • [2025] Object-Centric Representations Improve Policy Generalization in Robot Manipulation
  • [2025] LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
  • [2025] GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
  • [2025] A Practical Guide for Incorporating Symmetry in Diffusion Policy
  • [2025] Adaptive Visuo-Tactile Fusion with Predictive Force Attention for Dexterous Manipulation
  • [2025] EquAct: An SE(3)-Equivariant Multi-Task Transformer for Open-Loop Robotic Manipulation
  • [2025] Spatial RoboGrasp: Generalized Robotic Grasping Control Policy
  • [2025] Learning Generalizable Robot Policy with Human Demonstration Video as a Prompt
  • [2025] [AAAI 25] FlowPolicy: Enabling Fast and Robust 3D Flow-Based Policy via Consistency Flow Matching for Robot Manipulation
  • [2025] Object-centric 3D Motion Field for Robot Learning from Human Videos
  • [2025] Evaluating Robot Policies in a World Model
  • [2025] 3DFlowAction: Learning Cross-Embodiment Manipulation from 3D Flow World Model
  • [2025] SpikePingpong: High-Frequency Spike Vision-based Robot Learning for Precise Striking in Table Tennis Game
  • [2025] SAIL: Faster-than-Demonstration Execution of Imitation Learning Policies
  • [2025] Gondola: Grounded Vision Language Planning for Generalizable Robotic Manipulation
  • [2025] Touch begins where vision ends: Generalizable policies for contact-rich manipulation
  • [2025] AMPLIFY: Actionless Motion Priors for Robot Learning from Videos
  • [2025] GAF: Gaussian Action Field as a Dynamic World Model for Robotic Manipulation
  • [2025] Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation
  • [2025] Latent Action Diffusion for Cross-Embodiment Manipulation
  • [2025] Vision in Action: Learning Active Perception from Human Demonstrations
  • [2025] [IROS 25] Robust Instant Policy: Leveraging Student’s t-Regression Model for Robust In-context Imitation Learning of Robot Manipulation
  • [2025] [RSS 25] Dex1B: Learning with 1B Demonstrations for Dexterous Manipulation
  • [2025] DemoDiffusion: One-Shot Human Imitation using pre-trained Diffusion Policy
  • [2025] World4Omni: A Zero-Shot Framework from Image Generation World Model to Robotic Manipulation
  • [2025] ViTacFormer: Learning Cross-Modal Representation for Visuo-Tactile Dexterous Manipulation
  • [2025] [ICCV 25] Spatial-Temporal Aware Visuomotor Diffusion Policy Learning

2024年工作

  • [2024] Learning Robotic Manipulation Policies from Point Clouds with Conditional Flow Matching
  • [2024] Point Cloud Matters: Rethinking the Impact of Different Observation Spaces on Robot Learning
  • [2024] [RSS 25] 3D Diffusion Policy: Generalizable Visuomotor Policy Learning via Simple 3D Representations
  • [2024] Sparse diffusion policy: A sparse, reusable, and flexible policy for robot learning
  • [2024] ManiCM: Real-time 3D Diffusion Policy via Consistency Model for Robotic Manipulation
  • [2024] 3d diffuser actor: Policy diffusion with 3d scene representations
  • [2024] [ICLR 25] Diffusion Policy Policy Optimization
  • [2024] Language-Guided Object-Centric Diffusion Policy for Collision-Aware Robotic Manipulation
  • [2024] EquiBot: SIM(3)-Equivariant Diffusion Policy for Generalizable and Data Efficient Learning
  • [2024] Equivariant Diffusion Policy
  • [2024] [IROS 25] Mamba Policy: Towards Efficient 3D Diffusion Policy with Hybrid Selective State Models
  • [2024] Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies
  • [2024] Motion Before Action: Diffusing Object Motion as Manipulation Condition
  • [2024] One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation
  • [2024] Consistency policy: Accelerated visuomotor policies via consistency distillation
  • [2024] SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
  • [2024] Few-Shot Task Learning through Inverse Generative Modeling
  • [2024] G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation
  • [2024] Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
  • [2024] Diffusion Policy Attacker: Crafting Adversarial Attacks for Diffusion-based Policies
  • [2024] Imagination Policy: Using Generative Point Cloud Models for Learning Manipulation Policies
  • [2024] Equivariant diffusion policy
  • [2024] Scaling diffusion policy in transformer to 1 billion parameters for robotic manipulation
  • [2024] Data Scaling Laws in Imitation Learning for Robotic Manipulation
  • [2024] Hierarchical Diffusion Policy for Kinematics-Aware Multi-Task Robotic Manipulation
  • [2024] Equivariant diffusion policy
  • [2024] Learning universal policies via text-guided video generation
  • [2024] Crossway Diffusion: Improving Diffusion-based Visuomotor Policy via Self-supervised Learning
  • [2024] 3D Diffuser Actor: Policy Diffusion with 3D Scene Representations
  • [2024] Act3D: 3D Feature Field Transformers for Multi-Task Robotic Manipulation
  • [2024] GenDP: 3D Semantic Fields for Category-Level Generalizable Diffusion Policy
  • [2024] Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
  • [2024] Prediction with Action: Visual Policy Learning via Joint Denoising Process
  • [2024] Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations
  • [2024] Bidirectional Decoding: Improving Action Chunking via Closed-Loop Resampling
  • [2024] Streaming Diffusion Policy: Fast Policy Synthesis with Variable Noise Diffusion Models
  • [2024] CARP: Visuomotor Policy Learning via Coarse-to-Fine Autoregressive Prediction
  • [2024] In-Context Imitation Learning via Next-Token Prediction
  • [2024] Learning Diffusion Policies from Demonstrations For Compliant Contact-rich Manipulation

2023年工作

  • [2023] Diffusion policy: Visuomotor policy learning via action diffusion
  • [2023] Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods



声明:内容取材于网络,仅代表作者观点,如有内容违规问题,请联系处理。 
Copyright © 2025 成都科技区角科技有限公司
蜀ICP备2025143415号-1
  
川公网安备51015602001305号