RMP explained
For RMP, if you want a concise version of “how” the framework works and a list of formulas as a quick reference, the paper already suffices. This blog is a more elaborate version that additionally explains the “why” part of RMP, namely, what the motivations are for designing this framework and the intuitions behind the designs.
The primary reference for this post is RMPflow: A computational graph for automatic motion policy generation. Credit also goes to Jason Jz Liu, who taught me everything I know about RMPs and a large amount of the control-theory intuition behind them.
Motivation
When people talk about robot motion, they often start with motion planning. In the classical sense, motion planning is a global problem: given a start and a goal, find a valid path that connects them while avoiding collisions. This viewpoint is extremely useful, but it is only part of the story. A robot moving in the real world does not just need a path. It also needs a way to continuously decide how to move at the current moment, based on its current state and whatever is happening nearby. This is where local motion generation enters. Instead of solving for one complete trajectory ahead of time, local motion generation focuses on the question: given the robot’s current state, what should it do next? A good local controller should be reactive. If the robot is nudged away from its expected path, if an obstacle moves unexpectedly, or if the environment changes slightly, the robot should not have to throw away everything and restart from scratch. It should be able to respond immediately based on the local situation.
That intuition is important because it shifts the emphasis from “find me one path” to “define a behavior.” In local motion generation, we are no longer just interested in a single solution trajectory. We want a rule that tells the robot how to move from many possible nearby states. In that sense, good local motion generation is really about designing motion behaviors that remain responsive as the state changes.
This picture becomes clearer when we look at some common older approaches and where they fall short.
- Many methods are fundamentally trajectory-centric rather than reactive. They compute one desired trajectory and then ask the robot to track it. This works well when reality matches the plan, but it can become brittle when the robot is perturbed. Imagine a robot arm moving along a carefully planned path and being pushed away slightly during execution. If the underlying method is too tied to a single trajectory, the robot does not naturally “know” what to do from the new state. It only knows the original path it was supposed to follow. A reactive local behavior should instead tell the robot how to continue moving sensibly from wherever it currently is.
- Many methods are too tightly bound to a single task space. Some controllers are written directly in joint space, where they are convenient for low-level execution but awkward for expressing geometric goals like reaching, orientation, or obstacle avoidance. Others are written in end-effector space, where reaching is natural but posture or joint constraints become harder to express. In practice, different objectives belong naturally in different spaces. Reaching is naturally described in end-effector position space. Orientation control may live in an orientation-related space. Joint limits are naturally expressed in joint coordinates. Obstacle avoidance may be defined relative to distances or body-point spaces. A good framework should not force every objective into one representation.
- Many motion-generation schemes rely on strict hierarchies. A classic example is stack-of-tasks or null-space control, where one task is declared highest priority and lower-priority tasks are projected into whatever freedom remains. This can be very useful, but it also introduces a rigid structure: one behavior wins first, another gets whatever is left over, and trade-offs are handled through hard precedence rather than smooth blending. In many real situations, though, behavior should interpolate continuously. A robot should not have to choose between “only follow the goal” and “only avoid the obstacle.” It should be able to balance both in a smooth, state-dependent way.
From these shortcomings, we can extract three things that a good local motion-generation framework should provide.
- reactivity. The framework should define behavior as a function of the current state, not as commitment to a single precomputed trajectory. In other words, the robot should always have a local answer to the question: what motion should I generate from here?
- flexibility across task spaces. Different objectives should be definable in the spaces where they make the most sense. A reaching behavior should be easy to define in end-effector space. A posture behavior should be easy to define in joint space. A collision behavior should be easy to define relative to obstacles. These behaviors should then be transformable into a common representation without losing their meaning.
- smooth interpolation between objectives. Real robot behavior is rarely all-or-nothing. Multiple local goals usually matter at once: move toward the target, avoid obstacles, maintain a reasonable posture, respect limits, perhaps also orient the gripper in a useful way. A good framework should allow these objectives to blend continuously rather than forcing them into a rigid priority ladder.
This leads naturally to the idea of behavioral design. In an ideal world, we would like to define many different motion-generation objectives, each as its own local behavior, and then combine them under one unified mathematical framework. Each behavior would specify what kind of motion it wants. Then, when multiple behaviors are active at once, the framework would provide a principled way to interpolate between them. This is exactly the perspective that gives rise to Riemannian Motion Policies.
Basic Notations
Let’s introduce some basic notations.
Let $q(t)\in Q\subset \mathbb{R}^d$ be the $d$-dimensional configuration (usually joint angles) of the robot at time $t$. $\dot{q}, \ddot{q}$ are the velocities and accelerations of the joints
Let $x_i(t)\in X_i\subseteq\mathbb{R}^{k_i}$ denote the $k_i$-dimensional task variable (usually end effector pose) in the $i$th task space $X_i$ at time $t$, with associated velocities and accelerations given by $\dot{x}_i$ and $\ddot{x}_t$.
The differentiable task map $\phi_i:\mathbb{R}^d\to\mathbb{R}^{k_i}$ relates the configuration space to the $i$th task space, so that $x_i=\phi_i(q)$.
For example, if $x_i$ is the end-effector pose, then $\phi_i$ is the forward-kinematics map of the robot.
The subscript $i$ is dropped for simplicity
Given a Jacobian of a task map $\phi$ as
\[J_\phi\equiv\frac{\partial\phi}{\partial q}\in\mathbb{R}^{k\times d}\]then the task space velocity and acceleration is given by
\[\begin{align} \dot{x}&=\frac{d}{dt}\phi(q)=J_\phi\dot{q}\\ \ddot{x}&=\frac{d^2}{dt^2}\phi(q)=J_\phi\ddot{q}+\dot{J}_\phi\dot{q}\approx J_\phi\ddot{q} \end{align}\]where the acceleration approximation dropped $\dot{J}_\phi\dot{q}$, because in practice this second-order correction term is unnecessary due to integration steps in control loops are small.
The subscript of the Jacobian is dropped when it is clear from the context.
Motion Policy
At the most basic level, an RMP starts with a motion policy, which mathematically describes an intended local behavior. A motion policy $f:q, \dot{q}\to\ddot{q}$ is a dynamical system mapping position and velocity to acceleration. However, since the configuration space is equivalent to a task space if $\phi$ is the identity map, we write $f: x, \dot{x}\to\ddot{x}$ without loss of generality. This makes what follows easier.
A motion policy encodes an infinite bundle of trajectories in its integral curves, which makes the robot reactive at any state of $(q, \dot{q})$. For any initial state, the policy generates a trajectory through forward integration. A trajectory is represented as
\[\begin{align} q(t+1)&=q(t)+\dot{q}(t)\Delta t\\ \dot{q}(t+1)&=\dot{q}(t)+f(q(t), \dot{q}(t))\Delta t \end{align}\]The underlying control system of the robot can accept $(q, \dot{q}, \ddot{q})$ to execute some action.
But that alone is not enough, because once several behaviors interact, we also need a way to say how important each one is, and in what directions. That is where the metric enters. Informally, the motion policy says what motion is preferred, and the metric says how strongly that preference should matter at the current state.
Riemannian Metric
A Riemannian metric, also called just metric, $A$ is a symmetric positive semidefinite matrix defined on the tangent space (vector space of all instantaneous velocities from a point on a curve) that measures the inner product between two tangent vectors $u$ and $v$ as $\langle u, v\rangle_A=u^TAv$.
Symmetric:
\[A^T=A\]Positive semidefinite: for every $z$
\[z^TAz\geq 0\]Thus $|u|^2_A=u^T Au$. Being symmetric positive definite gives desirable properties.
For a given Riemannian metric $A$, we have the following results
\[\begin{align} \dot{x}^TA\dot{x}&=(J\dot{q})^TA(J\dot{q})=\dot{q}^T(J^TAJ)\dot{q}\\ \ddot{x}^TA\ddot{x}&=(J\ddot{q})^TA(J\ddot{q})=\ddot{q}^T(J^TAJ)\ddot{q} \end{align}\]Note that by shorthand $\dot{q}=\dot{q}^T$ because it’s a vector Therefore,
\[\begin{align} \|\dot{x}\|^2_A&=\|\dot{q}\|^2_B\\ \|\ddot{x}\|^2_A&=\|\ddot{q}\|^2_B \end{align}\]where $B\equiv J^TAJ$ is the metric in the domain of the map that mimics the metric $A$ in the codomain. Note that this gives us a way to transform a metric defined in some task space to new vector space. Therefore, we can define metrics in any arbitrary task space and unify them by transforming into a single space of interest, such as the joint space.
When the state space consists of position and velocity, it is $2k$-dimensional. However, velocity depends on position, which makes the minimal dimension of the state space $k$-dimensional. Therefore, we restrict the formulation to a $k$-dimensional subspace of the tangent space identifiable with the acceleration vector $\ddot{x}$, and therefore we consider only Riemannian metrics $A(x, \dot{x})\in\mathbb{R}^{k\times k}$ defined on this subspace.
Riemannian Motion Policy
Together, the metric and motion policies form an RMP. An RMP, denoted by the tuple $^X(f, A)$ is a motion policy in a space $X$ augmented with a Riemannian metric. The dynamical system $f:x, \dot{x}\to\ddot{x}^d$, where $x\in X$ is described by a second-order differential equation mapping position and velocity to a desired acceleration $\ddot{x}^d=f(x, \dot{x})$. For this reason, $f$ is also known as the acceleration policy. The Riemannian metric $A(x, \dot{x})$ is a positive semidefinite matrix that varies smoothly with the state. On the high level, RMP specifies a behavior in some task space (eg. space of end effector position) and the relative importance of it compared to other behaviors.
As promised before, it is possible to convert many RMPs defined in different task spaces into a unified space and combine them into a single RMP. This is enabled by the following properties.
Properties
- Closed under addition (we can combine RMPs in the same space)
Let $R_1=^\pi(f_1, A_1)$ and $R_2=^\pi(f_2, A_2)$ be two RMPs in some space $\pi$.
\[R_1+R_2=((A_1+A_2)^+(A_1f_1+A_2f_2), A_1+A_2)_{\pi}\]where $^+$ is the pseudoinverse, which reduces to the inverse when the matrix is full rank. More generally, a collection of RMPs ${R_i}^n{i=1}$ can be combined into a single RMP $R_c=^\pi(f_c, A_c)$
\[R_c=\sum_{i}R_i=^\pi\left(\left(\sum_i A_i\right)^+\sum_i A_if_i, \sum_i A_i\right)\]that yields the optimal solution to the combined system.
Note that if each metric is of the form $A_i=w_i I$, $(f_c, A_c)=(\frac{1}{w}\sum_i w_if_i, (\frac{1}{w}\sum_i w_i A_i)$ where $w\equiv \sum_i w_i$
Additionally, RMPs are commutative and associative under addition,
\[\begin{align} R_1+R_2&=R_2+R_1\\ (R_1+R_2)+R_3&=R_1+(R_2+R_3) \end{align}\]- existence of pullback (We can convert RMPs from the space they are defined in to a new space as long as a mapping exists.)
In differential geometry, the pullback of a function $f$ by a mapping $\phi$ satisfies ${\mathbf{pull}}_\phi(q)=f(\phi(q))$. In other worlds, the pullback applied to the domain yields the same result as the function applied to the co-domain. The pullback also exists for an RMP
\[{\mathbf{pull}}_\phi(^X(f, A))=^Q((J^TAJ)^+J^T Af, J^TAJ)\]where the transformation of $A$ into $J^TAJ$ is the same as we saw in before. If $J$ is full rank, then the expression becomes
\[{\mathbf{pull}}_\phi(^X(f, A))=^Q(J^+f, J^TAJ)\]important We can view the pullback as a form of natural gradient. If $\nabla_{\ddot{x}}F(\ddot{x}; x, \dot{x})=f(x, \dot{x})$ is the gradient of a potential function $F(\ddot{x}; x, \dot{x})$, the parametric gradient of the composed function $\psi:q\to x$ is $\nabla_{\ddot{q}}\psi(J^T\ddot{x}; x, \dot{x})=J^TAf$. The pull can thus be viewed a natural vector field.
- existence of pushforward (We can convert RMPs from the space they are defined in to a new space as long as an inverse mapping exists.)
Similarly, we can transform an RMP from the domain of a task map to its co-domain. The pushforward operation applied to an RMP $(h, B)$ defined on the domain $Q$ yields an equivalent RMP defined on the co-domain $X$
\[{\mathbf{push}}_\phi(^Q(h, B))=^X(Jh, (J^+)^TBJ^+)\]where $B=J^TAJ$ as before. eg. if $h$ is defined on the configuration space $Q$, and if $\phi$ is a forward kinematics map to the end-effector space $x=\phi(q)$, then the pushforward of $h$ by $\phi$ describes the end-effector movement under $h$ along with its associated metric.
- linearity of pullback and pushforward
- associativity of pullback and pushforward
Let $z=\phi_1(q)$ and $x=\phi_2(z)$ so that $x=(\phi_2\circ\phi_1)(q)=\phi_2(\phi_1(q))$ is well defined, and suppose $R$ is an RMP on $X$ and $R’$ an RMP on $Q$. Then
\[\begin{align} {\mathbf{pull}}_{\phi_1}({\mathbf{pull}}_{\phi_2}(R))&={\mathbf{pull}}_{\phi_1\circ\phi_2}(R)\\ {\mathbf{push}}_{\phi_2}({\mathbf{push}}_{\phi_1}(R'))&={\mathbf{push}}_{\phi_2\circ\phi_1}(R') \end{align}\]- covariance of pullback and pushforward
The pullback operation is covariant to reparameterization, which means that it is unaffected by a change of coordinates. Specifically, given two coordinate spaces $Q$ and $U$, let $q=\zeta(u)$ be a bijective differentiable map with Jacobian $J_\zeta$, and let $R=^Q(f, A)$ be an RMP on $Q$. Then
\[{\mathbf{pull}}_\zeta(^Q(f, A))=^U(J_\zeta^T(J_\zeta J_\zeta^T)^+f, J_\zeta^T AJ_{\zeta})\]is equivalent to $R$ up to numerical precision. That is, for any $(u, \dot{u})$, the following holds
\[J_\zeta^Th(u, \dot{u})=f(q, \dot{q})\]where $q=\zeta(u), \dot{q}=J_\zeta(\dot{u})$ and $h:(u, \dot{u})\to\ddot{u}$ is given by the transformed differential equation. The important result is that integral curves created in $U$ and transformed to $Q$ will match the corresponding integral curve found directly in $Q$. Essentially, this gives a way to update the pullback of an RMP after the coordinate frame is reparameterized.
This makes it extremely convenient to handle joint limits. At a high level, we can pull the constrained C-space RMP back through an inverse sigmoid into an unconstrained space that always satisfies the joint limit.
Workflow
The workflow of using RMP is
- find a list of behaviors you want the robot to exhibit
- define their motion policies and metrics
- pull them back to joint space into a single unified RMP
- use this RMP to output joint position and velocity targets for a low-level controller, such as PID
- Tune the metric weights until you a suitable combined behavior
Now we will dive into the way to design a list of common behaviors.
List of RMPs
One of the nicest aspects of RMP is that once the framework is in place, designing a new behavior becomes conceptually simple: choose a task space, define a local acceleration policy in that space, and then choose a metric that encodes how strongly that policy should matter and in which directions.
Target Attractor
This is the most basic reaching behavior. Its job is to pull a task-space point, usually the end effector, toward a desired target.
task space
Let $x=\phi_e(q)\in\mathbb{R}^3$ be the 3D end-effector position, and let $x_g$ be a desired target.
motion policy
The target acceleration can be described by a damped attractor towards the goal.
\[f_g(x, \dot{x})=k_p\frac{x_g-x}{\|x_g-x\|+\epsilon}-k_d\dot{x}\]metric
The metric is a blend of two terms. Let $e=x_g-x, d^2=|e|^2$. The code defines a near-target isotropic metric
\[M_{near}=\mu_{near}I_3\]and a far-target directional metric
\[\frac{\mu_{far}}{(d^2)^2+\epsilon}ee^T\]The final metric is
\[M=(\beta b+(1-\beta))(\alpha M_{near}+(1-\alpha)M_{far})\]where
\[\begin{align*} \alpha&=(1-\alpha_{\min})\exp(\frac{-(d^2)^2}{2\sigma_a^2})+\alpha_{\min}\\ \beta&=\exp(-\frac{(d^2)^2}{2\sigma^2_b}) \end{align*}\]The idea is that near the target the metric penalizes error in all directions (shifting a bit to adjust for errors), but far from the target the metric prioritizes penalizing error towards the target.
pullback
The pullback metric is
\[\begin{align*} M_{pb}&=J^TMJ\\ (Mf)_{pb}&=J^TM\ddot{x} \end{align*}\]with task Jacobian $J=\frac{\partial x}{\partial q}\in \mathbb{R}^{3\times n}$
Target Axis Attractor
This is the orientation + position attractor. We use the same idea as the target attractor, but instead use three points, each along one of the xyz canonical axes. This is a point-wise tracking error.
task space
The position part is the same as before with $x=\phi(q)\in\mathbb{R}^3$.
The orientation part uses three axis-endpoint points
\[n_i=p+r_i\]with $i=1,2,3$, where $p$ is the link position and $r_i$ are the world frame unit axes of the link.
motion policy
The position part is unchanged
\[f_g(x, \dot{x})=k_p\frac{x_g-x}{\|x_g-x\|+\epsilon}-k_d\dot{x}\]The orientation part, for each axis point
\[\ddot{n}_i=k_p^{axos}(n_{g, i}-n_i)-k_d^{axis}\dot{n}_i\]metric
For the position metric, it’s the same as in the Target Attractor. The axis-space metric is purely isotropic
\[M_{axis}=(\beta_{axis}b_{axis}+(1-\beta_{axis}))\mu_{axis}I_3\]with
\[\beta_{axis}=\exp(-\frac{(d^2)^2}{2\sigma_{b, axis}^2})\]pullback
Let $J$ be the position Jacobian and $J_{axis, i}$ the Jacobian of the ith axis point.
\[\begin{align*} M_{pb}&=J^TMJ+\sum_{i=1}^3J^T_{axis, i}M_{axis}J_{axis, i}\\ (Mf)_{pb}&=J^TM\ddot{x}+\sum_{i=1}^3J^T_{axis, i}M_{axis}\ddot{n}_i \end{align*}\]Joint-space attractor
This is used to set a canonical joint configuration that the controller would prefer when there are degrees of redundancy. Same idea as null space control.
task space
\[x=q\in\mathbb{R}^m\]note that we can pick only a subset of joints
motion policy
Let
\[p=q_g-q\]We clip the joint position error
\[r(p)=\begin{cases}p, &\|p\|<\theta\\ \theta\frac{p}{\|p\|}, & \|p\|\geq \theta\end{cases}\]Then
\[\ddot{q}=k_p r(q_g-q)-k_d\dot{q}\]metric
We use the simple isotropic metric
\[M=\mu I_m\]pullback
We are just in joint subspace, so pull back is just setting this at the coordinates of selected joints and filling other parts of the matrixwith zeros.
\[\begin{align*} M_{pb}=(\mu+h)I_m\\ (Mf)_{pb}=\mu\ddot{q} \end{align*}\]Note that $h$ is an extra inertia-like stabilization term in the code
Joint limit avoidance
task space
For each selected joint with joint angle $q$ and limit $[q_{\min}, q_{\max}]$, define a normalized scalar distance to the limit. For max and min we have separate RMPs.
The lower limit is
\[x=\frac{q-q_{\min}}{q_{\max}-q_{\min}}, \dot{x}=\frac{\dot{q}}{q_{\max}-q_{\min}}, J=\frac{1}{q_{\max}-q_{\min}}\]The upper limit is
\[x=\frac{q_{\max}-q}{q_{\max}-q_{\min}}, \dot{x}=-\frac{\dot{q}}{q_{\max}-q_{\min}}, J=-\frac{1}{q_{\max}-q_{\min}}\]We also clip so that $x\geq 0$. Essentially, the closer to the joint limit, the closer $x$ is to 0.
motion policy
\[\ddot{x}=\frac{k_p}{x^2/l_p^2+\epsilon_p}-k_d\dot{x}\]where $l_p, \epsilon_p$ are coefficients. $l_p$ controls how large the repulsion becomes at joint limit. Essentially, the closer to joint limit, the closer to 0 $x$ becomes and the $k_p$ will dominate exponentially.
metric
\[M=(1-\sigma(\dot{x}/v_m))\frac{\mu}{x/l_m+\epsilon_m}\]where
\[\sigma(z)=\frac{1}{1+e^{-z}}\]$l_m$ controls how fast the metric grows as the joint approaches the limit. Essentially, how close to joint limit before this behavior becomes important.
pullback
Per joint limit is scalar, so
\[M_{pb}=JMJ, (Mf)_{pb}=JM\ddot{x}\]Joint-velocity limit avoidance
task space
for each monitored joint
\[x=|\dot{q}|, J=sign(\dot{q})\]motion policy
\[\ddot{x}=-k_d(x-(v_{\max}-v_r))\]where $v_r$ is a damping term and $v_{\max}$ is maximum velocity. We start applying reguarlization on joint velocity once it enters the damping region before $v_{\max}$
metric
\[M=\frac{\mu}{1-(x-(v_{\max}-v_r))^2/v_r^2}\]This is deactivated in the safe region, so $M=0$ for $x<v_{\max}-v_r$
pullback
This is again scalar
\[M_{pb}=JMJ, (Mf)_{pb}=JM\ddot{x}\]Collision Avoidance
task space
We create a series of collision spheres on the robot and create separate RMPs for each.
The task space is
\[x=d_{signed}, \dot{x}=\frac{x_t-x_{t-1}}{\Delta t}\]where $d_{signed}$ is the signed scalar distance to the nearest obstacle. Note that for self-collision avoidance, the obstacles for a sphere is simply the other spheres. For general collision avoidance, we can add in more obstacles.
motion policy
\[\ddot{x}=k_p e^{-x/l_p}-k_d(1-\sigma(\dot{x}/v_d))\frac{\dot{x}}{\max(x, 0)/l_d+\epsilon_d}\]$l_p$ and $l_d$ are coefficients. $\sigma$ is the sigmoid function.
metric
We define three factors
A velocity gate
\[g_v=1-\sigma(\dot{x}/v_d)\]A compact distance modulation
\[g_x(x)=\begin{cases}x^2/r^2-2x/r+1, & x\leq r\\ 0, & x>r\end{cases}\]A barrier-like inverse-distance factor
\[b(x)=\frac{\mu}{\max(x, 0)/l_m+\epsilon_m}\]The final scalar metric is
\[M=g_vg_x(x)b(x)\]pull back
For each sphere subtask
\[M_{pb}=J_r^TMJ_r, (Mf)_{pb}=J_r^TM\ddot{x}\]The total collision-avoidance pullback is summed over all spheres
\[M_{pb, total}=\sum_i J_{r, i}^T M_i J_{r, i}, (Mf)_{pb, total}=\sum_i J_{r, i}^T M_i \ddot{x}_i\]Final combination across behaviors
After obtaining every behavior’s pullback pair $(M_{pb, i}, (Mf)_{pb, i})$, the code sums them to produce the unified single RMP
\[\begin{align*} M_{sum}&=\sum_i M_{pb, i}\\ (Mf)_{sum}&=\sum_i (Mf)_{pb, i} \end{align*}\]Then we solve for
\[\ddot{q}=(M_{sum}+\lambda I)^{-1}(Mf)_{sum}\]where the small damping term $\lambda$ is added for numerical stability. This comes from the solution to the least-squares equation
\[\ddot{q}^*=\arg\min_{\ddot{q}}\sum_i\frac{1}{2}\|\ddot{x}_i^d-J_i\ddot{q}\|\]with the damping added.