image.png
Contact-Aware Retargeting of Skinned Motion

paper code(not released) project

原文链接:https://www.yuque.com/jinluzhang/researchblog/hbz8sm

Summary🔖

写完笔记之后最后填,概述文章的内容,以后查阅笔记的时候先看这一段。
注:写summary需要通过自己的思考,用自己的语言描述。切忌直接Ctrl + c原文。

Motivation👓

在skined motion retargeting任务中,现有的方法并没有对self-contact(人与自身接触)和ground contact(人与地面接触)进行系统深入的探究,但特定部位的self-contact(如手、脚、头部等)建模不好(渗透或分离)会影响retargeting之后3D模型的效果。

Method💡

作者解决问题的方法/算法是什么?基于哪些之前的baseline?
Input:

  • skeleton motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图2 ,
  • the skinned motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图3,
  • taget character represented by skeleton 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图4
  • skinned geometry 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图5

Output:

  • skeleton motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图6
  • skinned motion with preserving contacts 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图7

Baseline:SAN [1]
Core Module:Energy Function, RNN及对应的encoder-space optimization

Overview

image.png

  1. 该方法首先检测两种contact(hand contact, foot contact),前者使用skinned motion数据,后者使用skeleton motion数据;
  2. contact-aware motion retargeting包含两部分:energy function和RNN。把1中检测到的contacts送到本文提出的energy function中(类似loss function一样的作用),保持contact的效果;
  3. 使用Geometry-Conditioned RNN得到output motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图9【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图10

    Input contact detection

    文章把这部分放在了method的最后一部分,可能是觉得创新上来讲最不重要,但从方法的逻辑角度来说还是要先理解这部分。

    self-contact

    作者总体思路是首先检测source motion中手部hand与各部位的接触,然后将检测到的self-contact转化到output motion中,即对应到【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图11上。
    具体来说,在预处理步骤中,作者将source skinned motion中的顶点(vertices)进行分组(按照顶点的权值>0.5),作者没说明分几组,按照文章所说(either of the character’s hands intersect any other body part),应该是分成左右手两组,其他身体部分再进行分组;
    然后,作者参考[2],根据距离(<0.2cm)和余弦相似度(>0.9)判断两个group是否contact;
    最后对contact group挑选3个距离最近的顶点对,作为接触的顶点。

    The two groups are determined to be in contact if the average cosine similarity of the per-vertex velocities in global coordinates is greater than 0.9, or if the distance between their nearest vertices is less than 0.2 cm where the shortest character in our dataset is 138 cm. For each detected contact, we identify the top 3 closest pairs of vertices between the two groups. The same process is repeated for all pairs of intersecting groups.

得到【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图12中的contact vertex pairs之后,再对【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图13进行匹配对应的vertex对,才能得到output motion中的contact vertex pairs。作者对mesh model进行特征匹配查找A的各个contact pairs与B最相似的特征点(KDTree),在计算特征的时候作者考虑了蒙皮权重和顶点偏移量两个属性,并在补充材料做了可视化。
image.png

foot contact

判定foot与地面的接触相对较容易,主要判别脚趾(toe)和脚后跟(heel)即可,前者需要根据距离和与前一帧的位移即速度进行判断,后者只根据距离进行判断。

The toe joint is determined to be in contact if its height from the ground is at most 3cm and displacement from previous time-step is at most 1cm, all at 180 cm scale. The heel is determined to be in contact only if its displacement from the previous time-step is at most 1 cm at 180 cm scale.

Energy Function

Energy Function是本文的重点,其作用类似loss函数,目的是to preserve ground and self contact,同时减少渗透的产生。完整的Energy Function定义为:
image.png
【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图16为基于几何的motion建模项(对应skinned motion),而【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图17是基于骨架的motion建模项(对应skeleton motion),下面分别进行说明。

几何项

几何项【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图18分为3个组成部分:
image.png
首先是self-contact自接触函数项,它使用平均距离平方进行衡量是否进行接触,越小说明越准确接触:
image.png
第二个是渗透函数项,关于渗透函数定义作者使用了[3]中的方法,作者为每个碰撞的三角形(mesh的单位)建立一个圆锥形的区域,该区域中另一个三角形顶点的渗透用【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图21表示,渗透为正,否则为0。个人理解,每个渗透函数×该顶点的法向量,即为最终的渗透项(沿法向量渗透的向量长度 )。
细节建议看[3]这篇论文
image.png
此外作者还用【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图23表示权重系数,因为不同部位的碰撞渗透会导致可视化的效果很显著,因此要用一个比较小的权重约束靠的很近的三角形(这里没明白为什么要用小的权重而不是更大的权重)。
【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图24的设定如下,主要关系到测地距离(见http://lemonc.me/average-geodesic-distance.html),【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图25是最小和最大测地距离的比值。
image.png
第三个是foot contact函数项,第一项是最小化接触点的运动速度(为什么不只最小化横向运动速度?),第二项是最小化y轴的距离。
image.png

骨架项

image.png
目的是通过保证局部关节旋转与全局的运动轨迹,以及肢体末端的motion,确保motion style的一致性

preserving the local motion represented as joint rotations, the global motion represented as the root trajectory and the global motion of the end-effectors (i.e., hands and feet).

以弱监督方式学习和A一致的local rotation和root velocity(没有看出来弱监督项?):
image.png
经过各自高度缩放的、肢体末端(手、脚)的运动速度保持一致:
image.png

Geometry-Conditioned RNN and Encoder-Space Optimization

image.png
重新回顾一下整个任务的input和output
Input:

  • skeleton motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图32 ,
  • the skinned motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图33,
  • taget character represented by skeleton 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图34
  • skinned geometry 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图35

Output:

  • skeleton motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图36
  • skinned motion with preserving contacts 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图37

对于RNN网络来说:
input of RNN encoder:

  • skeleton motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图38 —> encoder—> motion feature

input of RNN decoder比较多,共有6个:

  • motion feature of encoder
  • local motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图39(指上一帧的关节局部坐标)
  • taget character represented by skeleton 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图40
  • geometric encoding of skinned geometry 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图41 using PointNet
  • root velocity 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图42
  • 还有一个使上一层的decoder输出

output of RNN decoder:

  • global skeleton joint positions 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图43 (我认为应该就是指的skeleton motion 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图44中的一帧)
  • skinned motion with preserving contacts 【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图45

用公式表示编解码过程:
image.png

Encoder-Space Optimization

只使用RNN进行motion retargeting可以,但并不能使整体的contact得到保留,并去除penetration。
作者先用RNN,得到的output motion输入到energy function进行优化,通过对编码器输出的隐式向量根节点速度进行更新,来对motion进行更新。
带来的好处就是frame-by-frame,更加smooth,低维度和对良好的解耦嵌入。
更新过程如下:
image.png

Evaluation🧪

Dataset

Mixamo Dataset

Result

comparisons with SOTAs:
image.png
User Study:
image.png
Ablation Study:
image.png

Conclusion⭐️

Contribution

  • introduce a novel geometry-conditioned recurrent network with an encoderspace optimization strategy
  • propose the energy function to optimize the self-contact, ground contact and reduce penetration.

    Rethink❓

    亮点/可借鉴之处:

  • 把self-contact、ground contact、self-penetration的优化问题通过energy function的设计和RNN网络训练结合起来,通过弱监督的方式更新编码器的隐式向量和根节点速度,就能得到整体优化的motion。在渗透和接触的优化中取得了良好的平衡。

局限性:

  • RNN的frame-by-frame是否存在效率限制和过渡平滑的情况?
  • Energy Function的设计对motion的解耦太复杂,设计了多达5个优化项

可拓展方向:

  • RNN或许可以考虑使用TCN或Transformer(轻量化)/GCN完成
  • Energy Function是不是进行改进,提出一种统一的范式对隐式向量进行弱监督训练?

    Notes📝

    (optional) 不符合此框架,但需要额外记录的笔记。
    还有哪些疑问的地方?

    Track📚

    (optional) 列出相关性高的文献和写笔记时用到的博客、笔记等等,以便之后可以继续track下去(包括之前和之后的,即文章中引用的和被引用该文章的文章)

Ref:
[1] K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen, “Skeleton-aware networks for deep motion retargeting,” ACM Trans. Graph., vol. 39, no. 4, Jul. 2020, doi: 10.1145/3386569.3392462.
[2] M. Teschner, S. Kimmerle, B. Heidelberger, G. Zachmann, L. Raghupathi, A. Fuhrmann, M.-P. Cani, F. Faure, N. Magnenat-Thalmann, W. Strasser, and P. Volino. “Collision Detection for Deformable Objects. In Eurographics”, 2004.
3, 5
[3] Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. Capturing hands in action using discriminative salient points and physics simulation. IJCV, 118(2):172–193, June 2016. 2, 3, 4