Contact-Aware Retargeting of Skinned Motion

paper code（not released） project

原文链接：https://www.yuque.com/jinluzhang/researchblog/hbz8sm

Summary🔖

写完笔记之后最后填，概述文章的内容，以后查阅笔记的时候先看这一段。
注：写summary需要通过自己的思考，用自己的语言描述。切忌直接Ctrl + c原文。

Motivation👓

在skined motion retargeting任务中，现有的方法并没有对self-contact（人与自身接触）和ground contact（人与地面接触）进行系统深入的探究，但特定部位的self-contact（如手、脚、头部等）建模不好（渗透或分离）会影响retargeting之后3D模型的效果。

Method💡

作者解决问题的方法/算法是什么？基于哪些之前的baseline？
Input:

skeleton motion ,
the skinned motion ,
taget character represented by skeleton
skinned geometry

Output:

skeleton motion
skinned motion with preserving contacts

Baseline：SAN [1]
Core Module：Energy Function， RNN及对应的encoder-space optimization

Overview

该方法首先检测两种contact(hand contact, foot contact)，前者使用skinned motion数据，后者使用skeleton motion数据；
contact-aware motion retargeting包含两部分：energy function和RNN。把1中检测到的contacts送到本文提出的energy function中（类似loss function一样的作用），保持contact的效果；
使用Geometry-Conditioned RNN得到output motion 和
Input contact detection
文章把这部分放在了method的最后一部分，可能是觉得创新上来讲最不重要，但从方法的逻辑角度来说还是要先理解这部分。
self-contact
作者总体思路是首先检测source motion中手部hand与各部位的接触，然后将检测到的self-contact转化到output motion中，即对应到上。
具体来说，在预处理步骤中，作者将source skinned motion中的顶点（vertices）进行分组（按照顶点的权值>0.5），作者没说明分几组，按照文章所说（either of the character’s hands intersect any other body part），应该是分成左右手两组，其他身体部分再进行分组；
然后，作者参考[2]，根据距离（<0.2cm）和余弦相似度（>0.9）判断两个group是否contact；
最后对contact group挑选3个距离最近的顶点对，作为接触的顶点。

The two groups are determined to be in contact if the average cosine similarity of the per-vertex velocities in global coordinates is greater than 0.9, or if the distance between their nearest vertices is less than 0.2 cm where the shortest character in our dataset is 138 cm. For each detected contact, we identify the top 3 closest pairs of vertices between the two groups. The same process is repeated for all pairs of intersecting groups.

得到【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图12 中的contact vertex pairs之后，再对进行匹配对应的vertex对，才能得到output motion中的contact vertex pairs。作者对mesh model进行特征匹配查找A的各个contact pairs与B最相似的特征点（KDTree），在计算特征的时候作者考虑了蒙皮权重和顶点偏移量两个属性，并在补充材料做了可视化。

foot contact

判定foot与地面的接触相对较容易，主要判别脚趾（toe）和脚后跟（heel）即可，前者需要根据距离和与前一帧的位移即速度进行判断，后者只根据距离进行判断。

The toe joint is determined to be in contact if its height from the ground is at most 3cm and displacement from previous time-step is at most 1cm, all at 180 cm scale. The heel is determined to be in contact only if its displacement from the previous time-step is at most 1 cm at 180 cm scale.

Energy Function

Energy Function是本文的重点，其作用类似loss函数，目的是to preserve ground and self contact，同时减少渗透的产生。完整的Energy Function定义为：

【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图16 为基于几何的motion建模项（对应skinned motion），而是基于骨架的motion建模项（对应skeleton motion），下面分别进行说明。

几何项

几何项【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图18 分为3个组成部分：

首先是self-contact自接触函数项，它使用平均距离平方进行衡量是否进行接触，越小说明越准确接触：

第二个是渗透函数项，关于渗透函数定义作者使用了[3]中的方法，作者为每个碰撞的三角形（mesh的单位）建立一个圆锥形的区域，该区域中另一个三角形顶点的渗透用【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图21 表示，渗透为正，否则为0。个人理解，每个渗透函数×该顶点的法向量，即为最终的渗透项（沿法向量渗透的向量长度）。
细节建议看[3]这篇论文

此外作者还用表示权重系数，因为不同部位的碰撞渗透会导致可视化的效果很显著，因此要用一个比较小的权重约束靠的很近的三角形（这里没明白为什么要用小的权重而不是更大的权重）。
【ICCV2021】Contact-Aware Retargeting of Skinned Motion - 图24 的设定如下，主要关系到测地距离（见http://lemonc.me/average-geodesic-distance.html），是最小和最大测地距离的比值。

第三个是foot contact函数项，第一项是最小化接触点的运动速度（为什么不只最小化横向运动速度？），第二项是最小化y轴的距离。

骨架项

目的是通过保证局部关节旋转与全局的运动轨迹，以及肢体末端的motion，确保motion style的一致性

preserving the local motion represented as joint rotations, the global motion represented as the root trajectory and the global motion of the end-effectors (i.e., hands and feet).

以弱监督方式学习和A一致的local rotation和root velocity（没有看出来弱监督项？）：

经过各自高度缩放的、肢体末端（手、脚）的运动速度保持一致：

Geometry-Conditioned RNN and Encoder-Space Optimization

重新回顾一下整个任务的input和output
Input:

skeleton motion ,
the skinned motion ,
taget character represented by skeleton
skinned geometry

Output:

skeleton motion
skinned motion with preserving contacts

对于RNN网络来说：
input of RNN encoder：

skeleton motion —> encoder—> motion feature

input of RNN decoder比较多，共有6个：

motion feature of encoder
local motion (指上一帧的关节局部坐标)
taget character represented by skeleton
geometric encoding of skinned geometry using PointNet
root velocity
还有一个使上一层的decoder输出

output of RNN decoder:

global skeleton joint positions (我认为应该就是指的skeleton motion 中的一帧)
skinned motion with preserving contacts

用公式表示编解码过程：

Encoder-Space Optimization

只使用RNN进行motion retargeting可以，但并不能使整体的contact得到保留，并去除penetration。
作者先用RNN，得到的output motion输入到energy function进行优化，通过对编码器输出的隐式向量和根节点速度进行更新，来对motion进行更新。
带来的好处就是frame-by-frame，更加smooth，低维度和对良好的解耦嵌入。
更新过程如下：

Evaluation🧪

Dataset

Mixamo Dataset

Result

comparisons with SOTAs:

User Study:

Ablation Study：

Conclusion⭐️

Contribution

introduce a novel geometry-conditioned recurrent network with an encoderspace optimization strategy
propose the energy function to optimize the self-contact, ground contact and reduce penetration.

Rethink❓
亮点/可借鉴之处：
把self-contact、ground contact、self-penetration的优化问题通过energy function的设计和RNN网络训练结合起来，通过弱监督的方式更新编码器的隐式向量和根节点速度，就能得到整体优化的motion。在渗透和接触的优化中取得了良好的平衡。

局限性：

RNN的frame-by-frame是否存在效率限制和过渡平滑的情况？
Energy Function的设计对motion的解耦太复杂，设计了多达5个优化项

可拓展方向：

RNN或许可以考虑使用TCN或Transformer（轻量化）/GCN完成
Energy Function是不是进行改进，提出一种统一的范式对隐式向量进行弱监督训练？
Notes📝
(optional) 不符合此框架，但需要额外记录的笔记。
还有哪些疑问的地方？
Track📚
(optional) 列出相关性高的文献和写笔记时用到的博客、笔记等等，以便之后可以继续track下去（包括之前和之后的，即文章中引用的和被引用该文章的文章）

Ref:
[1] K. Aberman, P. Li, D. Lischinski, O. Sorkine-Hornung, D. Cohen-Or, and B. Chen, “Skeleton-aware networks for deep motion retargeting,” ACM Trans. Graph., vol. 39, no. 4, Jul. 2020, doi: 10.1145/3386569.3392462.
[2] M. Teschner, S. Kimmerle, B. Heidelberger, G. Zachmann, L. Raghupathi, A. Fuhrmann, M.-P. Cani, F. Faure, N. Magnenat-Thalmann, W. Strasser, and P. Volino. “Collision Detection for Deformable Objects. In Eurographics”, 2004.
3, 5
[3] Dimitrios Tzionas, Luca Ballan, Abhilash Srikantha, Pablo Aponte, Marc Pollefeys, and Juergen Gall. Capturing hands in action using discriminative salient points and physics simulation. IJCV, 118(2):172–193, June 2016. 2, 3, 4

【ICCV2021】Contact-Aware Retargeting of Skinned Motion