模块设计 - Transform - 《Antv G2：图形、交互语法》

接口设计
Connector
Inference
- 设计
- 种类
案例验证

Transform 是数据预处理，对 data 和 encoding 进行转换，可以是异步的，也可以是同步的。在数据预处理之后，就可以提取每个 channel 需要的属性，从而进行数据映射。

在 G2 5.0 里面的 transform 按照使用场景主要分为以下几类：

Connector：获得数据，比如通过 url 获得数据。这类 transform 主要是获得 data 的值。
Layout：布局算法，比如 treemap 等。这类 transform 主要生成新的 data，并且更新 encoding。
Inference：推断 encoding 和 statistic，比如条形图的补全 y 方向的 0 和推断 stack 等。
Statistic：统计变换，对数据进行过滤、聚合等操作，主要用于改变图形的位置通道。
接口设计
```typescript type Primitive = number | string | Date | boolean;

type TransformContext = { // 原始数据，可以不传 data?: any; // 编码信息 encode: Record; // 原始数据的索引数组 I: number[]; // 根据指定的 encode 从当前数据提取出一列数据 columnOf: (data: any, encode: Encode) => Primitive[]; };

type TransformProps = { // 用于区分 Transform 的种类 type: ‘connector’ | ‘inference’ | ‘layout’ | ‘statsitic’; // 是否是异步的，主要是为了判断给他们添加缓存的种类 async: boolean; // 是否需要缓存，默认都是 true memo: boolean; };

type TransformComponent = { (options?: O): Transform; props: TransformProps; };

type Transform = (context: TransformContext) => TransformContext | Promise;


在之前的设计中 Layout & Connector  被称为 Transform，Transform、Inference和 Statistic 这三部分是分开的，合并的理由或者说优点如下：
- 用户理解成本降低，不需要去区别 Transform 和 Statistic 的区别是什么，两者的本质都是数据预处理。
- Layout Transform 可以直接修改 encoding 的信息，去掉了生成新字段和指定为 encoding 操作，减少了用户的使用成本。
- Statistic 可以感知到原始数据 data，这对涉及到聚合的变换非常有用，因为这样的 reducer 可以拿到完整的数据，而不只是 encoding 里面声明的 column。
- 简化了渲染流程的理解和实现，之前为：Transform -> Inference -> Encoding -> Statisitc -> ...，现在为：Transform -> Encoding -> ...
- 更强的灵活性，同时之前能做到的事情，现在也可以做到，并且因为获得了更多的信息（参数），拥有更强的能力。
同时不会默认给 transform 函数添加缓存了，而是在 runtime 运行前给所有 library 中的 transform 中添加缓存即可。这样用户不需要关注缓存相关的功能。
```javascript
// Before
function transform() {}
export Sort = useMemo(transform);
// After
export Sort() {}
function memoTransform(library) {
  return mapObject(library, (value, key) => {
    if(key.startsWith('transform') && value.memeo) {
       return value.async ? useAsyncMemo(value): useMemo(value);
    }
  });
}

下面是接下来案例中的一些工具函数。

// 赋给 source 默认值
function applyDefaults(source, defaults) {
  const target = {...source};
  for (const [key, value] of Object.entries(defaults)) {
    target[key] = target[key] ?? value;
  }
  return target;
};
// 返回一个函数，该函数将合并它的参数和返回值
function merge(transform) {
  return (options) => {
    const newOptions = transform(options);
    return {...options, ...newOptions};
  }
}
// 返回一列数据
function column(value) {
  return {type: 'column', value};
}

Connector

Fetch ```typescript // Fetch.ts import { TransformComponent as TC } from ‘runtime’; import { useAsyncMemoTransform } from ‘utils’;

export type FetchOptions = { url?: string; };

const transform: TC = (options) => { const { url } = options; return merge(async () => { const response = await fetch(url); const data = await response.json(); return { data }; }); }

export const Fetch = useAsyncMemoTransform(transform);

Fetch.props = { type: ‘connector’, }

<a name="lM1NS"></a>
## Layout
- Treemap
- Force Graph
- Sankey
- Voronoi
- Tree
```javascript
const options = {
  type: 'polygon',
  transform: [
    { type: 'fetch', url: 'xxx' },
    { type: 'treemap' },
  ],
  encode: { color: 'name' },
};

import { TransformComponent as TC } from 'runtime';
import { useMemoTransform } from 'utils';
export type TreemapOptions = {};
export const Treemap: TC<TreemapOptions> = () => {
  return merge((context) => {
    const { data: treeData, encode } = context;
    // 生成新的数据数据索引
    const data = d3.treemap()(treeData);
    const I = range(data);
    // 生成新的通道
    const count = data.length;
    const X1 = new Array(count);
    const X2 = new Array(count);
    const X3 = new Array(count);
    const X4 = new Array(count);
    const Y1 = new Array(count);
    const Y2 = new Array(count);
    const Y3 = new Array(count);
    const Y4 = new Array(count);
    for (let i = 0; i < data.length; i++) {
      const { x, y, width, height } = data[i];
      X1[i] = x;
      X2[i] = x + width;
      X3[i] = x + width;
      X4[i] = x;
      Y1[i] = y;
      Y2[i] = y + height;
      Y3[i] = y + height;
      Y4[i] = y;
    }
    // 更新索引，数据和编码
    return {
      I,
      data: rects,
      encode: {
        ...encode,
        x1: column(X1),
        x2: column(X2),
        x3: column(X3),
        x4: column(X4),
        y1: column(Y1),
        y2: column(Y2),
        y3: column(Y3),
        y4: column(Y4),
      },
    };
  });
}
Treemap.props = {
  type: 'layout';
}

Inference

MaybeType：推断 encoding 的种类。
MaybeArray：将 array encoding 拆封成多个通道，这些通道公用一个比例尺。
MaybeZeroX1：推断 x1 = 0
MaybeZeroY1：推断 y1 = 0
MaybeZeroY2：推断 y2 = 0
MaybeSeries：将 color 字段推断为 series 字段
MaybeTooltip：推断 tooltip 的值，目前是使用 y1 和 position（这个地方需要优化）
MaybeTitle：推断 title 的值，目前使用的是 x 的值（这个也需要优化）
MaybeKey：生成默认的 key
MaybeStackY：推断 stackY ```typescript import { TransformComponent as TC } from ‘runtime’; import { useMemoTransform } from ‘utils’;

// Inference function zero() { return { type: ‘constant’, value: 0 }; }

export type MaybeZeroX1Options = {};

export const MaybeZeroX1: TC = () => { return merge(({ encode }) => ({ encode: applyDefualts(encode, { x1: zero() }), })); }

MaybeZeroX1.props = { type: ‘inference’, }

<a name="AxoRe"></a>
## Statistic
统计变换是最复杂一种变换，它涉及到数据的聚合，过滤等。同时它也是 G2 5.0 中最重要的一种变换，因为它直接决定了 G2 5.0 的统计分析能力。<br />![image.png](https://cdn.nlark.com/yuque/0/2022/png/418707/1652672626585-94b2f6bf-0f36-4d45-a083-bfd71cd4c296.png#clientId=ue9bcffb7-53d5-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=231&id=gsMy9&margin=%5Bobject%20Object%5D&name=image.png&originHeight=818&originWidth=1326&originalType=binary&ratio=1&rotation=0&showTitle=true&size=473311&status=done&style=stroke&taskId=u4c064cf3-37a0-48be-9d28-34794b07c82&title=%E8%81%9A%E5%90%88%E5%89%8D&width=374 "聚合前")![image.png](https://cdn.nlark.com/yuque/0/2022/png/418707/1652678295228-dae5f83e-f622-4161-8642-cf9982574c88.png#clientId=u0a139caf-0be1-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=231&id=sUVKo&margin=%5Bobject%20Object%5D&name=image.png&originHeight=822&originWidth=1254&originalType=binary&ratio=1&rotation=0&showTitle=true&size=226322&status=done&style=stroke&taskId=u10798c56-7c14-4743-ba95-35d8a990170&title=%E8%81%9A%E5%90%88%E5%90%8E&width=352 "聚合后")
<a name="h8Bdo"></a>
### 相关工作
一般的的图表库都是在 encoding 之前去预处理数据，生成新的字段，然后将生成的字段参与 encoding。但是 Vega Lite 和 Plot 的声明方式却与众不同，是很接近图形语法的声明方式。
**Vega Lite**
- 特点：Encoding 和 Statistic 其实是在一起声明的。
- 优点：简洁。
- 缺点：Encoding 的配置较为复杂。
```javascript
// Vega Lite
const vegaLite = {
  data: { value: athletes },
  mark: 'bar',
  encoding: {
    x: { field: 'sex', group: true },
    y: { aggregate: 'mean', field: 'weight' },
  },
};

通过编译之后等 Vega 配置可以看出 Vega Lite 实现统计变换的方法是通过 transform 生成新的字段，然后用新的字段参与 encoding 的声明。这样的实现的优缺点如下：

优点：实现简单，因为把 transform 和 encoding 的逻辑解耦开来，所以 transform 模块只需要关心数据本身的变换。

缺点：

字段冲突：生成的字段可能和已有的字段冲突（可能性很小）。

需要额外的机制把 statistic 映射为 transform 和 encoding。

const vega = {
data: {
value: athletes,
transform: [{
 type: 'aggregate',
 groupby: ['sex'],
 fields: ['weight'],
 ops: ['mean'],
 // 生成中间字段
 as: ['__y']
}],
},
marks: [{
type: 'bar',
encoding: {
 x: { field: 'sex' },
 // 使用中间字段，这个字段是不会被用户感知的
 y: { field: '__y' }
}
}],
};

Plot

特点：嵌套的 statistic，encoding 作为初始值，会经过一系列的 statistics 转换成最终的需要的值。
优点：encoding 和 statistic 分开声明，理解更容易一点。
缺点：多个 statistic 的时候会嵌套，可读性和易修改性会变弱。 ```javascript // Plot Plot.barY( athletes, Plot.groupX({ y: ‘mean’ }, { x: ‘sex’, y: ‘weight’, fill: ‘sex’ }), ).plot();

// 嵌套的 Statistic // normalizeY + groupX Plot.barY( athletes, Plot.normalizeY( Plot.groupX({ y: ‘mean’ }, { x: ‘sex’, y: ‘weight’, fill: ‘sex’ }), ), ).plot();

Plot 的实现 statistic 的思路和 Vega-Lite 是不一样的，它不存在一个中间的过渡字段，而是直接对 encoding 进行修改。这样实现的优缺点如下：
- 优点：不存在字段冲突。
- 缺点：实现起来比较复杂，这里的复杂度一方面主要来自于一下几个方面：
   - 同时对 data 和 encoding 进行转换。
   - Plot  API 设计让它在转换的时候不能获取到 data，需要通过 lazy channel 的方式延迟获得 data 的值。
   - 不同于常规的数据处理，这里是直接操作一列数据，而不是一行数据。
```javascript
// Stack 的例子
function stack(x, y = one, ky, {offset, order, reverse}, options) {
  const z = maybeZ(options);
  const [X, setX] = maybeColumn(x);
  const [Y1, setY1] = column(y);
  const [Y2, setY2] = column(y);
  offset = maybeOffset(offset);
  order = maybeOrder(order, offset, ky);
  return [
    basic(options, (data, facets) => {
      const X = x == null ? undefined : setX(valueof(data, x));
      const Y = valueof(data, y, Float64Array);
      const Z = valueof(data, z);
      const O = order && order(data, X, Y, Z);
      const n = data.length;
      const Y1 = setY1(new Float64Array(n));
      const Y2 = setY2(new Float64Array(n));
      const facetstacks = [];
      for (const facet of facets) {
        const stacks = X ? Array.from(group(facet, i => X[i]).values()) : [facet];
        if (O) applyOrder(stacks, O);
        for (const stack of stacks) {
          let yn = 0, yp = 0;
          if (reverse) stack.reverse();
          for (const i of stack) {
            const y = Y[i];
            if (y < 0) yn = Y2[i] = (Y1[i] = yn) + y;
            else if (y > 0) yp = Y2[i] = (Y1[i] = yp) + y;
            else Y2[i] = Y1[i] = yp; // NaN or zero
          }
        }
        facetstacks.push(stacks);
      }
      if (offset) offset(facetstacks, Y1, Y2, Z);
      return {data, facets};
    }),
    X,
    Y1,
    Y2
  ];
}

设计

特点：encode 和 statsitic 分开声明，同时由数组代替了嵌套的形式。

// G2 5.0
const g2 = {
data: athletes,
type: 'interval',
transform: [
  { type: 'groupX', y: 'sum' }
],
encode: { x: 'sex', y: 'weight', fill: 'sex' },
};

function StackY() {
return merge((context) => {
  const { data, I, columnOf, encode } = context;
  const { x, y } = encode;
  // 从原始数据中提取出下面两列数据
  const X = columnOf(data, x);
  const Y = columnOf(data, y);
  // 按照 X 通道分组
  const groups = Array.from(group(I, (i) => X[i]));
  // 堆叠每一组内的 mark 的 y 通道
  const newY = new Array(I.length);
  const newY1 = new Array(I.length);
  for (const G of groups) {
    for (let py = 0, i = 0; i < G.length; i += 1) {
      const index = I[i];
      newY1[index] = py;
      newY[index] = py + Y[index];
      py = newY[index];
    }
  }
  return {
    // 更新 encode
    encode: { 
      ...encode, 
      x: column(X),
      y: column(Y), 
      y1: column(Y1)
    },
  };
});
}

种类

dodgeX
stackY
groupX
binX
binY
bin
selectFirst
selectLast
selectMax
selectMin
summary
案例验证
矩阵树图
```javascript // https://g2.antv.vision/zh/examples/relation/relation#treemap

// 默认隐藏 axisX 和 axisY // 默认将 x 和 y scale 设为 identity const config = { type: ‘polygon’, data, transform: [{type:’treemap’}], encode: { color: ‘name’, label: ‘name’ } };

<a name="ZjgfU"></a>
### Voronoi 图
![image.png](https://cdn.nlark.com/yuque/0/2022/png/418707/1654502978747-d91240a3-ef96-492f-9cb5-e8daa2fb6334.png#clientId=uba5cc825-7196-4&crop=0&crop=0&crop=1&crop=1&from=paste&height=356&id=u828b06a7&margin=%5Bobject%20Object%5D&name=image.png&originHeight=712&originWidth=1284&originalType=binary&ratio=1&rotation=0&showTitle=false&size=727074&status=done&style=stroke&taskId=ub2b6f2e2-afb1-4ac6-9d07-915b78c20f1&title=&width=642)
```javascript
// https://g2.antv.vision/zh/examples/relation/relation#voronoi
const config = {
  type: 'polygon',
  data,
  transform: [{type: 'voronoi'}],
  encode: {
    color: 'value',
    label: 'value'
  },
};

桑基图

const config = {
  type: 'view',
  data,
  children: [
    {
      type: 'polygon',
      transform: [{type: 'sankey.nodes'}],
      encode: {
        color: 'name',
      },
    },
    {
      type: 'polygon',
      transform: [{type: 'sankey.links'}],
      encode: {
        shape: 'arc',
        color: 'name'
      }
    }
  ],
}

力导向图

const config = {
  type: 'view',
  data,
  children: [
    {
      type: 'point',
      transform: [{type: 'force.nodes'}],
      encode: {
        color: 'name',
      },
    },
    {
      type: 'edge',
      transform: [{type: 'force.links'}],
    }
  ],
}

Transform

接口设计

Connector

Inference

设计

种类

案例验证

矩阵树图

桑基图

力导向图