项目|工具 - 批量拉取gitlab仓库 & 优化 & 踩坑 - 《become developer - 开发成长之旅》

0. 前言
1. 获取需要拉取的gitlab仓库地址
- 1.1 gitlab Access Token
- 1.2 gitlab 开放Api
2. clone with oauth2
3. 在 node 里运行 shell 脚本
4. 优化 - 大量项目的异步队列

0. 前言
 1. 获取需要拉取的gitlab仓库地址
 1.1 gitlab Access Token
1.2 gitlab 开放Api
2. clone with oauth2
3. 在 node 里运行 shell 脚本
 4. 优化 - 大量项目的异步队列

0. 前言

当拿到一个「批量拉取特定 gitlab 项目到指定地址」的需求时，你会觉得这是个复杂需求吗？
并不会，毕竟这个需求拆解下来，只需要两步，获取地址，和 clone。
不过，实际实施下来，还是踩了一些坑，做了一些额外的优化，记录在这里，希望能给看文章的你带来一点收获。

1. 获取需要拉取的gitlab仓库地址

1.1 gitlab Access Token

众所周知，自建的gitlab一般是有权限的，拉取自然也不例外。为了能够正常拉取项目，需要一个 Access Token。获取方式非常简单：右上角头像 -> Settings -> 左侧Access Tokens，按照页面的指示生成一个。

1.2 gitlab 开放Api

回想一下，当我们日常使用 git clone 命令时，需要什么？项目拉取的链接。去获取链接直观的思路就是通过gitlab的开放api。
直接放上链接：https://docs.gitlab.com/ee/api/projects.html 。
简单概括一下我们需要用到的点：

请求的域名为：https://gitlab.example.com/api/v4 (example 换成自己的域名)
获取所有有权限的项目列表的接口为 /projects ，注意会拉取到所有开放级别为public的项目。
获取特定群组下所有项目列表的接口为 /groups/${groupId}/projects，这次为了不拉到public的项目，选择这个接口。
这是一个有权限的接口，所以我们需要把 Access Token 放入请求头，对应key为 PRIVATE-TOKEN，value为「1.1」获取到的 token
这是一个分页接口，所以我们需要提供页码(page)和每页数据量(per_page)，因为我们想要减少请求次数，所以这里per_page选最大的100

综上所述，可以用postman等工具尝试发出请求：

curl --location --request GET 'https://gitlab.example.com/api/v4/projects?page=1&per_page=100'
--header 'PRIVATE-TOKEN: 生成的Access Token'

获取到的 name 和 http_url_to_repo 是我们需要的两个字段。

2. clone with oauth2

经过第一步，我们可以获取到一个形如下列的数组：

[
  { name: 'a', http_url_to_repo: '' },
  { name: 'b', http_url_to_repo: '' },
]

如果是人工 clone 的，就可以直接 git clone 了，出提示框填个人信息；但是一个自动化的过程肯定不能这样被阻断，这时候就需要使用特殊的网址来做啦↓

git clone https://oauth2:{gitlabToken}@{去掉http://或者https://的仓库链接}

// eg.
function clearHttpHead(url) {
  return url.replace(/^https:\/\//, '');
}
`git clone https://oauth2:${gitlabToken}@${clearHttpHead(item.http_url_to_repo)}`

3. 在 node 里运行 shell 脚本

语句拼好了，接下来的问题就到了如何在 node 里运行了。
概括来说，node 提供 child_process 来运行 shell脚本，那在理论上来说，当然可以使用

const child = require('child_process')
const { promisify } = require('util'); // 可以把各种用回调的转成promise
await promisify(child.exec)(`git clone https://oauth2:${gitlabToken}@${clearHttpHead(item.http_url_to_repo)}`)

的方式运行git clone 命令，但是 child.exec 千好万好，报错就退出这一点在批量跑的时候还是不大方便，所以建议使用第三方库 shelljs 来做这个操作，也即是：

const shell = require('shelljs');
await promisify(shell.exec)(`git clone https://oauth2:${gitlabToken}@${clearHttpHead(item.http_url_to_repo)}`);

可以理解为 shelljs 提供了一个真空的终端环境，在里面怎么报错，外面的代码拿到的都是普通的输出而非报错输出。
到了这一步，如果是少量项目的批量，就可以满足需求了。

4. 优化 - 大量项目的异步队列

由上文可以看到，我特意使用了 await，将异步的 shell.exec 转成了同步，这是因为如果有超过 10 （服务器上是5）个项目同时开始下载，node 就会报错「同时进行的进程超出最大限制（大意）」，把所有下载都同步进行是最简单但也效率最低的方法。
为了解决这个问题，让脚本能在限制下最高效的运行，我们需要产生出一个异步队列，效果为同时开始n个异步任务，任一异步任务结束后从剩余任务中取一个继续执行，直到所有任务结束。
实现思路也很简单，先写一个Promise.race，抽出生成 Promise 的方法，将元数据存储为剩余任务；当有promise任务结束之后，取剩余任务的第一个生成对应的promise；循环这个操作直到剩余任务的数量为0，改为Promise.all，等待所有任务结束。
话不多说，实现的代码如下：

/**
 * 声明示例
 */
class Race {
  constructor(data) {
    // rest 总数据队列
    // queueLength 同时进行n个任务的`n`
    // method 生成promise的方法
    const { rest, queueLength, method } = data;
    this.rest = rest || [];
    this.queueLength = queueLength || 2;
    this.queue = [];
    this.promiseMethods = method || (() => {
    });
    return new Promise((resolve) => {
      this.startRace(resolve);
    });
  }
  race(resolve) {
    if (this.rest.length > 0) {
      Promise.race(this.queue).then((res) => {
        this.queue.splice(res, 1, this.promiseMethods(this.rest[0], res));
        this.rest.shift();
        console.log(`剩余${this.rest.length}个项目`);
        this.race(resolve);
      }, (e) => {
        console.log('race error!');
        console.log(e);
      });
    } else {
      Promise.all(this.queue).then((res) => {
        resolve(res);
      }, (e) => {
        console.log('promise.all error!');
        console.log(e);
      });
    }
  }
  startRace(resolve) {
    for (let i = 0; i < this.queueLength; i++) {
      this.queue.push(this.promiseMethods(this.rest[i], this.queue.length));
    }
    this.rest.splice(0, this.queueLength);
    this.race(resolve);
  }
}
module.exports = Race;

/**
 * 使用示例
 */
/**
 * 获取 Promise 异步任务
 * @param {object} item
 * @param {number} index
 */
const getTask = async (item, index) => {
  if (item && item.http_url_to_repo) {
    console.log(chalk.green(`${item.name} start`));
    try {
      await promisify(shell.exec)(`git clone https://oauth2:${gitlabToken}@${clearHttpHead(item.http_url_to_repo)} ${path.join(cloneToPath, item.name)}`);
      successProject.push(item.name);
      console.log(chalk.green(`${item.name} over`));
    } catch (e) {
      console.log(chalk.red(`${item.name} fail`));
      console.error(e);
      failProjects.push(item.name);
    }
    return index;
  }
  return Promise.resolve(index);
};
new Race({
    rest: [...projects],
    queueLength: 5,
    method: getTask,
  }).then(() => {
    console.log();
    console.log(chalk.green('success finished!!'));
    console.log();
    console.log(chalk.green('success projects:'));
    console.log(successProject.join(',') || 'no Success project');
    console.log();
    console.log(chalk.red('failed projects:'));
    console.log(failProjects.join(',') || 'no Fail project');
  });

当然，因为是用的Promise.all，所以Promise一定要是resolve的，这也是上文使用shelljs的另一个原因。