谷歌计算引擎设置

Google Compute Engine Setup

本文提供如何在谷歌计算引擎集群上全面自动的基于Hadoop 1 或者Hadoop 2设置Flink,谷歌的 bdutil 使启动和部署基于Hadoop的flink集群成为可能。请按照以下步骤操作。

This documentation provides instructions on how to setup Flink fully automatically with Hadoop 1 or Hadoop 2 on top of a Google Compute Engine cluster. This is made possible by Google’s bdutil which starts a cluster and deploys Flink with Hadoop. To get started, just follow the steps below.

先决条件

Prerequisites

安装谷歌云套件

Install Google Cloud SDK

请按照指示如何安装谷歌云套件。特别是,请务必使用以下命令通过谷歌云进行身份验证:

  1. gcloud auth login

Please follow the instructions on how to setup the Google Cloud SDK. In particular, make sure to authenticate with Google Cloud using the following command:

  1. gcloud auth login

安装bdutil

Install bdutil

目前还没有包含Flink扩展的bdutil发布。 但是,您可以从GitHub获得Flink支持的最新版本的bdutil:

  1. git clone https://github.com/GoogleCloudPlatform/bdutil.git

下载源代码后,切换到新创建的bdutil目录并继续执行后续步骤。

At the moment, there is no bdutil release yet which includes the Flink extension. However, you can get the latest version of bdutil with Flink support from GitHub:

  1. git clone https://github.com/GoogleCloudPlatform/bdutil.git

After you have downloaded the source, change into the newly created bdutil directory and continue with the next steps.

在Google Compute Engine上部署Flink

Deploying Flink on Google Compute Engine

设置一个桶

Set up a bucket

如果尚未执行此操作,请为bdutil配置和暂存文件创建存储桶。 可以使用gsutil创建一个新存储桶:

  1. gsutil mb gs://<bucket_name>

If you have not done so, create a bucket for the bdutil config and staging files. A new bucket can be created with gsutil:

  1. gsutil mb gs://<bucket_name>

调整bdutil配置

Adapt the bdutil config

要使用bdutil部署Flink,请至少调整bdutil_env.sh中的以下变量:

  1. CONFIGBUCKET="<bucket_name>"
  2. PROJECT="<compute_engine_project_name>"
  3. NUM_WORKERS=<number_of_workers>
  4. # set this to 'n1-standard-2' if you're using the free trial
  5. GCE_MACHINE_TYPE="<gce_machine_type>"
  6. # for example: "europe-west1-d"
  7. GCE_ZONE="<gce_zone>"

To deploy Flink with bdutil, adapt at least the following variables in bdutil_env.sh.

  1. CONFIGBUCKET="<bucket_name>"
  2. PROJECT="<compute_engine_project_name>"
  3. NUM_WORKERS=<number_of_workers>
  4. # set this to 'n1-standard-2' if you're using the free trial
  5. GCE_MACHINE_TYPE="<gce_machine_type>"
  6. # for example: "europe-west1-d"
  7. GCE_ZONE="<gce_zone>"

调整Flink配置

Adapt the Flink config

bdutil的Flink扩展为您处理配置。 您还可以在extensions / flink / flink_env.sh中调整配置变量。 如果您想进一步配置,请查看[配置Flink](../ config.html)。 使用bin / stop-clusterbin / start-cluster更改配置后,必须重新启动Flink。

bdutil’s Flink extension handles the configuration for you. You may additionally adjust configuration variables in extensions/flink/flink_env.sh. If you want to make further configuration, please take a look at configuring Flink. You will have to restart Flink after changing its configuration using bin/stop-cluster and bin/start-cluster.

使用Flink创建一个集群

Bring up a cluster with Flink

要在Google Compute Engine上调出Flink群集,请执行:

  1. ./bdutil -e extensions/flink/flink_env.sh deploy

To bring up the Flink cluster on Google Compute Engine, execute:

  1. ./bdutil -e extensions/flink/flink_env.sh deploy

运行一个Flink示例作业:

  1. ./bdutil shell
  2. cd /home/hadoop/flink-install/bin
  3. ./flink run ../examples/batch/WordCount.jar gs://dataflow-samples/shakespeare/othello.txt gs://<bucket_name>/output

Run a Flink example job:

  1. ./bdutil shell
  2. cd /home/hadoop/flink-install/bin
  3. ./flink run ../examples/batch/WordCount.jar gs://dataflow-samples/shakespeare/othello.txt gs://<bucket_name>/output

关闭你的集群

Shut down your cluster

关闭集群就像执行下面一样简单

  1. ./bdutil -e extensions/flink/flink_env.sh delete

Shutting down a cluster is as simple as executing

  1. ./bdutil -e extensions/flink/flink_env.sh delete