🔗 原文链接:https://airflow.apache.org/docs/apache-airflow/stable/start/local.html

    This quick start guide will help you bootstrap a Airflow standalone instance on your local machine.

    :::info Note:
    On November 2020, new version of PIP (20.3) has been released with a new, 2020 resolver. This resolver might work with Apache Airflow as of 20.3.3, but it might lead to errors in installation. It might depend on your choice of extras. In order to install Airflow you might need to either downgrade pip to version 20.2.4 pip install --upgrade pip==20.2.4 or, in case you use Pip 20.3, you need to add option --use-deprecated legacy-resolver to your pip install command.

    While pip 20.3.3 solved most of the teething problems of 20.3, this note will remain here until we set pip 20.3as official version in our CI pipeline where we are testing the installation as well. Due to those constraints, only pip installation is currently officially supported.

    While they are some successes with using other tools like poetry or pip-tools, they do not share the same workflow as pip - especially when it comes to constraint vs. requirements management. Installing via Poetry or pip-tools is not currently supported.

    If you wish to install airflow using those tools you should use the constraint files and convert them to appropriate format and workflow that your tool requires. :::

    The installation of Airflow is painless(轻松的) if you are following the instructions below. Airflow uses constraint files to enable reproducible installation, so using pip and constraint files is recommended.

    1. # airflow needs a home, ~/airflow is the default,
    2. # but you can lay foundation somewhere else if you prefer
    3. # (optional)
    4. $ export AIRFLOW_HOME=~/airflow
    5. $ AIRFLOW_VERSION=2.1.2
    6. $ PYTHON_VERSION=3.6
    7. # For example: 3.6
    8. $ CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
    9. # For example: https://raw.githubusercontent.com/apache/airflow/constraints-2.0.2/constraints-3.6.txt
    10. $ pip install "apache-airflow==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"

    如果使用 MySQL 作为数据库,Airflow 依赖于 MySQLdb,需要安装 mysqlclient 库。

    1. $ conda install -c conda-forge mysqlclient

    下面进行 Airflow 初始化:

    1. # initialize the database
    2. $ airflow db init

    通常这一步会出现两个问题:
    ❌第一种:显示报错信息 Can’t connect to local MySQL server through socket ‘/tmp/mysql.sock’
    ✔️解决办法:在 airflow.cfg 中修改数据连接的配置信息

    1. sql_alchemy_conn = mysql://root:root@localhost:3306/airflow?charset=utf8&unix_socket=/var/run/mysqld/mysqld.sock

    ❌第二种:显示报错信息 Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
    ✔️解决办法:进入 MySQL 的 airflow 数据库,设置 global explicit_defaults_for_timestamp

    1. mysql> SHOW GLOBAL VARIABLES LIKE '%timestamp%';
    2. +---------------------------------+-------+
    3. | Variable_name | Value |s
    4. +---------------------------------+-------+
    5. | explicit_defaults_for_timestamp | OFF |
    6. | log_timestamps | UTC |
    7. +---------------------------------+-------+
    8. 2 rows in set (0.00 sec)
    9. mysql> SET GLOBAL explicit_defaults_for_timestamp=1;

    ❌ 第三种:显示报错信息 error:flask,sqlalchemy.exc.OperationalError: (MySQLdb._exceptions.OperationalError) (1071, ‘Specified key was too long; max key length is 767 bytes’)
    ✔️解决办法:进入 MySQL 的 airflow 数据库,设置数据库编码为 UTF8。

    1. mysql> ALTER DATABASE `airflow` CHARACTER SET utf8;

    启动 AirFlow:

    1. $ airflow users create --username admin --firstname Mingmin --lastname Yu --role Admin --email yu_mingm623@163.com
    2. # start the web server, default port is 8080
    3. $ airflow webserver --port 8080
    4. # start the scheduler
    5. # open a new terminal or else run webserver with ``-D`` option to run it as a daemon
    6. $ airflow scheduler
    7. # visit localhost:8080 in the browser and use the admin account you just
    8. # created to login. Enable the example_bash_operator dag in the home page

    Upon running these commands, Airflow will create the $AIRFLOW_HOME folder and create the “airflow.cfg“ file with defaults that will get you going fast. You can inspect the file either in $AIRFLOW_HOME/airflow.cfg, or through the UI in the Admin->Configuration menu. The PID file for the webserver will be stored in $AIRFLOW_HOME/airflow-webserver.pid or in /run/airflow/webserver.pid if started by systemd.

    Out of the box, Airflow uses a SQLite database, which you should outgrow fairly quickly since no parallelization is possible using this database backend. It works inconjunction(连接) with the [SequentialExecutor](https://airflow.apache.org/docs/apache-airflow/stable/_api/airflow/executors/sequential_executor/index.html#airflow.executors.sequential_executor.SequentialExecutor) which will only run task instances sequentially. While this is very limiting, it allows you to get up and running quickly and take a tour of the UI and the command line utilities.

    Here are a few commands that will trigger a few task instances. You should be able to see the status of the jobs change in the example_bash_operator DAG as you run the commands below.

    1. # run your first task instance
    2. $ airflow tasks run example_bash_operator runme_0 2015-01-01
    3. # run a backfill over 2 days
    4. $ airflow dags backfill example_bash_operator \
    5. --start-date 2015-01-01 \
    6. --end-date 2015-01-02