美文网首页
dolphinscheduler的安装和使用

dolphinscheduler的安装和使用

作者: 呦丶耍脾气 | 来源:发表于2025-05-11 18:51 被阅读0次

什么是DolphinScheduler

Apache DolphinScheduler是一个分布式、去中心化、易扩展的可视化DAG工作流任务调度系统,旨在解决数据处理流程中复杂的依赖关系问题,使调度系统在数据处理流程中开箱即用12。

主要功能特性

  1. 可视化DAG(有向无环图):DolphinScheduler通过DAG图的方式将任务按照依赖关系关联起来,可以实时可视化监控任务的运行状态2。
  2. 丰富的任务类型:支持Shell、MR、Spark、SQL(mysql、postgresql、hive、sparksql)、Python、Sub_Process、Procedure等多种任务类型12。
  3. 高可靠性和高扩展性:通过ZooKeeper实现Master集群和Worker集群的去中心化设计,支持集群HA(高可用性)23。
  4. 实时监控和故障处理:支持工作流定时调度、依赖调度、手动调度、手动暂停/停止/恢复,同时支持失败重试、从指定节点恢复失败、Kill任务等操作2。

组成

DolphinScheduler的架构主要包括以下几个部分:

  • MasterServer:负责DAG任务切分、任务提交和监控,监听其他MasterServer和WorkerServer的健康状态。
  • WorkerServer:负责任务的执行和提供日志服务。
  • ZooKeeper:用于集群管理和容错。
  • AlertServer:提供告警相关服务。
  • API接口层:处理前端UI层的请求。
  • UI:提供系统的各种可视化操作界面。

安装

版本DolphinScheduler 3.2.2中文文档地址
DolphinScheduler支持单机部署(Standalone),伪集群部署(Pseudo-Cluster),集群部署(Cluster),Kubernetes部署(Kubernetes)
本文档主要是使用docker-compose安装:
单机部署比较简单,直接装好java环境,如果需要其他驱动直接将jar包放到对应应用的libs下:

对应文件

主要的文件,其他挂在的目录如果报权限问题加一下权限即可

/mnt/data/www/dolphinscheduler/
├── docker-compose.yml
├── Dockerfile/
│   └── ds-worker/
│       └── Dockerfile  # 您的自定义文件
├── drivers/ #驱动文件目录,会copy到对应镜像的libs目录
│   ─── ...
│   └── mysql-connector-java.jar #java的mysql驱动,dolphinscheduler使用mysql数据库需要使用
└── .env
  • docker-compose.yml
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

version: "3.8"

services:
  dolphinscheduler-postgresql:
    image: bitnami/postgresql:15.2.0
    ports:
      - "5432:5432"
    profiles: ["all", "schema"]
    environment:
      POSTGRESQL_USERNAME: root
      POSTGRESQL_PASSWORD: root
      POSTGRESQL_DATABASE: dolphinscheduler
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-postgresql:/bitnami/postgresql
    healthcheck:
      test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/5432"]
      interval: 5s
      timeout: 60s
      retries: 120
    networks:
      - dolphinscheduler

  dolphinscheduler-zookeeper:
    image: bitnami/zookeeper:3.7.1
    profiles: ["all"]
    environment:
      ALLOW_ANONYMOUS_LOGIN: "yes"
      ZOO_4LW_COMMANDS_WHITELIST: srvr,ruok,wchs,cons
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-zookeeper:/bitnami/zookeeper
    healthcheck:
      test: ["CMD", "bash", "-c", "cat < /dev/null > /dev/tcp/127.0.0.1/2181"]
      interval: 5s
      timeout: 60s
      retries: 120
    networks:
      - dolphinscheduler

  dolphinscheduler-schema-initializer:
    image: ${HUB}/dolphinscheduler-tools:${TAG}
    env_file: .env
    profiles: ["schema"]
    command: [ tools/bin/upgrade-schema.sh ]
    depends_on:
      dolphinscheduler-postgresql:
        condition: service_healthy
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-logs:/opt/dolphinscheduler/logs
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-shared-local:/opt/soft
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-resource-local:/dolphinscheduler
    networks:
      - dolphinscheduler

  dolphinscheduler-api:
    image: ${HUB}/dolphinscheduler-api:${TAG}
    ports:
      - "12345:12345"
      - "25333:25333"
    profiles: ["all"]
    env_file: .env
    healthcheck:
      test: [ "CMD", "curl", "http://localhost:12345/dolphinscheduler/actuator/health" ]
      interval: 30s
      timeout: 5s
      retries: 3
    depends_on:
      dolphinscheduler-zookeeper:
        condition: service_healthy
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-logs:/opt/dolphinscheduler/logs
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-shared-local:/opt/soft
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-resource-local:/dolphinscheduler
      # 挂载驱动目录
      - /mnt/data/www/dolphinscheduler/drivers:/opt/dolphinscheduler/lib
    command: >
      sh -c "
        # 复制驱动到目标目录
        cp /opt/dolphinscheduler/lib/* /opt/dolphinscheduler/libs/ &&
        # 执行原容器的启动命令
        /opt/dolphinscheduler/bin/start.sh
      "
    networks:
      - dolphinscheduler

  dolphinscheduler-alert:
    image: ${HUB}/dolphinscheduler-alert-server:${TAG}
    profiles: ["all"]
    env_file: .env
    healthcheck:
      test: [ "CMD", "curl", "http://localhost:50053/actuator/health" ]
      interval: 30s
      timeout: 5s
      retries: 3
    depends_on:
      dolphinscheduler-zookeeper:
        condition: service_healthy
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-logs:/opt/dolphinscheduler/logs
    networks:
      - dolphinscheduler

  dolphinscheduler-master:
    image: ${HUB}/dolphinscheduler-master:${TAG}
    profiles: ["all"]
    env_file: .env
    healthcheck:
      test: [ "CMD", "curl", "http://localhost:5679/actuator/health" ]
      interval: 30s
      timeout: 5s
      retries: 3
    depends_on:
      dolphinscheduler-zookeeper:
        condition: service_healthy
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-logs:/opt/dolphinscheduler/logs
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-shared-local:/opt/soft
      - /mnt/data/www/dolphinscheduler/drivers:/opt/dolphinscheduler/lib #添加驱动映射
    command: >
      sh -c "
        # 复制驱动到目标目录
        cp /opt/dolphinscheduler/lib/* /opt/dolphinscheduler/libs/ &&
        # 执行原容器的启动命令
        /opt/dolphinscheduler/bin/start.sh
      "
    networks:
      - dolphinscheduler

  dolphinscheduler-worker:
    #image: ${HUB}/dolphinscheduler-worker:${TAG}
    build: 
      context: .
      dockerfile:  Dockerfile/ds-worker/Dockerfile  # 指定自定义Dockerfile
    profiles: ["all"]
    env_file: .env
    healthcheck:
      test: [ "CMD", "curl", "http://localhost:1235/actuator/health" ]
      interval: 30s
      timeout: 5s
      retries: 3
    depends_on:
      dolphinscheduler-zookeeper:
        condition: service_healthy
    environment:
      - PYTHON_HOME=/usr/local/python2.4/bin 
      - PATH=/usr/local/python2.4/bin:$PATH
    volumes:
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-worker-data:/tmp/dolphinscheduler
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-logs:/opt/dolphinscheduler/logs
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-shared-local:/opt/soft
      - /mnt/data/www/dolphinscheduler/dolphinscheduler-resource-local:/dolphinscheduler
      - /mnt/data/www/dolphinscheduler/drivers:/opt/dolphinscheduler/lib #添加驱动映射
      - /mnt/data/www/dolphinscheduler/datax:/opt/soft/datax #添加datax映射
    command: >
      sh -c "
        # 复制驱动到目标目录
        cp /opt/dolphinscheduler/lib/* /opt/dolphinscheduler/libs/ &&
        # 执行原容器的启动命令
        /opt/dolphinscheduler/bin/start.sh
      "
    networks:
      - dolphinscheduler

networks:
  dolphinscheduler:
    driver: bridge
  • Dockerfile
    因为要用到datax和python,所以需要自定义安装一下,后面有需要直接在Dockerfile中添加即可
FROM apache/dolphinscheduler-worker:3.2.2
#安装datax
RUN wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz \
    && tar -zxvf datax.tar.gz -C /opt/ \
    && rm datax.tar.gz
# 安装 Python 3 和 pip
RUN apt-get update && \
    apt-get install -y python3 python3-pip && \
    rm -rf /var/lib/apt/lists/*
ENV DATAX_HOME=/opt/datax \
    PATH=$PATH:/opt/datax/bin \
  • .env
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.
#
HUB=apache
TAG=3.2.2

TZ=Asia/Shanghai
DATABASE=postgresql
SPRING_JACKSON_TIME_ZONE=UTC
SPRING_DATASOURCE_URL=jdbc:postgresql://dolphinscheduler-postgresql:5432/dolphinscheduler
REGISTRY_ZOOKEEPER_CONNECT_STRING=dolphinscheduler-zookeeper:2181

#DATABASE=mysql
#DATABASE_TYPE=mysql
#SPRING_DATASOURCE_DRIVER_CLASS_NAME=com.mysql.cj.jdbc.Driver
#SPRING_DATASOURCE_URL=jdbc:mysql://你的宿主机ip:3306/dolphinscheduler?useUnicode=true&characterEncoding=UTF-8&useSSL=false
#SPRING_DATASOURCE_USERNAME=dolphinscheduler
#SPRING_DATASOURCE_PASSWORD=123456

上面的注释部分时指定dolphinscheduler持久化数据库为mysql,独立部署配置如下

export DATABASE=mysql
export SPRING_PROFILES_ACTIVE=${DATABASE}
export SPRING_DATASOURCE_USERNAME=dolphinscheduler
export SPRING_DATASOURCE_PASSWORD=123456

启动并使用

命令

启动:docker-compose --profile all up -d
停用:docker-compose --profile all down

使用

登录地址:http://你的ip:12345/dolphinscheduler

  • 添加一个环境



DATAX_LAUNCHERPYTHON_LAUNCHER是因为我要使用dataxpython,所以需要加一下环境,对应的就是Dockerfile中我安装的应用
JAVA_HOMEPATH也要加一下,独立部署不需要,docker安装需要,在执行脚本时,会切换成default用户sudo -u default -i,此时PATH会被重置,导致没有了java路径

  • 创建项目


  • 在当前项目下创建一个工作流


  • 创建datax节点



    自定义模板参考

{
  "setting": {},
  "job": {
        "content":[
            {
                "reader":{
                    "name":"oraclereader",
                    "parameter":{
                        "username":"源库账号",
                        "password":"源库密码",
                        "connection":[
                            {
                                "querySql":[
                                    "SELECT
                                        bp.PK_PSNDOC,
                                        bp.name,
                                        bp.code,
                                        bp.MOBILE,
                                        CASE
                                            WHEN bp.SEX IS NULL AND bp.SEX=1 THEN 1
                                            ELSE 0
                                        END AS SEX,
                                        CASE
                                            WHEN bp.BIRTHDATE IS NULL THEN '1000-01-01'
                                            ELSE bp.BIRTHDATE
                                        END AS BIRTHDATE,
                                        br.NAME AS native_place,
                                        bp.TS AS updated_at,
                                        bp.CREATIONTIME AS created_at
                                    FROM
                                        BD_PSNDOC bp
                                    LEFT JOIN BD_REGION br ON bp.nativeplace=br.PK_REGION"
                                ],
                                "jdbcUrl":[
                                    "jdbc:oracle:thin:@//源库地址:1521/orcl"
                                ]
                            }
                        ]
                    }
                },
                "writer":{
                    "name":"mysqlwriter",
                    "parameter":{
                        "username":"目标库用户",
                        "password":"目标库密码",
                        "writeMode": "update",
                        "primaryKey": ["PK_PSNDOC"],
                        "column":[
                            "`PK_PSNDOC`",
                            "`name`",
                            "`code`",
                            "`mobile`",
                            "`sex`",
                            "`BIRTHDATE`",
                            "`native_place`",
                            "`updated_at`",
                            "`created_at`"
                        ],
                        "connection":[
                            {
                                "table":[
                                    "bi_nc_psndoc"
                                ],
                                "jdbcUrl":"jdbc:mysql://目标库ip地址:3306/data_warehouse"
                            }
                        ]
                    }
                }
            }
        ],
        "setting":{
            "speed": {
                "channel": 2
            }
        }
    }
}
  • 保存并执行





  • 查看结果



常见问题

    1. /tmp/dolphinscheduler/exec/process/default/140885259385600/140887069920000_1/1/2/1_2.sh: line 5: --jvm=-Xms1G -Xmx1G: command not found
      解决:需要设置下安全中心/环境管理/对应环境的python路径:PYTHON_LAUNCHER=你的python
    1. 如下错误:
    File "/opt/datax/bin/datax.py", line 114
        print readerRef
        ^^^^^^^^^^^^^^^
    SyntaxError: Missing parentheses in call to 'print'. Did you mean print(...)?

解决:datax版本问题,升级到3.0,wget https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz && tar -zxvf datax.tar.gz -C /opt/

    1. /bin/sh: 1: java: not found

      这个问题时在执行脚本时,执行了一条sudo -u default -i,切换成了default用户,导致java的环境没了,从而执行不了java代码,只需要在安全中心/环境管理/对应环境中添加上export JAVA_HOME=/opt/java/openjdkexport PATH=/opt/java/openjdk/bin:$PATH
    1. docker中如何执行脚本
      以php为例,我要在dolphinscheduler添加shell来执行php容器的脚本,我是通过dockertcp连接,
#ExecStart添加
-H tcp://0.0.0.0:2375
#执行
$ systemctl daemon-reload
$ systemctl restart docker

shell参考如下:

#!/bin/bash

echo "执行全量生成月度年度快照"
# 1. 创建 exec 实例
#max-time 禁止 curl 超时
#no-buffer 实时输出流
EXEC_RESPONSE=$(curl -s -X POST --max-time 0 --no-buffer \
  "http://宿主机ip:2375/containers/php8-oracle-sqlsvr/exec" \
  -H "Content-Type: application/json" \
  -d '{"Cmd": ["php", "/var/www/data-warehouse/artisan", "nc:generate_work_snapshot_history"], "AttachStdout": true, "AttachStderr": true}')

# 提取 ExecID(用 grep + cut 替代 jq)
EXEC_ID=$(echo "$EXEC_RESPONSE" | grep -o '"Id":"[^"]*"' | cut -d'"' -f4)

if [ -z "$EXEC_ID" ]; then
  echo "错误:无法创建 Docker exec 实例"
  echo "API 响应: $EXEC_RESPONSE"
  exit 1
fi

# 2. 启动执行并获取输出
OUTPUT=$(curl -s -X POST \
  "http://宿主机ip:2375/exec/$EXEC_ID/start" \
  -H "Content-Type: application/json" \
  -d '{"Detach": false, "Tty": false}' \
  | tail -c +9)  # 跳过 Docker 头信息

# 3. 获取退出码(直接从 JSON 中提取)
EXIT_CODE_JSON=$(curl -s "http://宿主机ip:2375/exec/$EXEC_ID/json")
EXIT_CODE=$(echo "$EXIT_CODE_JSON" | grep -o '"ExitCode":[0-9]*' | cut -d':' -f2)

# 4. 输出结果
echo "----------------------------------------"
echo "PHP脚本输出:"
echo "$OUTPUT"
echo "----------------------------------------"
echo "退出码: ${EXIT_CODE:-未知}"

if [ -z "$EXIT_CODE" ] || [ "$EXIT_CODE" -ne 0 ]; then
  echo "错误:脚本执行失败"
  exit "${EXIT_CODE:-1}"
fi

相关文章

网友评论

      本文标题:dolphinscheduler的安装和使用

      本文链接:https://www.haomeiwen.com/subject/pdkpijtx.html