Summary of common problems in the use of Apache DolphinScheduler (continuous update)

Description

The common problems in the article are found during use. If the article does not have the problem you encountered, please submit an issue on github.

Official website address

github issue地址

Precautions for use

Installation package and installation directory
	安装包:是下载的的源文件,只有在安装时候才会用到,相安装好调度可以删除该目录。
	安装目录:运行install.sh后调度安装的目录,对调度管理都在安装目录下操作,比如 启停服务,、配置、查看日志等等。
	官网文档中的描述如下(配置在install.sh中),**记住**任何操作都在安装目录下操作。
 #将DS安装到哪个目录,如: /opt/soft/dolphinscheduler,不同于现在的目录
  installPath="/opt/soft/dolphinscheduler"
Log view directory

Assuming that the installation configuration is installPath="/opt/soft/dolphinscheduler" during installation scheduling, then the log directory is under
/opt/soft/dolphinscheduler/logs

View worker log
tail -f /opt/soft/dolphinscheduler/logs/dolphinscheduler-worker.log

Check the master log
tail -f /opt/soft/dolphinscheduler/logs/dolphinscheduler-master.log

View api log

tail -f /opt/soft/dolphinscheduler/logs/dolphinscheduler-api-server.log

View the alert log

tail -f /opt/soft/dolphinscheduler/logs/dolphinscheduler-alert.log

View log service logger log

tail -f /opt/soft/dolphinscheduler/logs/dolphinscheduler-logger-server-huaweiyun.out

huaweiyun is the hostname, modify it to your local

Common problems of development environment

API start port is 8080 not 12345

In the initial environment, the api default configuration is 12345, the
solution is to
edit the running configuration and add the following configuration

-Dserver-api-server -Dspring.profiles.active=api

Insert picture description here

Can't find mysql driver

Because DolphinScheduler uses postgresql by default, mysql driver dependency is not introduced by default. Need to manually modify the pom.

Solution:
find pom.xml in the root directory

<dependency>
	<groupId>mysql</groupId>
	<artifactId>mysql-connector-java</artifactId>
	<version>${mysql.connector.version}</version>
	<scope>test</scope>
</dependency>

Remove <scope>test< /scope> and it's ok

Insert picture description here

Re-import maven dependencies.

Worker execution failure reports NPE null pointer

Insert picture description here

Solution:
add the following parameters in WorkServer vm options

-Dspring.profiles.active=worker -Dlogging.config=“dolphinscheduler-server/src/main/resources/logback-worker.xml”

Insert picture description here
If the file cannot be found, remove the double quotes

-Dspring.profiles.active=worker -Dlogging.config=dolphinscheduler-server/src/main/resources/logback-worker.xml

Frequently asked questions about scheduling deployment

ubuntu deployment error source: not found

Insert picture description here

Solution

You can Baidu google keywords ubuntu source: not found There is a solution

Error 404 when accessing API service

Insert picture description here

Solution

The project name is missing from the above request, and the full address is http://ip:12345/dolphinscheduler/

API default port 12235

Frequently Asked Questions about Scheduling

System initial login failed

Generally, this error is the wrong password. The initial password for scheduling is as follows, please note that there should be no spaces in the copy.
Account: admin
Password: dolphinscheduler123

The monitoring page Master Worker page has been loading hard or there is no data in the query

Through the command jps, you can see the WorkerServer and MasterServer processes. There may be a reason that zk is down .
Insert picture description here

Solution

Check whether zk is started. If it is not started, start the zk service. Then monitor the page to see the status of Master Worker

The workflow runs manually, but there is no data on the task instance page

There are many factors for this problem. Let's assume that both zk and worker master are started successfully.
By checking the worker log, it is found that load or availablePhysicalMemorySize(G) is too high. The problem is the same as the following problem load or availablePhysicalMemorySize(G) is too high . Refer to the problem for solutions.

The task instance status is submitted successfully, but it never runs

There are many factors for this problem. The first analysis of the current situation is that the following log consume tasks: [], there still have 1 tasks need to be executed, the number of tasks has not decreased . In this case, the IP of the Worker group configuration associated with the task may be inconsistent with the IP of the Worker service. The task cannot be executed.

Insert picture description here

Solution:

Go to Security Center->Worker branch management and modify the group IP to be consistent with the IP of the Worker service.
**Bold style**

File upload failed

Assuming that the prerequisite file storage services have been configured, the upload has the following problem Nginx: 413 Request Entity Too Large. The reason is Nginx upload file limit.
Insert picture description here

Solution:

Modify /etc/nginx/nginx.conf. Increase the upload file size limit, and increase the nginx upload file size limit in the http{} section

client_max_body_size 200M

Restart ngnix

ps: If the description is not detailed, you can Baidu Google keyword 413 nginx

The SQL query was successful, but the task instance failed

The failure log is as follows, because the result of the SQL query will be sent by email, and the log error is that no mail service is configured
Insert picture description here
.

Configure the mail service configuration file as conf/alert.properties

The configuration example is as follows, 163 mailbox configuration. For
Insert picture description here
details of mailbox configuration, please refer to this article

SQL insert data task execution failed

[ERROR] 2020-06-05 10:20:07.756  - [taskAppId=TASK-9-493-494]:[336] - Can not issue data manipulation statements with executeQuery().
java.sql.SQLException: Can not issue data manipulation statements with executeQuery().
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:1084)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:987)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:973)
	at com.mysql.jdbc.SQLError.createSQLException(SQLError.java:918)
	at com.mysql.jdbc.StatementImpl.checkForDml(StatementImpl.java:501)
	at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:2150)
	at org.apache.dolphinscheduler.server.worker.task.sql.SqlTask.executeFuncAndSql(SqlTask.java:295)
	at org.apache.dolphinscheduler.server.worker.task.sql.SqlTask.handle(SqlTask.java:176)
	at org.apache.dolphinscheduler.server.worker.runner.TaskScheduleThread.run(TaskScheduleThread.java:142)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

Solution:

The reason for the error is that the SQL type is a non-query execution of the query. The solution is to update the task and change the SQL type to non-query
Insert picture description here

load or availablePhysicalMemorySize(G) is too high

The worker background log always shows the following log, because the server memory or CPU is not enough (because the default heap size is 1G for scheduled startup, if all 5 services are started, it will consume 5G of memory). This is the worker's self-protection mechanism. It will not accept new tasks. It will continue to accept new tasks when the current task is finished or the CPU and memory are surplus.
Insert picture description here

Solution

1 If other programs are being executed (large memory consumption), you can wait for the completion of other tasks, and then observe whether the worker has insufficient capacity logs.

2 If you are a local tyrant, directly expand the memory

3 If the physical machine memory is not without 5G, there are two ways to deal with it

  • You can turn off some services first, such as kill alert, logger, etc. directly.
  • You can adjust the startup parameters, modify the configuration file bin/dolphinscheduler-daemon.sh, modify the -Xms value to be smaller
export DOLPHINSCHEDULER_OPTS="-server -Xmx16g -Xms1g -Xss512k -XX:+DisableExplicitGC -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:LargePageSizeInBytes=128m -XX:+UseFastAccessorMethods -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=70"

Source reference

Zookeeper monitoring status is abnormal

zk starts normally, and Master Worker starts normally. But the self-check status of the zk node is abnormal, which is due to the failure to obtain the FourLetterWord of zk

Insert picture description here
Solution

First execute the command, localhost is the service address

 echo ruok|nc localhost 2181

If the following prompt appears, you need to modify the configuration zoo.cfg configuration

ruok is not executed because it is not in the whitelist.

Add the following configuration in zoo.cfg

  4lw.commands.whitelist=*

Restart zk

If there is no nc command, you need to install yum install nc first

Source reference

Failed to create tenant

To create a tenant, you need to use HDF, and general errors occur when operating HDFS.
Possible errors include whether hdfs is configured, hdfs operation is not authorized, etc.

For example, the following is no permission prompt
Insert picture description here

Solution

When encountering this problem, first look at the api log and make changes according to the log error prompts.
Anyway, first make sure that your HDFS has been configured . (Currently the mainstream is hdfs type, other types have not been tried)

ps: Refer to this issue for hdfs configuration file modification

Master and Worker stop abnormally

the reason

Master and Worker need to report the heartbeat to zookeeper. If the heartbeat is not reported within the specified time, the Master and Worker will automatically stop and the
following log will appear

[INFO] 2020-04-30 06:48:28.032 org.apache.dolphinscheduler.server.master.MasterServer:[180] - master server is stopping ..., cause : i was judged to death, release resources and stop myself
[INFO] 2020-04-30 06:48:29.425 org.apache.dolphinscheduler.server.master.runner.MasterSchedulerThread:[143] - master server stopped...
[INFO] 2020-04-30 06:48:31.033 org.apache.dolphinscheduler.server.master.MasterServer:[197] - heartbeat service stopped

So if the zookeeper loses connection (hangs up), the Master and Worker also hang up.

Solution

1 Ensure that the zookeeper service can be accessed normally.
2 You can modify the zookeeper timeout configuration.
In version 1.2, the timeout is 300ms. You can change the
configuration file to conf/zookeeper.properties and modify it according to the actual situation.

Refer to issue

Data truncation: Data too long for column ‘app_link’ at row 1

The reason is that the app_link field is too long

Solution

1) Before version 1.3, the length of t_ds_task_instance.app_link is 255. You can modify the field length, official script

ALTER TABLE t_ds_task_instance ALTER COLUMN app_link type text

2) You can upgrade to the latest version, the problem has been resolved,

Refer to issue

Guess you like

Origin blog.csdn.net/samz5906/article/details/106434430