基于【Apollo】进程异常崩溃定位方法

现象

在dreamview里面,打开Navi_planning或者Planning模块的开关,开关运行一段时间后会自动关闭并重新开启。

定位过程

查看dreamview代码,模块开关定义在modules/dreamview/conf/hmi.conf

modules {
  key: "navigation_planning"
  value: {
    display_name: "Navi Planning"
    supported_commands {
      key: "start"
      value: "supervisorctl start navigation_planning &"
    }
    supported_commands {
      key: "stop"
      value: "supervisorctl stop navigation_planning &"
    }
  }
}

再在代码中搜索navigation_planning,发现modules/tools/supervisord/dev.conf如下:

[program:navigation_planning]
command=/apollo/bazel-bin/modules/planning/planning --flagfile=/apollo/modules/planning/conf/planning_navi.conf --stderrthreshold=3 --use_navigation_mode
autostart=false
numprocs=1
exitcodes=0
stopsignal=INT
startretries=10
autorestart=unexpected
redirect_stderr=true
stdout_logfile=/apollo/data/log/planning.out

手动执行命令/apollo/bazel-bin/modules/planning/planning --flagfile=/apollo/modules/planning/conf/planning_navi.conf --stderrthreshold=3 --use_navigation_mode
出现错误Illegal instruction (core dumped) ,出现core dumped错误就要寻找日志。

寻找core dump日志:先用命令ulimit -c查看core日志是否开启,显示0为关闭,unlimited为开启。如显示0则执行命令ulimit -c unlimited。然后到/proc/sys/kernel目录下cat文件core_pattern。发现日志路径/apollo/data/core/core_%e.%p,根据路径和进程号找到日志core_planning.1193

日志找到,用gdb解析日志,命令gdb 执行程序 core日志(core.进程号)

ubuntu@in_dev_docker:/apollo/data/core$ gdb /apollo/bazel-bin/modules/planning/p
lanning core_planning.1193
GNU gdb (Ubuntu 7.7.1-0ubuntu5~14.04.3) 7.7.1
Copyright (C) 2014 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /apollo/bazel-bin/modules/planning/planning...done.
[New LWP 1228]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".

warning: the debug information found in "/home/caros/secure_upgrade/depend_lib/libyaml-cpp.so.0.5.1" does not match "/home/caros/secure_upgrade/depend_lib/libyaml-cpp.so.0.5" (CRC mismatch).

Core was generated by `/apollo/bazel-bin/modules/planning/planning --flagfile=/apollo/modules/planning'.
Program terminated with signal SIGILL, Illegal instruction.
#0  qpOASES::SubjectTo::init (this=this@entry=0x20792b0, _n=_n@entry=18)
    at external/qpOASES/src/SubjectTo.cpp:128
128 external/qpOASES/src/SubjectTo.cpp: No such file or directory.
(gdb) 
(gdb) 
(gdb) bt
#0  qpOASES::SubjectTo::init (this=this@entry=0x20792b0, _n=_n@entry=18)
    at external/qpOASES/src/SubjectTo.cpp:128
#1  0x0000000000943523 in qpOASES::Bounds::init (this=this@entry=0x20792b0, 
    _n=_n@entry=18) at external/qpOASES/src/Bounds.cpp:116
#2  0x000000000093d89f in qpOASES::QProblemB::QProblemB (this=0x2079280, 
    _nV=18, _hessianType=qpOASES::HST_UNKNOWN, allocDenseMats=qpOASES::BT_TRUE)
    at external/qpOASES/src/QProblemB.cpp:128
#3  0x0000000000931c3f in qpOASES::QProblem::QProblem (this=0x2079280, _nV=18, 
    _nC=226, _hessianType=<optimized out>, allocDenseMats=qpOASES::BT_TRUE)
    at external/qpOASES/src/QProblem.cpp:83
#4  0x000000000093ec41 in qpOASES::SQProblem::SQProblem (this=0x2079280, 
    _nV=<optimized out>, _nC=<optimized out>, _hessianType=<optimized out>, 
    allocDenseMats=<optimized out>) at external/qpOASES/src/SQProblem.cpp:60
#5  0x00000000004a3b28 in apollo::planning::Spline1dGenerator::Solve (
    this=0x1f66cd0)
    at modules/planning/math/smoothing_spline/spline_1d_generator.cc:98
#6  0x00000000004a0ccc in apollo::planning::QpSplineStGraph::Solve (
    this=0x7ffc35f7fbb0)
    at modules/planning/tasks/qp_spline_st_speed/qp_spline_st_graph.cc:332
#7  0x000000000049e986 in apollo::planning::QpSplineStGraph::Search (
    this=0x7ffc35f7fbb0, st_graph_data=..., accel_bound=..., 
    reference_speed_data=..., speed_data=0x207ffd0)
    at modules/planning/tasks/qp_spline_st_speed/qp_spline_st_graph.cc:129
---Type <return> to continue, or q <return> to quit---
#8  0x000000000048e1d2 in apollo::planning::QpSplineStSpeedOptimizer::Process (
    this=0x1f873f0, adc_sl_boundary=..., path_data=..., init_point=..., 
    reference_line=..., reference_speed_data=..., path_decision=0x207fe98, 
    speed_data=0x207ffd0)
    at modules/planning/tasks/qp_spline_st_speed/qp_spline_st_speed_optimizer.cc:165
#9  0x00000000004b1e7b in apollo::planning::SpeedOptimizer::Execute (
    this=0x1f873f0, frame=0x205b0b0, reference_line_info=0x207fae0)
    at modules/planning/tasks/speed_optimizer.cc:44
#10 0x0000000000452d32 in apollo::planning::EMPlanner::PlanOnReferenceLine (
    this=0x1fa1560, planning_start_point=..., frame=0x205b0b0, 
    reference_line_info=0x207fae0)
    at modules/planning/planner/em/em_planner.cc:192
#11 0x0000000000452794 in apollo::planning::EMPlanner::Plan (this=0x1fa1560, 
    planning_start_point=..., frame=0x205b0b0)
    at modules/planning/planner/em/em_planner.cc:157
#12 0x0000000000432d74 in apollo::planning::Planning::Plan (
    this=0x7ffc35f80cf0, current_time_stamp=1527583889.0500541, 
    stitching_trajectory=std::vector of length 1, capacity 1 = {...}, 
    trajectory_pb=0x205b3b0) at modules/planning/planning.cc:473
#13 0x0000000000431cdb in apollo::planning::Planning::RunOnce (
    this=0x7ffc35f80cf0) at modules/planning/planning.cc:375
#14 0x00000000004304d2 in apollo::planning::Planning::OnTimer (
---Type <return> to continue, or q <return> to quit---
    this=0x7ffc35f80cf0) at modules/planning/planning.cc:149
#15 0x000000000044cc40 in boost::_mfi::mf1<void, apollo::planning::Planning, ros::TimerEvent const&>::operator() (this=0x1f67628, p=0x7ffc35f80cf0, a1=...)
    at /usr/include/boost/bind/mem_fn_template.hpp:165
#16 0x000000000044c2b8 in boost::_bi::list2<boost::_bi::value<apollo::planning::Planning*>, boost::arg<1> >::operator()<boost::_mfi::mf1<void, apollo::planning::Planning, ros::TimerEvent const&>, boost::_bi::list1<ros::TimerEvent const&> >
    (this=0x1f67638, f=..., a=...) at /usr/include/boost/bind/bind.hpp:313
#17 0x000000000044b736 in boost::_bi::bind_t<void, boost::_mfi::mf1<void, apollo::planning::Planning, ros::TimerEvent const&>, boost::_bi::list2<boost::_bi::value<apollo::planning::Planning*>, boost::arg<1> > >::operator()<ros::TimerEvent>
    (this=0x1f67628, a1=...) at /usr/include/boost/bind/bind_template.hpp:47
#18 0x000000000044a926 in boost::detail::function::void_function_obj_invoker1<boost::_bi::bind_t<void, boost::_mfi::mf1<void, apollo::planning::Planning, ros::TimerEvent const&>, boost::_bi::list2<boost::_bi::value<apollo::planning::Planning*>, boost::arg<1> > >, void, ros::TimerEvent const&>::invoke (
    function_obj_ptr=..., a0=...)
    at /usr/include/boost/function/function_template.hpp:153
#19 0x00007f20f54bb780 in operator() (a0=..., this=<optimized out>)
    at /usr/include/boost/function/function_template.hpp:767
#20 ros::TimerManager<ros::Time, ros::Duration, ros::TimerEvent>::TimerQueueCallback::call (this=0x7f207c0008e0)
    at /apollo/apollo-platform/ros/ros_comm/roscpp/include/ros/timer_manager.h:1---Type <return> to continue, or q <return> to quit---
84
#21 0x00007f20f54d8ad6 in ros::CallbackQueue::callOneCB (
    this=this@entry=0x199e3a0, tls=tls@entry=0x1f97820)
    at /apollo/apollo-platform/ros/ros_comm/roscpp/src/libros/callback_queue.cpp:393
#22 0x00007f20f54d92a3 in ros::CallbackQueue::callAvailable (
    this=this@entry=0x199e3a0, timeout=...)
    at /apollo/apollo-platform/ros/ros_comm/roscpp/src/libros/callback_queue.cpp:334
#23 0x00007f20f5523e35 in ros::SingleThreadedSpinner::spin (
    this=<optimized out>, queue=0x199e3a0)
    at /apollo/apollo-platform/ros/ros_comm/roscpp/src/libros/spinner.cpp:62
#24 0x00007f20f55087db in ros::spin ()
    at /apollo/apollo-platform/ros/ros_comm/roscpp/src/libros/init.cpp:567
#25 0x000000000044dd37 in apollo::common::ApolloApp::Spin (this=0x7ffc35f80cf0)
    at modules/common/apollo_app.cc:78
#26 0x000000000042a48f in main (argc=1, argv=0x7ffc35f80ee0)
    at modules/planning/main.cc:20

可以看到完整的进程调用逻辑,最后就排查到底是哪里出现了问题。

上文的错误Illegal instruction (core dumped) 基本上是由调用库版本不拼配导致的。

这是在Apollo平台下,在一般ubuntu/linux/Android系统下都可以找到core dump日志,并进行分析。

猜你喜欢

转载自blog.csdn.net/sunyoop/article/details/80500055