Apollo Application and Source Code Analysis: Monitor Monitoring - Basic Concepts and Entry Analysis

Monitor system monitoring

Table of contents

basic concept

Code Structure Analysis

overall logic analysis


basic concept

overall classification

This module contains system-level software such as code to check the status of the hardware and monitor the health of the system.

In Apollo 5.5, the monitoring module now performs the following checks:

  • running module status
  • Monitor Data Integrity
  • Monitoring data frequency
  • Monitor system health (e.g. CPU, memory, disk usage, etc.)
  • Generate end-to-end latency statistics report

The first three functions can be configured independently.

attribute classification

In terms of attributes, the monitors of apollo can basically be divided into hardware status monitoring and software status monitoring.

Hardware status monitoring can basically be divided into:

  • GPS
  • Resource
  • ESD-CAN
  • Socket-CAN

Software status monitoring can be divided into:

  • Channel Status
  • Functional safety Status
  • Latency status
  • Localization status
  • Module status
  • Process status
  • recorder status

Then the above status is packaged and emitted by the Summary module.

Code Structure Analysis

├── BUILD
├── README.md
├── common
│   ├── BUILD
│   ├── monitor_manager.cc
│   ├── monitor_manager.h
│   ├── recurrent_runner.cc
│   ├── recurrent_runner.h
│   └── recurrent_runner_test.cc
├── hardware
│   ├── BUILD
│   ├── esdcan_monitor.cc
│   ├── esdcan_monitor.h
│   ├── gps_monitor.cc
│   ├── gps_monitor.h
│   ├── resource_monitor.cc
│   ├── resource_monitor.h
│   ├── socket_can_monitor.cc
│   └── socket_can_monitor.h
├── monitor.cc
├── monitor.h
├── proto
│   ├── BUILD
│   └── system_status.proto
└── software
    ├── BUILD
    ├── camera_monitor.cc
    ├── camera_monitor.h
    ├── channel_monitor.cc
    ├── channel_monitor.h
    ├── functional_safety_monitor.cc
    ├── functional_safety_monitor.h
    ├── latency_monitor.cc
    ├── latency_monitor.h
    ├── localization_monitor.cc
    ├── localization_monitor.h
    ├── module_monitor.cc
    ├── module_monitor.h
    ├── process_monitor.cc
    ├── process_monitor.h
    ├── recorder_monitor.cc
    ├── recorder_monitor.h
    ├── summary_monitor.cc
    └── summary_monitor.h

It mainly includes 4 parts:

  • Component creation entry: monitor.cc/.h
  • common common base class
  • hardware hardware monitoring
  • software software monitoring

overall logic analysis

overall process

When the Monitor is running, it first scans different sub-Monitors, and then makes a monitoring report on the overall status through the SummaryMonitor, resulting in 4 types of status:

  • Fatal
  • Error
  • Warn
  • OK
  • Unknown

After that, FunctionalSafetyMonitor performs two actions according to the status:

  1. Inform the driver to take action
  2. Trigger Guardian module (emergency stop) if expected safety measures are not in effect

code analysis

Monitor class structure analysis: monitor.h/.cc

class Monitor : public apollo::cyber::TimerComponent {
 public:
  bool Init() override;
  bool Proc() override;

 private:
  std::vector<std::shared_ptr<RecurrentRunner>> runners_;
};

Monitor is a timer component that inherits TimerComponent, init is responsible for initialization, and proc is responsible for actual execution.

Monitor initialization analysis

MonitorManager::Instance()->Init(node_);
  // Only the one CAN card corresponding to current mode will take effect.
  runners_.emplace_back(new EsdCanMonitor());
  runners_.emplace_back(new SocketCanMonitor());
  // To enable the GpsMonitor, you must add FLAGS_gps_component_name to the
  // mode's monitored_components.
  runners_.emplace_back(new GpsMonitor());
  // To enable the LocalizationMonitor, you must add
  // FLAGS_localization_component_name to the mode's monitored_components.
  runners_.emplace_back(new LocalizationMonitor());
  // To enable the CameraMonitor, you must add
  // FLAGS_camera_component_name to the mode's monitored_components.
  runners_.emplace_back(new CameraMonitor());
  // Monitor if processes are running.
  runners_.emplace_back(new ProcessMonitor());
  // Monitor if modules are running.
  runners_.emplace_back(new ModuleMonitor());
  // Monitor message processing latencies across modules
  const std::shared_ptr<LatencyMonitor> latency_monitor(new LatencyMonitor());
  runners_.emplace_back(latency_monitor);
  // Monitor if channel messages are updated in time.
  runners_.emplace_back(new ChannelMonitor(latency_monitor));
  // Monitor if resources are sufficient.
  runners_.emplace_back(new ResourceMonitor());
  // Monitor all changes made by each sub-monitor, and summarize to a final
  // overall status.
  runners_.emplace_back(new SummaryMonitor());
  // Check functional safety according to the summary.
  if (FLAGS_enable_functional_safety) {
    runners_.emplace_back(new FunctionalSafetyMonitor());
  }

  return true;

runners_ is a member container in the class: std::vector<std::shared_ptr<RecurrentRunner>> runners_.

init function flow:

  1. Use the current node to initialize MonitorManger (node_ is because it inherits from component)
  2. Put the following monitors into the container:
    1. EsdCanMonitor
    2. SocketCanMonitor
    3. GpsMonitor
    4. LocalizationMonitor
    5. CameraMonitor
    6. ProcessMonitor
    7. ModuleMonitor
    8. LatencyMonitor
    9. ChannelMonitor
    10. ResourceMonitor
    11. SummaryMonitor
  1. Determine whether to enable functional_safety
    1. If it is, put FunctionalSafetyMonitor into the container
  1. return true

Monitor Execution Function Analysis

bool Monitor::Proc() {
  const double current_time = apollo::cyber::Clock::NowInSeconds();
  if (!MonitorManager::Instance()->StartFrame(current_time)) {
    return false;
  }
  for (auto& runner : runners_) {
    runner->Tick(current_time);
  }
  MonitorManager::Instance()->EndFrame();

  return true;
}

process

  1. recorded the current time
  2. MonitorManager starts a frame (starts a monitoring task) and passes in the current time
    1. Return false if execution fails
  1. Traversing the runners container, executing the tick function (starting the monitoring task of sub-monitoring) for all the monitors inside, and passing in the current time.
  2. MonitorManager closes the frame.
  3. returns true.

 

Guess you like

Origin blog.csdn.net/qq_32378713/article/details/128035178