Basic knowledge of server BMC and IPMI

Getting to know BMC and IPMI

What is BMC?

The full name of BMC is Baseboard management controller, which is an embedded management microcontroller.

BMC is called Baseboard Manager Controller ( BMC for short ). It monitors the power and temperature of the system to ensure that the system is in a normal operating state.

In fact, BMC is the role of a housekeeper. It can manage the power, temperature and other performance of the entire system, and can also act as a watchdog to restart the CPU when the system crashes.

Ordinary computer systems, such as our own PC, do not bring BMC, because the CPU is not necessary, and the CPU can do all the work that BMC needs to do. But for complex servers, it is very necessary to introduce BMC. It's like an average family doesn't have a housekeeper, but a large family has a big business, so housekeepers can share housework.

The BMC system generally relies on the BMC chip, and currently commonly used is the AST2500 produced by ASPEED.

AST2500 is a BMC chip produced by ASPEED company, used for remote management of servers, and generally also used as a display chip of the server, output VGA signal, the display function is very basic, but it is enough for the server.


The BMC chips used by most manufacturers (DELL, HP, Lenovo, Inspur, Sugon, etc.) currently known are all produced by the company. The models include but are not limited to AST2050/2300/2400/2520. Huawei also used it before The company's BMC, in order to ensure information security, has now gradually switched to self-developed BMC chips.

BMC is the core controller that implements the IPMI general interface specification.
When supporting the IPMI v1.5 typical interface, the configuration requires 32K RAM memory and 128K flash memory. Of course, the higher the configuration, the more powerful its performance.
So how does BMC play its role in the system? The following is the architecture diagram of IPMI v1.5 released by Intel in 2001: Picuture from:Intel INC.
From the figure we can see that BMC is connected to the system bus through the System Interface externally, and connected to other components through IPMB: Intelligent platform management Bus internally.
In particular, BMC is connected to two network cards, one for local connection and one for remote connection to the network port.
This also provides the possibility of remotely using the ipmitool tool for management.
In addition, the specific configuration information of these sensors, such as alarm thresholds, whether event triggering is allowed, etc., are stored in a set of data called SDR (Sensor Data Record). The alarm events generated by sensors are stored in a group of data called SEL (Sensor Event Log).

What is IPMI?

The full name of IPMI is Intelligent platform management interface. We can see from its English that it is an interface and a protocol.
In other words, what interface it defines in this IPMI can be seen by managers using corresponding tools. In other words, I use BMC to control these sensors, Fan, voltage, etc... and calculate the result to tell the user a sequence of these parameters.
These parameters can be queried through the function of BMC.
As mentioned above, BMC can also store system data and event logs through memory and external storage. The realization of these functions is a bit similar to the realization of storage functions in a microcomputer system class. Of course, its implementation requires assembly language.
The system components of IPMI mainly include the following:

  • BMC:baseboard management controller;
  • IPMB:Intelligent Platform Management Bus;
  • ICMB:Intelligent Chassis management Bus;
  • SDR:Sensor Data Record;
  • SEL:System Event Log;
  • FRU:Field Replacement Unit

User can use IPMI query in three ways:

  • Query via command line under Linux operating system; for example, CPU temperature, etc.
  • Management software;
  • Browser with Java virtual machine

What is BMC

Before introducing BMC, we need to understand a concept, namely platform management.

Platform management represents a series of monitoring and control functions, and the object of operation is the system hardware. For example, by monitoring the temperature, voltage, fan, power supply, etc. of the system, and making corresponding adjustments to ensure that the system is in a healthy state.

Of course, if the system is really abnormal, you can also restart the system by resetting.

At the same time, the platform management is also responsible for recording various hardware information and log records, which are used to prompt users and locate subsequent problems.

The following figure is an overview of the functions involved in platform management:

The above functions can be integrated into a controller, which is called a Baseboard Manager Controller ( BMC ).

It should be noted that BMC is an independent system. It does not depend on other hardware on the system (such as CPU, memory, etc.), nor does it depend on BIOS, OS, etc. (but BMC can interact with BIOS and OS, so it can For better platform management, there is system management software under the OS that can work with BMC to achieve better management results).

Normally, our computer will not bring BMC, because it is not very useful, some temperature, power, etc. management, CPU (or EC, this is another topic) is enough to control.

But for equipment with high system requirements, such as servers, BMC will be used.

Of course, because BMC is an independent system, for some embedded devices, other processors may not be needed, and only one BMC can complete the work.

After all, BMC itself is also a small system with out-of-band processors (usually ARM processors), and it is completely possible to use it alone to handle certain tasks.

But since it's called BMC, the focus is on platform management in general, so this article mainly talks about BMC in the server.

The position of BMC in the system is roughly as shown in the figure below:

BMC connects with other components in the system through different interfaces.

LPC, I2C, SMBUS, Serial, etc., these are relatively basic interfaces, while IPMI is an interface that matches with BMC. All BMCs need to implement this interface, which needs a special introduction here.

 

IPMI

The full name of IPMI is Intelligent Platform Management Interface, intelligent platform management interface.

After reading the name, there is no need to specifically introduce what it is used for. For a detailed introduction to it, please refer to https://www.intel.com/content/www/us/en/servers/ipmi/ipmi-home.html , Here is only a brief description.

IPMI is a specific specification definition of the concept of "platform management". The specification defines the software and hardware architecture, interactive instructions, event format, data records, and capability sets of "platform management". The BMC is a core part of IPMI and belongs to the IPMI hardware architecture. The gray part of the figure below is the scope of IPMI:

You can see that the BMC is at the bottom of the hardware, and the upper white part is the management software in the system.

Since this article introduces BMC, only the IPMI hardware modules related to BMC are introduced here.

 

IPMI hardware module

IPMI stipulates a lot of things, BMC is one of the most important part, in addition, there are some "satellite" controllers connected with BMC through IPMB, these "satellite" controllers generally control specific equipment.

The full name of IPMB is Intelligent Platform Management Bus, which is a serial bus based on I2C. It is used for communication between BMC and the "satellite" controller, and IPMI commands are passed on it.

For relatively simple systems, BMC can already meet the requirements, but when the system is more complex and consists of multiple subsystems, then the IPMB and "satellite" controller can better manage the complex system.

The following figure describes the various hardware modules related to IPMI:

The following briefly introduces each part.

 

MOTHERBOARD

The first is the lower left corner of the picture, the name says Mother Board.

Usually, in the server, this part is the protagonist, which contains the main components such as CPU and PCH.

Here we can see that in addition to several components: network card, serial port and IPMI bus, there is actually a PCI bus that is part of the top middle part of the picture.

Network card: The server needs a network card. There is no good introduction to this. The point is actually the connection between the BMC and the network card, which will be introduced later.

Serial port: The serial port is used to output the debugging information of the server, but what is worth noting here is the Serial Port Sharing, which enables the serial port output of the server to be output directly or to the BMC. As for why it should be output to BMC, what needs to be noticed here is a common scenario. The server is located in the computer room, and the staff usually do not operate directly in the computer room, but operate through the network (which is why the BMC connects to the network card). At this time, it is inconvenient to go directly to the computer room if it is necessary to obtain the serial port information of the server. At this time, it is a good idea to obtain server serial port information through BMC.

IPMI bus: This is the main body of BMC communicating and controlling with the server, of course it is indispensable.

PCI bus: The role of this part is very similar to the serial port. In addition to outputting serial port information, the server also needs to output graphical interfaces and the like. From the server side, it is connected to a graphics card through PCI, and the display is output through it.

 

IPMB

Then come to the upper right corner of the picture, which describes the device connected via IPMB.

These devices are similar to BMC and are also used to manage chips.

They are a supplement to BMC, thereby extending the functions of BMC.

 

Non-volatile Storage

We know that BMC is actually an independent chip, so it must also run the system.

What runs through BMC is a Unix-like system, and the system is stored in Non-volatile Storage, usually in SPI Flash.

There is no essential difference with general storage media.

In addition to the system itself, it also contains a series of information that the BMC will store.

For example, the serial port information obtained from the server; the alarm information of the system itself; FRU information, etc.

 

Sensors & Control Circuitry

Although this part occupies only a small part of the picture, it is the most basic function of BMC: obtaining information and controlling the environment.

BMC will obtain the temperature of the device through buses such as I2C/PECI, and then adjust the temperature according to a preset strategy.

There are two ways to adjust, one is to adjust the fan, which belongs to active cooling; the other is to adjust the power supply, such as the P state of the CPU, or shut down redundant hard drives, which are passive cooling.

 

WIFE

The full name of FRU is Field Replaceable Unit.

It can also be seen from the figure that, similar to memory modules, CPUs, etc. belong to FRUs, and they are usually replaceable in the server.

BMC will detect these devices and save relevant information.

When the presence of these devices changes, BMC will generate related alarms

Guess you like

Origin blog.csdn.net/star871016/article/details/112257689