How to print logs reasonably in java

1. The main role of the log

1) The log is the “mirror mirror” of the system operation, through which it can reflect the operating state of the system in real time;

As shown in the figure above, the producer in system A continuously generates data and puts it in the data queue, and the sender continuously fetches data from the data queue and sends it to the receiver of downstream system B. For system A, the amount of data to be sent in the data queue It is a very key indicator, it can truly reflect the current operating status of the system from the side. If the number of elements in the data queue exceeds 90% of the capacity, it indicates that the system may not be operating properly at this time, and there will be queue The risk of congestion; if the number of elements in the data queue is less than 10% of the capacity, it indicates that the system is operating normally at this time, and the risk of queue congestion is low.

If this indicator is not output to the log, the development and operation and maintenance personnel cannot know the current operating state of the system A (of course, there are other ways to obtain this indicator, such as exposure through the http interface is also one of the ways).

2) A good log is convenient for later O & M and developers to quickly locate online problems, speed up stop loss, and reduce losses caused by system failures;

3) The log also has another function of being able to seamlessly integrate with the monitoring system, collecting logs through the monitoring system, and obtaining the relevant performance indicators of the system operation, which is helpful to analyze the performance bottleneck of the system and avoid risks in advance;

for example:

If there is a shopping mall system, in the initial stage, the database provides services through two servers (one master, one slave), and most interfaces can respond to user requests within seconds. With the passage of time, the number of users in the mall system has gradually increased, and there has been a certain increase in concurrent queries and writes. The amount of data in the database has also slowly increased, resulting in slower and slower query of some SQL statements. The slave machine of the database was dragged down due to too many slow queries, completely downtime, resulting in unavailability of mall services.

If the mall system records the time-consuming situation of each http request in the log, configure the log collection through the monitoring system, and configure the corresponding alarm, then you can find the system performance bottleneck due to business growth in advance, and optimize the system in advance ( Such as machine capacity expansion, SQL statement optimization, sub-database sub-table, etc.), to avoid risks.

4) It is convenient for statistics of business-related index data, relevant business analysis and function optimization.

for example:

For example, a search system wants to count the proportion of searches used in different regions (such as north and south regions) in the past week. If the ip of each search query request is printed in the log itself, it is easy to count, otherwise it needs to be online and added to the log to count.

Therefore, everyone should pay attention to the standardization of log writing in the daily process of writing code, let it play its due value, and help to ensure the stable operation of our services, while effectively improving the efficiency of later system maintenance.

2. How to print the program log in a standardized manner?

Next, we will talk about how to print logs in a standardized way from the following aspects.

  1. Log file naming
  2. Log scrolling
  3. Log level
  4. Selection of log printing timing

2.1 Log file naming

Generally speaking, the naming of log files can include the following key information:

  1. Type identification (logTypeName)
  2. Log Level (logLevel)
  3. Log generation time (logCreateTime)
  4. Log backup number (logBackupNum)

Type identification: refers to the function or purpose of this log file, such as a web service. The log recording the HTTP request is usually named request.log or access.log, request and access are the type identification, and the java gc log is usually named gc. log, so you can see it at a glance; the log usually used to record the overall operation of the service is generally named after the service name (serviceName, appKey) or the machine name (hostName), such as nginx.log;

Log level: It is a more recommended way to distinguish the level directly through the file when printing the log. If you log all levels into the same log file, you need to go to the file to find the operation when locating the problem. Cumbersome. The log level generally includes the five levels of DEBUG, INFO, WARN, ERROR, and FATAL. In the actual writing code, strict matching mode or non-strict matching mode can be adopted. The strict matching mode means that only the INFO log and the ERROR log are printed in the INFO log file. The file only prints the ERROR log; in the non-strict matching mode, the INFO log file can print the INFO log, WARN log, ERROR log, FATAL log, the WARN log file can print the WARN log, ERROR log, FATAL log, and so on.

Log generation time: the time when the log file is created is appended to the log file name, which is convenient for sorting when searching for the log file;

Log backup number: When the log is cut, if the file size is used for scrolling, you can add a number to the end of the log file name;

2.2 Log scrolling

Although the log can save the key information when the system is running, but due to the limited disk space, we can not keep the log without limit, so there must be a log rolling strategy. Log rolling usually has the following modes:

  1. The first one: scroll by time
  2. The second: rolling according to the size of a single log file
  3. The third type: scroll according to the time and the size of a single log file at the same time.

Rolling according to time, that is, creating a new log file every certain time, usually can be scrolled according to the hour level or day level, depending on the amount of printing of the system log. If the system log is relatively small, you can take the day-level rolling; and if the system daily volume is relatively large, it is recommended to take the hour-level rolling.
Roll according to the size of a single log file, that is, when a log file reaches a certain size, a new log file is created. It is generally recommended that the size of a single log file should not exceed 500M. If the log file is too large, it may cause log monitoring or troubleshooting. Certainly affected.

According to the time and the size of a single log file, this mode is usually suitable for scenarios where you want to keep logs for a certain period of time, but do not want the single log file to be too large.

For the log rolling strategy, there are two more critical parameters: the maximum number of reserved logs and the maximum disk footprint. Remember to set these two parameters. If they are not set, it is very likely that the online machine disk will be full.

2.3 Log level

The log levels are usually the following:

debug/trace、info、warning、error、fatal

The serious programs of these log levels increase in order:

  • debug / trace: Because debug and trace level logs have a lot of printing content, they are generally not suitable for online production environment use, and are generally used for early offline environment debugging. Even if the online environment is to be used, it needs to be controlled by a switch, and it is only turned on when locating and tracking online problems;

  • info: The info log is generally used to record the critical state of the system operation, critical business logic, or critical execution nodes. But bear in mind that the info log must not be abused. If the info log is abused, it is not much different from the debug / trace log.

  • warning: The warning log is generally used to record some unexpected situations when the system is running. As the name implies, it is used as a warning to remind development and operation and maintenance personnel to pay attention, but to deal with it immediately without human intervention.

  • error: The error log is generally used to record some common errors when the system is running. Once these errors appear, it means that the user's normal access or use has been affected, which usually means that human intervention is required. However, in the production environment, it is not always necessary to manually intervene in the error log when it appears. Usually, the number and duration of the error log are combined to make a comprehensive judgment.

  • fatal: It is a fatal error of the system. Generally, it means that the system basically hangs up and requires manual intervention.

Here is a simple example to illustrate, if we have such a scenario, we have a salary calculation system, need to obtain the attendance data of all employees of the company from the employee attendance system on the 1st of every month, and then calculate the last month should be based on the attendance data Salary, then there needs to be a function to obtain employee attendance data from the attendance system:

public Map<Long, Double> getEmployeeWorkDaysFromAttendance(int year, int month, Set<Long> employeeList) throws BusiessException {
        // 入口关键日志,需要打印关键的参数,因为employeeList可能数量较大,所以次数没有直接打印employeeList列表内容,只打印了size
        logger.info("get employee work days, year:{}, month:{}, employeeList.size:{}", year, month, employeeList.size());
 
        // 如果需要临时检验员工列表,可以把debug日志开关打开
        if (debugOpen()) {
            logger.debug("employ list content:{}", JSON.toJsonString(employeeList));
        }
         
        int retry = 1;
        while (retry <= MAX_RETRY_TIMES) {
            try {
                Map<Long, Double> employeeWorkDays = employeeAttendanceRPC.getEmployeeWorkDays(year, month, employeeList);
                logger.info("get employee work days success, year:{}, month:{}, employeeList.size:{}, employeeWorkDays.size:{}", year, month, employeeList.size(), employeeWorkDays.size());
                return employeeWorkDays;
            } catch (Exception ex) {
                logger.warning("rpc invoke failed(employeeAttendanceRPC.getEmployeeWorkDays), retry times:{}, year:{}, month:{},  employeeList.size:{}", retry, year, month, employeeList.size(), ex);
                 
                // 连续重试失败之后,向上跑出异常
                // 对于没有异常机制的语言,此处应该打印error日志
                if (retry == MAX_RETRY_TIMES) {
                    throw new BusiessException(ex, "rpc invoke failed(employeeAttendanceRPC.getEmployeeWorkDays)");
                }
            }
            retry++;
        }
    }

2.4 Selection of log printing timing

Because the log is to facilitate us to understand the current operating status of the system and locate online problems, the timing of the log printing is very important. If the log is abused, it will result in too much log content and affect the efficiency of problem location; It is easy to cause the lack of key logs, and the root cause of the problem cannot be found when locating the problem online. Therefore, it is very important to grasp the timing of log printing. The following are common timings suitable for printing logs:

1) http call or rpc interface call

When the program calls other services or systems, the interface call parameters and call results (success / failure) need to be printed.

2) Abnormal program

When an exception occurs in the program, you either choose to throw an exception upward, or you must print the exception stack information in the catch block. However, it should be noted that it is best not to repeatedly print the exception log, such as throwing an exception upward in the catch block and printing the error log (except for the entry of the external rpc interface function).

3) Special condition branch

When the program enters some special conditional branches, such as special else or switch branches. For example, we calculate salary based on seniority:

public double calSalaryByWorkingAge(int age) {
       if (age < 0) {
           logger.error("wrong age value, age:{}", age);
           return 0;
       }
       // ..
   }

In theory, the length of service cannot be less than 0, so it is necessary to print out this unexpected situation. Of course, it is also feasible to throw an exception.

4) Critical execution path and intermediate state

It is also necessary to record key log information in some critical execution paths and intermediate states. For example, an algorithm may be divided into many steps. What is the intermediate output result of each step needs to be recorded to facilitate subsequent positioning and tracking of the execution state of the algorithm.

5) Request entrance and exit

The entry / exit logs need to be printed at the entry / exit of the function or external interface, which facilitates subsequent log statistics and also facilitates monitoring of the system's operating status.

2.5 Log content and format

The timing of the log printing determines that the problem can be located according to the log, and the content of the log determines whether the cause of the problem can be quickly found based on the log, so the content of the log is also crucial. Generally speaking, a log should include at least the following components:

logTag、param、exceptionStacktrace

  • logTag is the log identifier, used to identify the scene or reason for the output of this log,
  • param is the function call parameter,
  • exceptionStacktrace is an exception stack.

for example:

good case

public class HttpClient {
        private static final Logger LOG = LoggerFactory.getLogger(HttpClient.class);
 
        private static int CONNECT_TIMEOUT = 5000;   // unit ms
        private static int READ_TIMEOUT = 10000;     // unit ms
 
        public static String sendPost(String url, String param) {
            OutputStream out = null;
            BufferedReader in = null;
            String result = "";
            try {
                URL realUrl = new URL(url);
                URLConnection conn = realUrl.openConnection();
                conn.setDoInput(true);
                conn.setDoOutput(true);
                conn.setConnectTimeout(CONNECT_TIMEOUT);
                conn.setReadTimeout(READ_TIMEOUT);
                conn.setRequestProperty("charset", "UTF-8");
                out = new PrintWriter(conn.getOutputStream());
                out.print(parm);
                out.flush();
                in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
                String line;
                while ((line = in.readLine()) != null) {
                    result += line;
                }
            } catch (Exception ex) {
                // 有关键logTag,有参数信息,有错误堆栈
                LOG.error("post request error!!!, url:[[}], param:[{}]", url, param, ex);
            } finally {
                try {
                    if (out != null) {
                        out.close();
                    }
                    if (in != null) {
                        in.close();
                    }
                } catch (IOException ex) {
                    LOG.error("close stream error!!!, url:[[}], param:[{}]", url, param, ex);
                }
                return result;
            }
        }
    }

bad case

public class HttpClient {
    private static final Logger LOG = LoggerFactory.getLogger(HttpClient.class);
 
    private static int CONNECT_TIMEOUT = 5000;   // unit ms
    private static int READ_TIMEOUT = 10000;     // unit ms
     
    public static String sendPost(String url, String param) {
        OutputStream out = null;
        BufferedReader in = null;
        String result = "";
        try {
            URL realUrl = new URL(url);
            URLConnection conn = realUrl.openConnection();
            conn.setDoInput(true);
            conn.setDoOutput(true);
            conn.setConnectTimeout(CONNECT_TIMEOUT);
            conn.setReadTimeout(READ_TIMEOUT);
            conn.setRequestProperty("charset", "UTF-8");
            out = new PrintWriter(conn.getOutputStream());
            out.print(parm);
            out.flush();
            in = new BufferedReader(new InputStreamReader(conn.getInputStream()));
            String line;
            while ((line = in.readLine()) != null) {
                result += line;
            }
        } catch (Exception ex) {
            // 没有任何错误信息
            LOG.error("post request error!!!");
        } finally {
            try {
                if (out != null) {
                    out.close();
                }
                if (in != null) {
                    in.close();
                }
            } catch (IOException ex) {
                LOG.error("close stream error!!!");
            }
            return result;
        }
    }
}

In addition, for the external http interface or rpc interface, it is best to have a requestId for each request in order to track all subsequent execution paths of each request.

How to properly log in the project?

1. Correctly define the log

2. Use parameterized form {} placeholder, [] for parameter isolation

LOG.debug("Save order with order no:[{}], and order amount:[{}]");

3. Output different levels of logs

The most commonly used log levels in the project are ERROR, WARN, INFO, and DEBUG. What are the application scenarios of these four?

Several wrong ways to log

1. Do not use System.out.print..

When outputting the log, the log can only be output through the log framework, instead of using System.out.print ... to print the log. This will only be printed to the tomcat console, and will not be recorded in the log file. It is inconvenient to manage the log. The log is discarded after being started as a service, and the log cannot be found.

2. Do not use e.printStackTrace()

It is actually output to the tomcat console using System.err.

3. Do not output logs after throwing an exception

If a custom business exception is thrown after the exception is caught, there is no need to record an error log at this time, and the final capture party will handle the exception. You cannot throw an exception again and print the error log, otherwise it will cause repeated output of the log.


try {
    // ...
} catch (Exception e) {
    // 错误
    LOG.error("xxx", e);
    throw new RuntimeException();
}

4. Not all error messages are output

Looking at the following code, this will not record detailed stack exception information, but only the basic description of the error, which is not conducive to troubleshooting.

try {
    // ...
} catch (Exception e) {
    // 错误
    LOG.error('XX 发生异常', e.getMessage());
 
    // 正确
    LOG.error('XX 发生异常', e);
}

5. Don't use the wrong log level

I used to locate a problem online, and my colleagues confidently told me: I clearly output the log, why can't I find it ... Later, I went to read his code, like this:

try {
    // ...
} catch (Exception e) {
    // 错误
    LOG.info("XX 发生异常...", e);
}

Use the info to record the error log, and the log is output to the info log file. How can my colleagues find it in the error log file?

6. Do not print the log in the Melaleuca loop

What does this mean? If your framework uses the Log4j framework with low performance, then do n’t print the log in thousands of for loops, as this may drag down your application. If your program response time is slow, It should be considered whether the log is printed too much.

for(int i=0; i<2000; i++){
    LOG.info("XX");
}

7. Disable debugging in online environment

Published 420 original articles · 143 thumbs up · 890,000 views

Guess you like

Origin blog.csdn.net/jeikerxiao/article/details/99851611