Front-end code exception log collection and monitoring

☞ How to collect logs

The means of collecting logs at ordinary times can be classified into two aspects. One is wrong judgment in logic, which is active judgment; the other is to use the shortcut provided by language to obtain wrong information violently, such as try..catch and window.onerror.

1. Active judgment

After some operations, we get a desired result, but the result is not what we want

// test.js
function calc(){
  // code...
  return val;
}
if(calc() !== "someVal"){
  Reporter.send({
    position: "test.js::<Function>calc"
    msg: "calc error"
  });
}

This kind of feedback, which belongs to logic error/state error, status is often used in interface judgment.

2. try..catch Capture

Check for errors in a code segment:

try {
  init();
  // code...
} catch(e){
  Reporter.send(format(e));
}

init It is thought

3. window.onerror

Catch global errors:

window.onerror = function() {
  var errInfo = format (arguments);
  Reporter.send(errInfo);
  return true;
};

Returning in the above function return true, the error will not be exposed to the console. The following is its parameter information:

/**
 * @param {String} errorMessage error message
 * @param {String} scriptURI error file
 * @param {Long} lineNumber line number of the error code
 * @param {Long} columnNumber The column number of the error code
 * @param {Object} errorObj Error details, Anything
 */
window.onerror = function(errorMessage, scriptURI, lineNumber,columnNumber,errorObj) {
    // code..
}

window.onerror It is a particularly violent fault-tolerant method, try..catch and the same is true. Their underlying implementation is implemented by using statements in C/C++. goto Once an error is found, no matter how deep the current stack is, no matter where the code runs, go directly to the top level or try..catch That layer of capture, this kind of kick-off error handling isn't great.

☞ Problems with collecting logs

The purpose of collecting logs is to find problems in time. The best log can tell us where the error is, and a better way is to not only tell us where the error is, but also tell us how to deal with the error. The ultimate goal is to find errors and automatically tolerate faults. This step is the hardest.

1. No specific error message, Script error.

First look at the following example, test.html

<!-- http://barret/test.html -->
<script>
  window.onerror = function(){
    console.log(arguments);
  };
</script>
<script src="http://barret/test.js"></script>

test.js

// http://barret/test.js
function test(){
  see a = 1;
  return a+1;
}
test();

The logs we expect to collect are the following specific information:

In order to better configure and manage resources, we usually put static resources on foreign domains

<!-- http://barret/test.html -->
<script>
  window.onerror = function(){
    console.log(arguments);
  };
</script>
<script src="http://localhost/test.js"></script>

And the result obtained is:

Open Chromium's WebCore source code , you can see:

In the case of cross-domain, the returned result is Script error..

// http://trac.webkit.org/browser/branches/chromium/1453/Source/WebCore/dom/ScriptExecutionContext.cpp#L333
String message = errorMessage;
int line = lineNumber;
String sourceName = sourceURL;
// have got all the error messages, but if it is found to be non-same origin, overwrite the error message in `sanitizeScriptError`
sanitizeScriptError(message, line, sourceName, cachedScript);

In the old version of WebCore, only judgment is made securityOrigin()->canRequest(targetURL), and there is one more judgment in the new version. It cachedScript can be seen that the browser has stricter and stricter restrictions on this aspect.

Tested locally:

Visible under the file:// agreement, securityOrigin()->canRequest(targetURL) too false.

☞ Why Script error.?

Simple error: Script error, the purpose is to avoid data leakage to insecure domains, a simple example:

<script src="bank.com/login.html"></script>

Above, we did not introduce a js file, but an html. This html is the login page of the bank. If you are already logged in bank.com, the login page will automatically jump to Welcome xxx...it. Please Login...Yes Welcome xxx... is not defined, Please Login... is not definedthrough this information, it can be judged whether a user logs in to his bank account, which provides a very convenient judgment channel for hackers, which is quite unsafe.

☞Parameters crossOriginskip cross-domain restrictions

Both image and script tags have the crossorigin parameter, which tells the browser that I want to load a resource from an external domain, and I trust this resource.

<script src="http://localhost/test.js" crossorigin></script>

However, it reported an error:

This is an expected error, the cross-origin resource sharing policy requires that the server also set Access-Control-Allow-Origin the response header:

header('Access-Control-Allow-Origin: *');

Looking back at our CDN resources,

In fact, these static resources such as Javascript/CSS/Image/Font/SWF have already added CORS response headers early.

2. The compression code cannot locate the exact location of the error

Almost all the online codes are packaged and compressed, and dozens or hundreds of files are compressed and packaged into one, and there is only one line. When we receive a is not defined it, if an error is reported only in a specific scenario, we can't locate what is compressed at a all, then the error log at this time is invalid.

The first method that comes to mind is to use sourceMap, which can locate a certain point of the compressed code in the specific location of the uncompressed code. The following is the format introduced by sourceMap, added in the last line of the code:

//# sourceMappingURL=index.js.map

It used to start with '//@', now it's '//#', but for error reporting, it's useless. JS can't get his real number of lines, and can only use tools such as Chrome DevTools to assist in positioning, and not every online resource will add a sourceMap file. The use of sourceMap can only be reflected in the development stage at present.

Of course, if you understand the corresponding relationship between the VLQ code and the location of sourceMap, you can also parse the obtained log twice and map it to the real path location. This cost is relatively high, and it seems that no one has tried it for the time being.

So, is there any way to locate the specific location of the error, or is there any way to reduce the difficulty of locating the problem?

It can be considered like this: when packaging, add 1000 blank lines between every two merged files, and the last online file will become

(function(){var longCode.....})(); // file 1

// 1000 blank lines

(function(){var longCode.....})(); // file 2

// 1000 blank lines

(function(){var longCode.....})(); // file 3

// 1000 blank lines

(function(){var longCode.....})(); // file 4


var _fileConfig = ['file 1', 'file 2', 'file 3', 'file 4']

If the error is on line 3001,

window.onerror = function(msg, url, line, col, error){
  // line = 3001
  var lineNum = line;
  console.log("Error location: " + _fileConfig[parseInt(lineNum / 1000) - 1]);
  // -> "Error location: file 3"
};

It can be calculated that the error occurs in the third file, and the scope is narrowed down a lot.

3. Registration of error events

Registering the error event multiple times will not execute multiple callbacks repeatedly:

var fn = window.onerror = function() {
  console.log(arguments);
};
window.addEventListener("error", fn);
window.addEventListener("error", fn);

After triggering the error, the result of the above code is:

window.onerror and addEventListener both are executed, and only once.

4. The amount of logs collected

There is no need to send all the error messages to the Log, this amount is too large. If the web page PV has 1kw, then there will be 1kw of log information sent by a certain error, which is about a G log. We can Reporter add a sample rate to the function:

function needReport (sampling) {
  // sampling: 0 - 1
  return Math.random() <= sampling;
}
Reporter.send = function(errInfo, sampling) {
  if (needReport (sampling || 1)) {
    Reporter._send(errInfo);
  }
};

This sampling rate can be processed according to requirements. It can be the same as above, using a random number, or it can be determined by the last letter/number of a field (such as nickname) in the cookie, or hash the user's nickname, and then Judging by the last letter/number, in short, there are many ways.

☞ Collection log distribution location

In order to get more accurate error information and effectively count error logs, we should use more active buried points, such as in an interface request:

// Module A Get Shops Data
$.ajax({
  url: URL,
  dataType: "jsonp",
  success: function (ret) {
    if(ret.status === "failed") {
      // Fill point 1
      return Reporter.send({
        category: "WARN",
        msg: "Module_A_GET_SHOPS_DATA_FAILED"
      });
    }
    if(!ret.data || !ret.data.length) {
      // Fill point 2
      return Reporter.send({
        category: "WARN",
        msg: "Module_A_GET_SHOPS_DATA_EMPTY"
      });
    }
  },
  error: function() {
    // Fill point 3
    Reporter.send({
      category: "ERROR",
      msg: "Module_A_GET_SHOPS_DATA_ERROR"
    });
  }
});

We have precisely laid out three points above, and the description is very clear. These three points will provide us with very favorable information for subsequent troubleshooting of online problems.

☞ About try..catch the use

For try..catch the use of , my suggestion is: if you can not use it, try not to use it. JS code is written by oneself, where there will be problems, what problems will occur, you should have a spectrum in try..catch your heart, usually only two places are used:

// JSON format is incorrect
try{
  JSON.parse(JSONString);
}catch(e){}

// there are undecodeable characters
try{
  decodeComponentURI(string);
}catch(e){}

Errors like this are not very controllable. You can try..catch think about whether other methods can be used to make compatibility where it is used. Thanks to EtherDream for the addition .

☞ About window.onerror the use

You can try the following code:

// test.js
throw new Error("SHOW ME");
window.onerror = function(){
  console.log(arguments);
  // prevent error messages from being printed to the console
  return true;
};

The above code directly reported an error and did not continue to execute. There may be several script tags in the page, but window.onerrorthis error monitor must be placed at the top!

☞ Error alerts and reminders

When should I be alerted? Can't report anything wrong. As mentioned above, because of network environment and browser environment factors, we allow an error rate of one thousandth for complex pages. Data graph after log processing:

There are two lines in the figure, the orange line is today's data, the light blue line is the average data of the past, a record is generated every 10 minutes, the abscissa is the time axis of 0-24 points, and the ordinate is the error amount. It can be clearly seen that at around one or two in the morning, the service is abnormal, and the error message is more than ten times the average value, then the alarm is changed at this time.

The conditions for the alarm can be set to be stricter, because false alarms are a very annoying thing, and information such as text messages, emails, and software are bombarded, sometimes in the middle of the night. Then, generally meet the following conditions to call the police:

The error exceeds the threshold, for example, a maximum of 100 errors are allowed in 10 minutes, and the result exceeds 100
If the error exceeds 10 times the average value, an alarm will be issued when the average value is exceeded. This logic is obviously incorrect, but if the error exceeds 10 times the average value, it can basically be determined that there is a problem with the service.
Before incorporating into the comparison, it is necessary to filter the errors that occur with the same IP. For example, an error occurs in a for loop or a while loop. Another example is a user who is rushing to buy at a squatting point and keeps refreshing.

☞ Friendly error message

Compare the following two logs, the error log of catch:

Uncaught ReferenceError: vd is not defined

Custom error log:

"When the back-end interface information is obtained in the birthday module, the eval parsing error, the error content is: vd is not defined."
This error has occurred 1000 times in the last 10 minutes, and the average error of this error in the past is 50 times / 10 minutes