Cloudopt open source privacy protection engine based on machine learning-Baize (白泽)

Bai Ze is a lofty beast in ancient Chinese mythology and a symbol of auspiciousness. According to legend, Bai Ze can predict good and bad luck and is an auspicious beast. Bai Ze can also speak human words, communicate with all things, and understand the appearance of all things in the world.

Baize is a machine-based privacy protection engine that runs directly on the browser, which can effectively block tracking scripts and malicious advertisement scripts. It is developed with JavaScript and can be run in Node.js and browser environment.

The traditional filtering method is to use filtering rules for filtering, but these filtering rules are manually updated based on open source organizations, non-profit organizations or individuals, and require a lot of manpower to maintain. We imitate the heuristic engine of anti-virus software and propose an automatic and effective machine learning method based on integrated learning. Learn and create classifiers through the multi-dimensional features of scripts to prevent tracking scripts and malicious advertisement scripts.

We collected all network requests from the homepage of the Alexa top 100 website, with a total of  11,764  rows as training. At the same time, in order to achieve better test results, we selected some well-known domestic websites and the homepage network requests of websites not included in the training set, a total of  760  lines.

We tested the data on the test set, and Bai Ze obtained an accuracy rate of 91.8%. It can identify most malicious requests with an accuracy of 65%.

name acc auc recall
Baize 91.8% 78.3% 80.2%

In most cases, it only takes 0.1 ms to predict whether a network request is safe.

English document:

https://github.com/cloudoptlab/baize

Chinese documents:

https://github.com/cloudoptlab/baize/blob/master/README_ZH.md

Guess you like

Origin www.oschina.net/news/119495/cloudopt-opensource-baize
Recommended