Tencent’s open source distributed data science component: Fast-Causal-Inference

Tencent announced the open source distributed data science component project Fast-Causal-Inference. This is a statistical analysis and causal inference calculation library developed by Tencent WeChat that uses SQL interaction and is based on distributed vectorization. It has been applied in multiple internal WeChat businesses such as WeChat video account and WeChat search.

According to the introduction, the project aims to solve the performance bottleneck of the existing statistical model library (R/Python) under big data and provide Causal inference capabilities that can execute tens of billions of data in seconds. At the same time, the SQL language is used to reduce the threshold for using statistical models, making it easy to use in production environments.

Main advantages of the project

1. Provide Causal inference capability for massive data execution in seconds

Based on the vectorized OLAP execution engine ClickHouse/StarRocks, the speed is more conducive to the ultimate user experience

2. Minimalist way to use SQL

SQLGateway WebServer lowers the threshold for using statistical models through SQL language, and provides a minimalist way to use SQL on the upper layer, transparently doing engine-related SQL expansion and optimization

3. Provide causal inference capabilities for basic operators, high-order operators, and upper-layer application encapsulation

支持 ttest, OLS, Lasso, Tree-based model, matching, bootstrap, DML等

The first version already supports the following Features

Basic causal inference tools

  1. ttest based on deltamethod, supports CUPED
  2. OLS, billion rows of data, sub-second level

Advanced causal inference tools

  1. OLS-based IV, WLS, and other GLS, DID, synthetic control, CUPED, mediation are incubating
  2. uplift: minute-level operations on tens of millions of data
  3. Data simulation frameworks such as bootstrap/permutation solve the problem of variance estimation without displayed solutions

Guess you like

Origin www.oschina.net/news/258226/fast-causal-inference-open-source