Python implements probability density matching method

Python implementation series:
Python implements Bayesian optimization algorithm
Python implements t-SNE dimensionality reduction
Python implements 12 dimensionality reduction algorithms
Python implements 11 feature selection strategies
Python implements 10 clustering algorithms
Python implements 8 similarity measures
python implements inverse distance Weight interpolation (IDW)
Python implements 12 probability distributions
python implements fast Fourier transform
python implements hidden Markov model

1 The role of probability density matching method

Systematic and random errors can exist in any measurement :

  • Random errors are normally distributed and can be basically eliminated in climate averaging processing.
  • The main difference between the two is that the systematic error average is not zero .

System errors are divided into independent errors and dependent errors :

  • Independent error means that the error does not change with area, time or the size of the observation value. It can be eliminated by subtracting the same constant from the entire meteorological field . The correction method is relatively simple .
  • However , the dependent error will change with the observation quantity , making correction more difficult.

At present, the probability density function matching method (PDF method) has superiority in correcting non-independent system errors, so this article is ready to try it.

2 PDF method revision ideas
  • For each revised grid point, select an appropriate spatio-temporal window based on the spatio-temporal resolution and error characteristics of the data , and collect matching ground observation and grid estimation data. When the sample size is large enough, stable cumulative probability density distributions of the two are obtained. .
  • When the same cumulative probability density value corresponds to different ground observations and estimated precipitation, the precipitation is corrected based on the deviation between the two . For example, when the cumulative probability density of precipitation is 30%, the satellite-retrieved precipitation is 4.5 mm, and the ground-observed precipitation is 5 mm, then the error of the satellite precipitation at this time is -0.5 mm.
  • A ten-fold cross-validation method can be used to check the correction effect.
3 PDF method revision plan

Select a space-time window for each grid point and collect matching ground observation and satellite precipitation grid data. For example, select a total of 30 days forward from the current date.30 d is used as the time window, 10° × 10° 10°×10°centered on the target grid point10°×The 10° spatial range serves as the spatial window.

② Incorporate into statistics the grid values ​​that have observation sites in the above-mentioned time and space window, and both ground observation and satellite precipitation data are non-deficient. The number of samples selected should be at least 300. If the requirements are not met, the spatial scope of the search should be appropriately expanded. .

③ Calculate the satellite precipitation amount ( R s R_sRs) of the cumulative probability density ( fs f_sfs), and the surface precipitation corresponding to this probability density ( R g R_gRg), get the corrected value (Δr), then the corrected satellite precipitation ( R c R_cRc) is R c = R s − Δ r R_c=R_s-ΔrRc=RsΔr

4 python implementation

1) Build a data set

import scipy.stats as st
import numpy as np
import matplotlib.pyplot  as plt
%matplotlib inline

grid = np.random.randint(0,1000, size=10000)
grid = grid.reshape(100,100)
ix = np.random.randint(0,100,size=300)
iy = np.random.randint(0,100, size=300)
station = np.random.randint(100,800,size=300)
grid_station = grid[iy,ix]

Insert image description here

2) Calculate probability density

# 累计概率密度
cdf_station = st.norm.cdf(station,loc=np.mean(station),scale=np.std(station)) # 返回每个数据点累计概率
cdf_grid = st.norm.cdf(grid,loc=np.mean(grid),scale=np.std(grid))

Insert image description here

3) Probability density matching

cdf_grid_station = cdf_grid[iy,ix]
delta = np.zeros(cdf_grid_station.shape)
for i, prob in enumerate(cdf_grid_station):
    for j , prob_true in enumerate(cdf_station):
        if abs(prob-prob_true)<=0.01:
            delta[i]=grid_station[i]-station[j]
            break

4) Check the effect
Insert image description here

Blue: error before correction, orange: error after correction. Judging from the picture, it has a certain effect.

5) Grid point deviation correction

Just interpolate the deviation directly .

Follow the public account I Don’t Love Machine Learning and reply to the pdf in the background to get the complete code link.

Reference: http://html.rhhz.net/yyqxxb/html/20130504.htm

Guess you like

Origin blog.csdn.net/mengjizhiyou/article/details/127678406