Isotonic regression isotonic regression 103

Isotonic regression isotonic regression 103

Disclaimer: This article is a blogger original article, follow the  CC 4.0 BY-SA  copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/bea_tree/article/details/51009810

1. With regard to isotonic regression

First sklearn glue original contributors blog Isotonic Regression http://fa.bianp.net/blog/2013/isotonic-regression/

And explain https://en.wikipedia.org/wiki/Isotonic_regression on Wikipedia

Such a return, a return to this kind of monotone functions, after a regression model is larger than a certain x x front, that is orderly, there are specific mathematical formulas in the above two URLs.
Isotonic regression does not need to develop objective function.

One of the applications isotonic regression was used to make statistical inferences, such as the relationship between dose and toxicity, generally considered as toxic dose is diminished or incremental relationship, whereby the maximum dose can be estimated.

 

2. Introduction partial function

2.1 matplotlib.collection.linecollection
http://matplotlib.org/api/collections_api.html#matplotlib.collections.LineCollection
The main function is a straight line connecting the plurality of units
It focuses on two parameters segments and zorder
sequence or segment may be numpy.array, the present embodiment
segments = [[[i, y[i]], [i, y_[i]]] for i in range(n)]
Representative segments need to be connected to the n
 
zorder sequence is a plot of artists, their change to change different parameters can be seen, as to the meaning of artists, see the following URLs
http://old.sebug.net/paper/books/scipydoc/matplotlib_intro.html#axes
 

2.2 sklearn.utils.check_random_state

Giving a value np.random.RandomState
If its argument is an integer, then returns the integer value, and assign
If you have already set up a state, but this is not an integer, then it returns an error
If you do not own choice program parameters, casually to a state
 

2.3 sklearn.isotonic.IsotonicRegression

http://scikit-learn.org/stable/modules/generated/sklearn.isotonic.IsotonicRegression.html#sklearn.isotonic.IsotonicRegression
 
The main example used to write fit_transform
First fit to fit the data, and then the transformer
Here y_ = ir.fit_transform (x, y) is equal to y2 = ir.fit (x, y) y3 = y2.predict (x)
 

2.4  , for example . newaxis

http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#numpy.newaxis
Effect is created when a new dimension slicing operation, for example: x = np.arange (n) is the original shape (100,); x [:, np.newaxis] it would be (100,1), it is amazing things, be sure to write down Oh
 

2.5 matplotlib.pyplot.gca

gca=Get the current  Axes 
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.gca

3. Code

  1.  
    # Author: In Varoquaux <[email protected]>
  2.  
    # Alexander Gramfort <[email protected]>
  3.  
    # #copy by IKoala
  4.  
     
  5.  
    import numpy as np
  6.  
    import matplotlib.pyplot as plt
  7.  
    from matplotlib.collections import LineCollection
  8.  
     
  9.  
    from sklearn.linear_model import LinearRegression
  10.  
    from sklearn.isotonic import IsotonicRegression
  11.  
    from sklearn.utils import check_random_state
  12.  
     
  13.  
    n = 100
  14.  
    x = np.arange(n)
  15.  
    rs = check_random_state( 333)
  16.  
     
  17.  
    y = rs.randint( -50, 50, size=(n,)) + 50. * np.log(1 + np.arange(n))
  18.  
     
  19.  
    ###############################################################################
  20.  
    # Fit IsotonicRegression and LinearRegression models
  1.  
    # Respectively iR, LR fit
  2.  
     
  3.  
    ir = IsotonicRegression()
  4.  
     
  5.  
    y_ = ir.fit_transform (x, y)
  6.  
     
  7.  
    lr = LinearRegression()
  8.  
     
  9.  
    lr.fit(x[:, np.newaxis], y) # x needs to be 2d for LinearRegression
  10.  
     
  11.  
    ###############################################################################
  12.  
    # plot result
  13.  
     
  14.  
    segments = [[[i, y[i]], [i, y_[i]]] for i in range(n)]
  15.  
    lc = LineCollection(segments, zorder= 0)
  16.  
    lc.set_array (np.ones (len (the y-))) # turn it into arange (100) to see what effect oh
  17.  
    lc.set_linewidths( 0.5 * np.ones(n))
  18.  
     
  19.  
    fig = plt.figure()
  20.  
    plt.plot(x, y, 'r.', markersize=12)
  21.  
    plt.plot(x, y_, 'g.-', markersize=12)
  22.  
    plt.plot(x, lr.predict(x[:, np.newaxis]), 'b-')
  23.  
    plt.gca().add_collection(lc)
  24.  
    plt.legend(( 'Data', 'Isotonic Fit', 'Linear Fit'), loc='lower right')
  25.  
    plt.title( 'Isotonic regression')
  26.  
    plt.show()
  27.  

 

--Isotonic regression model to predict the results of calibration

Disclaimer: This article is a blogger original article, follow the  CC 4.0 BY-SA  copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/wangxiao7474/article/details/81069815

--Isotonic regression model to predict the results of calibration  

Brief Introduction:

Isotonic Regression: the method used by Zadrozny and Elkan (2002; 2001) to calibrate predictions from boosted naive bayes, SVM, and decision tree models.[1]

Zadrozny and Elkan (2002; 2001) successfully used a more general

method based on Isotonic Regression (Robertson et al.,1988) to calibrate predictions from SVMs, Naive Bayes, boosted Naive Bayes, and decision trees. This method is more general in that the only restriction is that the mapping function be isotonic (monotonically increasing).[1]

Isotonic regression (isotonic regression) is a non-parametric method (The non-parametric approach);

Assumptions of the model predictions denoted by fi, the real target denoted yi, then the basic assumptions Isotonic Regression is:

Wherein m is a function Isotonic (monotonically increasing) the.

Given data set may be solved by the following formula m:

Isotonic Regression algorithm is a pair-adjacent violators algorithm (algorithm referred PAV), the time complexity is O (N), the main idea is to merge continuously adjusted local monotonicity violation interval, so that the resulting section monotone sex. PAV regression algorithm is the algorithm library scikit-learn in isotonic. FIG dynamic effect of this algorithm can be found in [2].

One algorithm that finds a stepwise constant solution for the Isotonic Regression problem is pair-adjacent violators (PAV) algorithm (Ayer et al., 1955) presented in Table 1.

Isotonic regression popularization explanation:

  1.  
    Description of the problem: Given a random number sequence, this does not alter the position of each element, but can modify the value of each element, to obtain a modified sequence of non-decreasing, asking how to minimize the error (there is squared difference)?
  2.  
    Isotonic regression method: as viewed from the back of the first element of the sequence, once the wheel stopping scrambled phenomenon was observed, from which one by shuffling the absorbent element start a sequence of elements, the average value of the sequence until all the elements is less than or equal to an element to be absorbed.
  3.  
    For example:
  4.  
    Original sequence: < 9, 10, 14>
  5.  
    Results sequence: < 9, 10, 14>
  6.  
    Analysis: as viewed from the back 9, 14 to the last element of disorder was not observed situation, without processing.
  7.  
    原始序列: <9, 14, 10>
  8.  
    结果序列: <9, 12, 12>

应用流程:

对于CTR,特征选择的时候,可能会选择很多细粒度的特征,那么直接通过clicks/impressions计算出的点击率会非常不准确。

文献[4]中提出基于下式,提出求解t()的一种近似方法。

The methods by Wang et al. [5] and Meyer [6] find a non-decreasing

mapping function t() that minimizes:

其中,ci表示真实label, pi表示模型输出的预测概率。M是一个表示平滑程度的参数,a和b分别表示输入的预测值的范围,用于平衡拟合程度(goodness-of-fit,第一项)和转换函数t()的平滑度(smoothness,第二项)。

另外,为了维持该模型的识别能力,必须保证该模型是单调递增的。

算法实现流程如下:Algorithm 1: Smooth Isotonic Regression

通过Isotonic regression 得到单调且非参数化的函数f(),同时这个函数要使有最小值。

在经过Isotonic Regression函数映射后的数据中,选择s个典型的点,其预测值和对应的label分别记作集合

对步骤2中采样的点采用Piecewise Cubic Hermite Interpolating Polynomial (PCHIP)方法进行插值,得到平滑后的单调曲线,并将该曲线作为最终进行校准的映射函数。

理论上讲,该方法比Isotonic regression 更加平滑,比sigmoid regression 更加灵活。

适用情况:

Isotonic Regression is a more powerful calibration method that can correct any monotonic distortion. Unfortunately, this extra power comes at a price. A learning curve analysis shows that Isotonic Regression is more prone to overfitting, and thus performs worse than Platt Scaling, when data is scarce.[1]

Isotonic regression 对模型的输出特征没有要求;

适用于样本量多的情形,样本量少时,使用isotonic regression容易过拟合;

Isotonic Regression通常作为辅助其他方法修复因为数据稀疏性导致的矫正结果不平滑问题;[7]

Microsoft在文献[3]中的CTR预估模型的校准上用到Isotonic Regression。

 

参考文献:

[1] Alexandru Niculescu-Mizil, et al. Predicting Good Probabilities With Supervised Learning. ICML2005.

[2] https://en.wikipedia.org/wiki/Isotonic_regression

[3] Thore graepel, et al. Web-Scale Bayesian Click-Through Rate Prediction for Sponsored Search Advertising in Microsoft’s Bing Search Engine. ICML2010.

[4] Jiang X, Osl M, Kim J, Ohno-Machado L. Smooth Isotonic Regression: A New Method to Calibrate Predictive Models. AMIA Summits on Translational Science Proceedings. 2011;2011:16-20.

[5] X. Wang and F. Li. Isotonic smoothing spline regression. J Comput Graph Stat, 17(1):21–37, 2008.

[6] M. C. Meyer. Inference using shape-restricted regression splines. Annals of Applied Statistics, 2(3):1013–1033, 2008.

[7] <预测模型结果校准>https://sensirly.github.io/prediction-model-calibration/

 

 

保序回归:一种可以使资源利用率最大化的算法

1.数学定义

保序回归是回归算法的一种,基本思想是:给定一个有限的实数集合,训练一个模型来最小化下列方程:

并且满足下列约束条件:

2.算法过程说明

从该序列的首元素往后观察,一旦出现乱序现象停止该轮观察,从该乱序元素开始逐个吸收元素组成一个序列,直到该序列所有元素的平均值小于或等于下一个待吸收的元素。

举例:

原始序列:<9, 10, 14>

结果序列:<9, 10, 14>

分析:从9往后观察,到最后的元素14都未发现乱序情况,不用处理。

原始序列:<9, 14, 10>

结果序列:<9, 12, 12>

分析:从9往后观察,观察到14时发生乱序(14>10),停止该轮观察转入吸收元素处理,吸收元素10后子序列为<14, 10>,取该序列所有元素的平均值得12,故用序列<12, 12>替代<14, 10>。吸收10后已经到了最后的元素,处理操作完成。

原始序列:<14, 9, 10, 15>

结果序列:<11, 11, 11, 15>

分析:从14往后观察,观察到9时发生乱序(14>9),停止该轮观察转入吸收元素处理,吸收元素9后子序列为<14,9>。求该序列所有元素的平均值得12.5,由于12.5大于下个待吸收的元素10,所以再吸收10,得序列<14, 9, 10>。求该序列所有元素的平均值得11,由于11小于下个待吸收的元素15,所以停止吸收操作,用序列<11, 11, 11>替代<14, 9, 10>。

3.举例说明下面实验的原理

以某种药物的使用量为例子:

假设药物使用量为数组X=0,1,2,3,4….99,病人对药物的反应量为Y=y1,y2,y3…..y99 ,而由于个体的原因,Y不是一个单调函数(即:存在波动),如果我们按照药物反应排序,对应的X就会成为乱序,失去了研究的意义。而我们的研究的目的是为了观察随着药物使用量的递增,病人的平均反应状况。在这种情况下,使用保序回归,即不改变X的排列顺序,又求的Y的平均值状况。如下图所示:

从图中可以看出,最长的绿线x的取值约是30到60,在这个区间内,Y的平均值一样,那么从经济及病人抗药性等因素考虑,使用药量为30个单位是最理想的。

当前IT行业虚拟化比较流行,使用这种方式,找到合适的判断参数,就可以使用此算法使资源得到最大程度的合理利用。

4.实验代码

 
     
  1. import numpy as np 
  2. import matplotlib.pyplot as plt 
  3. from matplotlib.collections import LineCollection 
  4. from sklearn.isotonic import IsotonicRegression 
  5. from sklearn.utils import check_random_state 
  6.  
  7. n = 100 
  8. ##产生一个0-99的列表 
  9. x = np.arange(n) 
  10. ##实例化一个np.random.RandomState的实例,作用是每次取的随机值相同 
  11. rs = check_random_state(0) 
  12. ##randint(-50, 50):产生-50到50之间的整数 
  13. ##np.log  求以e为低的对数 
  14. y = rs.randint(-50, 50, size=(n,)) + 50. * np.log(1 + np.arange(n)) 
  15.  
  16. ##设置保序回归函数 
  17. ir = IsotonicRegression() 
  18. ##训练数据 
  19. y_ = ir.fit_transform(x, y) 
  20.  
  21. ##绘图 
  22. segments = [[[i, y[i]], [i, y_[i]]] for i in range(n)] 
  23. ##plt.gca().add_collection(lc),这两步就是画点与平均直线的连线 
  24. lc = LineCollection(segments) 
  25.  
  26. fig = plt.figure() 
  27. plt.plot(x, y, 'r.', markersize=12) 
  28. plt.plot(x, y_, 'g.-', markersize=12) 
  29. plt.gca().add_collection(lc) 
  30. plt.legend(('Data', 'Isotonic Fit'), loc='lower right') 
  31. plt.title('Isotonic regression') 
  32. plt.show() 
 

Guess you like

Origin www.cnblogs.com/think90/p/11764012.html