some understanding of《Improved Use of Continuous Attributes in C4.5》 - 代码天地

some understanding of《Improved Use of Continuous Attributes in C4.5》

其他 2018-11-02 12:02:49 阅读次数: 0

版权声明：本文为博主原创文章，可以随便转载 https://blog.csdn.net/appleyuchi/article/details/83154696

Here are formulas provided in
“Improved Use of Continuous Attributes in C4.5”
1996,Journal of Artificial Intelligence Research 4 (1996)77-90

$Info(D)=-\sum_{j=1}^{C}p(D,j)·log_2(p(D,j))$

$Gain(D,T)=Info(D)-\sum_{i=1}^{k}\frac{|D_i|}{|D|}·Info(D_i)$

$Split(D,T)=-\sum_{i=1}^{k}\frac{|D_i|}{|D|}·log_2(\frac{|D_i|}{|D|})$

The followding are my understandings:
------------------first change-----------------------------
then,
$Gain\_Ratio=\frac{Gain(D,T)}{Split(D,T)}$

Then ,my understanding of the "first change"is
$Gain\_Ratio\_adjusted=\frac{Gain(D,T)-\frac{log_2(N-1)}{D}}{Split(D,T)}$
is this right?
Many Thanks~
--------------------second change---------------------------
Relevant part of “second change” in this article is:
"This seems to be an unnecessary complication,so the threshold t is chosen instead to maximize gain.Once the threshold is chosen,however,the final selection of the attribute to be used for the test is still made on the basis of the gain ratio criterion using the adjusted gain
"
My understanding is:

1st step:
choose threshold t according to $Gain(D,T)_{max}$ ,
Not $Gain\_Ratio_{max}$
Not $(Gain(D,T)-log_2(N-1)/|D|)_{max}$
2nd step:
the criterion to choose best feature is according to:
$Gain\_Ratio(discrete\ feature)=\frac{Gain(D,T)}{Split(D,T)}$
$Gain\_Ratio\_adjusted(continuous\ feature)=\frac{Gain(D,T)-\frac{log_2(N-1)}{D}}{Split(D,T)}$
Finally,just choose the feature whose Gain Ratio or Gain Ratio(adjusted) is the largest.

is this understanding right?
Many thanks~

猜你喜欢

转载自blog.csdn.net/appleyuchi/article/details/83154696

some understanding of《Improved Use of Continuous Attributes in C4.5》

Publish and use a Language Understanding app

Some Good Tools Of PM--continuous updating

Understanding Feature Engineering (Part 1) — Continuous Numeric Data

Use Azure Speech and Language Understanding Services

Attributes

some understanding of《Inferring Decision Trees Using the Minimum Description Length Principle*》

C4.5算法

C4.5

Some input files use unchecked or unsafe operations

C4.5算法详解

c4.5算法解读

CART与C4.5的区别

Understanding-and-Using-C-pointers

Understanding Predicate Delegates in C#

Understanding the concept of inheritance in C++

vue——解决“You may use special comments to disable some warnings。

Vue：You may use special comments to disable some warnings.

解决You may use special comments to disable some warnings.

后台用C#中的Attributes

Wordpress 去掉评论框下方的 You may use these HTML tags and attributes

Note that HTML attributes are case-insensitive and camelCased props need to use their kebab-case equ

[C++] Some hints

Some small problems in C

You may use special comments to disable some warnings. Use // eslint-disable-next-line to ignore the

【weka】决策树C4.5

决策树 C4.5

数据挖掘入门算法C4.5

随即森林C4.5思想

Pessimistic Error Pruning example of C4.5

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

周排行

让自己的头脑极度开放

CentOS 6.5(x64) 和Redhat6.5操作系误删libc

高可用注册中心

【日记】12.28/【题解】AtCoder AGC041

XML（5）_XML 约束_DTD

Java集合Map（四）

树梅派安装桌面环境教程

pipenv 的使用和安装

小程序白屏问题和内存研究

C语言简单选择排序

每日归档

更多

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)