Design and Realization of Anti-spam Management System Based on ASP

With the rapid popularization of the Internet, e-mail has gradually become one of the main media for people to exchange information due to its fast, convenient and low-cost characteristics, but the accompanying spam is also becoming more and more rampant. Spam occupies limited storage, computing and network resources, consumes a lot of processing time for users, and affects and interferes with users' normal work, life and study. How to effectively control spam is a difficult problem faced by the whole world, and it is also a problem to be solved on the Internet. This article first introduces the importance of e-mail to people's daily life, and then briefly introduces the development history of anti-spam technology. Three methods of filtering spam are studied, namely black and white list technology, subject keyword filtering technology and Bayesian strategy, and the design methods of these three technologies are explained, focusing on the design principle of Bayesian filtering technology and implementation steps. Finally, the shortcomings of these filtering techniques and the difficulties encountered in the design are summarized.

With the extreme expansion of the Internet, it has brought a lot of information to people. E-mail makes people feel fast and convenient, and has become the fastest and most economical means of communication. However, the Internet is an anarchic world. E-mails distribute mail advertisements indiscriminately, causing many people's mailboxes to pile up with garbage. Some people use e-mails to make mail bombs, which paralyzes e-mail servers; what's more, they use e-mails to spread viruses. All these have brought endless troubles to many users. Therefore, effective filtering of spam has become an important practical problem. At present, the technology of semantic analysis and text classification in the research of spam filtering in my country is still relatively backward, resulting in the failure of many large domestic email systems to detect and reject spam in a timely and effective manner, thus causing great harm to users. What's more serious is that many foreign spammers take advantage of this defect to send spam through Chinese mail servers, causing many foreign ISP service providers to block the IP addresses of Chinese mail servers, causing a lot of mental and economic losses to Chinese users . With the increasingly close relationship between the Chinese economy and the world economy, the exchange activities between China and foreign countries have gradually increased, and the number of external e-mails has also increased sharply. If Chinese e-mails are completely blocked due to the problem of spam, it will definitely cause serious damage to domestic enterprises and organizations. Huge impact, seriously hindering the development of China's economy. Therefore, research on new and reliable spam filtering techniques has become an urgent task.

1 . 2 Development overview

1.2.1 Email Overview

Email refers to the writing, sending and receiving of letters through electronic communication systems. The most used communication system today is the Internet, and email is one of the most popular functions on the Internet. Through the e-mail system, you can use a very low price and a very fast way (it can be sent to any destination you specify in the world within a few seconds), and communicate with network users in any corner of the world. At the same time, you can get a lot of free news, special mail, and realize easy information search. This is unmatched by any traditional method. It is precisely because of the ease of use, fast delivery, low fees, easy storage, and unimpeded global access that e-mail is widely used, and it has greatly changed the way people communicate. Every user who applies for an Internet account will have an email address. It is a mailbox address that is very similar to the user's house number, or more precisely, it is equivalent to renting a mailbox at the post office. Because the traditional letter is delivered to your door by the postman, and the e-mail needs to check the mailbox by yourself, but you don't have to step out of the house. Email originates from a proprietary email system. E-mail existed long before the Internet became popular, and was developed as a relatively simple method of sending text messages from one computer terminal to another in a host-multi-terminal master-slave system.

After a long process, it has now evolved into a more complex and much richer system, which can transmit multimedia information such as sound, pictures, images, documents, etc., so that more specialized documents such as databases or accounting reports are May be distributed online as an e-mail attachment.

1.2.2 Overview of Anti-Spam

 "Spam" mostly refers to unsolicited e-mail, or it can be a duplicate of the same message sent to a newsgroup or list server unrelated to the subject of the message. Technical experts and anti-spam organizations at home and abroad have exactly the same definition of "spam": emails sent in batches without the consent of the recipients. Although the amount of information in each of these e-mails is not necessarily large, the content of the e-mails is not what most users need or even makes most users hate. Overwhelming promotional emails not only infringe on users' private space, but also interfere with most users' normal use of email functions, and at the same time bring users a waste of online time and funds, so they are called "spam". The common nouns SPAM, UCE (Unsolicited Commercial Email) and UBE (Unsolicited Bulk Email) on the Internet are the same as what is commonly referred to as spam. The research on anti-spam technology is a long-term and arduous task, which has gone through the following eras:

Table 1-1 History of Anti-Spam

first generation

second generation

Third Generation

Fourth Generation

Basic MTA Control

real-time blacklist

Bayesian filtering

Multi-Technology Integration and Hierarchical Filtering

Whitelist and Blacklist

electronic signature

artificial intelligence

easy keyword search

machine language learning

letter header test

title filter

Simple DNS Test

The current anti-spam technology can be divided into four categories: filter (Filter), reverse lookup (Reverse lookup), challenge (challenges) and cryptography (cryptography). have their limitations. Filtering includes keyword filtering, black and white lists, HASH technology, rule-based filtering, intelligent and probabilistic systems, and Bayesian algorithms. Verification query technologies are divided into reverse query technology, DKIM technology, SenderID technology, FairUCE technology, and challenge analysis. For challenge-response, computational challenges.

  1. How Email Works

2 . 1 Structure of the e-mail

Emails can be thought of as semi-structured text files. RFC822 clearly divides the mail into two parts: the first part is called the mail header, which contains several data fields, which are used to identify important parts of the mail, such as sender, receiver, subject and comments. Email header fields should appear before the email body, separated by a blank line. The second part is the mail body (body), which is the content of the mail sent by the sending user to the receiving user.

2 . 2 Transmission of e-mails

Emails are similar to ordinary letters. The sender indicates the recipient's name and address (ie, the email address). The sender's server transmits the email to the recipient's server, and the recipient's server sends the email to the receiver. In the person's mailbox, as shown in the following figure:

The email system is mainly composed of the following three parts: MUA (Mail UserAgent), a mail user agent, which helps users read and write emails; MTA (MailTransport Agent), a mail transfer agent, which is responsible for transferring emails from one server to another; MDA (MailDeliveryAgent), a mail delivery agent, distributes mail to users' mailboxes. The entire mail transmission process is shown in the figure below:

 

  1. demand analysis

3 . 1 Database Requirements Analysis

This system adopts Microsoft SQL Server 2000 database, the name of the database is mail. The analysis of system functions draws the overall ER diagram of the system, as shown in Figure 3-1:

In order to eliminate the redundancy of data, the primary key is adopted in the table. According to the different functional modules of the database, the following tables are established according to the different requirements analysis, which are:

  1. The data table used to save the mail information in the mail folder, the specific design is shown in Table 3-1:

Table 3-1 mail data table

Field Name

field description

Field Type

primary key

email me

Mail ID

int

*

mailfrom

sender address

varchar

milk

receiver's address

varchar

maildate

send date

datetime

mailsubject

Email Subject

varchar

mailbody

content of email

varchar

  1. The data table used to save the added black and white list email addresses, the specific design is shown in the table:

Table 3-2 black_mailadd data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

Mailadd

Blacklist email addresses

varchar

Table 3-3 white_mailadd data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

Mailadd

Whitelist email addresses

varchar

  1. The information table used to save the subject keywords added by users, the specific design is shown in the table:

Table 3-4 key_word data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

word

filtered keywords

varchar

  1. The data table to be used for Bayesian filtering, the specific design is shown in the table:

Table 3-5 drop_word data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

word

common words without analysis

varchar

Table 3-6 bayes_field data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

value

threshold

int

Table 3-7 hash_all data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

token

independent string

varchar

good_time

Appears in legitimate emails

int

good_pro

Probability of appearing in legitimate mail

float

bad_time

Appears in spam

int

bad_pro

Probability of appearing in spam

float

Table 3-8 hash_pro data table

Field Name

field description

Field Type

primary key

ID

serial number

int

*

token

independent string

varchar

token_pro

Combined probability of spam

float

3 . 2 Development environment requirements

The basic software and hardware environment required to install this system are:

  1. Windows95、 Windows98 或WindowsNT/2000/XP。
  2. Microsoft SQL Server 2000 database
  3. Pentium100 and above IBM PC and its compatibles.
  4. More than 128M memory.
  5. More than 5000M available hard disk space.
  6. High density floppy drive.
  7. VGA monitor.
  8. CD - ROM drive.
  9. This program is under the Windows2000 Professional operating system, using Chinese Dreamweaver MX 2004 as the foreground development tool, and using the Chinese version of Microsoft SQL Server 2000 database as the background database.

  1. System function and technical description

4 . 1 System function module design

The system is divided into three major modules, and each major module has different functional divisions. The module structure diagram is shown in Figure 4-1:

 

  1. Daily Operation Module

This module is divided into two parts: receiving emails and writing emails. Users can send and receive emails through this module to complete the reception of normal emails and spam emails. This system uses data read from the local database to provide a test environment for anti-spam technology research. .

  1. mail folder

This module establishes two folders, which are Inbox and Junk Mail folder. The filtered normal mails are displayed in the Inbox folder, and the filtered junk mails are displayed in the Junk Mail folder. And you can delete and view the mail.

  1. spam filtering

此模块是本设计的核心部分,采用了黑名单、白名单、主题关键字、贝叶斯过滤技术来过滤垃圾邮件,用户可以通过过滤设置来启动和停止这些过滤规则。

4.2基本功能

通过黑名单、白名单、主题关键字、贝叶斯过滤技术完成客户端的垃圾邮件过滤,每个过滤规则在对邮件进行处理判断后,若可以确定邮件的属性,即为垃圾邮件或非垃圾邮件就可以直接把邮件显示在垃圾邮件夹和收件夹。客户端垃圾邮件过滤模型如下图所示:

4.3黑白名单技术

黑名单是一个简单有效最常用的过滤方法,它首先检查邮件头,如果发送者在黑名单内,就拒绝接收该邮件。黑名单可以是发送垃圾邮件的服务器、开放的代理、开放的中继以及发送者邮箱地址。现在有很多组织都在做*bl(block list),将那些经常发送垃圾邮件的IP地址(甚至IP地址范围)收集在一起,做成block list。

白名单过滤的方法是在邮件过滤系统中维持一张白名单表,其中收录了用户认可的邮件地址。当收到的邮件其发送者在用户的白名单中,该邮件就被判定为正常邮件。这种方法能100%的屏蔽垃圾邮件,但是同时也会过滤掉很多第一次与收件人通信的正常邮件,而这些用户不在收件人的白名单中。

目前很多邮件接收端都采用了黑白名单的方式来处理垃圾邮件,包括MUA和MTA,当然在MTA中使用得更广泛,这样可以有效地减少服务器的负担。本文中黑名单和白名单分别是已知的垃圾邮件发送者或可信任的邮件发送者的邮件地址,这种技术手段是最传统的方式,它通过黑名单技术对垃圾邮件进行屏蔽,通过白名单技术对允许的邮件进行放行。

4.4 关键字过滤技术

这种技术是根据在邮件头、邮件主题或者邮件正文中是否含有设定的关键字符来判断邮件是否为垃圾邮件,然后采取处理措施。这种技术非常简单易行,现在的邮件客户端一般都提供这种技术。根据调查显示,采用基于关键字符技术的邮件过滤器能够捕获到60%的垃圾邮件。但是这种当邮件中含有某类的关键字符时就判定邮件为垃圾邮件的技术缺点非常致命,它的误确认率特别高。例如将单词"free"设置为过滤关键字,那么所有包含有这个单词的邮件都会被过滤掉,不管这封邮件来自于你的朋友还是垃圾邮件制造者。本文中是设置要过滤的邮件标题关键字,对标题中含有这些关键字的邮件进行过滤。

4.5 贝叶斯过滤技术

4.5.1贝叶斯过滤算法的基本步骤

第一步:通过收集大量的邮件,按规则分为垃圾邮件和非垃圾邮件,建立垃圾邮件集和非垃圾邮件集,相当于两个数据库;

第二步:提取邮件主题和邮件正文中的独立字串,如商品、易趣等作为TOKEN串,并统计提取出的TOKEN串出现的次数,即字频,按照上述方法分别处理垃圾邮件集和非垃圾邮件集中的所有邮件;

第三步:每一个邮件集对应一个哈希表,hashtable_good对应非垃圾邮件集而hashtable_bad对应垃圾邮件集。表中存储TOKEN串到字频的映射关系。如下所示:

TOKEN串         出现次数

商品                N1

易趣                N2

法轮功              N3

色情                N4

第四步:计算每个哈希表中TOKEN串出现的概率 P={(某TOKEN串的字频)/(对应哈希表的长度)};

第五步:综合考虑hashtable_good和hashtable_bad,推断当新来的邮件中出现某个TOKEN串时,该邮件作为垃圾邮件的概率。存在事件S:该邮件为垃圾邮件,t1­­­ t2 …,tn代表TOKEN串,则P{S/ti}表示在邮件中出现TOKEN串ti时,该邮件为垃圾邮件的概率。

第六步:建立新的哈希表 hashtable_probability存储TOKEN串ti到P{S/ti}的映射,如下所示:

TOKEN串         垃圾邮件的概率

商品                P{S/t1}

易趣                P{S/t2}

法轮功              P{S/t3}

色情                P{S/t4}

重复此步骤直到得到出现某字串的邮件为垃圾邮件的概率,垃圾邮件集和非垃圾邮件集的学习过程就算结束了。根据建立的哈希表hashtable_probability可以估计一封新到的邮件为垃圾邮件的可能性,当新到一封邮件时,按照步骤生成新的TOKEN串,查询hashtable_probability得到该TOKEN串的键值。假设由该邮件共得到N个TOKEN串,t1t2t3…tn ,则hashtable_probability中对应的值为P1, P2, P3, …Pn, P{S/t1,t2,t3,…tn}表示在邮件同时出现多个TOKEN串t1,t2,t3,…tn时,该邮件为垃圾邮件的概率,由复合概率公式可得:

P{S/t1t2t3…tn}=(P1 * P2*…* Pn)/[P1 * P2*…* Pn + (1- P1 )*(1- P2)*…*(1- Pn)],当P{S/t1,t2,t3,…tn }超过预定阈值时,就可以判断该邮件为垃圾邮件。

4.5.2贝叶斯过滤算法举例

例如:一封含有“法轮功”字样的垃圾邮件 A和 一封含有“法律”字样的非垃圾邮件B。

根据邮件A生成hashtable_ bad,该哈希表中的记录为:

法:1次

轮:1次

功:1次

计算得在本表中:

法出现的概率为0.3

轮出现的概率为0.3

功出现的概率为0.3

根据邮件B生成hashtable_good,该哈希表中的记录为:

法:1

律:1

计算得在本表中:

法出现的概率为0.5

律出现的概率为0.5

综合考虑两个哈希表,共有四个TOKEN串: 法 轮 功 律

当邮件中出现“法”时,该邮件为垃圾邮件的概率为:

P=0.3/(0.3+0.5)=0.375

出现“轮”时:

P=0.3/(0.3+0)=1

出现“功“时:

P=0.3/(0.3+0)=1

出现“律”时

P=0/(0+0.5)=0;

由此可得第三个哈希表:hashtable_probability 其数据为:

法:0.375

轮:1

功:1

律:0

当新到一封含有“功律”的邮件时,我们可得到两个TOKEN串,功 律

查询哈希表hashtable_probability可得:

P(垃圾邮件|功)=1

P(垃圾邮件|律)=0

此时该邮件为垃圾邮件的可能性为:

P=(0*1)/[0*1+(1-0)*(1-1)]=0

由此可推出该邮件为非垃圾邮件。

4.5.3贝叶斯过滤模块划分

针对贝叶斯过滤的流程以及其所需要的功能,可以把整个过滤从功能上分为邮件预处理、贝叶斯算法实现、数据库访问、过滤主逻辑几个主要模块,系统结构如下图所示:

 

邮件预处理模块:这个模块主要负责读取邮件,对邮件进行编解码,去html的tag等;

贝叶斯算法模块:这个模块主要的功能是对邮件文本向量化,统计特征向量词出现的次数,分类器的训练、调整、更新,新邮件的过滤等;

数据库访问模块:在文本向量化,统计频率和计算概率时需要访问数据库,这个模块主要对数据库进行访问操作;

过滤主逻辑模块:这个模块负责调用其余各个模块的功能,实现垃圾邮件过滤处理的主逻辑。

  1. 系统工作流程和详细设计

5.1 系统工作流程图

系统服务工作流程图如图5-1所示:

5.2邮件统计设计

进入反垃圾邮件管理系统就可直观的显示收件夹和垃圾邮件夹中邮件的数目,并可点击进入浏览邮件,如果各种过滤策略启动,收到的邮件满足黑名单、关键字、贝叶斯过滤的条件,不满足白名单过滤的条件将会被显示在垃圾邮件夹里,正常邮件会被显示在垃圾邮件夹里,邮件统计界面如图5-2所示:

5.3收件夹设计

被过滤后的正常邮件被显示在收件夹内,显示了寄件人、日期和邮件主题,可对邮件进行删除和内容查看操作。如图5-3所示:

 

5.4反垃圾功能设计

5.4.1黑白名单过滤

此部分完成黑白名单的添加、修改、删除操作,如果收到黑名单中的地址发来的邮件就进行过滤显示在垃圾邮件夹里,如果收到白名单中的地址发来的邮件就直接显示在收件夹内,黑名单界面如下图所示:

 

黑名单过滤的代码如下:

rem 通过黑名单过滤

function black_leach(add)

sqlb = "select * from black_mailadd where mailadd= '"&add &"'"

set rsb = server.CreateObject("adodb.recordset")

rsb.open sqlb,conn,1

if rsb.eof then

black_leach="true" ' 不在黑名单里,不被过滤

else

black_leach="false"  '在黑名单里,被过滤

end if

rsb.close

set rsb=nothing

end function

5.4.2主题关键字过滤

主题关键字添加界面:

 

完成主题关键字的添加后,如果收到的邮件标题中含有要过滤的关键字就显示在垃圾邮件夹中,方便用户有选择的查看和删除,添加完成后出现以下界面,可以向数据库中添加、修改、删除主题关键字,如图所示:

 

 

主题关键字过滤核心代码如下:

rem 对标题进行分词,并查询单词中是否有被过滤的关键字:subjectleach

function sub_leach(strf)

dim strtemp

strf=Trim(strf)

strf=strf&" "

strtemp=""

for i =1 to len(strf)

if mid(strf,i,1)<>" " then

strtemp=strtemp&mid(strf,i,1)

else

sqls="select * from key_word where word= '"& strtemp &"'"

set rss = server.createobject("adodb.recordset")

rss.open sqls,conn,1,1

if not rss.eof then

sub_leach="false" '含有关键字,被过滤

exit function

else strtemp=""

end if

end if

next

sub_leach="true" '不含有关键字,不被过滤

end function

5.4.3贝叶斯过滤

此部分完成贝叶斯过滤阈值的设定、非垃圾邮件样本集和垃圾邮件样本集的学习,并生成哈希概率表,如果收到一封邮件,计算得到的垃圾邮件概率大于预先设定好的阈值,就把它显示在垃圾邮件夹中,阈值设置和非垃圾邮件样本集学习的界面如下:

 

  1. 对非垃圾邮件集进行分词,并进行词频计算的代码如下:

sqld="select * from drop_word where word= '"& strtemp &"'"

set rs = server.createobject("adodb.recordset")

rs.open sqld,conn,1,1

if rs.eof then '不在drop-word里

rs.close

sqlh="select token,good_time from hash_all where token= '"& strtemp &"'"

rs.open sqlh,conn,1,3

if rs.eof then'添加到hash表里,

conn.execute" insert into hash_all(token,good_time) values('"&strtemp&"','1')"

rs("good_time")=rs("good_time")+1 'rs(good_time)的值加1

  1. 计算非垃圾邮件集中各token串出现的概率的代码如下:

set rs = server.createobject("adodb.recordset")

sql="select good_time,good_pro from hash_all where good_time<>0"

rs.open sql,conn,1,3

r_t=rs.recordcount

do until rs.eof

i=rs("good_time")/r_t

i=int(i*1000)/1000

rs("good_pro")=i

rs.update

  1. 构造hash_pro表的代码如下:

function make_pro

set rss=server.CreateObject("adodb.recordset")

sqls="select token,good_pro,bad_pro from hash_all "

rss.open sqls,conn,1,3

do until rss.eof

str=rss("token")

i=rss("bad_pro")/(rss("bad_pro")+rss("good_pro"))

i=int(i*1000)/1000

conn.execute "insert into hash_pro values('"&str&"','"&i&"')"

rss.movenext

loop

rss.close

end function

5.4.4过滤参数设置

此模块可以启动和停止过滤策略,对四种过滤规则进行设定,界面如图5-14所示

 

  1. 测试与分析

6.1系统测试

  1. 黑白名单功能测试:在黑名单中加入要过滤的邮件地址,在白名单中加入允许放行的邮件地址,启动黑白名单功能,发送两封邮件,前一封邮件地址在黑名单中,后一封邮件地址在白名单中,发信人是前者的邮件被显示在垃圾邮件夹中,发信人是后者的邮件被显示在收件夹中,测试成功。
    1. 主题关键字过滤测试:添加要过滤的关键字,启动主题关键字过滤功能,发送一封标题中含有过滤关键字的邮件,该邮件被显示在垃圾邮件夹里,测试成功。
    2. 贝叶斯过滤测试:启动贝叶斯过滤功能,在文本1.txt中写入非垃圾邮件样本集,在文本2.txt中写入垃圾邮件样本集,对1.txt和2.txt进行学习,分析计算得到哈希概率表,发送一封含有这两个样本集字串的邮件,设定一个阈值,垃圾邮件概率超过这个阈值邮件被显示在垃圾邮件夹里,小于这个阈值邮件被显示在收件夹里,测试成功。测试中我在1.txt中写入了fa、lv两个字串,在2.txt中写入fa、lun、gong三个字串,经过学习得到下图所示的数据表:

 

 

当发送一封邮件内容是fa lun mail的邮件时,计算得到的垃圾邮件概率大于预先设定的阈值95,该邮件被显示在垃圾邮件夹中。点击邮件浏览界面中对邮件进行bayes分析的按钮可以显示bayes分析的各项指标,如下图所示:

 

6.2设计中的难点问题

1.对系统中需要使用的过滤参数进行集中的管理和配置。主要包括以下两个方面的内容:

(1)提供用户界面给用户修改相关参数,以完成个性化定制。

(2)在系统的使用过程中,考虑到用户的实际情况,应该方便的允许用户随时开启或关闭邮件过滤功能。

针对此问题我设计了一个参数设置模块来根据用户的需要开启关闭不同的过滤规则。

  1. 特征串的选取

特征串选取好坏将决定最终的过滤效果。特征串词库不是静态建立的,而是根据垃圾邮件集和非垃圾邮件集动态变化的,这样才能保证其一定的智能性和不断的学习能力。因此需要考虑多种情况,准确的提取垃圾邮件集合和非垃圾邮件集合中的特征信息,从而建立比较完善的特征串词库。对英文邮件,token串的选取不能仅简单的以单词为分解目标。而应该考虑到各种变化的情况,比如字母大小写 ,字母的异化。

6.3三种过滤技术分析

黑白名单技术占用较少的计算机资源,易于实施,但需要手动维护邮件地址清单,此方案在成熟的垃圾邮件解决方案中只起补充作用。

关键字过滤是一个简单但是有效的阻断绝大多数垃圾邮件的方法,其优点是简单易构造、易实现、可靠性高。而缺点是:必须经常对关键字进行更改和产生较多误报的情况。

贝叶斯算法在实际应用中需注意的方面:

  1. 纯粹的贝叶斯算法过滤只考虑了邮件正文的内容,而往往邮件头部的一

些信息是很重要的,因此必须把邮件头部的不同组成部分出现的TOKEN标识出来,且其权重设置为较大值。

  1. 在遇到某些特殊邮件时要根据这些网页的具体特征去判断这些邮件的合法性,例如在某个邮件中邮件正文很少,除此之外就只有一个网页的链接,那么这封邮件就很有可能是垃圾邮件。
  1. 由于正常邮件被误判断为垃圾邮件很有可能给用户带来极大的损失,因此必须采取方法降低正常邮件的误判。这里可以采取二级过滤规则的策略,即在一封邮件被判断为垃圾邮件以后还可以利用别的过滤规则对其再次进行判断,若符合某些条件则把其判为非垃圾邮件。
    1. 由于贝叶斯算法在电子邮件中的应用是由使用英语的人提出来并按照英语的语法习惯处理TOKEN串的,所以,当把他们应用到汉字处理或者其他并不和英语类似的语言时就会产生许多问题。这里也不具备对中文的识别,因为中文的分词算法比较复杂,准确率不高。

6.4通用模块分析

本系统中包括一些通用模块,这些模块以文件的形式保存,可以在其他文件中使用#include语句包含这些模块,使用其中定义的功能。比如:

conn.asp 实现到数据库的连接,代码如下:

<%

set conn=server.CreateObject("adodb.connection")

set rs=server.createobject("adodb.recordset")

set subrs=server.createobject("adodb.recordset")

str="PROVIDER=SQLOLEDB;DATA SOURCE=127.0.0.1;UID=sa;

PWD=123;DATABASE=mail"

conn.open str

%>

结    论

在设计过程中,我深刻认识到了反垃圾邮件技术研究的重要性,它对我们的工作和生活都产生着巨大的影响。我也对目前所应用到的各种反垃圾技术做了全面的了解,最后选择了三种常见的技术进行了功能设计,它们分别是黑白名单技术、主题关键字过滤技术、贝叶斯过滤技术,本文就围绕这三种技术做了详细说明并阐述了开发过程,希望通过这三种技术提高垃圾邮件的过滤率。但是由于现在很多垃圾邮件发送者也在利用各种过滤技术的漏洞发送垃圾邮件,要想做到百分之百的过滤垃圾邮件是一件不可能的事,在设计中使用的三种过滤技术也存在着各自的缺陷,还有很多难点问题,比如黑白名单地址的选取,主题关键字的选取,贝叶斯垃圾邮件集和非垃圾邮件集的搜集。当然还有很多更高效的过滤技术有待我们去研究,控制和消除垃圾邮件也不是几个人或几个组织就可以完成的,它需要全社会的共同努力,来建立一个洁净的网络空间。

 

 

Guess you like

Origin blog.csdn.net/axingxiansen/article/details/129836341