[.Net] 用正则拆解中文地址

摘要:[.Net] 用正则拆解中文地址

前言：

因工作需要将客户传进来的地址作拆解分类，看了前人写的code是使用indexOf + Substring 来拆解一整串的地址。

针对这种土法炼钢的方式觉得虽然是个解法，但不是好的解法。而且针对中国台湾的地址，也不一定能完全满足substring时index的规则。

所以想用正则来使code干净简洁，还在学习使用中，有错请多包容。

需求：

给部分或完整地址皆能拆解分类至细项。

实践：

1.建个ADDRESS类，构造函数传入原始地址。

2.相关属性：City 为县市，Region为乡镇市区，Village为村里，Neighbor为邻，Road为路，Section为段，Lane为巷，Alley为弄，No为号，Seq为序号，Floor为楼层，other为其他。


public class ADDRESS
    {
        public ADDRESS(string address)
        {
            this.OrginalAddress = address;
            this.ParseByRegex(address);
        }

        public static string GetDTName = "SELECT";

        /// 

        /// 县市
        /// 

        public string City { get; set; }

        /// 

        /// 乡镇市区
        /// 

        public string Region { get; set; }

        /// 

        /// 村里
        /// 

        public string Village { get; set; }

        /// 

        /// 邻
        /// 

        public string Neighbor { get; set; }

        /// 

        /// 路
        /// 

        public string Road { get; set; }
        
        /// 

        /// 段
        /// 

        public string Section { get; set; }

        /// 

        /// 巷
        /// 

        public string Lane { get; set; }
        
        /// 

        /// 弄
        /// 

        public string Alley { get; set; }
        
        /// 

        /// 号
        /// 

        public string No { get; set; }
        
        /// 

        /// 序号
        /// 

        public string Seq { get; set; }
        
        /// 

        /// 楼
        /// 

        public string Floor { get; set; }

        public string Others { get; set; }

        /// 

        /// 是否符合pattern规范
        /// 

        public bool IsParseSuccessed { get; set; }

        /// 

        /// 原始传入的地址
        /// 

        public string OrginalAddress { get; private set; }

        private void ParseByRegex(string address)
        {
            var pattern = @"(?D+?[县市])(?D+?(市区|镇区|镇市|[乡镇市区]))?(?D+?[村里])?(?d+[邻])?(?D+?(村路|[路街道段]))?(?
D?段)?(?d+巷)?(?d+弄)?(?d+号?)?(?-d+?(号))?(?d+楼)?(?.+)?";
            
            Match match = Regex.Match(address, pattern);
            
            if (match.Success)
            {
                this.IsParseSuccessed = true;
                this.City = match.Groups["city"].ToString();
                this.Region = match.Groups["region"].ToString();
                this.Village = match.Groups["village"].ToString();
                this.Neighbor = match.Groups["neighbor"].ToString();
                this.Road = match.Groups["road"].ToString();
                this.Section = match.Groups["section"].ToString();
                this.Lane = match.Groups["lane"].ToString();
                this.Alley = match.Groups["alley"].ToString();
                this.No = match.Groups["no"].ToString();
                this.Seq = match.Groups["seq"].ToString();                
                this.Floor = match.Groups["floor"].ToString();
                this.Others = match.Groups["others"].ToString();
            }
            
        }
}

总结：

扫描二维码关注公众号，回复： 7275876 查看本文章

因拆解分类较细且一开始没考虑到路名在有段没有路（澎湖县马公市锁港里锁管港段）而产生错误，花了一段时间去研究如何拆解。

因此除了一些较特殊的地址如钓鱼台南海岛南沙群岛等，其它大部分都可解析出来。

附上线上正则网页和地址如下：

http://www.rubular.com/

中国台北市信义区市府路1号
中国台北市大安区忠孝东路四段101巷45号
中国台北市信义区吴兴街220巷11弄39号
北市三重路66-2号14楼
中国台北市南港区合成里1邻中坡北路30巷19-2号3楼之3
北市八德路四段145巷6弄12-2号5楼
苗栗县苗栗市嘉盛里25邻为公路584号
南投县仁爱乡大同村定远新村路18-1号
连江县南竿乡介寿村3邻48-2号
澎湖县马公市锁港里锁管港段1439号
台中市顺天路250号

以上地址如有雷同存属巧合

本文章参考网址：http://www.dotblogs.com.tw/hatelove/archive/2012/06/05/parse-taiwan-address-with-regex.aspx

原文:大专栏 [.Net] 用正则拆解中文地址

[.Net] 用正则拆解中文地址

猜你喜欢