note 12 集合Set - 代码天地

note 12 集合Set

其他 2019-04-28 01:48:04 阅读次数: 0

集合Set

+无序不重复元素（键）集
+和字典类似，但是无“值”

创建

x = set()
x = {key1,key2,...}

添加和删除

x.add('body')
x.remove('body')

集合的运算符

运算符含义

```
       差集
```
& 交集
| 并集
!= 不等于
== 等于
in 成员
for key in set 枚举

+中文分词
如：我爱北京天安门。->我/爱/北京/天安门/。
算法：正向最大匹配
从左到右扫描取尽可能长的词
如：研究生命的起源->研究生/命/的/起源
“研究生”是词，且比“研究”更长

自然语言处理

处理此问题需要一个词典

正向最大匹配分词

def load_dict(filename):
    word_dict = set()
    max_len = 1
    f =  open(filename)
    for line in f:
        word = unicode(line .strip(),'utf-8')
        word_dict.add(word)
        if len(word) > max_len:
            max_len = len(word)
            
    return max_len,word_dict

def fmm_word_seg(sent,max_len,word_dict):
    begin = 0
    words = []
    sent = unicode(sent,'utf-8')
    
    while begin < len(sent):
        for end in range(begin + max_len,begin,-1):
            if sent[begin:end] in word_dict:
                words.append(sent[begin:end])
                break
        begin = end
        
    return words

max_len,word_dict = load_dict('lexicon.dic')
sent = raw_input('Input a sententce:')
words = fmm_word_seg(sent,max_len,word_dict)
for word in words:
    print word

数据结构对比

猜你喜欢

转载自www.cnblogs.com/OceanF/p/10781356.html

note 12 集合Set

UBUNTU NOTE12

C语言 NOTE12

note about set theory

python note 07 集合

12集合（2）-----Set

note12——》时间之尺

python note 12 生成器

java 对象容器集合 note

note 集合转树形结构

2.Kotlin_集合_note

12—JAVA（进阶）—集合（Collection：List、Set，Iterator，泛型）

MySQL数据库（12）：数据类型-Set集合

note

Note It

note 24-set up ServerSocket in android

note 12-android radio button and checkbox and Toast

Andrew Ng 's machine learning lecture note (12)

红米 12C earth 秒解锁跳过168小时红米note12 note12pro note12pro+系列机型解锁bl root教程步骤Fastboot

6-12 varchar和char 枚举类型enum 集合set

JavaSE第12天练习题（Set集合、HashSet、Map）

python算法与数据结构（12）使用hash表实现set集合

第12章_集合框架(Collection接口,Iterator接口,List,Set,Map,Collections工具类)

调试：'Object reference note set to an instance of an object.'

ES6-note-Set和Map（草稿）

集合set 集合set

12 集合操作

12、集合(2)

day12 集合

12#ArrayList集合

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)