学科应用

学科应用
- 理工类应用
- 人文社科类应用
  - 古腾堡项目
  - 就职演说语料库

理工类应用

简单的三角函数计算

#Filename: mathA.py
import numpy as np
import pylab as pl
x = np.linspace(-np.pi,np.pi,256)    #linspace->array
s = np.sin(x)
c = np.cos(x)
pl.title('Trigonometric Function')
pl.xlabel('X')
pl.ylabel('Y')
pl.plot(x,s)
pl.plot(x,c)

这里写图片描述

一组数据的傅立叶变换

数组：[1,1,…,1,-1,-1,…,1,1,1,…,1]

#Filename: mathB.py
import scipy as sp
import pylab as pl
listA = sp.ones(500)
listA[100:300] = -1
f = sp.fft(listA)
pl.plot(f)
pl.show()

这里写图片描述

Biopython

将生物信息学文件分析成 Python 可利用的数据结构
处理常用的在线生物信息学数据库代码
提供常用生物信息程序的界面

人文社科类应用

NLTK下载方法

#具体语料库下载，在终端python里输入：
import nltk
nltk.download()
#之后在下载界面选择相应要使用的语料库即可

古腾堡项目

计算NLTK中目前收录的古腾堡项目的书

>>>from nltk.corpus import gutenberg
>>>gutenberg.fileids()
[u'austen-emma.txt', u'austen-persuasion.txt', u'austen-sense.txt',
u'bible-kjv.txt', u'blake-poems.txt', u'bryant-stories.txt', u'burgessbusterbrown.txt',
u'carroll-alice.txt', u'chesterton-ball.txt',
u'chesterton-brown.txt', u'chesterton-thursday.txt', u'edgeworthparents.txt',
u'melville-moby_dick.txt', u'milton-paradise.txt',
u'shakespeare-caesar.txt', u'shakespeare-hamlet.txt', u'shakespearemacbeth.txt',
u'whitman-leaves.txt']

一些简单的计算

>>> from nltk.corpus import gutenberg
>>> allwords = gutenberg.words('shakespeare-hamlet.txt')
>>> len(allwords)
37360
>>> len(set(allwords))    #allwords中不重复的单词数
5447
>>> allwords.count('Hamlet')
99
>>> A = set(allwords)
>>> longwords = [w for w in A if len(w)>12]
>>> print(sorted(longwords))  #对长单词排序
['Circumstances', 'Guildensterne', 'Incontinencie', 'Recognizances', 'Vnderstanding', 'determination', 'encompassement', 'entertainment', 'imperfections', 'indifferently', 'instrumentall', 'reconcilement', 'stubbornnesse', 'transformation', 'vnderstanding']

古腾堡项目中其他函数应用

# Filename: freqG20.py
from nltk.corpus import gutenberg
from nltk.probability import *

fd2 = FreqDist([sx.lower() for sx in allwords if sx.isalpha()]) #FreqDist():创建一个所给数据的频率分布

print(fd2.B())  #.B():统计不同单词的个数
print(fd2.N())  #.N():统计所有单词的个数

fd2.tabulate(20)  #统计前20个出现次数最多的单词
fd2.plot(20)      
fd2.plot(20, cumulative = True) #参数cumulative：绘制累计图

这里写图片描述

就职演说语料库

>>> from nltk.corpus import inaugural
>>> fd3 = FreqDist([s for s in inaugural.words()])
>>> print(fd3.freq('freedom'))
0.0011939479191683535

# Filename: inaugural.py

from nltk.corpus import inaugural

from nltk.probability import ConditionalFreqDist

cfd = ConditionalFreqDist(      #条件频率统计函数
            (fileid, len(w))
            for fileid in inaugural.fileids()
            for w in inaugural.words(fileid)
            if fileid > '1950')
#统计1950年以后总统演说中各字词数的使用次数
print(list(cfd.items())[:40])
#python3中cdf.items变成了可迭代对象
cfd.plot()

Output:

[('1981-Reagan.txt', FreqDist({2: 538, 3: 525, 1: 420, 4: 390, 5: 235, 7: 192, 6: 176, 8: 109, 9: 93, 10: 66, ...})), ('2009-Obama.txt', FreqDist({3: 599, 2: 441, 4: 422, 1: 350, 5: 236, 6: 225, 7: 198, 8: 96, 9: 63, 10: 59, ...})), ('1973-Nixon.txt', FreqDist({2: 430, 3: 397, 4: 289, 1: 252, 5: 203, 6: 129, 7: 128, 8: 62, 10: 48, 9: 37, ...})), ('2001-Bush.txt', FreqDist({3: 334, 2: 317, 1: 294, 4: 234, 5: 163, 7: 152, 6: 134, 8: 70, 9: 64, 10: 32, ...})), ('1957-Eisenhower.txt', FreqDist({3: 380, 2: 378, 1: 259, 4: 259, 5: 203, 7: 157, 6: 129, 8: 63, 9: 42, 10: 25, ...})), ('1985-Reagan.txt', FreqDist({3: 570, 2: 510, 1: 430, 4: 407, 5: 294, 7: 206, 6: 204, 8: 138, 9: 71, 10: 61, ...})), ('1961-Kennedy.txt', FreqDist({3: 333, 2: 273, 4: 226, 1: 187, 5: 155, 7: 115, 6: 110, 8: 64, 9: 42, 10: 24, ...})), ('1997-Clinton.txt', FreqDist({3: 534, 2: 378, 4: 352, 1: 350, 5: 225, 6: 179, 7: 171, 8: 117, 9: 70, 10: 45, ...})), ('1969-Nixon.txt', FreqDist({2: 511, 3: 450, 4: 351, 1: 327, 5: 244, 6: 175, 7: 146, 8: 92, 9: 63, 10: 32, ...})), ('1977-Carter.txt', FreqDist({3: 281, 2: 249, 1: 184, 4: 184, 5: 139, 6: 107, 7: 74, 8: 61, 9: 42, 10: 36, ...})), ('1965-Johnson.txt', FreqDist({3: 355, 2: 301, 1: 256, 4: 255, 5: 138, 7: 133, 6: 127, 8: 68, 9: 45, 10: 30, ...})), ('1953-Eisenhower.txt', FreqDist({3: 538, 2: 514, 4: 389, 1: 348, 5: 270, 7: 210, 6: 185, 8: 131, 9: 84, 10: 60, ...})), ('1989-Bush.txt', FreqDist({3: 556, 1: 486, 2: 437, 4: 435, 5: 253, 6: 184, 7: 147, 8: 72, 9: 60, 10: 51, ...})), ('2005-Bush.txt', FreqDist({3: 469, 2: 395, 4: 332, 1: 320, 7: 234, 5: 203, 6: 162, 9: 90, 8: 79, 10: 49, ...})), ('1993-Clinton.txt', FreqDist({3: 380, 2: 307, 1: 277, 4: 274, 5: 152, 6: 137, 7: 107, 8: 86, 9: 67, 10: 39, ...}))]

这里写图片描述

Python高级数据处理与可视化（六）---学科应用

学科应用

理工类应用

简单的三角函数计算

一组数据的傅立叶变换

Biopython

人文社科类应用

古腾堡项目

就职演说语料库

猜你喜欢