numpy scipy pandas sk-learn gensim

Numpy Scipy
矩阵向量处理。
Numpy provides a high-performance multidimensional array and basic tools to compute with and manipulate these arrays. 
SciPy  builds on this, and provides a large number of functions that operate on numpy arrays and are useful for different types of scientific and engineering applications.
参考:
http://cs231n.github.io/python-numpy-tutorial/ (python基础,numpy, scipy, matplotlib均包含在内)

Scikit-learn
数据建模分析处理。
scikit-learn is a Python module for machine learning built on top of SciPy and distributed under the 3-Clause BSD license.
conda update sklearn: conda update scikit - learn
文档还是很详细的,官网主页列出了很多个机器学习的项:

在user guide中列出了所有包含的项目:
安装:
pip install -U scikit-learn (需要提前安装numpy and scipy)
这种方式在安装完后, from sklearn.ensemble import RandomForestClassifier , 可能会报ImportError: cannot import name check_arrays的错误.
解决: conda update scikit - learn

sklearn model selection中带有GridSearch的功能。

sklearn提供了TFIDF算法,可以对中文提取关键词以及向量化,下面是参考博文 : http://www.cnblogs.com/chenbjin/p/3851165.html

Pandas
数据读写相关。
powerful Python data analysis toolkit.
官方主页: http://pandas.pydata.org/

gensim
Gensim 是一个很专业的主题模型Python工具包。
Gensim  is an  open-source   vector space modeling  and  topic modeling  toolkit, implemented in the  Python  programming language. It uses  NumPy SciPy  and optionally  Cython  for performance. It is specifically intended for handling large text collections, using efficient online, incremental algorithms. Gensim is commercially supported by the startup RaRe Technologies.
Gensim includes implementations of  tf-idf random projections word2vec  and document2vec algorithms,   hierarchical Dirichlet processes  (HDP),  latent semantic analysis  (LSA) and  latent Dirichlet allocation  (LDA) , including  distributed   parallel  versions.
install: pip install gensim

猜你喜欢

转载自blog.csdn.net/zhangweijiqn/article/details/53215996