1, built environment
In the window development projects, need to use pycharm, the installation package in the ftp 165, the path / ambari / soft / pycharm
The demo test jieba address https://github.com/WanZhang1/cars_jieba , after downloading can be opened directly with pycharm
2, the installation package associated python
Online installation
pip install jieba
pip install python-docx
Offline installation
Download Package
jieba jieba-0.39.zip
python-docx python-docx-0.8.6.tar.gz
lxml lxml-2.3.4.tar.gz
installation
pip install jieba-0.39.zip
pip install python-docx-0.8.6.tar.gz
pip install lxml-2.3.4.tar.gz
3, the development of RESTful interface
Install python package
pip install flask
Offline installation package python
Download the flask packages and dependencies can be found in https://pypi.python.org/simple/
Flask-0.12.2.tar.gz
click-2.0.tar.gz
itsdangerous-0.21.tar.gz
itsdangerous-0.21.tar.gz
MarkupSafe-0.23.tar.gz
MarkupSafe-0.23.tar.gz
安装
pip install click-2.0.tar.gz
pip install itsdangerous-0.21.tar.gz
pip install MarkupSafe-0.23.tar.gz
pip install Werkzeug-0.7.2.tar.gz
pip install Jinja2-2.4.1.tar.gz
pip install Flask-0.12.2.tar.gz
4, code development
#!/usr/bin/env python
# -*- coding: utf-8 -*-
# by zhangw 2017/11/8
from flask import Flask, abort, request, jsonify
import jieba
app = Flask(__name__)
@app.route('/user_dict/', methods=['POST'])
def user_dict():
if not request.json or 'text' not in request.json:
abort(400)
text = request.json['text']
# 自定义词典
jieba.load_userdict("../../../file/user_dict.dat")
seg_list = jieba.cut(text, cut_all=False)
seg = " ".join(seg_list)
print(seg)
return jsonify({'result': seg})
if __name__ == "__main__":
# 将host设置为0.0.0.0,则外网用户也可以访问到这个服务
app.run(host="0.0.0.0", debug=True)
5, test
POST http://127.0.0.1:5000/jieba_parse/
Example:
curl -l -H "Content-type: application/json" -X POST -d '{"text":"近日,国外几名网友整理了一份自然语言处理的免费/公开数据集(包含文本数据)清单,为防止大家错过这个消息,论智暂且把清单内容搬运如下。有需要的读者可直接收藏本文,或去github点个星星以示感谢"}' http://0.0.0.0:5000/jieba_parse