Linux系统:自然语言处理(NLP)环境搭建【智能文本分类系统安装部署】

一、安装环境

1、安装Anconda科学计算环境

Anconda科学计算环境, 它包括python3, pip,pandas, numpy等科学计算包。

下载Anaconda3-5.2.0-Linux-x86_64.sh
curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh

[root@ainlp ~]# curl -O https://repo.anaconda.com/archive/Anaconda3-5.2.0-Linux-x86_64.sh
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  621M  100  621M    0     0  1785k      0  0:05:56  0:05:56 --:--:-- 2241k
[root@ainlp ~]# 

安装Anaconda3-5.2.0-Linux-x86_64.sh
sh Anaconda3-5.2.0-Linux-x86_64.sh

[root@ainlp ~]# sh Anaconda3-5.2.0-Linux-x86_64.sh

配置~/.bashrc,

[root@centos608 ~]# ll -la

添加一行:

export PATH=/root/anaconda/bin/:$PATH

修改后的 .bashrc文件

# .bashrc

# User specific aliases and functions

alias rm='rm -i'
alias cp='cp -i'
alias mv='mv -i'

# Source global definitions
if [ -f /etc/bashrc ]; then
        . /etc/bashrc
fi

# added by Anaconda3 installer
export PATH="/root/anaconda3/bin:$PATH"

2、安装必备组件supervisor, nginx

yum install supervisor -y
yum install nginx -y

3、安装pip

wget https://bootstrap.pypa.io/2.7/get-pip.py
python get-pip.py

wget https://bootstrap.pypa.io/2.7/get-pip.py
python get-pip.py

4、安装编译工具

yum install -y gcc* pcre-devel openssl-devel

...
Dependency Updated:
  e2fsprogs.x86_64 0:1.42.9-19.el7        e2fsprogs-libs.x86_64 0:1.42.9-19.el7    glibc.x86_64 0:2.17-323.el7_9    glibc-common.x86_64 0:2.17-323.el7_9    krb5-libs.x86_64 0:1.15.1-50.el7    krb5-workstation.x86_64 0:1.15.1-50.el7   
  libcom_err.x86_64 0:1.42.9-19.el7       libgcc.x86_64 0:4.8.5-44.el7             libgomp.x86_64 0:4.8.5-44.el7    libkadm5.x86_64 0:1.15.1-50.el7         libselinux.x86_64 0:2.5-15.el7      libselinux-python.x86_64 0:2.5-15.el7     
  libselinux-utils.x86_64 0:2.5-15.el7    libsepol.x86_64 0:2.5-10.el7             libss.x86_64 0:1.42.9-19.el7     libstdc++.x86_64 0:4.8.5-44.el7         openssl.x86_64 1:1.0.2k-21.el7_9    openssl-libs.x86_64 1:1.0.2k-21.el7_9     
  zlib.x86_64 0:1.2.7-19.el7_9           

Complete!
[root@ainlp django-uwsgi]# 

5、安装python依赖

yum install -y python-devel

[root@ainlp django-uwsgi]# yum install -y python-devel
Loaded plugins: fastestmirror, langpacks
Loading mirror speeds from cached hostfile
 * base: mirrors.aliyun.com
 * epel: mirror.sjtu.edu.cn
 * extras: mirrors.aliyun.com
 * updates: mirrors.aliyun.com
Package python-devel-2.7.5-90.el7.x86_64 already installed and latest version
Nothing to do
[root@ainlp django-uwsgi]# 

6、安装项目需要的python工具包,uwsgi,tensorflow,keras,django等,我们使用requirements.txt一同安装。

cd /data/django-uwsgi/
pip install -r requirements.txt

其中requirements.txt包括:

## The following requirements were added by pip freeze:
neo4j-driver
pandas>=0.20.3
numpy>=1.13.1
jieba>=0.39
Django>=1.11.7
djangorestframework>=3.7.3
django-filter>=1.1.0
flower>=0.9.2
uwsgi>=2.0.15
requests>=2.18.4
django-cors-headers
tensorflow==1.14.0
keras==2.2.4
celery>=3.1.25 

查看是否已经安装Django和安装的版本。如果这行命令输出了一个版本号,证明你已经安装了Django且展示当前安装的版本;如果你得到的是一个“No module named django”的错误提示,则表明你还未安

$ python -m django --version

创建一个自己的django项目。打开命令行,cd到一个你想放置你代码的目录如cd /User/tester/myonesite,然后运行该命令。这行代码将会在当前目录下创建一个名为mysite项目目录

$ django-admin startproject mysite

验证项目是否创建成果,首先切换到你的项目目录cd /User/tester/myonesite/mysite执行

$ python manage.py runserver

看到如下结果表示创建项目成功

Performing system checks...

System check identified no issues (0 silenced).

You have unapplied migrations; your app may not work properly until they are applied.

Run 'python manage.py migrate' to apply them.

八月 08, 2018 - 15:50:53

Django version 2.0, using settings 'mysite.settings'

Starting development server at http://127.0.0.1:8000/

Quit the server with CONTROL-C.

默认情况下,runserver命令会将服务器设置为监听本机内部 IP 的 8000 端口,如果想更改为其他8080端口请使用命令

$ python manage.py runserver 8082

创建应用,在 Django 中,每一个应用都是一个 Python 包,并且遵循着相同的约定。Django 自带一个工具,可以帮你生成应用的基础目录结构,这样你就能专心写代码,而不是创建目录了。使用命令在mysite下创建一个应用login

$ python manage.py startapp login

为模型的改变生成迁移文件

$ python manage.py makemigrations 

应用数据库迁移

$ python manage.py migrate 

创建django后台管理员账号,输入想创建的管理员name,输入想要使用的邮箱、密码和二次确认密码。

python manage.py createsuperuser

提示:Superuser created successfully.表示创建成功

修改管理员密码

$ manage.py changepassword admin

7、安装图数据库neo4j

对于CentOS系统安装Neo4j,需要手动安装Yum源

cd /tmp
wget http://debian.neo4j.org/neotechnology.gpg.key
sudo rpm --import neotechnology.gpg.key
  • 其中cd /tmp 为导航到系统tmp目录下;
  • 然后使用wget命令将安装配置文件neotechnology.gpg.key下载到当前目录;、
  • 再使用sudo rpm --import neotechnology.gpg.key命令将安装配置文件导入到系统中。

接下来,文本编辑器创建一个/etc/yum.repos.d/neo4j.repo内容:

[neo4j] 
name=Neo4j RPM Repository
baseurl=http://yum.neo4j.org/stable
enabled=1
gpgcheck=1

最后,我们就可以使用yum命令安装neo4j。

yum install neo4j-3.3.5

至此在CentOS系统下Neo4j已安装完毕。下面是安装后Neo4j的文件路径:

  1. Neo4j安装目录为:/usr/share/neo4j
  2. Neo4j的属性文件所在目录为: /etc/neo4j
  3. Neo4j默认的数据库文件保存目录为: /var/lib/neo4j

我们导航到/usr/share/neo4j/bin 运行目录下,运行:neo4j start命令就可以启动neo4j数据库了。

将自己的配置文件拷贝到

# 使用自己的配置文件
cp /data/django-uwsgi/util/neo4j.conf /etc/neo4j/neo4j.conf

neo4j.conf配置文件为:

#*****************************************************************
# Neo4j configuration
#
# For more details and a complete list of settings, please see
# https://neo4j.com/docs/operations-manual/current/reference/configuration-settings/
#*****************************************************************

# The name of the database to mount
#dbms.active_database=graph.db

# Paths of directories in the installation.
dbms.directories.data=/var/neo4j/db
dbms.directories.plugins=/var/lib/neo4j/plugins
dbms.directories.certificates=/var/lib/neo4j/certificates
dbms.directories.logs=/var/log/neo4j/
dbms.directories.lib=/usr/share/neo4j/lib
dbms.directories.run=/var/run/neo4j

# This setting constrains all `LOAD CSV` import files to be under the `import` directory. Remove or comment it out to
# allow files to be loaded from anywhere in the filesystem; this introduces possible security problems. See the
# `LOAD CSV` section of the manual for details.
dbms.directories.import=/var/neo4j/import

# Whether requests to Neo4j are authenticated.
# To disable authentication, uncomment this line
#dbms.security.auth_enabled=false

# Enable this to be able to upgrade a store from an older version.
#dbms.allow_upgrade=true

# Java Heap Size: by default the Java heap size is dynamically
# calculated based on available system resources.
# Uncomment these lines to set specific initial and maximum
# heap size.
dbms.memory.heap.initial_size=512m
#dbms.memory.heap.max_size=10g

# The amount of memory to use for mapping the store files, in bytes (or
# kilobytes with the 'k' suffix, megabytes with 'm' and gigabytes with 'g').
# If Neo4j is running on a dedicated server, then it is generally recommended
# to leave about 2-4 gigabytes for the operating system, give the JVM enough
# heap to hold all your transaction state and query context, and then leave the
# rest for the page cache.
# The default page cache memory assumes the machine is dedicated to running
# Neo4j, and is heuristically set to 50% of RAM minus the max Java heap size.
#dbms.memory.pagecache.size=10g

#*****************************************************************
# Network connector configuration
#*****************************************************************

# With default configuration Neo4j only accepts local connections.
# To accept non-local connections, uncomment this line:
dbms.connectors.default_listen_address=0.0.0.0

# You can also choose a specific network interface, and configure a non-default
# port for each connector, by setting their individual listen_address.

# The address at which this server can be reached by its clients. This may be the server's IP address or DNS name, or
# it may be the address of a reverse proxy which sits in front of the server. This setting may be overridden for
# individual connectors below.
dbms.connectors.default_advertised_address=0.0.0.0

# You can also choose a specific advertised hostname or IP address, and
# configure an advertised port for each connector, by setting their
# individual advertised_address.

# Bolt connector
dbms.connector.bolt.enabled=true
dbms.connector.bolt.tls_level=OPTIONAL
dbms.connector.bolt.listen_address=0.0.0.0:7687

# HTTP Connector. There must be exactly one HTTP connector.
dbms.connector.http.enabled=true
dbms.connector.http.listen_address=0.0.0.0:7474

# HTTPS Connector. There can be zero or one HTTPS connectors.
#dbms.connector.https.enabled=true
#dbms.connector.https.listen_address=:7473

# Number of Neo4j worker threads.
#dbms.threads.worker_count=

#*****************************************************************
# SSL system configuration
#*****************************************************************

# Names of the SSL policies to be used for the respective components.

# The legacy policy is a special policy which is not defined in
# the policy configuration section, but rather derives from
# dbms.directories.certificates and associated files
# (by default: neo4j.key and neo4j.cert). Its use will be deprecated.

# The policies to be used for connectors.
#
# N.B: Note that a connector must be configured to support/require
#      SSL/TLS for the policy to actually be utilized.
#
# see: dbms.connector.*.tls_level

#bolt.ssl_policy=legacy
#https.ssl_policy=legacy

#*****************************************************************
# SSL policy configuration
#*****************************************************************

# Each policy is configured under a separate namespace, e.g.
#    dbms.ssl.policy.<policyname>.*
#
# The example settings below are for a new policy named 'default'.

# The base directory for cryptographic objects. Each policy will by
# default look for its associated objects (keys, certificates, ...)
# under the base directory.
#
# Every such setting can be overriden using a full path to
# the respective object, but every policy will by default look
# for cryptographic objects in its base location.
#
# Mandatory setting

#dbms.ssl.policy.default.base_directory=certificates/default

# Allows the generation of a fresh private key and a self-signed
# certificate if none are found in the expected locations. It is
# recommended to turn this off again after keys have been generated.
#
# Keys should in general be generated and distributed offline
# by a trusted certificate authority (CA) and not by utilizing
# this mode.

#dbms.ssl.policy.default.allow_key_generation=false

# Enabling this makes it so that this policy ignores the contents
# of the trusted_dir and simply resorts to trusting everything.
#
# Use of this mode is discouraged. It would offer encryption but no security.

#dbms.ssl.policy.default.trust_all=false

# The private key for the default SSL policy. By default a file
# named private.key is expected under the base directory of the policy.
# It is mandatory that a key can be found or generated.

#dbms.ssl.policy.default.private_key=

# The private key for the default SSL policy. By default a file
# named public.crt is expected under the base directory of the policy.
# It is mandatory that a certificate can be found or generated.

#dbms.ssl.policy.default.public_certificate=

# The certificates of trusted parties. By default a directory named
# 'trusted' is expected under the base directory of the policy. It is
# mandatory to create the directory so that it exists, because it cannot
# be auto-created (for security purposes).
#
# To enforce client authentication client_auth must be set to 'require'!

#dbms.ssl.policy.default.trusted_dir=

# Client authentication setting. Values: none, optional, require
# The default is to require client authentication.
#
# Servers are always authenticated unless explicitly overridden
# using the trust_all setting. In a mutual authentication setup this
# should be kept at the default of require and trusted certificates
# must be installed in the trusted_dir.

#dbms.ssl.policy.default.client_auth=require

# A comma-separated list of allowed TLS versions.
# By default only TLSv1.2 is allowed.

#dbms.ssl.policy.default.tls_versions=

# A comma-separated list of allowed ciphers.
# The default ciphers are the defaults of the JVM platform.

#dbms.ssl.policy.default.ciphers=

#*****************************************************************
# Logging configuration
#*****************************************************************

# To enable HTTP logging, uncomment this line
#dbms.logs.http.enabled=true

# Number of HTTP logs to keep.
#dbms.logs.http.rotation.keep_number=5

# Size of each HTTP log that is kept.
#dbms.logs.http.rotation.size=20m

# To enable GC Logging, uncomment this line
#dbms.logs.gc.enabled=true

# GC Logging Options
# see http://docs.oracle.com/cd/E19957-01/819-0084-10/pt_tuningjava.html#wp57013 for more information.
#dbms.logs.gc.options=-XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintPromotionFailure -XX:+PrintTenuringDistribution

# Number of GC logs to keep.
#dbms.logs.gc.rotation.keep_number=5

# Size of each GC log that is kept.
#dbms.logs.gc.rotation.size=20m

# Size threshold for rotation of the debug log. If set to zero then no rotation will occur. Accepts a binary suffix "k",
# "m" or "g".
#dbms.logs.debug.rotation.size=20m

# Maximum number of history files for the internal log.
#dbms.logs.debug.rotation.keep_number=7

#*****************************************************************
# Miscellaneous configuration
#*****************************************************************

# Enable this to specify a parser other than the default one.
#cypher.default_language_version=3.0

# Determines if Cypher will allow using file URLs when loading data using
# `LOAD CSV`. Setting this value to `false` will cause Neo4j to fail `LOAD CSV`
# clauses that load data from the file system.
#dbms.security.allow_csv_import_from_file_urls=true


# Value of the Access-Control-Allow-Origin header sent over any HTTP or HTTPS
# connector. This defaults to '*', which allows broadest compatibility. Note
# that any URI provided here limits HTTP/HTTPS access to that URI only.
#dbms.security.http_access_control_allow_origin=*

# Value of the HTTP Strict-Transport-Security (HSTS) response header. This header
# tells browsers that a webpage should only be accessed using HTTPS instead of HTTP.
# It is attached to every HTTPS response. Setting is not set by default so
# 'Strict-Transport-Security' header is not sent. Value is expected to contain
# dirictives like 'max-age', 'includeSubDomains' and 'preload'.
#dbms.security.http_strict_transport_security=

# Retention policy for transaction logs needed to perform recovery and backups.
dbms.tx_log.rotation.retention_policy=1 days

# Enable a remote shell server which Neo4j Shell clients can log in to.
#dbms.shell.enabled=true
# The network interface IP the shell will listen on (use 0.0.0.0 for all interfaces).
#dbms.shell.host=127.0.0.1
# The port the shell will listen on, default is 1337.
#dbms.shell.port=1337

# Only allow read operations from this Neo4j instance. This mode still requires
# write access to the directory for lock purposes.
#dbms.read_only=false

# Comma separated list of JAX-RS packages containing JAX-RS resources, one
# package name for each mountpoint. The listed package names will be loaded
# under the mountpoints specified. Uncomment this line to mount the
# org.neo4j.examples.server.unmanaged.HelloWorldResource.java from
# neo4j-server-examples under /examples/unmanaged, resulting in a final URL of
# http://localhost:7474/examples/unmanaged/helloworld/{nodeId}
#dbms.unmanaged_extension_classes=org.neo4j.examples.server.unmanaged=/examples/unmanaged

#********************************************************************
# JVM Parameters
#********************************************************************

# G1GC generally strikes a good balance between throughput and tail
# latency, without too much tuning.
dbms.jvm.additional=-XX:+UseG1GC

# Have common exceptions keep producing stack traces, so they can be
# debugged regardless of how often logs are rotated.
dbms.jvm.additional=-XX:-OmitStackTraceInFastThrow

# Make sure that `initmemory` is not only allocated, but committed to
# the process, before starting the database. This reduces memory
# fragmentation, increasing the effectiveness of transparent huge
# pages. It also reduces the possibility of seeing performance drop
# due to heap-growing GC events, where a decrease in available page
# cache leads to an increase in mean IO response time.
# Try reducing the heap memory, if this flag degrades performance.
dbms.jvm.additional=-XX:+AlwaysPreTouch

# Trust that non-static final fields are really final.
# This allows more optimizations and improves overall performance.
# NOTE: Disable this if you use embedded mode, or have extensions or dependencies that may use reflection or
# serialization to change the value of final fields!
dbms.jvm.additional=-XX:+UnlockExperimentalVMOptions
dbms.jvm.additional=-XX:+TrustFinalNonStaticFields

# Disable explicit garbage collection, which is occasionally invoked by the JDK itself.
dbms.jvm.additional=-XX:+DisableExplicitGC

# Remote JMX monitoring, uncomment and adjust the following lines as needed. Absolute paths to jmx.access and
# jmx.password files are required.
# Also make sure to update the jmx.access and jmx.password files with appropriate permission roles and passwords,
# the shipped configuration contains only a read only role called 'monitor' with password 'Neo4j'.
# For more details, see: http://download.oracle.com/javase/8/docs/technotes/guides/management/agent.html
# On Unix based systems the jmx.password file needs to be owned by the user that will run the server,
# and have permissions set to 0600.
# For details on setting these file permissions on Windows see:
#     http://docs.oracle.com/javase/8/docs/technotes/guides/management/security-windows.html
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.port=3637
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.authenticate=true
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.ssl=false
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.password.file=/absolute/path/to/conf/jmx.password
#dbms.jvm.additional=-Dcom.sun.management.jmxremote.access.file=/absolute/path/to/conf/jmx.access

# Some systems cannot discover host name automatically, and need this line configured:
#dbms.jvm.additional=-Djava.rmi.server.hostname=$THE_NEO4J_SERVER_HOSTNAME

# Expand Diffie Hellman (DH) key size from default 1024 to 2048 for DH-RSA cipher suites used in server TLS handshakes.
# This is to protect the server from any potential passive eavesdropping.
dbms.jvm.additional=-Djdk.tls.ephemeralDHKeySize=2048

# This mitigates a DDoS vector.
dbms.jvm.additional=-Djdk.tls.rejectClientInitiatedRenegotiation=true

#********************************************************************
# Wrapper Windows NT/2000/XP Service Properties
#********************************************************************
# WARNING - Do not modify any of these properties when an application
#  using this configuration file has been installed as a service.
#  Please uninstall the service before modifying this section.  The
#  service can then be reinstalled.

# Name of the service
dbms.windows_service_name=neo4j

#********************************************************************
# Other Neo4j system properties
#********************************************************************
dbms.jvm.additional=-Dunsupported.dbms.udc.source=rpm

启动图数据库并查看状态
neo4j start
neo4j status

[root@ainlp ~]# cd /usr/share/neo4j/bin
[root@ainlp bin]# neo4j start
Active database: graph.db
Directories in use:
  home:         /var/lib/neo4j
  config:       /etc/neo4j
  logs:         /var/log/neo4j/
  plugins:      /var/lib/neo4j/plugins
  import:       /var/neo4j/import
  data:         /var/neo4j/db
  certificates: /var/lib/neo4j/certificates
  run:          /var/run/neo4j
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
Started neo4j (pid 2701). It is available at http://0.0.0.0:7474/
There may be a short delay until the server is ready.
See /var/log/neo4j//neo4j.log for current status.
[root@ainlp ~]# neo4j status
Neo4j is running at pid 2701
[root@ainlp ~]# 

二、后端服务搭建

1、启动图数据库并查看数据库状态

cd /data/django-uwsgi

启动图数据库
neo4j start

查看状态
neo4j status

[root@ainlp ~]# cd /data/django-uwsgi
[root@ainlp django-uwsgi]# neo4j start
Active database: graph.db
Directories in use:
  home:         /var/lib/neo4j
  config:       /etc/neo4j
  logs:         /var/log/neo4j/
  plugins:      /var/lib/neo4j/plugins
  import:       /var/neo4j/import
  data:         /var/neo4j/db
  certificates: /var/lib/neo4j/certificates
  run:          /var/run/neo4j
Starting Neo4j.
WARNING: Max 1024 open files allowed, minimum of 40000 recommended. See the Neo4j manual.
/usr/share/neo4j/bin/neo4j: line 410: /var/run/neo4j/neo4j.pid: No such file or directory
[root@ainlp django-uwsgi]# neo4j status
Neo4j is not running
[root@ainlp django-uwsgi]# 

2、使用supervisor启动主服务,并查看服务状态

  • 使用supervisord启动主服务,-c是读取自定义配置文件的意思
  • supervisord.conf是在工程主目录下的配置文件,里面包含了监控和守护django以及nginx进程的内容
cd /data/django-uwsgi
[root@ainlp django-uwsgi]# supervisord -c supervisord.conf
[root@ainlp django-uwsgi]# 

查看所有监控和守护进程的状态

[root@ainlp django-uwsgi]# supervisorctl status all
main_server                      BACKOFF   Exited too quickly (process log may have details)
nginx                            RUNNING   pid 3096, uptime 0:00:06
[root@ainlp django-uwsgi]# 



参考资料
Centos7 安装uwsgi失败

猜你喜欢

转载自blog.csdn.net/u013250861/article/details/114296963
今日推荐