In the previous chapters, you learned about the many different machine learning concepts and algorithms that can help us with better and more efficient decision making.
However, machine learning techniques are not limited to offline applications and analyses, and they have become the predictive engine of various web services. For example, popular and useful applications of machine learning models in web applications include spam detection in submission forms, search engines, recommendation systems for media or shopping portals门户网站, and many more.

In this chapter, you will learn how to embed a machine learning model into a web application that can not only classify, but also learn from data in real time.
The topics that we will cover are as follows:

• Saving the current state of a trained machine learning model
• Using SQLite databases for data storage
• Developing a web application using the popular Flask web framework
• Deploying a machine learning application to a public web server

Serializing fitted scikit-learn estimators

Training a machine learning model can be computationally expensive, as you saw in Chapter 8, Applying Machine Learning to Sentiment Analysis https://blog.csdn.net/Linli522362242/article/details/110155280. Surely, we don't want to retrain our model every time we close our Python interpreter and want to make
a new prediction or reload our web application?

One option for model persistence is Python's in-built pickle module (https://docs.python.org/3.7/library/pickle.html), which allows us to serialize and deserialize Python object structures to compact bytecode so that we can save our classifier in its current state and reload it if we want to classify new, unlabeled examples, without needing the model to learn from the training data all over again. Before you execute the following code, please make sure that you have trained the out-of-core logistic regression model from the last section of Chapter 8 and have it ready in your current Python session:

import numpy as np
import re
from nltk.corpus import stopwords
 
import nltk
 
nltk.download('stopwords')
 
# stop-words probably bear no (or only a little) useful information that 
# can be used to distinguish between different classes of documents
stop = stopwords.words('english')
 
def tokenizer(text):
    text = re.sub('<[^>]*>', '', text) # remove html markup : <a>, <br />
    emoticons = re.findall( '(?::|;|=)(?:-)?(?:\)|\(|D|P)',
                           text.lower())
    text = re.sub( '[\W]+', ' ', # [\w]+ == [^A-Za-z0-9_]* : '(' or ')' 
                   text.lower() 
                 ) + ' '.join(emoticons).replace('-', '') # put the emoticons at the end
    return text
    
def stream_docs(path):
    with open(path, 'r', encoding='utf-8') as csv:
        next(csv) #skip header
        for line in csv:
            text, label = line[:-3], int(line[-2]) # line[-1] : EOF
            yield text, label

next(stream_docs(path='movie_data.csv'))

def get_minibatch(doc_stream, size):
    docs, y= [], []
    try:
        for _ in range(size):
            text, label = next(doc_stream)
            docs.append(text)
            y.append(label)
    except StopIteration:
        return None, None
    return docs, y

from sklearn.feature_extraction.text import HashingVectorizer
from sklearn.linear_model import SGDClassifier
 
vect = HashingVectorizer( decode_error='ignore', 
                          n_features=2**21, 
                          preprocessor=None, 
                          tokenizer=tokenizer )

clf = SGDClassifier(loss='log', random_state=1, max_iter=1)
doc_stream = stream_docs(path='movie_data.csv')

import pyprind
pbar = pyprind.ProgBar(45)
 
classes = np.array([0,1])
for _ in range(45):
    X_train, y_train = get_minibatch( doc_stream, size=1000 )
    if X_train is None:
        break
    X_train = vect.transform(X_train)
    clf.partial_fit(X_train, y_train, classes=classes)
    pbar.update()

X_test, y_test = get_minibatch(doc_stream, size=5000)
X_test = vect.transform(X_test)
print('Accuracy: %.3f' % clf.score(X_test, y_test))

partial_fit(X, y, classes=None, sample_weight=None) : Perform one epoch of stochastic gradient descent on given samples.

Internally, this method uses max_iter = 1. Therefore, it is not guaranteed that a minimum of the cost function is reached after calling it once. Matters such as objective convergence and early stopping should be handled by the user.

classes : ndarray of shape (n_classes,), default=None

Classes across all calls to partial_fit. Can be obtained by via np.unique(y_all), where y_all is the target vector of the entire dataset. This argument is required for the first call to partial_fit and can be omitted in the subsequent calls. Note that y doesn’t need to contain all labels in classes.

clf = clf.partial_fit(X_test, y_test)

####################################

Note

The pickling-section may be a bit tricky so that I included simpler test scripts in this directory (pickle-test-scripts/) to check if your environment is set up correctly. Basically, it is just a trimmed-down version of the relevant sections from Ch08 https://blog.csdn.net/Linli522362242/article/details/110155280, including a very small movie_data subset.

Executing

python pickle-dump-test.py

will train a small classification model from the movie_data_small.csv and create the 2 pickle files

stopwords.pkl
classifier.pkl

Next, if you execute

python pickle-load-test.py

You should see the following 2 lines as output:

Prediction: positive
Probability: 85.71%

####################################

After we trained the logistic regression model as shown above, we now save the classifier along worth the stop words, Porter Stemmer, and HashingVectorizer as serialized objects to our local disk so that we can use the fitted classifier in our web application later.

import pickle
import os

dest = os.path.join('movieclassifier', 'pkl_objects')
if not os.path.exists(dest):
    os.makedirs(dest)

pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None)

Write the pickled representation of the object obj to the open file object file. This is equivalent to Pickler(file, protocol).dump(obj).

Arguments file, protocol, fix_imports and buffer_callback have the same meaning as in the Pickler constructor.

class pickle.Pickler(file, protocol=None, *, fix_imports=True, buffer_callback=None)

This takes a binary file for writing a pickle data stream.

The optional protocol argument, an integer, tells the pickler to use the given protocol; supported protocols are 0 to HIGHEST_PROTOCOL. If not specified, the default is DEFAULT_PROTOCOL. If a negative number is specified, HIGHEST_PROTOCOL is selected.

There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.

Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python.
Protocol version 1 is an old binary format which is also compatible with earlier versions of Python.
Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of new-style classes. Refer to PEP 307 for information about improvements brought by protocol 2.
Protocol version 3 was added in Python 3.0. It has explicit support for bytes objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7.
Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to PEP 3154 for information about improvements brought by protocol 4.
Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to PEP 574 for information about improvements brought by protocol 5.

# stop = stopwords.words('english')
pickle.dump( stop, 
             open(os.path.join(dest, 'stopwords.pkl'), 'wb'), 
             protocol=4 )
pickle.dump( clf,
             open(os.path.join(dest, 'classifier.pkl'), 'wb'),
             protocol=4 )

Using the preceding code, we created a movieclassifier directory where we will later store the files and data for our web application. Within this movieclassifier directory, we created a pkl_objects subdirectory to save the serialized Python objects to our local hard drive or solid-state drive. Via the dump method of the pickle module, we then serialized the trained logistic regression model as well as the stop-word set from the Natural Language Toolkit (NLTK) library, so that we don't have to install the NLTK vocabulary on our server.

The dump method takes as its first argument the object that we want to pickle. For the second argument, we provided an open file object that the Python object will be written to. Via the wb argument inside the open function, we opened the file in binary mode for pickle, and we set protocol=4 to choose the latest and most efficient pickle protocol that was added to Python 3.4, which is compatible with Python 3.4 or newer. If you have problems using protocol=4, please check whether you are using the latest Python 3 version—Python 3.7 is recommended for this book. Alternatively, you may consider choosing a lower protocol number.

Also note that if you are using a custom web server, you have to ensure that the Python installation on that server is compatible with this protocol version as well.

Serializing NumPy arrays with joblib

Our logistic regression model contains several NumPy arrays, such as the weight vector, and a more efficient way to serialize NumPy arrays is to use the alternative joblib library. To ensure compatibility with the server environment that we will use in later sections, we will use the standard pickle approach. If you are interested, you can find more information about joblib at https://joblib.readthedocs.io.

We don't need to pickle HashingVectorizer, since it does not need to be fitted. Instead, we can create a new Python script file from which we can import the vectorizer into our current Python session. Now, copy the following code and save it as vectorizer.py in the movieclassifier directory:

python中的os.path.dirname(__file__)的使用 : 功能：去掉文件名，返回目录

(1).当"print os.path.dirname(__file__)"所在脚本是以完整路径被运行的，那么将输出该脚本所在的完整路径，比如：

python d:/pythonSrc/test/test.py

那么将输出 d:/pythonSrc/test

(2).当"print os.path.dirname(__file__)"所在脚本是以相对路径被运行的，那么将输出空目录，比如：

python test.py

那么将输出空字符串

%%writefile movieclassifier/vectorizer.py

from sklearn.feature_extraction.text import HashingVectorizer
import re
import os
import pickle

cur_dir = os.path.dirname(__file__)             # folder name
stop = pickle.load( open( os.path.join(cur_dir, 'pkl_objects', 'stopwords.pkl'),
                          'rb')
                  )
def tokenizer(text):
    text = re.sub('<[^>]*>', '', text) # remove html markup : <a>, <br />
    emoticons = re.findall( '(?::|;|=)(?:-)?(?:\)|\(|D|P)',
                           text.lower())
    text = re.sub( '[\W]+', ' ', # [\w]+ == [^A-Za-z0-9_]* : '(' or ')' 
                   text.lower() 
                 ) + ' '.join(emoticons).replace('-', '') # put the emoticons at the end
    tokenized = [w for w in text.split() if w not in stop]
    return tokenized

vect = HashingVectorizer( decode_error='ignore',
                          n_features=2**21,
                          preprocessor=None,
                          tokenizer=tokenizer)

__file__ :

After we have pickled the Python objects and created the vectorizer.py file, it would be a good idea to restart our Python interpreter or Jupyter Notebook kernel to test whether we can deserialize the objects without error.

Pickle can be a security risk

Please note that unpickling data from an untrusted source can be a potential security risk, since the pickle module is not secured against malicious code. Since pickle was designed to serialize arbitrary objects, the unpickling process will execute code that has been stored in a pickle file. Thus, if you receive pickle files from an untrusted source (for example, by downloading them from the internet), please proceed with extra care and unpickle the items in a virtual environment and/or on a non-essential machine that does not store important data that no one except you should have access to.

From your terminal, navigate to the movieclassifier directory, start a new Python session, and execute the following code to verify that you can import the vectorizer and unpickle the classifier:
First, change the current Python directory to movieclassifer:

import os

os.chdir('movieclassifier')
os.getcwd()

import pickle
import re
import os
from vectorizer import vect

clf = pickle.load( open(os.path.join(os.getcwd(),'pkl_objects', 'classifier.pkl'), 'rb') )

import numpy as np

label = {0:'negative', 1:'positive'}
example =["I love this movie. It's amazing."]

X = vect.transform(example)
print( 'Prediction: %s\nProbability: %.2f%%' % ( label[ clf.predict(X)[0] ], 
                                                 np.max(clf.predict_proba(X))*100
                                               )
     )

Since our classifier returns the class label predictions as integers, we defined a simple Python dictionary to map these integers to their sentiment ("positive" or "negative"). While this is a simple application with two classes only, it should be noted that this dictionary-mapping approach also generalizes to multiclass settings. Furthermore, this mapping dictionary should also be archived alongside the model.

Continuing with the discussion of the previous code example, we then used HashingVectorizer to transform the simple example document into a word vector, X. Finally, we used the predict method of the logistic regression classifier to predict the class label, as well as the predict_proba method to return the corresponding probability of our prediction. Note that the predict_proba method call returns an array with a probability value for each unique class label. Since the class label with the largest probability corresponds to the class label that is returned by the predict call, we used the np.max function to return the probability of the predicted class.

Setting up an SQLite database for data storage

In this section, we will set up a simple SQLite database to collect optional feedback about the predictions from users of the web application. We can use this feedback to update our classification model. SQLite is an open source SQL database engine that doesn't require a separate server to operate, which makes it ideal for smaller projects and simple web applications. Essentially, an SQLite database can be understood as a single, self-contained database file that allows us to directly access storage files.

Furthermore, SQLite doesn't require any system-specific configuration and is supported by all common operating systems. It has gained a reputation for being very reliable and is used by popular companies such as Google, Mozilla, Adobe, Apple, Microsoft, and many more. If you want to learn more about SQLite, visit the official website at http://www.sqlite.org.

Fortunately, following Python's batteries included philosophy, there is already an API in the Python standard library, sqlite3, which allows us to work with SQLite databases. (For more information about sqlite3, please visit https://docs.python.org/3.7/library/sqlite3.html.)

By executing the following code, we will create a new SQLite database inside the movieclassifier directory and store two example movie reviews:

import sqlite3 gives our Python program access to the sqlite3 module. The sqlite3.connect() function returns a Connection object that we will use to interact with the SQLite database held in the file reviews.sqlite. The reviews.db file is created automatically by sqlite3.connect() if reviews.db does not already exist on our computer.

# Step 1 — Creating a Connection to a SQLite Database

import sqlite3
import os

conn = sqlite3.connect('reviews.db') #create the new database file reviews.db
print(conn.total_changes) #==>0

conn.total_changes is the total number of database rows that have been changed by connection. Since we have not executed any SQL commands yet, 0 total_changes is correct.

# Step 2 — Adding Data to the SQLite Database

c = conn.cursor()

connection.cursor() returns a Cursor object. Cursor objects allow us to send SQL statements to a SQLite database using cursor.execute().

c.execute('DROP TABLE IF EXISTS tbl_reviews')
c.execute( 'CREATE TABLE tbl_reviews'\
            '(review TEXT,sentiment INTEGER, date TEXT)'
         )

The "CREATE TABLE reviews_db ..." string is a SQL statement that creates a table named reviews_db with the three columns described earlier: review of type TEXT, sentiment of type INTEGER, and date of type TEXT.

Now that we have created a table, we can insert rows of data into it:

example1 = 'I love this movie'
c.execute("INSERT INTO tbl_reviews (review, sentiment, date) VALUES(?,?, DATETIME('now'))", 
          (example1, 1)
         )

example2 = 'I disliked this movie'
c.execute("INSERT INTO tbl_reviews (review, sentiment, date) VALUES(?,?, DATETIME('now'))",
          (example2, 0)
         )

conn.commit()
conn.close()

Following the preceding code example, we created a connection (conn) to an SQLite database file by calling the connect method of the sqlite3 library, which created the new database file reviews.db in the movieclassifier directory if it didn't already exist.

Next, we created a cursor via the cursor method, which allows us to traverse over the database records using the versatile SQL syntax. Via the first execute call, we then created a new database table, tbl_reviews. We used this to store and access database entries. Along with tbl_reviews, we also created three columns in this database table: review, sentiment, and date. We used these to store two example movie reviews and respective class labels (sentiments).

Using the DATETIME('now') SQL command, we also added date and timestamps to our entries. In addition to the timestamps, we used the question mark symbols (?) to pass the movie review texts (example1 and example2) and the corresponding class labels (1 and 0) as positional arguments to the execute method, as members of a tuple. Lastly, we called the commit method to save the changes that we made to the database and closed the connection via the close method.

To check if the entries have been stored in the database table correctly, we will now reopen the connection to the database and use the SQL SELECT command to fetch all
rows in the database table that have been committed between the beginning of the year 2017 and today:

conn = sqlite3.connect('reviews.db')
c = conn.cursor()
c.execute("SELECT * "\
          "FROM tbl_reviews "\
          "WHERE date BETWEEN '2017-01-01 00:00:00' AND DATETIME('now')" 
         )
results= c.fetchall()
c.close()

print(results)

Alternatively, we could also use the free DB browser for SQLite app (available at https://sqlitebrowser.org/dl/), which offers a nice graphical user interface for working with SQLite databases, as shown in the following figure:

Developing a web application with Flask

Having prepared the code for classifying movie reviews in the previous subsection, let's discuss the basics of the Flask web framework to develop our web application. Since Armin Ronacher's initial release of Flask in 2010, the framework has gained huge popularity, and examples of popular applications that use Flask include LinkedIn and Pinterest. Since Flask is written in Python, it provides us Python programmers with a convenient interface for embedding existing Python code, such as our movie classifier.

The Flask microframework

Flask is also known as a microframework, which means that its core is kept lean[liːn]瘦的,精干的 and simple but it can be easily extended with other libraries. Although the learning curve of the lightweight Flask API is not nearly as steep as those of other popular Python web frameworks, such as Django, you are encouraged to take a look at the official Flask documentation at https://flask.palletsprojects.com/en/1.0.x/ to learn more about its functionality.

If the Flask library is not already installed in your current Python environment, you can simply install it via conda or pip from your terminal (at the time of writing, the latest stable release was version 1.0.2): https://blog.csdn.net/Linli522362242/article/details/108037567

Our first Flask web application

In this subsection, we will develop a very simple web application to become more familiar with the Flask API before we implement our movie classifier. This first application that we are going to build consists of a simple web page with a form field that lets us enter a name. After submitting the name to the web application, it will render it on a new page. While this is a very simple example of a web application, it helps with building an understanding of how to store and pass variables and values between the different parts of our code within the Flask framework.

First, we create a directory tree:
<== and

The app.py file will contain the main code that will be executed by the Python interpreter to run the Flask web application. The templates directory is the directory in which Flask will look for static HTML files for rendering in the web browser. Let's now take a look at the contents of app.py:

# initialized a new Flask instance with the argument __name__ to let Flask know that it can find the HTML template folder (templates) in the same directory where it is located
app = Flask(__name__) # Flask constructor takes the name of current module (__name__) as argument.

# app.route(rule, options)

The rule parameter represents ==> URL binding with the function.
The options is a list of parameters to be forwarded to the underlying Rule object

# In the follow example, ‘/’ URL is bound with index() function. Hence, when the web page of web server is opened in browser, the output of this function will be rendered.
# the route decorator (@app.route('/')) to specify the URL( here is '/' ) that should trigger the execution of the index function.
@app.route('/')
def index():
return render_template('first_app.html')

render_template
Definition : render_template(template_name_or_list: Union[Text, Iterable[Text]], **context: Any) -> Text
Renders a template from the template folder with the given context.

param template_name_or_list
the name of the template to be rendered, or an iterable with template names the first one existing will be rendered

param context
the variables that should be available in the context of the template.

Here, our index function simply rendered the first_app.html HTML file, which is located in the templates folder.

if __name__ =='__main__':
app.run()

we used the run function to run the application on the server only when this script was directly executed by the Python interpreter, which we ensured using the if statement with __name__ == '__main__'. https://www.tutorialspoint.com/flask/flask_application.htm OR https://www.w3cschool.cn/flask/flask_application.html

app.py

# -*- coding: utf-8 -*-
"""
Created on Fri Dec 11 15:18:59 2020

@author: LlQ
"""

from flask import Flask, render_template

# initialized a new Flask instance with the argument __name__ to let Flask know
# that it can find the HTML template folder (templates) in the same directory 
# where it is located
# Flask constructor takes the name of current module (__name__) as argument.
app = Flask(__name__)


# The route() function of the Flask class is a decorator, which tells the 
# application which URL should call the associated function

# ‘/’ URL is bound with index() function. Hence, when the web page of web server 
# is opened in browser, the output of this function will be rendered.
# OR the URL('/') that should trigger the execution of the index function. 
@app.route('/')
def index():
    return render_template('first_app.html')


if __name__ =='__main__':
    app.run()

Now, let's take a look at the contents of the first_app.html file:

the doctype for HTML5: <!DOCTYPE html>

HTML5 documents should always be opened in standards mode because they are based on the latest specifi cations for the HTML language.

<!doctype html>
<html>
    <head>
        <title>First app</title>
    </head>
    <body>
        <div>Hi, this is my first Flask web app!</div>
    </body>
</html>

HTML basics
If you are not familiar with the HTML syntax yet, visit https://developer.mozilla.org/en-US/docs/Web/HTML for useful tutorials on learning the basics of HTML.

Here, we have simply filled an empty HTML template file with a <div> element (a block-level element) that contains this sentence: Hi, this is my first Flask web app!.

Conveniently, Flask allows us to run our applications locally, which is useful for developing and testing web applications before we deploy them on a public web
server. Now, let's start our web application by executing the command from the terminal inside the 1st_flask_app_1 directory:

OR run app.py in spider
==>browser
==>OR in spider

########################################################
app.py

# -*- coding: utf-8 -*-
"""
Created on Fri Dec 11 15:18:59 2020

@author: LlQ
"""

from flask import Flask, render_template

# initialized a new Flask instance with the argument __name__ to let Flask know
# that it can find the HTML template folder (templates) in the same directory 
# where it is located
# Flask constructor takes the name of current module (__name__) as argument.
app = Flask(__name__)


# The route() function of the Flask class is a decorator, which tells the 
# application which URL should call the associated function

# ‘/’ URL is bound with index() function. Hence, when the web page of web server 
# is opened in browser, the output of this function will be rendered.
# OR the URL('/') that should trigger the execution of the index function. 
@app.route('/')
def index():
    return render_template('first_app.html')


if __name__ =='__main__':
    # app.run()
    server = pywsgi.WSGIServer(('127.0.0.1', 5000), app)
    server.serve_forever()

run it in spyder

ModuleNotFoundError: No module named 'gevent' # you have to install gevent

run it in cmd.exe

==>browser

########################################################

Form validation and rendering

In this subsection, we will extend our simple Flask web application with HTML form elements to learn how to collect data from a user using the WTForms library
(https://wtforms.readthedocs.org/en/latest/), which can be installed via conda or pip: https://blog.csdn.net/Linli522362242/article/details/108037567

conda install wtforms
# or pip install wtforms

This web application will prompt a user to type in his or her name into a text field, as shown in the following screenshot:

After the submission button (Say Hello) has been clicked and the form has been validated, a new HTML page will be rendered to display the user's name:

Setting up the directory structure

The new directory structure that we need to set up for this application looks like this:

The following are the contents of our modified app.py file:

For example, a form containing a text field can be designed as below https://www.tutorialspoint.com/flask/flask_wtf.htm
from wtforms import Form, TextAreaField, validators

class HelloForm(Form):
sayhello = TextAreaField('', [validators.DataRequired()])

In addition to the ‘sayhello’ field, a hidden field for CSRF token is created automatically. This is to prevent Cross Site Request Forgery attack这是为了防止跨站点请求伪造攻击.

When rendered, this will result into an equivalent HTML script as shown below.

<input id = "csrf_token" name = "csrf_token" type = "hidden" />

<label for = "sayhello"></label><br>
<input id = "sayhello" name = "sayhello" type = "text" value = "" required="required" />

Using wtforms, we extended the index function with a text field that we will embed in our start page using the TextAreaField class, which automatically checks whether a user has provided valid input text or not.

TextAreaField:
Definition : TextAreaField(label=None, validators=None, filters=tuple(), description='', id=None, default=None, widget=None, render_kw=None, _form=None, _name=None, _prefix='', _translations=None, _meta=None)

This field represents an HTML <textarea> and can be used to take multi-line input.

DataRequired
Definition : DataRequired(message=None)

Checks the field's data is 'truthy'真实的 otherwise stops the validation chain.

This validator checks that the data attribute on the field is a 'true' value (effectively, it does if field.data.) Furthermore, if the data is a string type, a string containing only whitespace characters is considered false.

If the data is empty, also removes prior errors (such as processing errors) from the field.

NOTE this validator used to be called Required but the way it behaved (requiring coerced data, not input data) meant it functioned in a way which was not symmetric to the Optional validator and furthermore caused confusion with certain fields which coerced data to 'falsey'虚假的 values like 0, Decimal(0), time(0) etc. Unless a very specific reason exists, we recommend using the InputRequired instead.

param message
Error message to raise in case of a validation error.

Request.Form：获取以POST方式提交的数据（接收Form提交来的数据）；在客户端的表单发送中，一定要注明post方法。步骤是：<form method=post>。
Request.Form(element)[(index)|.Count] : namh=Request.Form("name")
Form 集合通过使用 POST 方法的表格检索邮送到 HTTP 请求正文中的表格元素的值
Form 集合按请求正文中参数的名称来索引。Request.Form(element) 的值是请求正文中所有 element 值的数组。通过调用 Request.Form(element).Count 来确定参数中值的个数。如果参数未关联多个值，则计数为 1。如果找不到参数，计数为 0
要引用有多个值的表格元素中的单个值，必须指定 index 值。index 参数可以是从 1 到 Request.Form(element).Count 中的任意数字。如果引用多个表格参数中的一个，而未指定 index 值，返回的数据将是以逗号分隔的字符串
在使用 Request.Form 参数时，Web 服务器将分析 HTTP 请求正文并返回指定的数据。如果应用程序需要未分析的表格数据，可以通过调用不带参数的 Request.Form 访问该数据。

Request.QueryString：获取地址栏URL参数(以GET方式提交的数据)

Request：包含以上两种方式(优先获取GET方式提交的数据)，它会在QueryString、Form、ServerVariable中都搜寻一遍。不过还是指明为好
而且有时候也会得到不同的结果。如果你仅仅是需要Form中的一个数据，但是你使用了Request而不是Request.Form，那么程序将在QueryString、ServerVariable中也搜寻一遍。如果正好你的QueryString或者ServerVariable里面也有同名的项，你得到的就不是你原本想要的值了。

@app.route('/')
def index():
form = HelloForm(request.form) # request.form − It is a dictionary object containing key and value pairs of form parameters and their values

# The index() function collects form data present in request.form in a dictionary object and sends it for rendering to first_app.html.
return render_template('first_app.html', form = form)

We have already seen that the http method can be specified in URL rule. The Form data received by the triggered function can collect it in the form of a dictionary object and forward it to a template to render it on a corresponding web page.

In the example, ‘/’ URL renders a web page (first_app.html) which has a form. The data filled in it is posted to the ‘/hello’ URL which triggers the hello() function.填入的数据会发布到触发 hello() 函数的 '/hello' URL。
@app.route('/hello', methods=['POST'])
def hello():
form = HelloForm(request.form)
if request.method == "POST" and form.validate():
name = request.form['sayhello']
return render_template('hello.html', name=name)
# else:
return render_template('first_app.html', form=form)

we defined a new function, hello, which will render an HTML page, hello.html, after validating the HTML form.

The hello() function collects form data present in request.form in a dictionary object and sends it for rendering to first_app.html.
OR The hello() function collects form data present in request.form("sayhello") in an array object and sends it for rendering to hello.html.请求正文中所有 "sayhello" 值的数组

Here, we used the POST method to transport the form data to the server in the message body.

if __name__ == '__main__':
app.run(debug=True)

Finally, by setting the debug=True argument inside the app.run method, we further activated Flask's debugger. This is a useful feature for developing new web applications.

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators

app = Flask(__name__)

class HelloForm(Form):
    sayhello = TextAreaField('', [validators.DataRequired()])
    
@app.route('/')
def index():
    form = HelloForm(request.form)
    # form − It is a dictionary object containing key and value pairs of form parameters and their values
    
    # The index() function collects form data present in request.form in a dictionary 
    # object and sends it for rendering to first_app.html.
    return render_template('first_app.html', form = form)

@app.route('/hello', methods=['POST'])
def hello():
    form = HelloForm(request.form)
    if request.method == "POST" and form.validate():
        name = request.form['sayhello']
        return render_template('hello.html', name=name)
    # else:
    return render_template('first_app.html', form=form)
    
if __name__ == '__main__':
    app.run(debug=True)

Implementing a macro using the Jinja2 templating engine

Now, we will implement a generic macro in the _formhelpers.html file via the Jinja2 templating engine, which we will later import in our first_app.html file to render the text field:

https://flask.palletsprojects.com/en/1.1.x/patterns/wtforms/ we can write a macro that renders a field with label and a list of errors if there are any.

This macro accepts a couple of keyword arguments that are forwarded to WTForm’s field function, which renders the field for us. The keyword arguments will be inserted as HTML attributes. So, for example, you can call render_field(form.username, class='username') to add a class to the input element. Note that WTForms returns standard Python unicode strings, so we have to tell Jinja2 that this data is already HTML-escaped with the |safe filter (必须告诉Jinja2该数据已经使用|safe 过滤器进行了HTML转义。).

{% macro render_field(field) %}
<dt> { {field.label}}
<dd> { {field(**kwargs)|safe}}
{% if field.errors %}
<ul class=errors >
{% for error in field.errors %}
<li>{ { error.errors }}</li>
{% endfor %}
</ul>
{% endif %}
</dd>
</dt>
{% endmacro %}

# _formhelpers.html

{% macro render_field(field) %}
    <dt> {
   
   {field.label}}
    <dd> {
   
   {field(**kwargs)|safe}}
    {% if field.errors %}
        <ul class=errors >
            {% for error in field.errors %}
                <li>{
   
   { error.errors }}</li>
            {% endfor %}
        </ul>
    {% endif %}
    </dd>
    </dt>
{% endmacro %}

An in-depth discussion about the Jinja2 templating language is beyond the scope of this book. However, you can find comprehensive documentation on the Jinja2 syntax at http://jinja.pocoo.org.

Adding style via CSS

Next, we will set up a simple Cascading Style Sheets (CSS) file, style.css, to demonstrate how the look and feel of HTML documents can be modified. We have to save the following CSS file, which will simply double the font size of our HTML body elements, in a subdirectory called static, which is the default directory where Flask looks for static files such as CSS. The file content is as follows:

# style.css

body{
    font-size: 2em;
}

The following are the contents of the modified first_app.html file that will now render a text form where a user can enter a name:

first_app.html

<!doctype html>
<html>
    <head>
        <title>First app</title>
    </head>
    <body>
        <div>Hi, this is my first Flask web app!</div>
    </body>
</html>

in modified first_app.html

<link href=”file” rel=”stylesheet” type=”text/css” />

e.g.
<link href=”jpsstyles.css” rel=”stylesheet” type=”text/css” />

<form action=”url” method=”type” enctype=”type”> ... </form>

e.g.
<form id="survey" name="survey" action=”http://www.redballpizza.com/cgi-bin/survey” method=”post”>

where url specifies the filename and location of the program that processes the form, the method attribute specifies how Web browsers should send data to the server, and the enctype attribute specifies the format of the data stored in the fields.

The method attribute has two possible values: get and post. The get method, the default, appends the form data to the end of the URL specified in the action attribute.
The post method, on the other hand, sends form data in a separate data stream. Each method has its uses. Web searches often use the get method because the search parameters
become part of the URL and thus can be bookmarked for future searching using the same parameters. However, this also can result in a long and cumbersome URL if several
fields and field values are attached to the URL, and it may even result in data being truncated if the URL text string becomes too long. There is also a security risk in having name/value pairs attached to a URL that easily can be read by others. Your Web site administrator can supply the necessary information about which of the two methods you should use when accessing the scripts running on its server.

The enctype attribute determines how the form data should be encoded as it is sent to the server. Figure 6-5 describes the three most common encoding types.

first_app.html

<!doctype html>
<html>
    <head>
        <title>First app</title>
        <link rel = "stylesheet" href="{
   
   { url_for('static', filename='style.css') }}" type="text/css" />
    </head>
    
    <body>
        {% from "_formhelpers.html" import render_field %}
        
        <div>What's your name?</div>
        
        <!-- in app.py(must be runned first)
        class HelloForm(Form):
            sayhello = TextAreaField('', [validators.DataRequired()])
        
        
        the app.py will process the following form
        @app.route('/hello', methods=['POST'])
        ...
        
        ‘/’ URL renders a web page (first_app.html) which has a form. The data filled in it is posted to
         the ‘/hello’ URL which triggers the hello() function.填入的数据会发布到触发 hello() 函数的 
         '/hello' URL
        -->
        <form method=post action="/hello">
            <dl>
                {
   
   { render_field(form.sayhello) }}
            </dl>
            <input type=submit value="Say Hello" name="submit_btn" />
        </form>
    </body>
</html>

<dl> </dl> Encloses a definition list using the dd and dt elements

#########################################

For example, a form containing a text field can be designed as below https://www.tutorialspoint.com/flask/flask_wtf.htm
from wtforms import Form, TextAreaField, validators

class HelloForm(Form):
sayhello = TextAreaField('', [validators.DataRequired()])

When rendered, this will result into an equivalent HTML script as shown below.

<input id = "csrf_token" name = "csrf_token" type = "hidden" />

<label for = "sayhello"></label><br>
<input id = "sayhello" name = "sayhello" type = "text" value = "" />

#########################################

run app.py ==> input http://127.0.0.1:5000/ in browser ==>pass an HTML script to first_app.html then call "{ { render_field(form.sayhello) }}" to use _formhelpers.html {% macro render_field(field) %} for rendering

input "Lin" then click button"Say Hello" to trigger event("post")==>first_app.html (form = HelloForm(request.form) in app.py)==> render_template('hello.html', name=name) ==>hello.html

In the header section of first_app.html, we loaded the CSS file. It should now alter the size of all text elements in the HTML body. In the HTML body section, we imported the form macro from _formhelpers.html, and we rendered the sayhello form that we specified in the app.py file. Furthermore, we added a button to the same form element so that a user can submit the text field entry.

Lastly, we create a hello.html file that will be rendered via the line return render_template('hello.html', name=name) inside the hello function, which we defined in the app.py script to display the text that a user submitted via the text field. The code is as follows:
hello.html

@app.route('/hello', methods=['POST'])
def hello():
form = HelloForm(request.form)
if request.method == "POST" and form.validate():
name = request.form['sayhello']
return render_template('hello.html', name=name)

<!doctype html>
    <html>
        <head>
            <title>First app</title>
            <link rel='stylesheet' href="{
   
   { url_for('static', filename='style.css') }}" />
        </head>
        
        <div>Hello {
   
   { name }}
        </div>
    </html>
</html>

Having set up our modified Flask web application, we can run it locally by executing the following command from the app's main directory and we can view the result in our web browser at http://127.0.0.1:5000/:

In spyder

==>

first_app.html
==> input "Lin" then click button"Say Hello"

jinja2.exceptions.UndefinedError

jinja2.exceptions.UndefinedError: 'urlfor' is undefined

... ...

solution: <link rel='stylesheet' href="{ { url_for('static', filename='style.css') }}" />

Besides, in cmd
python app.py

since we just installed wtforms in tensorflow

in tensorflow
python app.py

If you are new to web development, some of those concepts may seem very complicated at first sight. In that case, I encourage you to simply set up the preceding files in a directory on your hard drive and examine them closely. You will see that the Flask web framework is actually pretty straightforward and much simpler than it might initially appear! Also, for
more help, don't forget to look at the excellent Flask documentation and examples at http://flask.pocoo.org/docs/0.10/.

Turning the movie classifier into a web application

Now that we are somewhat familiar with the basics of Flask web development, let's advance to the next step and implement our movie classifier into a web application. In this section, we will develop a web application that will first prompt a user to enter a movie review, as shown in the following screenshot:

After the review has been submitted, the user will see a new page that shows the predicted class label and the probability of the prediction. Furthermore, the user will be able to provide feedback about this prediction by clicking on the Correct or Incorrect button, as shown in the following screenshot:

If a user clicked on either the Correct or Incorrect button, our classification model will be updated with respect to the user's feedback. Furthermore, we will also store the movie review text provided by the user, as well as the suggested class label, which can be inferred from the button click, in an SQLite database for future reference. (Alternatively, a user could skip the update step and click the Submit another review button to submit another review.)

The third page that the user will see after clicking on one of the feedback buttons is a simple thank you screen with a Submit another review button that redirects the user back to the start page. This is shown in the following screenshot:

Live demo

Before we take a closer look at the code implementation of this web application, take a look at this live demo at http://raschkas.pythonanywhere.com to get a better understanding of what we are trying to accomplish in this section.

Files and folders – looking at the directory tree

To start with the big picture, let's take a look at the directory tree that we are going to create for this movie classification application, which is shown here:

Earlier in this chapter, we created the vectorizer.py file, the SQLite database, reviews.db, and the pkl_objects subdirectory with the pickled Python objects.

The app.py file in the main directory is the Python script that contains our Flask code, and we will use the review.db database file (which we created earlier in this chapter) to store the movie reviews that are being submitted to our web application. The templates subdirectory contains the HTML templates that will be rendered by Flask and displayed in the browser, and the static subdirectory will contain a simple CSS file to adjust the look of the rendered HTML code.

Getting the movieclassifier code files

A separate directory containing the movie review classifier application with the code discussed in this section is provided with the code examples for this book, which you can either
obtain directly from Packt or download from GitHub at https://github.com/rasbt/python-machine-learningbook-3rd-edition/. The code in this section can be found in the .../code/ch09/movieclassifier subdirectory.

Implementing the main application as app.py

Since the app.py file is rather long, we will conquer it in two steps. The first section of app.py imports the Python modules and objects that we are going to need, as well as the code to unpickle解开 and set up our classification model:
app.py

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np

# import HashingVectorizer from local dir
from vectorizer import vect

app = Flask(__name__)

######## Preparing the Classifier ########
cur_dir = os.path.dirname(__file__) 
# print(cur_dir) # C:\Users\LlQ\0Python Machine Learning\movieclassifier

clf = pickle.load( open( os.path.join(cur_dir,
                                      'pkl_objects', # folder name
                                      'classifier.pkl'
                                      ), 
                         'rb' 
                        ) )
db = os.path.join(cur_dir, 'reviews.db')

def classify(document):
    label = {0:'negative', 1:'positive'} # since classes = np.array([0,1])
    X = vect.transform( [document] )
    y = clf.predict(X)[0] # clf = SGDClassifier(loss='log', random_state=1, max_iter=1)
    proba = np.max( clf.predict_proba(X) )
    return label[y], proba

def train( document, y):
    X = vect.transform( [document] )
    clf.partial_fit(X, [y]) # Perform one epoch of stochastic gradient descent on given samples
    
def sqlite_entry(path, document, y):
    conn = sqlite3.connect(path) # path ： reviews.db file
    # connection.cursor() returns a Cursor object. 
    # Cursor objects allow us to send SQL statements to a SQLite database using cursor.execute()
    c = conn.cursor()
    c.execute("INSERT INTO tbl_reviews(review, sentiment, date)"\
              " VALUES( ?,?,DATETIME('now') )", (document, y)
             )
    conn.commit()
    conn.close()
    
######## Flask ########
class ReviewForm( Form ):
    moviereview = TextAreaField('', 
                                [validators.DataRequired(), validators.length(min=15)]
                               )
# When rendered, this will result into an equivalent HTML script as shown below.    
# <input id = "csrf_token" name = "csrf_token" type = "hidden" />

# <label for = "moviereview"></label><br>
# <input id="moviereview" name="moviereview" type ="text" value="" required="required" min="15" />

@app.route('/')
def index():
    form = ReviewForm( request.form )
    return render_template( 'reviewform.html', form=form )

@app.route('/results', methods=['POST'])
def results():
    form = ReviewForm( request.form )
    if request.method == 'POST' and form.validate():
        review = request.form['moviereview']
        y, proba = classify( review )
        return render_template('results.html',
                               content=review,
                               prediction=y,
                               probability=round(proba*100, 2)
                              )
    # else:
    return render_template('reviewform.html', form=form)

@app.route('/thanks', methods=['POST'])
def feedback():
    feedback = request.form['feedback_button']
    review = request.form['review']
    prediction = request.form['prediction']
    
    inv_label = {'negative':0, 'positive':1}
    y = inv_label[prediction]
    if feedback =='Incorrect':
        y = int( not(y) )
    train(review, y)
    sqlite_entry(db, review, y)
    return render_template('thanks.html')

if __name__ == '__main__':
    app.run( debug=True )

This first part of the app.py script should look very familiar by now. We simply imported HashingVectorizer and unpickled the logistic regression classifier. Next, we defined a classify function to return the predicted class label, as well as the corresponding probability prediction of a given text document. The train function can be used to update the classifier, given that a document and a class label are provided.

Using the sqlite_entry function, we can store a submitted movie review in our SQLite database along with its class label and timestamp for our personal records. Note that the clf object will be reset to its original, pickled state if we restart the web application. At the end of this chapter, you will learn how to use the data that we collect in the SQLite database to update the classifier permanently.

We defined a ReviewForm class that instantiates a TextAreaField, which will be rendered in the reviewform.html template file (the landing page of our web application). This, in turn, will be rendered by the index function. With the validators.length(min=15) parameter, we require the user to enter a review that contains at least 15 characters. Inside the results function, we fetch the contents of the submitted web form and pass it on to our classifier to predict the sentiment of the movie classifier, which will then be displayed in the rendered results.html template.

The feedback function, which we implemented in app.py in the previous subsection, may look a little bit complicated at first glance. It essentially fetches the predicted class label from the results.html template if a user clicked on the Correct or Incorrect feedback button, and it transforms the predicted sentiment back into an integer class label that will be used to update the classifier via the train function, which we implemented in the first section of the app.py script. Also, a new entry to the SQLite database will be made via the sqlite_entry function if feedback was provided, and eventually, the thanks.html template will be rendered to thank the user for the feedback.

Setting up the review form

Next, let's take a look at the reviewform.html template, which constitutes the starting page of our application:

# reviewform.html

<!doctype html>
<html>
    <head>
        <title>Movie Classification</title>
        <link rel="stylesheet" href="{
   
   { url_for('static', filename='style.css') }}" />
    </head>
    
    <body>
        <h2>Please enter your movie review:</h2>
        
        {% from "_formhelpers.html" import render_field %}
        
        <form method=post action="/results">
            <dl>
                {
   
   { render_field( form.moviereview, cols='30', rows='10') }}
            </dl>
            
            <div>
                <input type=submit value='Submit review' name='submit_btn' /> 
            </div>
        </form>
    </body>
</html>

_formhelpers.html

{% macro render_field(field) %}
    <dt> {
   
   {field.label}}
    <dd> {
   
   {field(**kwargs)|safe}}
    {% if field.errors %}
        <ul class=errors >
            {% for error in field.errors %}
                <li>{
   
   { error.errors }}</li>
            {% endfor %}
        </ul>
    {% endif %}
    </dd>
    </dt>
{% endmacro %}

style.css

we imported a CSS file (style.css) at the beginning of the results.html file. The setup of this file is quite simple: it limits the width of the contents of this web application to 600 pixels and moves the Incorrect and Correct buttons labeled with the div id button down by 20 pixels:

body{
    width:600px;
}

.button{
    padding-top: 20px;
}

This CSS file is merely a placeholder, so please feel free to modify it to adjust the look and feel of the web application to your liking.

Here, we simply imported the same _formhelpers.html template that we defined in the Form validation and rendering section earlier in this chapter. The render_field function of this macro is used to render a TextAreaField where a user can provide a movie review and submit it via the Submit review button displayed at the bottom of the page. This TextAreaField is 30 columns wide and 10 rows tall, and will look like this:

? right click your mouse==>Inspect

Solution: change port=12345
app.py

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np

# import HashingVectorizer from local dir
from vectorizer import vect

app = Flask(__name__)

######## Preparing the Classifier ########
cur_dir = os.path.dirname(__file__) 
#print(cur_dir) # C:\Users\LlQ\0Python Machine Learning\movieclassifier

clf = pickle.load( open( os.path.join(cur_dir,
                                      'pkl_objects', # folder name
                                      'classifier.pkl'
                                      ), 
                         'rb' 
                        ) )
db = os.path.join(cur_dir, 'reviews.db')

def classify(document):
    label = {0:'negative', 1:'positive'} # since classes = np.array([0,1])
    X = vect.transform( [document] )
    y = clf.predict(X)[0] # clf = SGDClassifier(loss='log', random_state=1, max_iter=1)
    proba = np.max( clf.predict_proba(X) )
    return label[y], proba

def train( document, y):
    X = vect.transform( [document] )
    clf.partial_fit(X, [y]) # Perform one epoch of stochastic gradient descent on given samples
    
def sqlite_entry(path, document, y):
    conn = sqlite3.connect(path) # path ： reviews.db file
    # connection.cursor() returns a Cursor object. 
    # Cursor objects allow us to send SQL statements to a SQLite database using cursor.execute()
    c = conn.cursor()
    c.execute("INSERT INTO tbl_reviews(review, sentiment, date)"\
              " VALUES( ?,?,DATETIME('now') )", (document, y)
             )
    conn.commit()
    conn.close()
    
######## Flask ########
class ReviewForm( Form ):
    moviereview = TextAreaField('', 
                                [validators.DataRequired(), validators.length(min=15)]
                               )
# When rendered, this will result into an equivalent HTML script as shown below.    
# <input id = "csrf_token" name = "csrf_token" type = "hidden" />

# <label for = "moviereview"></label><br>
# <input id="moviereview" name="moviereview" type ="text" value="" required="required" min="15" />

@app.route('/')
def index():
    form = ReviewForm( request.form )
    return render_template( 'reviewform.html', form=form )

@app.route('/results', methods=['POST'])
def results():
    form = ReviewForm( request.form )
    if request.method == 'POST' and form.validate():
        review = request.form['moviereview']
        y, proba = classify( review )
        return render_template('results.html',
                               content=review,
                               prediction=y,
                               probability=round(proba*100, 2)
                              )
    # else:
    return render_template('reviewform.html', form=form)

@app.route('/thanks', methods=['POST'])
def feedback():
    feedback = request.form['feedback_button']
    review = request.form['review']
    prediction = request.form['prediction']
    
    inv_label = {'negative':0, 'positive':1}
    y = inv_label[prediction]
    if feedback =='Incorrect':
        y = int( not(y) )
    train(review, y)
    sqlite_entry(db, review, y)
    return render_template('thanks.html')

if __name__ == '__main__':
    app.run( port=12345, debug=True )

########################################
Note: Remember to clear the browser cache every time you change the content of the stylesheet file

for example:

click ==> Clear browsing data

########################################

Creating a results page template

Our next template, results.html, looks a little bit more interesting:

<!doctype html>
<html>
    <head>
        <title>Movie Classification</title>
        <link rel="stylesheet" href="{
   
   { url_for('static', filename='style.css') }}" />
    </head>
    
    <body>
    
        <h3>Your movie review:</h3>
        <!--
            reviewform.html was filled with data <form method=post action="/results">
            ==> app.py(review,prediction,probability) ==> render_template('results.html')
        -->
        <div>{
   
   { content }}</div> 
        
        <h3>Prediction:</h3>
        <div>This movie review is <strong>{
   
   { prediction }}</strong>
            (probability: {
   
   { probability }}%).
        </div>
        
        <div id='button'>
            <form action='/thanks' method='post'>
                <input type=submit value='Correct' name='feedback_button' />
                <input type=submit value='Incorrect' name='feedback_button' />
                <input type=hidden value='{
   
   { prediction }}' name='prediction' />
                <input type=hidden value='{
   
   { content }}' name='review' />
            </form>
        </div>
        
        <div id='button'>
            <form action='/'>
                <input type=submit value='Submit another review' />
            </form>
        </div>
               
    </body>
</html>

First, we inserted the submitted review, as well as the results of the prediction, in the corresponding fields { { content }}, { { prediction }}, and { { probability }}. You may notice that we used the { { content }} and { { prediction }} placeholder variables (in this context, also known as hidden fields) a second time in the form that contains the Correct and Incorrect buttons. This is a workaround to POST those values back to the server to update the classifi

er and store the review in case the user clicks on one of those two buttons(Correct OR Incorrect).

==>click Correct button

The last HTML file we will implement for our web application is the thanks.html template. As the name suggests, it simply provides a nice thank you message to the user after providing feedback via the Correct or Incorrect button. Furthermore, we will put a Submit another review button at the bottom of this page, which will redirect the user to the starting page. The contents of the thanks.html file are as follows:
==> click Submit another review button ==>

<!doctype html>
<html>
    <head>
        <title>Movie Classification</title>
        <link rel="stylesheet" href="{
   
   { url_for('static', filename='style.css') }}" />
    </head>
    
    <body>
        <h3>Thank you for your feedback!</h3>
        <div id='button'>
            <form action='/'>
                <input type=submit value='Submit another review' />
            </form>
        </div>
    </body>
    
</html>

Now, it would be a good idea to start the web application locally from our command-line terminal via the following command before we advance to the next subsection and deploy it on a public web server:

After we have finished testing our application, we also shouldn't forget to remove the debug=True argument in the app.run() command of our app.py script (or set debug=False ) as illustrated in the following figure:

Deploying the web application to a public server

After we have tested the web application locally, we are now ready to deploy our web application onto a public web server. For this tutorial, we will be using the PythonAnywhere web hosting service, which specializes in the hosting of Python web applications and makes it extremely simple and hassle-free. Furthermore, PythonAnywhere offers a beginner account option that lets us run a single web application free of charge.

Creating a PythonAnywhere account

To create a new PythonAnywhere account, we visit the website at https://www.pythonanywhere.com/ and click on the Pricing & signup link that is located in the top-right corner. Next, we click on the Create a Beginner account button where we need to provide a username, password, and valid email address. After we have read and agreed to the terms and conditions, we should have a new account.

Unfortunately, the free beginner account doesn't allow us to access the remote server via the Secure Socket Shell (SSH) protocol from our terminal. Thus, we need to use the PythonAnywhere web interface to manage our web application. But before we can upload our local application files to the server, we need to create a new web application for our PythonAnywhere account. After we click on the Dashboard button in the top-right corner, we have access to the control panel shown at the top of the page. Next, we click on the Web tab that is now visible at the top of the page. We proceed by clicking on the +Add a new web app button on the left, which lets us create a new Python 3.7 Flask web application that we name movieclassifier.

Select Flask

Select Python 3.7 then /home/LlQ54951/movieclassifier/app.py

movieclassifier

Uploading the movie classifier application

After creating a new application for our PythonAnywhere account, we head over to the Files tab to upload the files from our local movieclassifier directory using the PythonAnywhere web interface. After uploading the web application files that we created locally on our computer, we should have a movieclassifier directory in our PythonAnywhere account. It will contain the same directories and files as our local movieclassifier directory, as shown in the following screenshot:

Create 3 directories: pkl_objects, static, templates

Upload a file under the directory movieclassifier: app.py(replaced with ours) ==> reviews.db ==> vectorizer.py

Upload files under the directory movieclassifier/pkl_objects: classifier.pkl ==> stopwords.pkl

Upload a file under the directory movieclassifier/static: style.css

Upload files under the directory movieclassifier/templates: _formhelpers.html ==> results.html ==> reviewform.html ==> thanks.html

Then, we head over to the Web tab one more time and click on the Reload <username>.pythonanywhere.com button to propagate the changes and refresh our web application. Finally, our web application should now be up and running and publicly available via <username>.pythonanywhere.com.

Reload load LlQ54951.pythonanywhere.com

Then click (Configuration for) LlQ54951.pythonanywhere.com

Troubleshooting

Unfortunately, web servers can be quite sensitive to the tiniest problems in our web application. If you are experiencing problems with running the web application on PythonAnywhere and are receiving error messages in your browser, you can check the server and error logs, which can be accessed from the Web tab in your PythonAnywhere account, to better diagnose the problem.

Solution: Go to Consoles

select Ipython 3.8 since app.py(Python3.8)

pip install scikit-learn --user --upgrade

click app.py ==> run

Reload load LlQ54951.pythonanywhere.com

Then click (Configuration for) LlQ54951.pythonanywhere.com

type in: I love this movie. ==> click Submit review button.

click Correct button

click Submit another review button

Updating the movie classifier

While our predictive model is updated on the fly whenever a user provides feedback about the classification, the updates to the clf object will be reset if the web server crashes or restarts. If we reload the web application, the clf object will be reinitialized from the classifier.pkl pickle file. One option to apply the updates permanently would be to pickle the clf object once again after each update. However, this would become computationally very inefficient with a growing number of users and could corrupt the pickle file if users provide feedback simultaneously.

An alternative solution is to update the predictive model from the feedback data that is being collected in the SQLite database. One option would be to download the SQLite database from the PythonAnywhere server, update the clf object locally on our computer, and upload the new pickle file to PythonAnywhere. To update the classifier locally on our computer, we create an update.py script file in the movieclassifier directory with the following contents:
#update.py

import pickle
import sqlite3
import numpy as np
import os

# import HasingVectorizer from local dir
from vectorizer import vect

def update_model(db_path, model, batch_size=10000):
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    c.execute('SELECT * from tbl_reviews')
    
    results = c.fetchmany(batch_size)
    
    while results:
        data = np.array(results)
        X = data[:,0] #review
        y = data[:,1].astype(int) # data[:,1] : 0 or 1
        
        classes = np.array([0,1]) #{0:'negative', 1:'positive'} 
        X_train = vect.transform(X)
        model.partial_fit(X_train, y, classes=classes)
        results = c.fetchmany(batch_size)
    
    conn.close()
    return model

cur_dir = os.path.dirname(__file__) # remove the filename and get the current directory

clf = pickle.load( open( os.path.join(cur_dir, 'pkl_objects', 'classifier.pkl'), 
                         'rb' )
                 )
db = os.path.join(cur_dir, 'reviews.db')
clf = update_model(db_path=db, model=clf, batch_size=10000)

# Uncomment the following lines if you are sure that
# you want to update the classifier.pkl file
pickle.dump( clf, 
             open( os.path.join(cur_dir, 'pkl_objects', 'classifier.pkl'), 
                   'wb'
                 ),
             protocol=4 )

The update_model function will fetch entries from the SQLite database in batches of 10,000 entries at a time, unless the database contains fewer entries. Alternatively, we could also fetch one entry at a time by using fetchone instead of fetchmany, which would be computationally very inefficient. However, keep in mind that using the alternative fetchall method could be a problem if we are working with large datasets that exceed the computer or server's memory capacity.

Now that we have created the update.py script, we could also upload it to the movieclassifier directory on PythonAnywhere and import the update_model function in the main application script, app.py, to update the classifier from the SQLite database every time we restart the web application. In order to do so, we just need to add a line of code to import the update_model function from the update.py script at the top of app.py:

#update.py

import pickle
import sqlite3
import numpy as np
import os

# import HasingVectorizer from local dir
from vectorizer import vect

def update_model(db_path, model, batch_size=10000):
    conn = sqlite3.connect(db_path)
    c = conn.cursor()
    c.execute('SELECT * from tbl_reviews')
    
    results = c.fetchmany(batch_size)
    
    while results:
        data = np.array(results)
        X = data[:,0] #review
        y = data[:,1].astype(int) # data[:,1] : 0 or 1
        
        classes = np.array([0,1]) #{0:'negative', 1:'positive'} 
        X_train = vect.transform(X)
        model.partial_fit(X_train, y, classes=classes)
        results = c.fetchmany(batch_size)
    
    conn.close()
    return model

cur_dir = os.path.dirname(__file__) # remove the filename and get the current directory

clf = pickle.load( open( os.path.join(cur_dir, 'pkl_objects', 'classifier.pkl'), 
                         'rb' )
                 )
db = os.path.join(cur_dir, 'reviews.db')
clf = update_model(db_path=db, model=clf, batch_size=10000)

# Uncomment the following lines if you are sure that
# you want to update the classifier.pkl file
#pickle.dump( clf, 
#             open( os.path.join(cur_dir, 'pkl_objects', 'classifier.pkl'), 
#                   'wb'
#                 ),
#             protocol=4 )

we comment the code(pickle.dum(clf,...) since every time we restart the web application in which will re-run all scripts( python files )

app.py

from flask import Flask, render_template, request
from wtforms import Form, TextAreaField, validators
import pickle
import sqlite3
import os
import numpy as np
###step1 import update function from local dir
from update import update_model

# import HashingVectorizer from local dir
from vectorizer import vect

app = Flask(__name__)

######## Preparing the Classifier ########
cur_dir = os.path.dirname(__file__) 
#print(cur_dir) # C:\Users\LlQ\0Python Machine Learning\movieclassifier

###step2
clf = pickle.load( open( os.path.join(cur_dir,
                                      'pkl_objects', # folder name
                                      'classifier.pkl'
                                      ), 
                         'rb' 
                        ) )
db = os.path.join(cur_dir, 'reviews.db')

def classify(document):
    label = {0:'negative', 1:'positive'} # since classes = np.array([0,1])
    X = vect.transform( [document] )
    y = clf.predict(X)[0] # clf = SGDClassifier(loss='log', random_state=1, max_iter=1)
    proba = np.max( clf.predict_proba(X) )
    return label[y], proba

def train( document, y):
    X = vect.transform( [document] )
    clf.partial_fit(X, [y]) # Perform one epoch of stochastic gradient descent on given samples
    
def sqlite_entry(path, document, y):
    conn = sqlite3.connect(path) # path ： reviews.db file
    # connection.cursor() returns a Cursor object. 
    # Cursor objects allow us to send SQL statements to a SQLite database using cursor.execute()
    c = conn.cursor()
    c.execute("INSERT INTO tbl_reviews(review, sentiment, date)"\
              " VALUES( ?,?,DATETIME('now') )", (document, y)
             )
    conn.commit()
    conn.close()
    
######## Flask ########
class ReviewForm( Form ):
    moviereview = TextAreaField('', 
                                [validators.DataRequired(), validators.length(min=15)]
                               )
# When rendered, this will result into an equivalent HTML script as shown below.    
# <input id = "csrf_token" name = "csrf_token" type = "hidden" />

# <label for = "moviereview"></label><br>
# <input id="moviereview" name="moviereview" type ="text" value="" required="required" min="15" />

@app.route('/')
def index():
    form = ReviewForm( request.form )
    return render_template( 'reviewform.html', form=form )

@app.route('/results', methods=['POST'])
def results():
    form = ReviewForm( request.form )
    if request.method == 'POST' and form.validate():
        review = request.form['moviereview']
        y, proba = classify( review )
        return render_template('results.html',
                               content=review,
                               prediction=y,
                               probability=round(proba*100, 2)
                              )
    # else:
    return render_template('reviewform.html', form=form)

@app.route('/thanks', methods=['POST'])
def feedback():
    feedback = request.form['feedback_button']
    review = request.form['review']
    prediction = request.form['prediction']
    
    inv_label = {'negative':0, 'positive':1}
    y = inv_label[prediction]
    if feedback =='Incorrect':
        y = int( not(y) )
    train(review, y)
    sqlite_entry(db, review, y)
    return render_template('thanks.html')

if __name__ == '__main__':
    ####update the classifier from the SQLite database every time we restart the web application ###without modifying### classifier.pkl file
    ###step3 We then need to call the update_model function in the main application body:
    clf = update_model( db_path=db,
                        model=clf,
                        batch_size=10000
                      )
    app.run( port=12345, debug=False )

###without modifying### classifier.pkl file

Summary

In this chapter, you learned about many useful and practical topics that extend our knowledge of machine learning theory. You learned how to serialize a model after training and how to load it for later use cases. Furthermore, we created a SQLite database for efficient data storage and created a web application that lets us make our movie classifier available to the outside world.

we have really discussed a lot about machine learning concepts, best practices, and supervised models for classification. In the next chapter, we will take a look at another subcategory of supervised learning, regression analysis, which lets us predict outcome variables on a continuous scale, in contrast to the categorical class labels of the classification models that we have been working with so far.

cp9_Embedding aModel into a Web Application_pickle_sqlite3_Flask_wtforms_pythonanywhere_pickle_seria

Serializing fitted scikit-learn estimators

Note

Setting up an SQLite database for data storage

Developing a web application with Flask

Our first Flask web application

Form validation and rendering

Setting up the directory structure

Implementing a macro using the Jinja2 templating engine

Adding style via CSS

jinja2.exceptions.UndefinedError

Turning the movie classifier into a web application

Files and folders – looking at the directory tree

Implementing the main application as app.py

Setting up the review form

Creating a results page template

Deploying the web application to a public server

Creating a PythonAnywhere account

Uploading the movie classifier application

Updating the movie classifier

Summary

猜你喜欢