Using flask to implement web search function based on elasticsearch

Overview

I have been doing this remote internship for a month, and it feels good. The salary of 200 a day is not low for a college student. Last week, the leaders assigned new tasks for this week. The general requirement is to make a web-side search page, and the general logic is shown in the following figure:

First of all, you can see that the web side can use flask or streamlit, but I am more familiar with flask, so I wrote it in flask. The general idea is to let the user choose the type of the uploaded file first, whether to upload the Pubmed ID and search based on this ID, or upload the keyword and search for the title and abstract based on the key phrases. Then there's the routine, a submit button and a file upload button. After uploading the file, the back-end logic is to access elasticserach for retrieval according to the options and the uploaded file information, and then generate a csv file. The csv file contains three columns of pmid, title and abstract, and then use the command line to put the file into the storage system S3, the leader was afraid that I would not understand, so he also wrote the command line stored in S3. Then after everything is over, return a success message to the front end.

Front-end implementation

For the front end, I use the jinjia2 template of bootstrap and flask. I did not use flask-bootstrap, mainly because I am not very familiar with it. I used to use django, but django is too bloated for this simple system, so I still use the original idea of ​​bootstrap and django template for development. The front-end page is roughly displayed as follows:

 Here is my template code:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>Uploads</title>
    <link rel="stylesheet" href="{
   
   { url_for('static', filename='css/bootstrap.min.css') }}">
    <script src="//cdn.bootcss.com/jquery/1.11.3/jquery.min.js"></script>
    <script src="{
   
   { url_for('static', filename='js/bootstrap.bundle.js') }}"></script>
    <script src="{
   
   { url_for('static', filename='js/bootstrap.bundle.min.js') }}"></script>
</head>
<body>

<form method="post" enctype="multipart/form-data" action="{
   
   { url_for('upload') }}">
    <div class="jumbotron">
    <h1 class="display-4">Upload Files</h1>
    <hr class="my-4">
    <p>Choose file type.</p>
    <div class="form-group">
    <select name="option" class="form-control">
      <option >Pubmed ids</option>
      <option >Key phrases</option>
    </select>
    <hr class="my-4">
    <p>Choose your local file to upload.</p>
    <input type="file" name="file_name">
    <hr class="my-4">
    <input class="btn btn-primary btn-lg" type="submit"></input>
    </div>
    <ul>
    {% for message in get_flashed_messages() %}
        <div class="alert alert-warning alert-dismissible fade show" role="alert">
            {
   
   { message }}
  <button type="button" class="close" data-dismiss="alert" aria-label="Close">
    <span aria-hidden="true">&times;</span>
  </button>
</div>
    {% endfor %}
    </ul>
    </div>

</form>



</div>
</body>
</html>

It can be seen that there is this paragraph, which is not shown in the above picture. This is used with the flash flashing function of flask. It is equivalent to sending back-end messages to the front-end in a more friendly and convenient manner. Specifically, it is used to return the "success message" mentioned above and other error messages.

    <ul>
    {% for message in get_flashed_messages() %}
        <div class="alert alert-warning alert-dismissible fade show" role="alert">
            {
   
   { message }}
  <button type="button" class="close" data-dismiss="alert" aria-label="Close">
    <span aria-hidden="true">&times;</span>
  </button>
</div>
    {% endfor %}
    </ul>

You can see the following display, how the flash message is displayed on the front end:

Here is a display of the success message:

 Here is the failure message:

 backend implementation

Next, let's talk about the implementation of the backend. Anyway, only one page is very light, so there is no need to write a second file. Write it directly in the manage.py file. But the downside is that if there are development tasks in the future, it needs to be moved out. Now for convenience, it is all written in manage.py.

First we need a routing function and other general configuration, I used the extension support of flask_script, so that it can be started from the command line like django, for example python manage.py runserver -p 8000 -r -d, you can see it like this It's very similar to how Django is started.

import os

from elasticsearch import Elasticsearch
from flask import Flask, request, flash, redirect
from flask import render_template
from flask_script import Manager
import pandas as pd
import numpy as np


app = Flask(__name__)
# 启用类似Django的命令行支持
manage = Manager(app)
# app配置列表
app.config["SECRET_KEY"] = "yuetian"

# 首页路由
@app.route('/')
def index():
    return render_template('index.html')

Then you need to write a function for submitting a form and uploading a file. You can see that the flash sending function is written here. The specific logic is that the deal_with_XXX() function is executed, and if it returns True, it will flash Success! ! ! news. Instead, send: Something goes Wrong! ! ! Check the file that you uploaded. error message. I won't talk about other uploading details. The native interface of python flask is used.

# 文件上传功能
@app.route('/upload/', methods=["POST"])
def upload():
    if request.method == "POST":
        file = request.files.get("file_name")
        file.save(f"fileSaved/{file.filename}")
        option = request.form["option"]
        if option == "Pubmed ids":
            res = deal_with_pmid(file.filename)
            if res is True:
                flash('success!!!')
                return redirect("/")
            else:
                flash('Something goes Wrong!!!Check the file that you uploaded.')
                return redirect("/")
        elif option == "Key phrases":
            res = deal_with_kp(file.filename)
            if res is True:
                flash('success!!!')
                return redirect("/")
            else:
                flash('Something goes Wrong!!!Check the file that you uploaded.')
                return redirect("/")
    elif request.method == "GET":
        return render_template('index.html')

Then there are the two deal_with_XXX() functions. As the name suggests, deal_with_pmid() is used to process the data whose upload file type is pubmed ID. One is the data whose processing type is the keyword. The two data formats are roughly as follows. All in all, one search per line.

1                                                                    Heart disease

2                                                                   Genetic variation 

3                                                                    Biological pathway

4

(pubmedID type data format) (key phrases type data format)          

 The specific code is as follows:

# 处理Pubmed ID类型文件
def deal_with_pmid(file_name):
    try:
        pmid_list = []
        title_list = []
        abstract_list =[]
        es = Elasticsearch(hosts=['http://52.14.194.191:9200'])
        list_pmid = open(f"fileSaved/{file_name}", "rb").readlines()
        for pmid in list_pmid:
            json_data = {
                "query": {
                    "match": {
                        "pmid": str(pmid)
                    }
                }
            }

            res = es.search(index="pubmed-paper-index-2", body=json_data)
            pmid_list.append(res["hits"]["hits"][0]["_source"]["pmid"])
            title_list.append(res["hits"]["hits"][0]["_source"]["title"])
            abstract_list.append(res["hits"]["hits"][0]["_source"]["abstract"])
        data = pd.DataFrame(np.array([pmid_list, title_list, abstract_list]).T, columns=["pmid", "title", "abstract"])
        data.to_csv(f"query/{data['pmid'].iloc[0]}~{data['pmid'].iloc[-1]}.csv")
        os.system(f"aws s3 cp query/{data['pmid'].iloc[0]}~{data['pmid'].iloc[-1]}.csv s3://meta-adhoc/nlp/es/ --recursive")
        return True
    except Exception:
        return False

# 处理Key Phrases类型文件
def deal_with_kp(file_name):
    try:
        pmid_list = []
        title_list = []
        abstract_list = []
        es = Elasticsearch(hosts=['http://52.14.194.191:9200'])
        kp_list = open(f"fileSaved/{file_name}", "rb").readlines()
        for kp in kp_list:
            key_list = str(kp).split(" ")
            json_data = {
                "query": {
                    "bool": {
                        "should":[
                            {
                                "match":{}
                            },
                            {
                                "match": {}
                            }

                        ]
                    }
                }
            }
            for key in key_list:
                json_data["query"]["bool"]["should"][0]["match"].update({"title": key})
                json_data["query"]["bool"]["should"][1]["match"].update({"abstract": key})

            res = es.search(index="pubmed-paper-index-2", body=json_data)
            for tmp in res["hits"]["hits"]:
                pmid_list.append(tmp["_source"]["pmid"])
                title_list.append(tmp["_source"]["title"])
                abstract_list.append(tmp["_source"]["abstract"])
        data = pd.DataFrame(np.array([pmid_list, title_list, abstract_list]).T, columns=["pmid", "title", "abstract"])
        data.to_csv(f"query/{data['pmid'].iloc[0]}.csv")
        os.system(f"aws s3 cp query/{data['pmid'].iloc[0]}.csv s3://meta-adhoc/nlp/es/ --recursive")
        return True
    except Exception:
        return False

Specifically, when dealing with json fields, it should be noted that the search result of pmid type must be a pmid corresponding to a result, because this is uniquely identified, and a keyword search corresponds to multiple search results, so use a for loop to deal with.


END

Guess you like

Origin blog.csdn.net/qq_41938259/article/details/124338119