Talk about the establishment of the answer exam application

Disclaimer: This article is a blogger original article, follow the CC 4.0 BY-SA copyright agreement, reproduced, please attach the original source link and this statement.
This link: https://blog.csdn.net/neal1991/article/details/79357497

Some time ago, the answer APP development in full swing, the major Internet companies have joined Caesar currency wars, including the hoisting of the General Assembly as millions hero, cheese hero and so on. Attendant application also assisted the rise of the individual answer.

Internet has a lot to answer auxiliary applications, generally involves two steps, namely acquiring title options, and search for answers. For the title and access to options including the use adb grab the phone screen shots, and then use the ocr (optical character recognization) way to identify topics and options. Most used ocr tools Google open source tesseract-ocr as well as Baidu's ocr API. Google's tesseract-ocr can be installed locally, software download address is https://digi.bib.uni-mannheim.de/tesseract/tesseract-ocr-setup-3.05.01.exe time, increased attention to the selection of installation Chinese Simplified language packs, or can not recognize Chinese. Another method is to use Baidu's ocr API, you can apply for free, more convenient to use, the recognition rate is also relatively more accurate. Baidu API Another advantage is the picture without treatment can be identified, and tesseract-ocr general picture needs to be a simple process. Another way to get the title and the options is to use packet capture tool to grab APP request in order to gain title and option information.

On the other hand, the search for the title of the answer. Several common approach is to directly use the title as a search key to open the browser, or is the problem plus the option to search, get the number of search engine results. To determine the relevance of issues and options by the number of results to determine the answer, generally speaking this way to get answers is less accurate, first, because the topic is now the subject of a strange way of getting the second is related The greater does not necessarily mean that is the correct answer. For one thing could have been very difficult to judge is the title and options, unless you can make a perfect semantic understanding, it would be difficult to determine the correct option. There is also a relatively straightforward way is to set up exam. In this article, we discuss ways of establishing exam, here just do a simple exploration, may not be able to use in practice, because the exam must be versatile enough to be able to exert power.

Use elasticsearch establish exam

This paper mainly on the one hand, on the establishment of a small exam areas to explore, to answer assisted View full use can read the original description, the code is mainly based on TopSup made some adjustments. Elasticsearch exam will be used to establish, for the installation es can view the first article. One might think that to do with the exam es, it is simply anti-aircraft guns to fight mosquitoes - fuss. But I think es to install and easy to use, thanks to its powerful RESTFUL interface with virtually any tool can be manipulated es. Talk is cheap, show me the code .

from elasticsearch import Elasticsearch

def write_quetion():
  question = {
    'question': '谁是世界上最帅的人',
    'answer': 'Neal'
  }
  es = Elasticsearch({'localhost'})
  es.index(index='question-index', doc_type='question', id=1, body=question)

The above is a simple code snippet written as a record of the index, in fact, es can be regarded as a non-relational database, DB-Engines latest rankings, the es has jumped to No. 9. Certain concepts Elasticsearch can analogy and relational databases:

Relational Database Elasticsearch
database index
table type
row document
column field

Then when es search problems should be this:

def search_question(key_words):
  es = Elasticsearch({'localhost'})
  res = es.search(index='question-index', body={
    "query": {
      "match": {
        "question": key_words,
        "minimum_should_match": "75%"
        }
      }
    }
  })
  if res['hits'['total'] > 0:
    for hit in res['hits']['hits']:
      print(hit['_source']['question'] + ':' + hit['_source']['answer'])
   else:
     print('未搜索到类似结果')

Get questions and answers from the picture

Establish exam may use the text or apply directly answer the phone shots, there is no doubt which is the more valuable. Suppose we now have such a screenshot:

This image already contains the correct option, but how do we recognize this picture and know the correct answer? Using digital options behind it, is not possible, the correct answer is not necessarily the most selected option. Image processing thanks to this course, which has a very basic concepts to help me solve this problem. Generally the color image into a gray image is determined by a mapping function color space to gray space. In FIG MATLAB in the RGB (it may be understood as a color image) is converted to grayscale rgb2grayfunction as an example, assuming a color pixel RGB value is (R, G, B), then it calculates the gradation value G it should be:

G = 0.2989 * R + 0.5870 * G + 0.1140 * B

General industry practice is to re-color pixel gray value is calculated according to a certain weight. May be obtained by taking the background color of the color pen on the right answer FIG RGB values ​​of (80, 215, 216), and the RGB value of the background color is wrong answers (194, 194, 194).

936LqI.md.png

Teach you today is the distributive law of multiplication, show a wave of elementary school mathematics. Closer to home, it can be seen, the gray color image map a lower value. This option is for us to distinguish right and wrong options will have a significant help. First we be tailored options area, to avoid affecting digits to the right of recognition results. By binarization algorithm, we can issue options Figure use different thresholds to convert the picture into two different pictures, is less than the threshold values ​​of the pixels become black pixels, pixels white pixels is greater than the programmed threshold value. The binary conversion algorithm is very simple:

def binarizing(img, threshold):
    pixdata = img.load()
    w, h = img.size
    for y in range(h):
        for x in range(w):
            if pixdata[x, y] < threshold:
                pixdata[x, y] = 0
            else:
                pixdata[x, y] = 255
    return img

To obtain binary image 120 by the threshold value and the threshold value 180 (any value between 175-194 are possible), the results were:

93c8dx.png

93clLR.png

This time the answer is ready to come out of it. We recognize these two pictures to go by way of ocr, first figure can get all the options, while the second map can only get the wrong option, then the errors of the difference between the two is the correct option Well! Bone is not fine wonderful, is not it did not expect!

Epilogue

So far this paper, this paper is from a small point of view about a way to establish training solutions that use simple image processing technique to obtain the correct option. It is not that the school curriculum is still valuable. Of course, this is just as explore a technology, does not necessarily guarantee the operability of practice, detailed code can read the original view.

the above.

Welcome to Micro Signal mad_coder search or scan two-dimensional code number of public attention:

93cfyj.jpg

Guess you like

Origin blog.csdn.net/neal1991/article/details/79357497