k- nearest neighbor algorithm to solve the Cat's Eye movie font anti-climb

Record k- nearest neighbor algorithm to solve the cat's eye movie font anti-climb.
This blog article is only used my amateur record, publish this, only users to read reference, if infringement, please inform me and I will be deleted.

The article did not say there are some very fine point. Because I believe you can see this article, not by my narrative also you can understand.

1. Background

  • Open the cat's eye view real-time movie box office.No doubt, this is the site to customize some of the fonts.
  • See the right font-family named stonefont , and this is the site to customize the name of the font. Then click on the picture frame on the right place.

Here Insert Picture Description

  • Goodfellas, click here to jump to the page. Here's font-family name is stonefont , custom font file is it.
  • See suffix woff yet, this is the font file. The visit at the URL of , give me a downloaded font file. bingo! !

Here Insert Picture Description
Open the downloaded font editor font file.
It is recommended that two font editing tools.

Articles using FontCreator , (cracked version of Oh! If economic conditions permit, please support and buy genuine)

  • After opening the font file to see a total of 10 digits.
  • 数字5对应的uni编码uniF208,序号为2
  • 数字都是由密密麻麻的坐标点连线组成的。

Here Insert Picture Description

Here Insert Picture Description
可以从表格中清楚的看出源码与字体的对应关系。

网页显示 源码显示 字体显示
3 &#xe1bb uniF1BB
5 &#xe208 uniF208
  • 这里查看一下网页的源码,看到 《误杀》 这部电影的实时票房,看到它的票房的数字字体被带 &#xe 的编码给替代了。
  • 但是在仔细与右边的字体文件对比之下,发信他们的后三位是相同的。这就是规律!!!
  • 既然如此!那根据他们的对应关系建立一个字典。
  • 本文完。

 
 
 
 
 
 
 
 
 
 
 
 

等等??怎么我每次刷新页面时候,网站的字体文件都有变化???
这有什么,直接用字体文件的坐标做对应就行了。

  • 但是!!仔细观看不同字体文件之间的表达同一个数字的坐标是有些差异的!
  • 这个!下面用k-近邻算法实现对猫眼电影字体的破解。

Here Insert Picture Description

2. 解决思路 + 重要代码

何为 k-近邻算法:

K最近邻(k-Nearest Neighbor,KNN)分类算法,是一个理论上比较成熟的方法,也是最简单的机器学习算法之一。
该方法的思路是:在特征空间中,如果一个样本附近的k个最近(即特征空间中最邻近)样本的大多数属于某一个类别,则该样本也属于这个类别。

在这里换句话说就是,先保存多个字体坐标对应的数字样本。然后输入新字体文件的样本,从而判断新字体文件具体的数字。

那接下来的思路就是:

  1. 先保存多份数字与字体坐标对应的数据样本集
  2. 提取网页的字体坐标与样本集做比对,得出新坐标对应的数字

fontTools的基本操作

from fontTools.ttLib import TTFont

base_font = TTFont('xxx.woff')    	# 打开文件
base_font.saveXML('xxx.xml')		# 将字体文件保存为xml文件
base_font.getBestCmap()				# 映射关系unicode跟Name
base_font['glyf'][name].coordinates	# 字形的轮廓信息(坐标数据)

获取新字体坐标数组

def parse_font():
    """
    获取新字体文件的uni编码的坐标
    :return:
    """
    base_font = TTFont('maoyan.woff')
    uni_list = base_font.getGlyphOrder()[2:]
    print(uni_list)

    num_dict = {}
    for name in uni_list:
    	# 获取字体坐标 [(142, 151), (154, 89), (215, 35)......]
    	# 循环遍历  [142, 151, 154, 89, 215, 35.....]
        coordinate = (list(base_font['glyf'][name].coordinates))
        font_0 = [i for item in coordinate for i in item]
        print(font_0)

knn- neighbor algorithm:
here only need to pass the coordinates of an array of new font file to return to the digital coordinate corresponding.

# -*- coding: utf-8 -*-


"""knn算法实现传入新字体文件的坐标数组返回比对后的数字"""

import numpy as np
from font_dataset import dataset_	# 字体样本数据集


def handle_dataset():
    """
    :return: 返回训练样本集,以及对应的标签
    """
    # 值得注意的是,样本数据的长度和新字体坐标数组的长度必须一致!!
    # 所以这里也要用zeros生成长度为200的数组,r然后再做替换
    lables = list(data[0] for data in dataset_)
    dataset = list(data[1] for data in dataset_)
    returnmat = np.zeros((len(dataset), 200))
    index = 0
    for data in dataset:
        returnmat[index, :len(data)] = data
        index += 1
    return returnmat, lables


def classify_knn(new_array, dataset, lables, k):
    """
    :param new_array: 新实例
    :param dataset:   训练数据集
    :param lables:    训练集标签
    :param k:         最近的邻居数目
    :return:          返回算法处理后的结果
    """
    # np.shape是取数据有多少组,然后生成tile生成与训练数据组相同数量的数据
    # 然后取平方
    # np.sum(axis=1) 取行内值相加,然后开发,求出两点之间的距离
    datasetsize = dataset.shape[0]
    diffmat = np.tile(new_array, (datasetsize, 1)) - dataset
    sqrdiffmat = diffmat ** 2
    distance = sqrdiffmat.sum(axis=1) ** 0.5
    # np.argsort将数值从小到大排序输出索引
    # dict的get返回指定键的值,如果值不在字典中返回默认值。
    # 根据排序结果的索引值返回靠近的前k个标签
    sortdistance = distance.argsort()
    count = {}
    for i in range(k):
        volelable = lables[sortdistance[i]]
        count[volelable] = count.get(volelable, 0) + 1
    count_list = sorted(count.items(), key=lambda x: x[1], reverse=True)
    return count_list[0][0]


def knn_num(inX):
    """
    :param inX: 传入新字体文件的坐标数组
    :return:    返回比对后的对象的数字
    """
    # returnMats 返回一个长度为200的用0填充的数组
    # 生成长度为200的数组,将returnMats前面替换为传入的字体坐标数组
    returnMats = np.zeros([200])
    returnMats[:len(inX)] = inX
    inX = returnMats
    dataset, lables = handle_dataset()
    result = classify_knn(inX, dataset, lables, 5)
    return (result)

3. behind the words

Well, this time to share it here. Have any questions please leave a comment below, oh.

Published 34 original articles · won praise 210 · views 20000 +

Guess you like

Origin blog.csdn.net/weixin_45081575/article/details/103593947