Spicy eyes in the low score "The Deer Tripod"? Let me tell you with data, which wife Wei Xiaobao is closest to

Produced by CDA Data Analyst  

Author: Mika

Data: Zhenda  

[Guide]

Recently, the new version of "The Legend of the Deer" starring Zhang Yishan has been criticized in hot searches. Originally, due to the excellent performance in "Remaining Sins", the audience's expectation of Zhang Yishan as Wei Xiaobao was not low, but within a few days of the broadcast, the word-of-mouth fell off a cliff. Douban scored 2.6, directly sitting firmly in the history of the worst version of "The Deer Tripod".

Netizens not only criticized the exaggerated acting skills, but also jumped off the plot. In the original book, Wei Xiaobao, who is clever and witty, was stunned by Zhang Yishan as a monkey show, which made people feel as embarrassed as sitting on needles, like a thorn on his back, and like a throat.

Is the new version of "Deer Tripod" really so bad ? Which of the major versions of "The Deer and Ding Ji" is the most classic ? Among the seven wives, which one does Wei Xiaobao like best ? Today we will use the data to make a good round.

01. Count the various editions of "The Deer and Ding Ji" which is the most acclaimed?

As Jin Yong's last work before the pen was sealed, from 1984 to the present, various versions of "The Deer and Ding Ji" have emerged in endlessly. Counting down the various editions of "The Deer and Ding Ji", which one is the most acclaimed?

Here we compare the seven versions that everyone is more familiar with, namely:

  • Tony Leung Edition 1984
  • Stephen Chow Edition 92
  • Chen Xiaochun Edition 98
  • 00 year Zhang Weijian edition
  • 2008 Huang Xiaoming Edition
  • 14 years Han Dong edition
  • 20 years Zhang Yishan edition

Comparison of Douban Scores of Different Editions

From the Douban score, we can see that the Tony Leung, Stephen Chow, Chen Xiaochun and Zhang Weijian versions before 2000 have good reputations, and the scores are all above 7 points. In particular, the 1998 edition of Chen Xiaochun received the highest score of 8.8 points, becoming a classic in the hearts of countless audiences. After that, Huang Xiaoming and Han Dong both hovered around 5 and 6 points. Zhang Yishan's version was the lowest, only 2.6 points.

Evaluation Distribution of "The Deer and Ding Ji"

Then we analyzed the evaluation of each version and we can see that Chen Xiaochun received the highest praise, reaching 92%. The Zhang Yishan version is the other extreme, with a negative rating of 92%.

02. Worst "The Deer Ding Ji" Douban scores 2.6. Is it wrong?

So the worst in history "The Deer Ding Ji" Douban scores 2.6, is it wrong?

Distribution of Star Ratings in Zhang Yishan's "The Deer and Ding Ji"

We analyzed and crawled 500 scoring data of Douban, and we can see that 87.2% of them give 1 star, which is an overwhelmingly bad review.

Word cloud map comparison: Chen Xiaochun version VS Zhang Yishan version

What are everyone talking about? We compare the highest score Chen Xiaochun version and the lowest score Zhang Yishan version of "The Deer and Ding Tale", and we can see that everyone's evaluation of the Chen Xiaochun version focuses on the "classic" and "good-looking" praises.

In Zhang Yishan's evaluation, "exaggerated", "can't stand", "forced", "ugly" and other complaints emerged endlessly.

The discussion points of the evaluation also mainly focused on Zhang Yishan's acting, exaggeration, and plot.

03. Data analysis tells you which wife Wei Xiaobao is closest to?

One of the highlights of "The Deer and Ding Ji" is Wei Xiaobao's seven beautiful wives. In the previous versions, the seven wives are all beautiful and unique, which is unforgettable.

Chen Xiaochun's "The Deer and Ding Tale" Wei Xiaobao and his seven wives

However, the seven wives in the new version made netizens shout blind, each stupidly unclear.

The Seven Wives of Wei Xiaobao

So which of Wei Xiaobao's seven wives is closest? We crawled the txt file of the whole novel of "The Deer Ding Ji" and used data analysis to tell you.

First of all, we define the index of intimacy like this : There are different passages in the novel, and we use the number of times that Wei Xiaobao and his seven wives appear together in the same paragraph as the index of intimacy. Assuming that Wei Xiaobao and Shuang'er appear in the same paragraph, the intimacy between Wei Xiaobao and Shuang'er is +1.

The overall implementation process is as follows:

  • Web crawler for novel text
  • Data cleaning and organization
  • Data visualization exploration
  • Apriori Association Analysis

1. Novel text web crawler

We choose Jinyong Portfolio website for data capture, the URL is: http://jinyong.zuopinj.com/

Crawler ideas:

  • First request the url of the novel homepage to get the url of the detail page of each chapter;
  • Then request the details page url and parse and extract the text data;
  • Save the captured data in local txt files in chapters.

Implementation code:

# 导入库
import requests
import parsel
import os
from multiprocessing.dummy import Pool

class JinyongSpider(object):
    def __init__(self):
        self.headers = {
            'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) 
          AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.198 Safari/537.36'
        }
        # 保存子页面的url
        self.titles = []
        self.chapter_links = []

        # 创建一个文件夹
        if not os.path.exists('../鹿鼎记'):
            os.mkdir('../鹿鼎记')

    def parse_home_page(self, url='http://jinyong.zuopinj.com/3/'):
        # 发起请求
        response = requests.get(url, headers=self.headers)
        # 修改编码
        response.encoding = response.apparent_encoding
        # 解析数据
        selector = parsel.Selector(response.text)
        # 获取数据
        title = selector.xpath('//div[@class="book_list"]/ul/li/a/@title').extract()
        chapter_link = selector.xpath('//div[@class="book_list"]/ul/li/a/@href').extract()

        # 追加数据
        self.titles.extend(title)
        self.chapter_links.extend(chapter_link)

    def parse_detail_page(self, zip_list):
        print(f'正在爬取{zip_list[0]}章节的小说!')

        # 发起请求
        response = requests.get(url=zip_list[1])
        # 修改编码
        response.encoding = response.apparent_encoding
        # 解析数据
        selector = parsel.Selector(response.text)
        # 获取数据
        noval_text = selector.xpath('//div[@id="htmlContent"]//text()').extract()
        noval_text = '\n'.join(noval_text)

        # 写出数据
        with open(f'../鹿鼎记/{zip_list[0]}.txt', 'w', encoding='utf-8') as fp:
            print(f'正在写入{zip_list[0]}章')
            fp.write(noval_text)
            fp.close()
            print('写入完毕,关闭文件!')

    def multiprocees_function(self):
        # 实例化线程,一个进程开启多个线程
        pool = Pool(10)
        zip_list = list(zip(self.titles, self.chapter_links))
        # map操作(将zip_list中的每一个列表元素map到get_video_data的函数中,parse_detail_page这
        个函数接收的是列表元素)
        pool.map(self.parse_detail_page, zip_list)
        # 关闭线程池
        pool.close()
        # 主线程等待子线程结束之后再结束
        pool.join()

if __name__ == '__main__':
    # 实例化对象
    jinyongspider = JinyongSpider()
    # 先获取章节页面链接
    jinyongspider.parse_home_page(url='http://jinyong.zuopinj.com/3/')
    # 通过线程池运行爬虫
    jinyongspider.multiprocees_function()

The crawled data is saved locally, and the format is as follows:

2. Data cleaning and sorting

Use pandas to preprocess the data. The specific processing ideas are as follows:

  • First, divide the crawled novel text into a list according to paragraphs;
  • Loop through the list to match whether each name appears in each paragraph. The occurrence is marked as T, otherwise it is F.

The format after processing is as follows:

3. Data visualization

Import the processed data into SPSS Modeler for subsequent data mining analysis. The following are some results of the analysis:

The number of appearances of Wei Xiaobao in the article paragraph

The article paragraphs after preprocessing have a total of 7880 records, of which the keyword "Wei Xiaobao" appeared 4,981 times, accounting for 63.21%.

The close relationship between Wei Xiaobao and his seven wives

From the link diagram, it can be seen that Shuang'er has the highest intimacy score, appearing 284 times in the same paragraph as Wei Xiaobao.

among them:

  • Strong links are Shuang'er, Shizhu, Ak and Fang Yi;
  • The middle link is Su Quan;
  • The weak links are Princess Jianning and Zeng Rou.

Link relationship diagram between characters

We can also draw the link relationship diagram between the characters.

4. Apriori association analysis

The Apriori algorithm is an algorithm commonly used to mine data association rules and can be used to find data sets that frequently appear in data values.

In the Apriori algorithm, the support degree represents the proportion of the number of occurrences of several related data in the data set in the total data set, and the confidence degree reflects the probability of the occurrence of one data after another data, or the conditional probability of the data.

The following is the result of running under the parameters defined as the minimum condition support of 3% and the minimum confidence of 30% :

From the analysis results, it can be seen that in the case of double occurrence, there is a 79.77% probability that Wei Xiaobao will appear. This situation accounts for 4.5% of the data set.

Conclusion

That's all there is to it. The last thing I want to say is that for the remake, so many jewels are in front, it is inevitable that they will be compared. Because of this, we should think about how to shoot different highlights from the previous work. Instead of blindly imitating inferior texture.

The reason why a classic becomes a classic is because it is difficult to surpass . I still hope to see better domestic remakes in the future.

 

Guess you like

Origin blog.csdn.net/yoggieCDA/article/details/109989646