Wechat friend data analysis based on Python

  WeChat recently received an important update that allows users to customize the Discover page. I don’t know when it started, the WeChat Moments have become more and more complicated. When more and more people choose to “show only the Moments of the last three days”, probably even the WeChat official is helpless. The gradually generalized friend relationship has allowed WeChat to gradually transition from socializing with acquaintances to socializing with strangers, and the status updates in the circle of friends are also real and illusory, as if trying to prove that each individual is "interesting". Some people choose to record the details of life in the circle of friends, and some people choose to show the similarities and differences of opinions in the circle of friends. In the final analysis, people are spying on other people's lives all the time, but they are afraid that others will know too much about their own lives. The light and darkness intertwined in human nature are like a hedgehog covered with thorns. If you are too far away, you will feel cold, and if you are too close, you will be afraid of being pricked by thorns. Moments are like visiting relatives during the Chinese New Year. Even if you have 10,000 unhappy things in your heart, you are always reluctant to tear your face, or block each other, or not show them, or only show them for the last three days. More and more, the circle of friends is getting bigger and bigger, but there will no longer be a "little red dot" that can really touch your heart. Humans make a product more and more complicated, and then say that it cannot meet human needs. This is probably It was unexpected from the beginning!

introduction

  Some people say that human nature is far more complex than computer programming, because even the greatest invention of human beings to date, the computer, can be equally at a loss when confronted with human natural language. How many human languages ​​have ambiguous meanings. I think language is the biggest misunderstanding of human beings. Humans often like to speculate on the hidden meaning behind language. A case of guessing a girlfriend's true intentions. In Mr. Jin Yong's martial arts novel "The Legend of the Condor Heroes", during the Southern Song Dynasty when information was blocked, a nonsense word from Qiu Qianzhang on the rivers and lakes turned the entire martial arts world upside down. In fact, wouldn't it be better to make it clear in a sentence or two? Among the various entanglements among Huang Yaoshi, the Quanzhen Seven Sons, and the Six Eccentrics of Jiangnan, which one is not a misunderstanding? Many martial arts masters who have shaken the past and the present have no ability to remove the false and preserve the truth, and the language has caused so many misunderstandings.

  But even if human languages ​​are as complex as a wordless book, humans still find clues in these languages. In the ancients, King Wen "restricted and played Zhouyi" and Dongfang Shuo measured characters and divinations. This kind of primitive worship with "superstition" color is just like people's superstitious horoscope today. Human beings have continuously summarized their experience in the evolution of thousands of years. and training results. In this way, our artificial intelligence is not a more scientific "superstition", because data and algorithms make us constantly believe that all this is true. We who live in the digital age are undoubtedly sad. While trying to hide our true self in front of others, we feel regretful that we have nowhere to escape. Every digital nerve is closely connected to you and me. You can't expect any digital device to have true intelligence, but every moment in your life is quietly reflected by data.

  Today's article will analyze the data of WeChat friends based on Python. The dimensions selected here are: gender, avatar, signature, and location. The results are mainly presented in the form of charts and word clouds. Among them, text information will be used. There are two methods: word frequency analysis and sentiment analysis. As the saying goes: If a worker wants to do a good job, he must first sharpen his tools. Before the official start of this article, briefly introduce the third-party modules used in this article:
* itchat : Wechat web interface encapsulates the Python version, which is used in this article to obtain WeChat friend information.
* jieba : Python version of stuttering word segmentation, used in this article to segment text information.
* matplotlib : A chart drawing module in Python, used in this article to draw column and pie charts
* snownlp : A Chinese word segmentation module in Python, used in this article to make emotional judgments on text information.
* PIL : The image processing module in Python, used in this article to process images.
* numpy : Numerical calculation module in Python, used with the wordcloud module in this article.
* wordcloud : The word cloud module in Python, used in this article to draw word cloud pictures.
* TencentYoutuyun: The Python version SDK provided by Tencent Youtu is used in this article to recognize faces and extract image label information.
The above modules can be installed through pip. For detailed instructions on the use of each module, please refer to their respective documentation.

data analysis

  The premise of analyzing WeChat friend data is to obtain friend information. By using the itchat module, all this will become very simple. We can achieve it with the following two lines of code:

itchat.auto_login(hotReload = True)
friends = itchat.get_friends(update = True)

  Just like logging in to the web version of WeChat, we can log in by scanning the QR code with our mobile phone. The friends object returned here is a collection, and the first element is the current user. Therefore, in the following data analysis process, we always take friends[1:] as the original input data, and each element in the collection is a dictionary structure. Taking me as an example, you can notice that there are Sex, City, Province, HeadImgUrl, and Signature are the four fields. Our following analysis starts with these four fields:

Friend information structure display

friend gender

  To analyze the gender of friends, we first need to obtain the gender information of all friends. Here we extract the Sex field of each friend's information, and then count the numbers of Male, Female and Unkonw respectively. We assemble these three values ​​into a list , you can use the matplotlib module to draw a pie chart. The code is implemented as follows:

def analyseSex(firends):
    sexs = list(map(lambda x:x['Sex'],friends[1:]))
    counts = list(map(lambda x:x[1],Counter(sexs).items()))
    labels = ['Unknow','Male','Female']
    colors = ['red','yellowgreen','lightskyblue']
    plt.figure(figsize=(8,5), dpi=80)
    plt.axes(aspect=1) 
    plt.pie(counts, #性别统计结果
            labels=labels, #性别展示标签
            colors=colors, #饼图区域配色
            labeldistance = 1.1, #标签距离圆点距离
            autopct = '%3.1f%%', #饼图区域文本格式
            shadow = False, #饼图是否显示阴影
            startangle = 90, #饼图起始角度
            pctdistance = 0.6 #饼图区域文本距离圆点距离
    )
    plt.legend(loc='upper right',)
    plt.title(u'%s的微信好友性别组成' % friends[0]['NickName'])
    plt.show()

  Here is a brief explanation of this code. The values ​​of the gender field in WeChat are Unkonw, Male, and Female, and the corresponding values ​​are 0, 1, and 2, respectively. The three different values ​​are counted by Counter() in the Collection module. The items() method returns a collection of tuples, and the first dimension element of the tuple represents the key, namely 0, 1, 2 , the second dimension element of the tuple represents the number, and the set of the tuple is sorted, that is, its keys are arranged in the order of 0, 1, and 2, so the map() method can be used to obtain these three different values. The number of values, we can pass it to matplotlib to draw, and the percentages of these three different values ​​are calculated by matplotlib. The following figure is the gender distribution map of friends drawn by matplotlib:
Gender analysis of WeChat friends

  Seeing this result, I am not surprised at all. The ratio of males and females is seriously unbalanced. Although this can explain the reason why I am single, I don't think the problem can be solved by adjusting the ratio of males and females. Many people think that they are single because they have a small social circle. So is it possible to get rid of singleness by expanding your social circle? I think this may increase the probability of getting out of the single, but the god of luck should not favor me, because my good luck ran out long before I was 24 years old. There is a hot topic on Zhihu: Are men generally no longer pursuing women? , In fact, where would anyone like to be alone? Nothing more than fear of disappointment again and again. Some people are not my flowers, I just happened to pass through her bloom. Someone once said that I was a passionate person, but she would never know that every decision I made was fiery and tragic. The so-called "wisdom will be hurt, deep love will not live; modest gentleman, gentle as jade", the world's suffering and five poisons are probably like this.

friend avatar

  Analyzing the avatars of friends can be analyzed from two aspects. First, in these friends avatars, what is the proportion of friends who use face avatars; second, what valuable keywords can be extracted from these friends avatars. Here, you need to download the avatar to the local according to the HeadImgUrl field, and then use the API interface related to face recognition provided by Tencent Youtu to detect whether there is a face in the avatar image and extract the tags in the image. Among them, the former is a subtotal, and we use a pie chart to present the results; the latter is to analyze the text, and we use a word cloud to present the results. The key code is as follows:

def analyseHeadImage(frineds):
    # Init Path
    basePath = os.path.abspath('.')
    baseFolder = basePath + '\\HeadImages\\'
    if(os.path.exists(baseFolder) == False):
        os.makedirs(baseFolder)

    # Analyse Images
    faceApi = FaceAPI()
    use_face = 0
    not_use_face = 0
    image_tags = ''
    for index in range(1,len(friends)):
        friend = friends[index]
        # Save HeadImages
        imgFile = baseFolder + '\\Image%s.jpg' % str(index)
        imgData = itchat.get_head_img(userName = friend['UserName'])
        if(os.path.exists(imgFile) == False):
            with open(imgFile,'wb') as file:
                file.write(imgData)

        # Detect Faces
        time.sleep(1)
        result = faceApi.detectFace(imgFile)
        if result == True:
            use_face += 1
        else:
            not_use_face += 1 

        # Extract Tags
        result = faceApi.extractTags(imgFile)
        image_tags += ','.join(list(map(lambda x:x['tag_name'],result)))

    labels = [u'使用人脸头像',u'不使用人脸头像']
    counts = [use_face,not_use_face]
    colors = ['red','yellowgreen','lightskyblue']
    plt.figure(figsize=(8,5), dpi=80)
    plt.axes(aspect=1) 
    plt.pie(counts, #性别统计结果
            labels=labels, #性别展示标签
            colors=colors, #饼图区域配色
            labeldistance = 1.1, #标签距离圆点距离
            autopct = '%3.1f%%', #饼图区域文本格式
            shadow = False, #饼图是否显示阴影
            startangle = 90, #饼图起始角度
            pctdistance = 0.6 #饼图区域文本距离圆点距离
    )
    plt.legend(loc='upper right',)
    plt.title(u'%s的微信好友使用人脸头像情况' % friends[0]['NickName'])
    plt.show() 

    image_tags = image_tags.encode('iso8859-1').decode('utf-8')
    back_coloring = np.array(Image.open('face.jpg'))
    wordcloud = WordCloud(
        font_path='simfang.ttf',
        background_color="white",
        max_words=1200,
        mask=back_coloring, 
        max_font_size=75,
        random_state=45,
        width=800, 
        height=480, 
        margin=15
    )

    wordcloud.generate(image_tags)
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()

  Here we will create a new HeadImages directory in the current directory to store the avatars of all friends, and then we will use a class named FaceApi, which is encapsulated by Tencent Youtu SDK, and face detection is called here. There are two API interfaces, and image tag recognition . The former will count the number of friends who "use face avatar" and "do not use face avatar", and the latter will accumulate the tags extracted from each avatar. The analysis results are shown in the following figure:

  It can be noticed that among all WeChat friends, about 1/4 of the WeChat friends use face avatars, while nearly 3/4 of WeChat friends do not have face avatars, which shows that among all WeChat friends, there is a high level of “appearance”. "People who are confident only account for 25% of the total number of friends, or 75% of WeChat friends are low-key and do not like to use face avatars as WeChat avatars. Does this mean that "good-looking skins" are not the same thing, and good-looking people are really a minority among the minority. Therefore, when the makeup of girls is getting closer and closer to "Korean semi-permanent thick and flat eyebrows", "face with melon seeds" and "big red lips", when the clothes of boys are getting closer and closer to "big back" and "high neck sweater" When moving closer to the "long coat", can we really get a personality once. There are too many things in life that are kidnapped by the world. It is necessary to be different from others, and at the same time, to be the same as most people. This is helpless in life. Considering that Tencent Youtu can't really recognize "face", we extract the tags in friends' avatars again to help us understand what keywords are in the avatars of WeChat friends. The analysis results are shown in the figure:
WeChat friend avatar tag word cloud display

  Through the word cloud, we can find that in the signature word cloud of WeChat friends, keywords with relatively high frequency are: girl, tree, house, text, screenshot, cartoon, group photo, sky, sea. This shows that among my WeChat friends, the WeChat avatars selected by friends mainly come from four sources: daily life, travel, scenery, and screenshots. The style of the WeChat avatars selected by friends is mainly cartoon, and the common elements in the WeChat avatars selected by friends are the sky. , sea, houses, trees. By observing all the friends' avatars, I found that among my WeChat friends, 15 people use personal photos as WeChat avatars, 53 people use network pictures as WeChat avatars, 25 people use anime pictures as WeChat avatars, and group photos are used. 3 people use pictures as WeChat avatars, 5 people use children’s photos as WeChat avatars, 13 people use landscape pictures as WeChat avatars, and 18 people use girls’ photos as WeChat avatars, which are basically in line with the analysis results of image tag extraction. .

friend signature

  Analyze friend signatures. Signatures are the most abundant text information in friend information. According to the "labeling" methodology that humans are used to, signatures can analyze the state of a certain person for a certain period of time, just like people laugh when they are happy, or when they are sad. Crying, crying and laughing are two labels, indicating the state of happiness and sadness, respectively. Here we do two kinds of processing for the signature. The first one is to generate a word cloud after word segmentation using stuttering word segmentation. The purpose is to know what keywords are in the friend's signature, and which keywords appear relatively frequently; It is to use SnowNLP to analyze the emotional tendencies in the friend's signature, that is, whether the friend's signature is positive, negative or neutral as a whole, and what is the proportion of each. The Signature field can be extracted here. The core code is as follows:

def analyseSignature(friends):
    signatures = ''
    emotions = []
    pattern = re.compile("1f\d.+")
    for friend in friends:
        signature = friend['Signature']
        if(signature != None):
            signature = signature.strip().replace('span', '').replace('class', '').replace('emoji', '')
            signature = re.sub(r'1f(\d.+)','',signature)
            if(len(signature)>0):
                nlp = SnowNLP(signature)
                emotions.append(nlp.sentiments)
                signatures += ' '.join(jieba.analyse.extract_tags(signature,5))
    with open('signatures.txt','wt',encoding='utf-8') as file:
         file.write(signatures)

    # Sinature WordCloud
    back_coloring = np.array(Image.open('flower.jpg'))
    wordcloud = WordCloud(
        font_path='simfang.ttf',
        background_color="white",
        max_words=1200,
        mask=back_coloring, 
        max_font_size=75,
        random_state=45,
        width=960, 
        height=720, 
        margin=15
    )

    wordcloud.generate(signatures)
    plt.imshow(wordcloud)
    plt.axis("off")
    plt.show()
    wordcloud.to_file('signatures.jpg')

    # Signature Emotional Judgment
    count_good = len(list(filter(lambda x:x>0.66,emotions)))
    count_normal = len(list(filter(lambda x:x>=0.33 and x<=0.66,emotions)))
    count_bad = len(list(filter(lambda x:x<0.33,emotions)))
    labels = [u'负面消极',u'中性',u'正面积极']
    values = (count_bad,count_normal,count_good)
    plt.rcParams['font.sans-serif'] = ['simHei'] 
    plt.rcParams['axes.unicode_minus'] = False
    plt.xlabel(u'情感判断')
    plt.ylabel(u'频数')
    plt.xticks(range(3),labels)
    plt.legend(loc='upper right',)
    plt.bar(range(3), values, color = 'rgb')
    plt.title(u'%s的微信好友签名信息情感分析' % friends[0]['NickName'])
    plt.show()

  Through the word cloud, we can find that in the signature information of WeChat friends, keywords with relatively high frequency are: hard work, growth, beauty, happiness, life, happiness, life, distance, time, and walk. Sure enough, my WeChat friends are all warm and upright young people! :smile: Actually, the setting of signature reflects a state of mind to a certain extent. When people are young, they can't help but "swear for new words." The time pattern is broken, perhaps this is the reason why we are reluctant to let others know the past, because along with the growth of people, a certain momentary state can't bear to look directly at it, QQ space has accompanied the entire youth of our generation, making The impressive "That Year Today" function sometimes makes us feel the warmth of memories, and sometimes makes us feel the slaughter of the years, "It was only normal at the time" things are right and wrong, "Looking back at the place where we were always depressed" calm and calm , The loss and melancholy of "Today's Fu He Xi"... all left deep and shallow marks in this line of signatures. There is a discussion on the topic of signatures on Zhihu, and friends who are interested in this may wish to take a look at it. :smile:

WeChat friend signature information word cloud display

  Through the histogram, we can find that in the signature information of WeChat friends, positive and positive emotional judgments account for about 55.56%, neutral emotional judgments account for about 32.10%, and negative emotional judgments account for about 12.35%. This result is basically consistent with the results we showed through the word cloud, which shows that about 87.66% of the signature information of WeChat friends is a positive attitude. There are basically two types of users in Moments. The first type of users use Moments to record their own life, and the second type of users use Moments to output their opinions. Obviously, for the second type of users, it doesn't mind others knowing about its past, it cares more about whether its output is consistent from beginning to end. Therefore, no matter if others in the circle of friends are posting food, or travel, or showing affection, or posting babies, or cooking chicken soup, etc., in my opinion, this is a way of life, and the spiritual level and material level are higher than yours. people who think that the content in your Moments is "boring", this is in line with the consistent way of human cognition. In most cases, it is the people who are similar to your level, who are unfamiliar with unfamiliar people or things. Judging, if you don't like the content in my circle of friends, please block me directly, because then we can still be friends; if you tell me that B is not good here because you like A, this is really three It doesn't fit. I believe that two people who do not have completely matched interests, even between male and female friends or lovers, in a word, sincerity and mutual respect are the basic requirements for getting along with each other.

Sentiment analysis display of WeChat friends' signature information

friend location

  Analyze the location of friends, mainly by extracting the two fields of Province and City. The map visualization in Python is mainly through the Basemap module. This module needs to download map information from foreign websites, which is very inconvenient to use. Baidu's ECharts is widely used in the front-end. Although the community provides the pyecharts project, I noticed that due to policy changes, Echarts no longer supports the function of exporting maps, so the customization of maps is still a problem. The mainstream The technical solution is to configure the JSON data of various provinces and cities across the country. Here, the blogger uses the BDP personal version , which is a zero-programming solution. We export a CSV file through Python, and then upload it to BDP. You can make a visual map, it can't be easier, here we just show the code that generates the CSV part:

def analyseLocation(friends):
    headers = ['NickName','Province','City']
    with open('location.csv','w',encoding='utf-8',newline='',) as csvFile:
        writer = csv.DictWriter(csvFile, headers)
        writer.writeheader()
        for friend in friends[1:]:
           row = {}
           row['NickName'] = friend['NickName']
           row['Province'] = friend['Province']
           row['City'] = friend['City']
           writer.writerow(row)

  The following picture is the geographical distribution map of WeChat friends generated in BDP. It can be found that my WeChat friends are mainly concentrated in two provinces, Ningxia and Shaanxi. The nerves of the digital age affect everyone in the social relationship chain, and the privacy we want to protect is reflected in these data little by little. Humans may be able to disguise themselves constantly, but these laws and connections extracted from the data will not deceive human beings. Mathematics was once called the most useless subject, because there is no need for sacred and pure calculation in life, and empirical formulas are always more commonly used than theoretical formulas in different subject knowledge. But at this moment, you see, the world is like a ticking clock, and every minute and every second is closely matched.
Geographical distribution map of WeChat friends

Summary of this article

  When I wrote this article, I didn't know how to write, because WeChat is a magical existence, it is a national-level national APP, so the product design of WeChat has always been an interesting phenomenon. A series of design details, such as the number of tabs, the name of each tab, the customization of the "discovery" page, the entry of the applet, the entry of the Moments, and the comments in the Moments, are all worthy of our research through human nature and psychology. Even if he is named "Zhang Xiaolong" by people, when facing the most complex Chinese user group, he is still full of helplessness. Unsatisfactory functions. When any ecology faces a huge user group, the increase or decrease of functions will become a problem. The so-called "big forest has any bird", Zhihu is facing the same The problem of marketing public accounts is that while constantly consuming social topics, it guides the value orientation of groups of fans. Humans always yearn for others to understand themselves, but do human beings really understand themselves? This blog is another attempt to analyze data. It mainly analyzes WeChat friends from four dimensions: gender, avatar, signature, and location. The results are presented in two forms: charts and word clouds. In a word, "data visualization is a means, not an end", the important thing is not that we have made these pictures here, but the phenomena reflected in these pictures, what essential enlightenment we can get, a friend of mine Ask me why I want to grab everything, why, because I don't understand human beings!

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325484177&siteId=291194637