[Python Treasure Box] Delving into the ocean of social media data: Python tools unlock the door to analysis

The Big Secret of Social Media Data: A Complete Analysis of Python Tools and Techniques

Preface

In the digital age, social media serves as a link between the world, and a deep understanding of this vast and complex network is the key to interpreting current trends and user behavior. This article will take you to explore a series of powerful Python tools and techniques to help you easily harness social media data and reveal its deep insights.

[Python Treasure Box] Challenge network analysis: Detailed evaluation of NetworkX, iGraph, Graph-tool, Snap.py and PyGraphviz

Welcome to subscribe to the column: Python Library Treasure Box: Unlocking the Magical World of Programming

Article directory

**Social media data revealed: full analysis of Python tools and techniques**

1. Tweepy

1.1 API authentication and basic usage

Tweepy is a Python library for accessing the Twitter API. First, perform API authentication and obtain the API key and access token of the Twitter developer account. Then, perform a basic usage demonstration through Tweepy:

import tweepy

# API认证
consumer_key = 'Your_Consumer_Key'
consumer_secret = 'Your_Consumer_Secret'
access_token = 'Your_Access_Token'
access_token_secret = 'Your_Access_Token_Secret'

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret) 

api = tweepy.API(auth)

# 获取用户信息
user = api.get_user(screen_name='twitter_handle')
print(f"User: {
      
      user.screen_name}, Followers: {
      
      user.followers_count}")

# 发送一条推文
api.update_status("Hello, Twitter API!")

This code demonstrates Tweepy's API authentication and basic usage, including obtaining user information and sending tweets.

1.2 Data collection and analysis techniques

Tweepy provides a variety of methods to collect and analyze Twitter data, including obtaining users' timelines, searching for tweets with specific keywords, etc. Here is a simple example:

# 获取用户时间线
tweets = api.user_timeline(screen_name='twitter_handle', count=10)

for tweet in tweets:
    print(f"{
      
      tweet.user.screen_name}: {
      
      tweet.text}")

# 搜索关键词
search_results = api.search(q='python', count=5)

for result in search_results:
    print(f"{
      
      result.user.screen_name}: {
      
      result.text}")

This code shows how to use Tweepy to get tweets from a user's timeline and search keywords.

1.3 Real-time data stream acquisition

Tweepy also supports real-time data stream acquisition, and can process tweets generated in real time through StreamListener. Here is a simple example:

from tweepy.streaming import StreamListener
from tweepy import Stream

class MyStreamListener(StreamListener):
    def on_status(self, status):
        print(f"{
      
      status.user.screen_name}: {
      
      status.text}")

# 创建Stream对象并启动实时数据流
my_stream_listener = MyStreamListener()
my_stream = Stream(auth=api.auth, listener=my_stream_listener)

# 过滤包含关键词'python'的推文
my_stream.filter(track=['python'])

With the above code, you can get tweets containing the keyword 'python' in real time. This demonstrates Tweepy's real-time data streaming capabilities.

1.4 Analyze user interactions and trends

Tweepy can not only obtain user information and tweets, but also analyze user interactions and attention trends. Here is a sample code:

# 获取用户关注者列表
followers = api.followers(screen_name='twitter_handle', count=5)

print(f"Followers of twitter_handle:")
for follower in followers:
    print(f"{
      
      follower.screen_name}")

# 分析用户的互动
interactions = api.user_timeline(screen_name='twitter_handle', count=100)

likes_count = 0
retweets_count = 0

for tweet in interactions:
    likes_count += tweet.favorite_count
    retweets_count += tweet.retweet_count

print(f"Total Likes: {
      
      likes_count}, Total Retweets: {
      
      retweets_count}")

This code shows how to use Tweepy to get a user's follower list and analyze the user's interaction data, including likes and retweets.

1.5 Use Cursor to process large amounts of data

For processing large amounts of data, Tweepy provides Cursor to easily traverse the result set. Here is an example of getting all tweets from a user:

# 使用Cursor获取用户所有推文
all_tweets = tweepy.Cursor(api.user_timeline, screen_name='twitter_handle').items()

for tweet in all_tweets:
    print(f"{
      
      tweet.user.screen_name}: {
      
      tweet.text}")

This code demonstrates how to use Tweepy's Cursor to obtain all tweets of a user, making it easier to process large amounts of data.

1.6 Data Visualization and Insights

Combining Tweepy with data visualization tools such as Matplotlib or Seaborn can present analysis results more intuitively. Here is a simple example:

import matplotlib.pyplot as plt

# 统计用户互动数据
labels = ['Likes', 'Retweets'] 
counts = [likes_count, retweets_count]

plt.bar(labels, counts, color=['blue', 'green'])
plt.title('User Interaction Analysis')
plt.xlabel('Interaction Type')
plt.ylabel('Count')
plt.show()

This code demonstrates how to use Matplotlib to count user interaction data and perform simple data visualization.

2. python-twitter

2.1 Interface calling and permission configuration

python-twitteris another library used to access the Twitter API. Before using it, you need to perform API authentication and configure permissions. The following is the basic interface call and permission configuration:

import twitter

# API认证
api = twitter.Api(
    consumer_key='Your_Consumer_Key',
    consumer_secret='Your_Consumer_Secret',
    access_token_key='Your_Access_Token',
    access_token_secret='Your_Access_Token_Secret'
)

# 获取用户信息
user = api.GetUser(screen_name='twitter_handle')
print(f"User: {
      
      user.screen_name}, Followers: {
      
      user.followers_count}")

This code demonstrates the use of python-twitterAPI authentication and basic interface calls.

2.2 Obtaining user information and post data

python-twitterCan be used to obtain user information and post data. Here is an example:

# 获取用户信息
user = api.GetUser(screen_name='twitter_handle')
print(f"User: {
      
      user.screen_name}, Followers: {
      
      user.followers_count}")

# 获取用户的帖子
statuses = api.GetUserTimeline(screen_name='twitter_handle', count=5)

for status in statuses:
    print(f"{
      
      status.user.screen_name}: {
      
      status.text}")

This code shows how to python-twitterget user information and post data.

2.2.1 Data cleaning and processing techniques

In the obtained post data, data cleaning and processing may be required. Here's a simple cleaning tip:

import re

# 清洗推文文本
cleaned_tweets = [re.sub(r'http\S+', '', status.text) for status in statuses]

for tweet in cleaned_tweets:
    print(tweet)

This code demonstrates using regular expressions to clean tweet text and remove URLs.

2.3 Publishing and interactive operations

python-twitterTweets and interactive actions are also supported. Here's an example of tweeting and liking:

# 发布推文
new_status = api.PostUpdate("Hello, python-twitter!")

# 点赞帖子
api.CreateFavorite(status=new_status)

This code demonstrates python-twitterthe basic operations of using Tweets and Likes.

2.4 Processing media content

python-twitterSupports processing of media content, including uploading images and videos. Here is an example of uploading an image:

# 上传图片
with open('path/to/image.jpg', 'rb') as file:
    media = api.UploadMediaChunked(file)

# 发布带有图片的推文
api.PostUpdate("Check out this image!", media=media.media_id)

This code demonstrates how to python-twitterupload an image and share it in a tweet.

2.5 Live Tweet Streaming

python-twitterIt also supports obtaining real-time tweet streams, which are processed through Streamclasses and StreamListener. Here is an example of listening for live tweets containing keywords:

class MyStreamListener(twitter.StreamListener):
    def on_status(self, status):
        print(f"{
      
      status.user.screen_name}: {
      
      status.text}")

# 创建Stream对象并启动实时推文流
my_stream_listener = MyStreamListener(api=api)
my_stream = twitter.Stream(auth=api.auth, listener=my_stream_listener)

# 过滤包含关键词'python'的实时推文
my_stream.filter(track=['python'])

With this code, you can get tweets containing the keyword 'python' in real time.

2.6 Advanced search and filtering

python-twitterProvides rich search and filtering functions to meet different needs. Here is an example of an advanced search:

# 高级搜索
search_results = api.GetSearch(
    term='python',
    lang='en',
    result_type='recent',
    count=5
)

for result in search_results:
    print(f"{
      
      result.user.screen_name}: {
      
      result.text}")

This code demonstrates how to perform python-twitteran advanced search, including specifying language and result type.

3. facebook-sdk

3.1 Authentication and authority management

facebook-sdkTo access the Facebook Graph API, you first need to authenticate and configure permissions. The following is a simple example of authentication and permission management:

import facebook

# 获取用户长期访问令牌
app_id = 'Your_App_ID'
app_secret = 'Your_App_Secret'
user_access_token = 'User_Access_Token'

graph = facebook.GraphAPI(access_token=user_access_token, version='v14.0')

This code demonstrates how to facebook-sdkauthenticate and configure permissions.

3.2 Pulling user information and post data

facebook-sdkCan be used to get user information and post data. Here is an example:

# 获取用户信息
user_info = graph.get_object('me')
print(f"User: {
      
      user_info['name']}, ID: {
      
      user_info['id']}")

# 获取用户发布的帖子
user_posts = graph.get_connections('me', 'posts')

for post in user_posts['data']:
    print(f"{
      
      post['from']['name']}: {
      
      post['message']}")

This code shows how to facebook-sdkget user information and post data.

3.3 Data analysis and insights

facebook-sdkCan be combined with other data analysis tools to gain deeper insights. Here is a simple example:

import pandas as pd

# 将帖子数据转换为DataFrame
posts_df = pd.DataFrame(user_posts['data'])

# 分析帖子数据
post_analysis = posts_df.groupby('from')['message'].count().reset_index()
print(post_analysis)

This code demonstrates how to use facebook-sdkthe fetched post data to perform a simple analysis.

3.4 Publishing and interactive operations

facebook-sdkSupports publishing posts and performing interactive operations, such as likes, comments, etc. Here is an example of posting and liking:

# 发布帖子
post_message = "Hello, Facebook Graph API!"
graph.put_object(parent_object='me', connection_name='feed', message=post_message)

# 获取帖子ID
last_post_id = graph.get_connections('me', 'posts')['data'][0]['id']

# 点赞帖子
graph.put_like(object_id=last_post_id)

This code demonstrates facebook-sdkbasic operations using publishing posts and likes.

3.5 Picture, video and file upload

facebook-sdkIt also supports uploading multimedia files, including pictures, videos, etc. Here is an example of uploading an image:

# 上传图片
with open('path/to/image.jpg', 'rb') as photo:
    graph.put_photo(image=photo, message='Check out this photo!')

This code demonstrates how to use facebook-sdkupload images.

3.6 Data visualization and report generation

Combined facebook-sdkwith data visualization tools, you can create attractive charts and reports. Here is a simple example:

import matplotlib.pyplot as plt

# 将帖子数据可视化
post_analysis.plot(kind='bar', x='from', y='message', legend=False)
plt.title('User Post Analysis')
plt.xlabel('User')
plt.ylabel('Post Count')
plt.show()

This code demonstrates how to use Matplotlib to visualize post data.

3.7 Advanced functions and extensions

facebook-sdkProvides many advanced features and extension options, including event management, ad operations, and more. Here is a simple example:

# 获取用户的事件
user_events = graph.get_connections('me', 'events')

for event in user_events['data']:
    print(f"Event Name: {
      
      event['name']}, Location: {
      
      event.get('location', 'N/A')}")

This code demonstrates how to obtain facebook-sdkuser event information.

4. Instaloader

4.1 Download of pictures, videos and posts

InstaloaderIt is a tool for downloading Instagram data, supporting the downloading of pictures, videos and posts. Here is a simple example:

from instaloader import Instaloader, Profile

# 创建Instaloader对象
loader = Instaloader()

# 获取用户信息
profile = Profile.from_username(loader.context, 'instagram_handle')
print(f"User: {
      
      profile.username}, Followers: {
      
      profile.followers}")

# 下载用户的图片和视频
loader.download_profile(profile.username, profile_pic_only=False)

This code demonstrates how to Instaloaderdownload images and videos of Instagram users.

4.2 User information and interaction data extraction

InstaloaderIt also supports extracting user information and interaction data. Here is an example:

# 获取用户信息
profile = Profile.from_username(loader.context, 'instagram_handle') 
print(f"User: {
      
      profile.username}, Followers: {
      
      profile.followers}")

# 获取用户的互动数据
likes_count = 0
comments_count = 0

for post in profile.get_posts():
    likes_count += post.likes
    comments_count += post.comments

print(f"Total Likes: {
      
      likes_count}, Total Comments: {
      
      comments_count}")

This code demonstrates how to Instaloaderobtain Instagram user information and interaction data.

4.3 Data processing and presentation skills

Downloaded data can be processed and displayed through other libraries. The following is an matplotlibexample of simple demonstration:

import matplotlib.pyplot as plt

# 帖子类型分布展示
post_types = ['Image', 'Video'] 
post_counts = [profile.mediacount - profile.video_count, profile.video_count]

plt.bar(post_types, post_counts, color=['blue', 'orange'])
plt.title('Post Type Distribution')
plt.xlabel('Post Type')
plt.ylabel('Count')
plt.show()

This code demonstrates how to use matplotlibthe Show Download Instagram post type distribution.

4.4 Analyze user activity patterns

InstaloaderNot only can it download data, but it can also help analyze user activity patterns. Here is an example:

# 获取用户的帖子
posts = list(profile.get_posts())

# 计算每个月的平均帖子数量
monthly_post_count = {
    
    }
for post in posts:
    month_year = post.date.strftime("%Y-%m")
    monthly_post_count[month_year] = monthly_post_count.get(month_year, 0) + 1

# 展示月均帖子数量
months = list(monthly_post_count.keys())
post_counts = list(monthly_post_count.values())

plt.plot(months, post_counts, marker='o', linestyle='-')
plt.title('Monthly Average Post Count')
plt.xlabel('Month')
plt.ylabel('Average Post Count')
plt.xticks(rotation=45)
plt.show()

This code demonstrates how to Instaloaderget a user's posts and analyze their average number of posts per month.

4.5 Download multiple user data in batches

InstaloaderSupports batch downloading of data for multiple users. The following is an example of batch downloading user images:

users_to_download = ['user1', 'user2', 'user3'] 

for user in users_to_download:
    try:
        profile = Profile.from_username(loader.context, user)
        loader.download_profile(profile.username, profile_pic_only=True)
        print(f"Downloaded profile pictures for {
      
      profile.username}")
    except Exception as e:
        print(f"Error downloading data for {
      
      user}: {
      
      e}")

This code demonstrates how to batch Instaloaderdownload avatar images for multiple users.

4.6 Using a proxy for downloading

In some network environments, it may be necessary to use a proxy for downloading. Here is an example of using a proxy to download:

from  instaloader import InstaloaderContext

# 设置代理
context = InstaloaderContext(requests_session=requests.Session(), proxy="http://your_proxy_here")

# 创建带代理的Instaloader对象
loader_with_proxy = Instaloader(context=context)

# 下载用户的图片和视频
loader_with_proxy.download_profile(profile.username, profile_pic_only=False)

This code demonstrates how to Instaloaderuse a proxy for data download.

The above is Instaloadera further expansion of the pair, including user activity analysis, batch downloading and using proxies.

5. SocialMediaMineR

5.1 Introduction to social media data mining tools

SocialMediaMineRIt is a tool for social media data mining and supports multiple platforms. Here is a brief introduction:

from SocialMediaMineR import SocialMediaMiner

# 创建SocialMediaMiner对象
miner = SocialMediaMiner(api_key='Your_API_Key')

# 获取Twitter上特定关键词的推文
tweets = miner.search_tweets(query='data mining', count=5)

for tweet in tweets:
    print(f"{
      
      tweet['user']['screen_name']}: {
      
      tweet['text']}")

This code demonstrates how to SocialMediaMineRget tweets for a specific keyword on Twitter.

5.2 Data capture and analysis functions

SocialMediaMineRProvides rich data capture and analysis functions, including user information, post data, etc. Here is an example:

# 获取用户信息
user_info = miner.get_user_info(screen_name='twitter_handle')
print(f"User: {
      
      user_info['screen_name']}, Followers: {
      
      user_info['followers_count']}")

# 获取用户的帖子
user_posts = miner.get_user_posts(screen_name='twitter_handle', count=5)

for post in user_posts:
    print(f"{
      
      post['user']['screen_name']}: {
      
      post['text']}")

This code shows how to SocialMediaMineRget user information and post data.

5.3 Data visualization and application cases

Data visualization is SocialMediaMineRa powerful feature that can be used to display mined social media data. Here is a simple example:

import matplotlib.pyplot as plt

# 统计推文来源
source_counts = miner.count_tweet_sources(query='data mining', count=100)

plt.pie(source_counts.values(), labels=source_counts.keys(), autopct='%1.1f%%')
plt.title('Tweet Sources Distribution')
plt.show()

This code demonstrates how to use SocialMediaMineRstatistical tweet sources and visualize them.

5.4 Mining user relationship network

# 获取用户的关注者和关注的用户
user_followers = miner.get_user_followers(screen_name='twitter_handle', count=10)
user_following = miner.get_user_following(screen_name='twitter_handle', count=10)

print(f"Followers: {
      
      user_followers}")
print(f"Following: {
      
      user_following}")

5.5 Sentiment Analysis and Topic Identification

# 对推文进行情感分析
sentiment_analysis = miner.sentiment_analysis(query='data mining', count=50)

positive_tweets = sum(1 for sentiment in sentiment_analysis if sentiment == 'positive')
negative_tweets = sum(1 for sentiment in sentiment_analysis if sentiment == 'negative')

print(f"Positive Tweets: {
      
      positive_tweets}, Negative Tweets: {
      
      negative_tweets}")

5.6 Scheduled tasks and automation

from apscheduler.schedulers.blocking import BlockingScheduler

# 创建定时任务
scheduler = BlockingScheduler()

# 定义定时任务函数
def scheduled_job():
    tweets = miner.search_tweets(query='automation', count=5)
    for tweet in tweets:
        print(f"{
      
      tweet['user']['screen_name']}: {
      
      tweet['text']}")

# 每天定时执行任务
scheduler.add_job(scheduled_job, 'interval', days=1)

# 启动定时任务
scheduler.start()

The above is SocialMediaMineRa further expansion of the pair, including mining user relationship networks, sentiment analysis, scheduled tasks and automation.

6. LAWS (Python Reddit API Wrapper)

6.1 Reddit API connection and usage method

PRAWIt is a Python package used to access the Reddit API, supporting obtaining information such as posts and comments. Here is a basic way to connect and use the Reddit API:

import praw

# Reddit API认证
reddit = praw.Reddit(
    client_id='Your_Client_ID', 
    client_secret='Your_Client_Secret',
    user_agent='Your_User_Agent'
)

# 获取特定subreddit下的热门帖子
subreddit = reddit.subreddit('python')
hot_posts = subreddit.hot(limit=5)

for post in hot_posts:
    print(f"Title: {
      
      post.title}, Upvotes: {
      
      post.ups}")

This code demonstrates how to use PRAWthe Reddit API to authenticate and obtain popular posts under a specific subreddit.

6.2 Post and comment data extraction

PRAWCan be used to extract data such as posts and comments. Here is an example:

# 获取帖子信息
post = reddit.submission(id='post_id')
print(f"Title: {
      
      post.title}, Comments: {
      
      post.num_comments}")

# 获取帖子的评论
comments = post.comments.list()

for comment in comments:
    print(f"{
      
      comment.author.name}: {
      
      comment.body}")

This code shows how to PRAWget post information and comment data.

6.3 Reddit data analysis and visualization skills

The acquired PRAWReddit data can be combined with other libraries for analysis and visualization. Here is a simple example:

import matplotlib.pyplot as plt

# 统计帖子类型分布
post_types = ['Link', 'Text', 'Image']
post_counts = [subreddit.link_karma, subreddit.comment_karma, subreddit.total_karma]

plt.bar(post_types, post_counts, color=['red', 'green', 'blue'])
plt.title('Post Type Distribution')
plt.xlabel('Post Type')
plt.ylabel('Karma')
plt.show()

This code demonstrates how to use matplotlibanalytics and visualize Reddit post type distribution.

6.4 User interaction analysis

# 获取用户的帖子和评论
user = reddit.redditor('username')
user_posts = list(user.submissions.new(limit=5))
user_comments = list(user.comments.new(limit=5))

print(f"User: {
      
      user.name}, Total Posts: {
      
      len(user_posts)}, Total Comments: {
      
      len(user_comments)}")

6.5 Explore trends across multiple Subreddits

# 定义Subreddit列表
subreddits = ['python', 'datascience', 'machinelearning']

# 统计各Subreddit的帖子数量
subreddit_post_counts = {
    
    subreddit: reddit.subreddit(subreddit).submissions.new(limit=10) for subreddit in subreddits}

for subreddit, posts in subreddit_post_counts.items():
    print(f"{
      
      subreddit} Posts:")
    for post in posts:
        print(f"  - {
      
      post.title}")

6.6 Reddit Bots and Automation

# 创建Reddit机器人
reddit_bot = praw.Reddit(
    client_id='Bot_Client_ID',
    client_secret='Bot_Client_Secret',
    user_agent='Bot_User_Agent',
    username='Bot_Username',
    password='Bot_Password'
)

# 发送帖子
subreddit = reddit_bot.subreddit('test')
subreddit.submit(title='Automated Post', selftext='This post was created by a bot.')

This is a further expansion of the above PRAWto include user interaction analysis, trend exploration across multiple subreddits, and Reddit bots and automation.

7. Facepy

7.1 How to use Facebook Graph API

FacepyIt is a Python library used to access Facebook Graph API, which supports obtaining user information, posts and other data. Here is a simple example:

from facepy import GraphAPI

# Facebook Graph API认证
access_token = 'Your_Access_Token'
graph = GraphAPI(access_token)

# 获取用户信息
user_info = graph.get('me')
print(f"User: {
      
      user_info['name']}, ID: {
      
      user_info['id']}")

This code demonstrates how to use Facepythe Facebook Graph API to authenticate and obtain user information.

7.2 Data capture and analysis techniques

FacepySupport data capture and analysis, including obtaining user posts, etc. Here is an example:

# 获取用户发布的帖子
user_posts = graph.get('me/posts', limit=5)

for post in user_posts['data']: 
    print(f"{
      
      post['from']['name']}: {
      
      post['message']}")

This code shows how to Facepyget the user's post data.

7.3 User interaction and content publishing operations

FacepyUser interaction and content publishing operations are also supported. Here is an example of posting and liking:

# 发布帖子
new_post = graph.post('me/feed', message='Hello, Facebook Graph API!')

# 点赞帖子
graph.post(f'{
      
      new_post["id"]}/likes')

This code demonstrates how to use Facepythe basic operations of publishing a post and liking it.

7.4 Get the user’s friend list

# 获取用户的朋友列表
friends = graph.get('me/friends')

for friend in friends['data']:
    print(f"Friend: {
      
      friend['name']}, ID: {
      
      friend['id']}")

7.5 Analyze post interaction data

# 获取帖子的点赞和评论数量
post_id = 'post_id_here'
post_interactions = graph.get(f'{
      
      post_id}?fields=likes.summary(true),comments.summary(true)')

likes_count = post_interactions['likes']['summary']['total_count']
comments_count = post_interactions['comments']['summary']['total_count']

print(f"Likes: {
      
      likes_count}, Comments: {
      
      comments_count}")

7.6 Use data to analyze user relationships

# 获取用户的好友及其朋友列表
friends_and_friends_of_friends = []
for friend in friends['data']:
    friend_id = friend['id']
    friend_friends = graph.get(f'{
      
      friend_id}/friends')['data']
    friends_and_friends_of_friends.extend((friend_id, friend_friend['id']) for friend_friend in friend_friends)

print("User and Friends of Friends:")
for pair in friends_and_friends_of_friends:
    print(pair)

The above is Facepya further extension, including obtaining the user's friend list, analyzing post interaction data, and using the data to analyze user relationships.

8. tweepy-streaming

8.1 Streaming data acquisition and processing

tweepy-streamingIt is a streaming data acquisition extension for Tweepy, used to process tweets generated in real time. Here is a simple example:

from tweepy.streaming import StreamListener
from tweepy import Stream

class MyStreamListener(StreamListener):
    def on_status(self, status):
        print(f"{
      
      status.user.screen_name}: {
      
      status.text}")

# 创建Stream对象并启动实时数据流
my_stream_listener = MyStreamListener()
my_stream = Stream(auth=api.auth, listener=my_stream_listener)

# 过滤包含关键词'python'的推文
my_stream.filter(track=['python'])

This code demonstrates how to use it to tweepy-streamingprocess tweets containing the keyword 'python' generated in real time.

8.2 Real-time social media data analysis

Combining real-time data streaming and analysis tools enables real-time social media data analysis. Here is a simple example:

from collections import Counter 

# 统计实时推文中关键词的频率
keyword_counter = Counter()

class MyStreamListener(StreamListener):
    def on_status(self, status):
        keywords = ['data', 'analysis', 'python']  # 示例关键词
        for keyword in keywords:
            if keyword.lower() in status.text.lower():
                keyword_counter[keyword] += 1

        print(f"Real-time Keyword Frequency: {
      
      keyword_counter}")

# 创建Stream对象并启动实时数据流
my_stream_listener = MyStreamListener()
my_stream = Stream(auth=api.auth, listener=my_stream_listener)

# 过滤包含关键词的实时推文
my_stream.filter(track=keywords)

This code demonstrates how to use a real-time data stream to count the frequency of tweets containing keywords.

8.3 Real-time sentiment analysis

from textblob import TextBlob

# 对实时推文进行情感分析
class MyStreamListener(StreamListener):
    def on_status(self, status):
        analysis = TextBlob(status.text)
        sentiment = 'Positive' if analysis.sentiment.polarity > 0 else 'Negative' if analysis.sentiment.polarity < 0 else 'Neutral'
        print(f"{
      
      status.user.screen_name}: {
      
      status.text}, Sentiment: {
      
      sentiment}")

# 创建Stream对象并启动实时数据流
my_stream_listener = MyStreamListener()
my_stream = Stream(auth=api.auth, listener=my_stream_listener)

# 过滤实时推文
my_stream.filter(track=['data science', 'machine learning'])

8.4 Real-time data storage

import json

# 存储实时推文到文件
class MyStreamListener(StreamListener):
    def on_status(self, status):
        with  open('real_time_tweets.json', 'a') as f:
            tweet_data = {
    
    
                'user': status.user.screen_name,
                'text': status.text,
                'created_at': str(status.created_at)
            }
            f.write(json.dumps(tweet_data) + '\n')

# 创建Stream对象并启动实时数据流
my_stream_listener = MyStreamListener()
my_stream = Stream(auth=api.auth, listener=my_stream_listener)

# 过滤实时推文
my_stream.filter(track=['python', 'programming'])

The above is tweepy-streaminga further expansion of the pair, including real-time sentiment analysis and real-time data storage.

Summarize

This article systematically introduces multiple social media analysis tools and provides readers with a foundation for in-depth learning. By learning these tools, readers can easily obtain social media data, analyze user behavior, perform real-time data stream processing, and present deep insights with the help of data visualization tools. This is of practical value to professionals engaged in marketing, public opinion analysis, social trend research and other fields, as well as learners interested in social media data mining.