Google colab dynamic topic modeling based on BERTopic Trump tweets

Table of contents

dynamic topic model

Download BERTopic

data processing

basic topic model

themes over time

Pay attention to parameters

docs

timestamps

global_tuning

evolution_tuning

nr_bins

Visualize topics over time



We'll use dynamic topic modeling and BERTopic to visualize how topics in Trump's tweets evolve over time. These topics will be visualized and thoroughly explored.

dynamic topic model


Dynamic topic models can be used to analyze the evolution of topics over time for a collection of documents.

Download BERTopic

%%capture
!pip install bertopic

data processing

import re
import pandas as pd
from datetime import datetime

# Load data
trump = pd.read_csv('https://drive.google.com/uc?export=download&id=1xRKHaP-QwACMydlDnyFPEaFdtskJuBa6')

# Filter
trump.text = trump.apply(lambda row: re.sub(r"http\S+", "", row.text).lower(), 1)
trump.text = trump.apply(lambda row: " ".join(filter(lambda x:x[0]!="@", row.text.split())), 1)
trump.text = trump.apply(lambda row: " ".join(re.sub("[^a-zA-Z]+", " ", row.text).split()), 1)
trump = trump.loc[(trump.isRetweet == "f") & (trump.text != ""), :]
timestamps = trump.date.to_list()
tweets = trump.text.to_list()
tweets[0]
#republicans and democrats have both created our economic problems#

basic topic model


To perform dynamic topic modeling with BERTopic, we first need to create a basic topic model using all tweets. The temporal aspect will be ignored as we are currently only interested in the topics in these tweets.

from bertopic import BERTopic
topic_model = BERTopic(min_topic_size=35, verbose=True)
topics, _ = topic_model.fit_transform(tweets)

We can then extract the most common topics: 

freq = topic_model.get_topic_info(); freq.head(10)

 

-1 indicates all outliers and should generally be ignored. Next, let's take a look at a common topic generated:

topic_model.get_topic(4)

 

We can visualize the underlying topics created using an inter-topic distance plot. This allows us to visually judge whether a basic theme is sufficient before continuing to create themes over time. 

fig = topic_model.visualize_topics(); fig

themes over time


Before starting the dynamic topic modeling step, it is important that you are satisfied with the topic you created previously. We will use these specific topics as the basis for dynamic topic modeling.

Therefore, this step will primarily show you how previously defined topics evolve over time.

 

Pay attention to parameters

docs


These are the tweets we are using


timestamps


timestamp of each tweet/document


global_tuning


Whether to average the topic representation of a topic at time t with its global topic representation


evolution_tuning


Whether to average the topic representation of a topic at time t with the topic representation of that topic at time t-1


nr_bins


The number of bins to put the timestamp into. Extracting topics across thousands of different timestamps is computationally inefficient. Therefore, it is recommended to keep this value below 20.

topics_over_time = topic_model.topics_over_time(docs=tweets, 
                                                timestamps=timestamps, 
                                                global_tuning=True, 
                                                evolution_tuning=True, 
                                                nr_bins=20)

Visualize topics over time


After creating topics_over_time, we will have to visualize these topics since accessing them becomes more difficult as the time dimension increases.

To do this, we will visualize the distribution of topics over time based on their frequency. Doing so allows us to see how themes evolve over time. Make sure to hover over any point to see how the topic representation at time t differs from the global topic representation.

topic_model.visualize_topics_over_time(topics_over_time, top_n_topics=20)

 

 

 

Guess you like

Origin blog.csdn.net/timberman666/article/details/132708070