Table of contents
We'll use dynamic topic modeling and BERTopic to visualize how topics in Trump's tweets evolve over time. These topics will be visualized and thoroughly explored.
dynamic topic model
Dynamic topic models can be used to analyze the evolution of topics over time for a collection of documents.
Download BERTopic
%%capture
!pip install bertopic
data processing
import re
import pandas as pd
from datetime import datetime
# Load data
trump = pd.read_csv('https://drive.google.com/uc?export=download&id=1xRKHaP-QwACMydlDnyFPEaFdtskJuBa6')
# Filter
trump.text = trump.apply(lambda row: re.sub(r"http\S+", "", row.text).lower(), 1)
trump.text = trump.apply(lambda row: " ".join(filter(lambda x:x[0]!="@", row.text.split())), 1)
trump.text = trump.apply(lambda row: " ".join(re.sub("[^a-zA-Z]+", " ", row.text).split()), 1)
trump = trump.loc[(trump.isRetweet == "f") & (trump.text != ""), :]
timestamps = trump.date.to_list()
tweets = trump.text.to_list()
tweets[0]
#republicans and democrats have both created our economic problems#
basic topic model
To perform dynamic topic modeling with BERTopic, we first need to create a basic topic model using all tweets. The temporal aspect will be ignored as we are currently only interested in the topics in these tweets.
from bertopic import BERTopic
topic_model = BERTopic(min_topic_size=35, verbose=True)
topics, _ = topic_model.fit_transform(tweets)
We can then extract the most common topics:
freq = topic_model.get_topic_info(); freq.head(10)
-1 indicates all outliers and should generally be ignored. Next, let's take a look at a common topic generated:
topic_model.get_topic(4)
We can visualize the underlying topics created using an inter-topic distance plot. This allows us to visually judge whether a basic theme is sufficient before continuing to create themes over time.
fig = topic_model.visualize_topics(); fig
themes over time
Before starting the dynamic topic modeling step, it is important that you are satisfied with the topic you created previously. We will use these specific topics as the basis for dynamic topic modeling.
Therefore, this step will primarily show you how previously defined topics evolve over time.
Pay attention to parameters
docs
These are the tweets we are using
timestamps
timestamp of each tweet/document
global_tuning
Whether to average the topic representation of a topic at time t with its global topic representation
evolution_tuning
Whether to average the topic representation of a topic at time t with the topic representation of that topic at time t-1
nr_bins
The number of bins to put the timestamp into. Extracting topics across thousands of different timestamps is computationally inefficient. Therefore, it is recommended to keep this value below 20.
topics_over_time = topic_model.topics_over_time(docs=tweets,
timestamps=timestamps,
global_tuning=True,
evolution_tuning=True,
nr_bins=20)
Visualize topics over time
After creating topics_over_time, we will have to visualize these topics since accessing them becomes more difficult as the time dimension increases.
To do this, we will visualize the distribution of topics over time based on their frequency. Doing so allows us to see how themes evolve over time. Make sure to hover over any point to see how the topic representation at time t differs from the global topic representation.
topic_model.visualize_topics_over_time(topics_over_time, top_n_topics=20)