Project Introduction:
The dataset contains 34k+ article data on the GeeksforGeeks website.
the data shows
field | illustrate |
---|---|
title | title of the article |
author_id | author of the article |
last_updated | The date the article was last updated |
link | Links to articles on GeeksforGeeks |
category | Article classification |
Data Sources
(97 messages) Multi-category dataset Multi-category dataset resource-CSDN library
Data cleaning and overview
import pandas as pd import numpy as np import matplotlib.pyplot as plt from sklearn.model_selection import train_test_split from sklearn.naive_bayes import MultinomialNB, ComplementNB, BernoulliNB from sklearn.metrics import brier_score_loss as BS from sklearn.feature_extraction.text import TfidfVectorizer as TFIDF import pyecharts.options as opts from pyecharts.charts import WordCloud,Ta