Comparison of Automated Data Analysis Frameworks—EDA Is All You Need
Introduction
Public number: ChallengeHub
This article mainly introduces some top automatic EDA tools to everyone, and demonstrates the specific effects through examples. Code link: https://www.kaggle.com/andreshg/automatic-eda-libraries-comparisson/notebook
AutoViz
AutoViz stands out among the many freeware Pythonic Rapid EDA Automation tools. It runs faster, which is better than its competitors SweetViz or Pandas Profiling.
Installation method
!pip install git+git://github.com/AutoViML/AutoViz.git
!pip install xlrd
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
dftc = AV.AutoViz(
filename='',
sep='' ,
depVar='target',
dfte=df,
header=0,
verbose=1,
lowess=False,
chart_format='png',
max_rows_analyzed=300000,
max_cols_analyzed=30
)
Pandas Profiling
from pandas_profiling import ProfileReport
df = pd.read_csv('/kaggle/input/titanic/train.csv')
report = ProfileReport(df)
# Start of Pandas Profiling process
start_time = dt.datetime.now()
print("Started at ", start_time)
report
SweetViz
!pip install sweetviz
import sweetviz as sv
df = pd.read_csv('/kaggle/input/credit-card-customers/BankChurners.csv').head(2000)
advert_report = sv.analyze([df, 'Data'])
advert_report.show_html()
print('SweetViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)
D-Tale
!pip install dtale
import dtale
dtale.show(df)
Official website link: https://github.com/man-group/dtale
Dataprep
!pip install -U dataprep
from dataprep.eda import plot, plot_correlation
plot(df)
plot_correlation(df)
plot(df, "Customer_Age")
plot(df, "Customer_Age", "Gender")
[1]:Pandas Profiling GitHub - https://github.com/pandas-profiling/pandas-profiling
[2]: Dan Roth, AutoViz: A New Tool for Automated Visualization - https://towardsdatascience.com/autoviz-a-new-tool-for-automated-visualization-ec9c1744a6ad
[3]: George Vyshnya, PROs and CONs of Rapid EDA Tools - https://medium.com/sbc-group-blog/pros-and-cons-of-rapid-eda-tools-e1ccd159ab07
[4]: SweetViz - https://towardsdatascience.com/sweetviz-automated-eda-in-python-a97e4cabacde
[5]:DataPrep - https://sfu-db.github.io/dataprep/user_guide/eda/plot.html
Welcome to scan the QR code to follow the ChallengeHub official account, discuss and learn more about machine learning, data analysis, etc.