Comparison of Automated Data Analysis Frameworks—EDA Is All You Need

Comparison of Automated Data Analysis Frameworks—EDA Is All You Need

Introduction

Public number: ChallengeHub

This article mainly introduces some top automatic EDA tools to everyone, and demonstrates the specific effects through examples. Code link: https://www.kaggle.com/andreshg/automatic-eda-libraries-comparisson/notebook

AutoViz

AutoViz stands out among the many freeware Pythonic Rapid EDA Automation tools. It runs faster, which is better than its competitors SweetViz or Pandas Profiling.

Installation method

!pip install git+git://github.com/AutoViML/AutoViz.git
!pip install xlrd
from autoviz.AutoViz_Class import AutoViz_Class
AV = AutoViz_Class()
dftc = AV.AutoViz(    
	filename='',     
	sep='' ,     
	depVar='target',     
	dfte=df,     
	header=0,     
	verbose=1,     
	lowess=False,     
	chart_format='png',     
	max_rows_analyzed=300000,     
	max_cols_analyzed=30
	)

Insert picture description here
Insert picture description here

Insert picture description here
Insert picture description here
Insert picture description here

Pandas Profiling

from pandas_profiling import ProfileReport
df = pd.read_csv('/kaggle/input/titanic/train.csv')
report = ProfileReport(df)
# Start of Pandas Profiling process
start_time = dt.datetime.now()
print("Started at ", start_time)
report

Insert picture description here

SweetViz

!pip install sweetviz
import sweetviz as sv
df = pd.read_csv('/kaggle/input/credit-card-customers/BankChurners.csv').head(2000)
advert_report = sv.analyze([df, 'Data'])
advert_report.show_html()

print('SweetViz finished!!')
finish_time = dt.datetime.now()
print("Finished at ", finish_time)
elapsed = finish_time - start_time
print("Elapsed time: ", elapsed)

Insert picture description here

D-Tale

!pip install dtale
import dtale
dtale.show(df)

Official website link: https://github.com/man-group/dtale

Dataprep

!pip install -U dataprep
from dataprep.eda import plot, plot_correlation
plot(df)

Insert picture description here

plot_correlation(df)

Insert picture description here

plot(df, "Customer_Age")

Insert picture description here

plot(df, "Customer_Age", "Gender")

Insert picture description here
[1]:Pandas Profiling GitHub - https://github.com/pandas-profiling/pandas-profiling
[2]: Dan Roth, AutoViz: A New Tool for Automated Visualization - https://towardsdatascience.com/autoviz-a-new-tool-for-automated-visualization-ec9c1744a6ad
[3]: George Vyshnya, PROs and CONs of Rapid EDA Tools - https://medium.com/sbc-group-blog/pros-and-cons-of-rapid-eda-tools-e1ccd159ab07
[4]: SweetViz - https://towardsdatascience.com/sweetviz-automated-eda-in-python-a97e4cabacde
[5]:DataPrep - https://sfu-db.github.io/dataprep/user_guide/eda/plot.html

Welcome to scan the QR code to follow the ChallengeHub official account, discuss and learn more about machine learning, data analysis, etc.
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_39158406/article/details/114239617