3 Automated Analysis Tools to Improve Data Analysis Efficiency

Data is important to every industry today, and nearly every company collects data and uses it to make data-driven business decisions. One of the most important steps in this process is analyzing the data. There are many python libraries dedicated to data visualization. For example, Matplotlib, Seaborn, etc., but they only provide the function of icons. If we need to do EDA, we need to write code manually. In this article, we will introduce 3 tools that can almost automate our exploratory data analysis.

1、pandas_profiling

pandas_profiling can extend the functionality of DataFrame by using the df.profile_report() function to perform quick data analysis and provide a descriptive summary of the dataset. It provides report generation capabilities for datasets and a number of custom functions for the generated reports.

To install pandas_profiling, run the following command in your jupyter notebook.

!pip install pandas_profiling

Import ProfileReport from pandas_profiling and run the following code. df is the name of the dataset.

import pandas_profiling
from pandas_profiling import ProfileReport
profile = ProfileReport(df, explorative=True, dark_mode=True)
profile.to_file('output.html')

The above code will generate a report, which will be saved in the same folder as the running Notebook. This report contains a detailed descriptive summary of the dataset and allows for interactive custom analysis. Below are some snippet screenshots from the generated report.

38106903516a01af0c5a2c52f70bd916.png

Overall overview of DataFrame

b2d53cc52e2e0ceed93bcaf743a0df09.png

information about a single variable

2、D-Tale

D-Tale is a tool developed using a combination of Flask backend and React frontend. It allows viewing and analysis of Pandas data structures and seamless integration with Notebook and python/ipython terminals. Currently the tool supports Pandas objects such as DataFrame, Series, MultiIndex, DatetimeIndex and RangeIndex.

Install dtale using the code below.

!pip install dtale

The following code will return a table, you can directly interact with the table for data analysis operations. This includes data cleaning, highlighting outliers, checking for missing values, performing correlation checks, analyzing with charts, and more.

import dtale 
dtale.show(df)

After executing the above code, open the corresponding options in the table tab to perform data analysis operations, such as the following figure:

d9738e5ae9a51fba7a38555ab1c1c8ac.png

3 、 dataprep

Dataprep is an open-source Python library that automates the process of exploratory data analysis. (This was covered in our previous article)

!pip install dataprep

The following code will automatically generate the EDA report. Statistics for each variable can be checked individually in the report. And provides multiple charts for in-depth analysis.

from dataprep.eda import create_report
create_report(df)

4f4c0d829bb9e0a6994dabaeca0d4e87.png

The code snippet above is just a part of what the dataprep tool provides. Dataprep can also be used in NLP as it provides options such as checking word frequency.

Summarize

This article briefly introduces 3 very useful data visualization and analysis tools. They can automatically help us perform fast and detailed data analysis with very little code. I hope these three tools will be helpful to you.

Author: Tamanna Sharma

Recommended reading:

My 2022 Internet School Recruitment Sharing

My 2021 Summary

Talking about the difference between algorithm post and development post

Internet school recruitment research and development salary summary

For time series, everything you can do.

What is the spatiotemporal sequence problem? Which models are mainly used for such problems? What are the main applications?

Public number: AI snail car

Stay humble, stay disciplined, stay progressive

cb8a50f31f74cdb0063e4862e7f0e232.png

Send [Snail] to get a copy of "Hands-on AI Project" (AI Snail Car)

Send [1222] to get a good leetcode brushing note

Send [AI Four Classics] to get four classic AI e-books

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/123564284