Practical Tips in Python Data Analysis

This article lists some tips to improve or speed up your daily data analysis work, including:

1. Pandas Profiling

2. Use Cufflinks and Plotly to plot Pandas data

3. IPython magic commands

4. Formatting in Jupyter

5. Jupyter shortcuts

6. Make a cell have multiple outputs simultaneously in Jupyter (or IPython)

7. Create slideshows on the fly for Jupyter Notebook

1. Pandas Profiling

This tool is very effective. The figure below shows the result of a simple call to df.profile_report()

Using the tool simply installs and imports the Pandas Profiling package.

This article will not describe this tool in detail, to learn more, please read: https://towardsdatascience.com/exploring-your-data-with-just-1-line-of-python-4b35ce21a82d

2. Use Cufflinks and Plotly to plot Pandas data

"Experienced" data scientists or data analysts are mostly familiar with matplotlib and pandas. That is, you can quickly plot a simple pd.DataFrame or pd.Series just by calling the .plot() method:

A little boring?

That's great, but what about drawing an interactive, zoomable, expandable panorama? It's time for Cufflinks *to* go! (Cufflinks is a further wrapper based on Plotly.)

To install Cufflinks in your environment, just run ! pip install cufflinks --upgrade in a terminal. Check out the picture below:

The effect is much better!

Note that the only thing that changes from the above image is the import and setup of Cufflinks cf.go_offline(), which changes the .plot() method to .iplot().

Other methods like .scatter_matrix() can also provide great visualization results:

Friends who need to do a lot of data visualization work can read the documentation of Cufflinks and Plotly to find more methods.

  • Cufflinks documentation: https://plot.ly/ipython-notebooks/cufflinks/

  • Plotly documentation: https://plot.ly/

3. IPython magic commands

The "magic" of IPython is a series of enhancements of IPython based on Python's standard syntax. Magic commands include two methods: line magics: prefixed with %, operate on a single input line; cell magics: prefixed with %%, operate on multiple input lines. Here are some useful features provided by IPython magic commands:

%lsmagic: find all commands

If you only remember one magic command, it has to be this one. Executing the %lsmagic command will provide a list of all available magic commands:

%debug: interactive debug

This is probably the magic command I use most often.

Most data scientists have encountered this situation: a block of executed code keeps breaking, and you write 20 print() statements in desperation, trying to output the content of each variable. Then, when you finally fix the problem, you have to go back and remove all the print() statements again.

But never again. After encountering a problem, just execute the %debug command to execute any part of the code you want to run:

What's going on in the picture above?

  1. We have a function that takes a list as input and squares all even numbers.

  2. We run the function, but something goes wrong. But we don't know what's going on!

  3. Use the %debug command on the function.

  4. Let the debugger tell us the values ​​of x and type(x).

  5. The problem is obvious: we passed '6' into the function as a string!

This is very useful for more complex functions.

%store: passing variables between notebooks

This command is also pretty cool. Suppose you spend some time cleaning data in a notebook, and now you want to test some functionality in another notebook, do you implement the functionality in the same notebook, or save the data and load it in another notebook? After using the %store command, none of these operations are required! This command will store the variable, which you can retrieve in any other notebook:

  • %store [variable] stores variables.

  • %store -r [variable] Read/retrieve stored variable.

%who: List all global variables.

Have you ever encountered a situation where you forget the variable name after assigning a value to it? Or accidentally deleted the cell responsible for assigning the value to the variable? Using the %who command, you can get a list of all global variables:

%%time: timing magic command

Use this command to get all timing information. Just apply the %%time command to any executable code and you can get output like this:

%%writefile: write cell content to file

This magic command is very useful when writing complex functions or classes in a notebook and want to save them in a dedicated file. Just prefix the cell of the function or class with %%writefile and the name of the file you want to save to:

As shown above, we can save the created function to the utils.py file, and then import it at will. This can also be done in other notebooks, as long as it belongs to the same directory as the utils.py file.

4. Formatting in Jupyter

This tool is cool! Jupyter takes into account HTML/CSS formatting in markdown. Here are the functions I use most often:

Blue, fashion:

<div class="alert alert-block alert-info">   This is <b>fancy</b>!</div>
Red, slightly flustered :
<div class="alert alert-block alert-danger">   This is <b>baaaaad</b>!</div>

Green, calm:​​​​​​

<div class="alert alert-block alert-success"> This is <b>gooood</b>!</div>

The diagram below shows how they work:

This is very useful when you want to present some findings in Notebook format!

5. Jupyter shortcuts

To understand and learn keyboard shortcuts, you can use the Command Palette: Ctrl + Shift + P to get a list of all notebook functions. Here are a few of the most basic commands:

  • Esc: Enter command mode. In command mode, you can use the arrow keys to navigate within the notebook.

In command mode:

  • A and B: Insert a new cell above (Above) or below (Below) the current cell.

  • M: The current cell is turned into Markdown state.

  • Y: The current cell is in the code state.

  • D,D: Delete the current cell.

  • Enter: The current cell returns to edit mode.

In edit mode:

  • Shift + Tab: Provides a docstring (documentation) for the object you type in the current cell. Keep using this shortcut to cycle through document mode.

  • Ctrl + Shift + -: Split the current cell at the cursor position.

  • Esc + F: Find and replace code (excluding output).

  • Esc + O: toggle cell output.

Select multiple cells:

  • Shift + Down and Shift + Up: Select the cell below or above.

  • Shift + M: Merge selected cells.

Note that after selecting multiple cells, you can perform delete/copy/cut/paste/run operations in batches.

6. Make a cell have multiple outputs simultaneously in Jupyter (or IPython)

Have you ever wanted to demonstrate the .head() and .tail() of a pandas DataFrame, but had to give up because it was too much work to create an extra code cell to run the .tail() method? Fear not now, you can display the output you want with the following line of code:

from IPython.core.interactiveshell import InteractiveShell

InteractiveShell.ast_node_interactivity = "all"

The figure below shows the result of multiple outputs:

With RISE, you can instantly turn your Jupyter Notebook into a slideshow with just one keystroke. And the notebook is still active, you can perform live coding while presenting your slides!

To use the tool, you just need to install RISE via conda or pip.

conda install -c conda-forge rise

or

pip install RISE

Now you can click the new button to create nice slideshows for your notebook:

Guess you like

Origin blog.csdn.net/veratata/article/details/128656094
Recommended