Slicing a multi-index pandas dataframe with a large list - Code World

Slicing a multi-index pandas dataframe with a large list

Others 2022-04-22 03:54:07 views: 0

najeem :

I have a large dataframe with multi-index. I wanted to slice this dataframe using a fairly large list. Below is a sample code. It is taking almost 10 seconds for this operation.

import pandas as pd
import numpy as np

df = pd.DataFrame(
    {
        "x": np.repeat(np.arange(10000), 50),
        "y": np.repeat(np.arange(50), 10000),
        "val": np.random.rand(50*10000)
    }
).set_index(["x", "y"])

large_list = range(5000,10000)

slice = df.loc[(large_list, slice(None)),:] # Takes 10 seconds on my machine

As a comparison, if I write this dataframe to an hdf file and read it with a where condition same as my slicing operation, it takes only 1.5 seconds!

df.to_hdf("sample.hdf", key="df", append=True)
df1 = pd.read_hdf("sample.hdf", "df", where='x in large_list')

Is there a faster way to slice in memory?

Andy L. :

If your intention is slicing multiindex by an arbitrary list, using query will be much faster

Create an arbitrary list from 5000 to 10000

np.random.seed(0)
large_list =  np.random.choice(list(range(5000, 10000)), 5000, replace=False)

In [2245]: large_list
Out[2245]: array([5398, 8833, 9836, ..., 6653, 7607, 7732])

x = df.query('x in @large_list')

Compare result

In [2246]: y = df.loc[(large_list, slice(None)),:]
In [2249]: np.allclose(x, y)
Out[2249]: True

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=170548&siteId=1

Slicing a multi-index pandas dataframe with a large list

Dealing with a multi-index pandas Series and DataFrame

Slicing pandas dataframe on equal column values

Slicing a multi-index pandas géants dataframe avec une grande liste

pandas—pandas.DataFrame.query与pandas.DataFrame.reset_index

Gets a list of column headers from pandas DataFrame

Get values from a list of dictionaries in a Pandas Dataframe

Transform list of tuples in pandas.DataFrame

Sort by both index and value in Multi-indexed data of Pandas dataframe

pandas statistics -json transfer list, a list of stitching to achieve dataframe

Pandas: DataFrame data selection method (index)

pandas DataFrame index (the difference between iloc and loc)

Replace Pandas Dataframe Value Based on Index Range

Python: slicing action list

pandas: get rows by comparing two columns of dataframe to list of tuples

Transform pandas dataframe columns to list according to number in row

Add ID found in list to new column in pandas dataframe

How to remove certain values from a pandas dataframe, which are not in a list?

Как Pandas DataFrame добавляет строку данных?

list list slicing method summary

Python list various slicing poses

Pandas 04_DataFrame library index data structure and select _

Get Index Minimum Value in Column When String - Pandas Dataframe

Reshape Pandas Dataframe with duplicate Index and fill missing rows

Reshape Pandas Dataframe with duplicate Index and fill missing rows

Finding the closest values in a multi-indexed dataframe in pandas

Pandas.DataFrame transpose

pandas basis, Serires, Dataframe

The data structure -DataFrame pandas

pandas in the deduplication data DataFrame

Recommended

The United States plans to restrict the export of large AI models to China and Russia

Apple to reach agreement with OpenAI to bring ChatGPT to iPhone

Ranking

whisper-webui installation tutorial is silky and easy to use

[Base] Laravel concepts laravel basis, the custom service provider: Contracts, ServiceContainer, ServiceProvider, Facades relations

Import torchvision error problem solving DLL: module not found

observer & watch & notify = pub & sub

A small turntable program [HTML + CSS + JS]

CorelDRAW 2018 shortcuts Daquan

Supervise el botón de menú para lograr un gatillo de presión prolongada

JS将时间秒转换成天小时分钟秒的字符串

RIP basic configuration

[Deleted] solution to a problem a few questions (Noip1994)

Daily

More

2024-05-11(32)

2024-05-10(34)

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)