I retrieve a csv file from nasdaq website with a few columns (Ticker, MarketCap...). I use read_csv from pandas to get a dataframe. My problem is that I can't convert the format of the MarketCap column into a number. This is how the MarketCap column looks like:
MarketCap
$5.54B
$526.85M
$28.41M
nan
nan
Ideally I would want to drop the $ sign and convert B into 1'000'000'000 and M into 1'000'000 The replace/to_replace functions in pandas don't seem to work here. I would like to update my dataframe as follow:
MarketCap
5'540'000'000'000
526'850'000'000
28'410'000'000
nan
nan
(I used ' as thousand separator just for clarity). I don't care about the nan values, so this can't be dropped/ignored for now.
I tried to use the replace method from pandas as follow:
df['MarketCap].replace(to_replace= ['B', 'M'], values= ['*1000000000', '*1000000'], inplace=True)
unfortunately since the column is of string format the above doesn't apply the multiplication.
Use Series.str.strip
with Series.str.extract
, then multiple first column converted to floats and second mapped by Series.map
:
df1 = df['MarketCap'].str.strip('$').str.extract(r'(\d+\.\d+)([BM]+)')
df['MarketCap'] = df1[0].astype(float) * df1[1].map({'B': 1000000000, 'M':1000000})
print (df)
MarketCap
0 5.540000e+09
1 5.268500e+08
2 2.841000e+07
3 NaN
4 NaN