Wrong x-values in pyplot of pandas dataframe after converting indices to integers. How can I get the correct values?

smcintyre247 :

SOLVED: I have a pandas dataframe (df) that contains data on number of Haitians who immigrated to Canada from 1980 to 2013 with the indices as the years, so

>>>len(df)
34
>>>df.index
Index(['1980', '1981', '1982', '1983', '1984', '1985', '1986', '1987', '1988',
   '1989', '1990', '1991', '1992', '1993', '1994', '1995', '1996', '1997',
   '1998', '1999', '2000', '2001', '2002', '2003', '2004', '2005', '2006',
   '2007', '2008', '2009', '2010', '2011', '2012', '2013'],
  dtype='object')

I want to convert the index to integers to make plotting easier, so I wrote

>>>df.index = df.index.map(int)
Int64Index([1980, 1981, 1982, 1983, 1984, 1985, 1986, 1987, 1988, 1989, 1990,
        1991, 1992, 1993, 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001,
        2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012,
        2013],
       dtype='int64')

The conversion seemed to work fine, but when I try to plot my data the x-values are wrong.

>>>df.plot()
...plt.title("Immigration from Haiti")
...plt.xlabel("Year")
...plt.ylabel("# of Immigrants")
...plt.text(2000, 6000, '2010 Earthquake-->')
...plt.show()

enter image description here

I have no clue where these x-values are coming from, but they are not the index values that I intended to use. How do I generate this plot with the correct x-values?

I know that I can leave the index values a string and use the position of the index for the added text in the plot (i.e. skip the convert to string and use plt.text(20, 6000, '2010 Earthquake-->') ), but I'd much rather use the actual year. Can you please tell me how to do this correctly and what I'm doing wrong?


Here's the full code for anyone that wanted it. I am still curious why it automatically offset the tick marks. This is for an EdX course on data visualization and the .xlsx data is safe. The dataframe that I was calling df above is actually named "haiti" in this code.

import numpy as np
import pandas as pd
import matplotlib as mpl
import matplotlib.pyplot as plt

#import data
df_can = pd.read_excel(
        "https://s3-api.us-geo.objectstorage.softlayer.net/cf-courses-data/CognitiveClass/DV0101EN/labs/Data_Files/Canada.xlsx",
        sheetname="Canada by Citizenship", skiprows=range(20), skip_footer=2)

#pre-process data a bit
df_can.columns.tolist()
df_can.index.tolist()
df_can.drop(["AREA", "REG", "DEV", "Type", "Coverage"], axis=1, inplace=True)
df_can.rename(columns={'OdName':'Country', 'AreaName':'Continent', 'RegName':'Region'}, inplace=True) #rename some columns to be more intuitive
df_can["Total"] = df_can.sum(axis=1) #add total # of immigrants column
df_can.set_index("Country", inplace=True) #change index from number to Country
df_can.columns = list(map(str,df_can.columns)) #convert years to string to avoid confusion
years = list(map(str,range(1980,2014))) #useful for plotting later

#line plot with mpl
mpl.style.use('ggplot')
haiti = df_can.loc['Haiti',years]
haiti.index = haiti.index.map(int)

haiti.plot(kind="line",figsize=(14,8))
plt.title('Immigration Trend of Top 5 Countries')
plt.ylabel('Number of Immigrants')
plt.xlabel('Years')
plt.text(2005, 6000, '2010 Earthquake---------->')
plt.show()
Sheldore :

You can turn off the offset as follows: You failed to provide us a working code so only you can test if it works. This has been explained in detail here

Way 1

fig, ax = plt.subplots()

df.plot(ax=ax)
ax.ticklabel_format(useOffset=False)

Way 2

 ax = df.plot()
 ax.ticklabel_format(useOffset=False)

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=12628&siteId=1