I am getting an issue with the timestamp
column in my csv file.
ValueError: could not convert string to float: '2020-02-21 22:00:00'
for this line:
import numpy as np
import pandas as pd
import matplotlib.pylab as plt
from datetime import datetime
from statsmodels.tools.eval_measures import rmse
from sklearn.preprocessing import MinMaxScaler
from keras.preprocessing.sequence import TimeseriesGenerator
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.layers import Dropout
import warnings
warnings.filterwarnings("ignore")
"Import dataset"
df = pd.read_csv('fx_intraday_1min_GBP_USD.csv')
train, test = df[:-3], df[-3:]
scaler = MinMaxScaler()
scaler.fit(train) <----------- This line
train = scaler.transform(train)
test = scaler.transform(test)
n_input = 3
n_features = 4
generator = TimeseriesGenerator(train, train, length=n_input, batch_size=6)
model = Sequential()
model.add(LSTM(200, activation='relu', input_shape=(n_input, n_features)))
model.add(Dropout(0.15))
model.add(Dense(1))
model.compile(optimizers='adam', loss='mse')
model.fit_generator(generator, epochs=180)
How can I convert the timestamp
column (preferably when reading the csv) to a float?
Link to the dataset: https://www.alphavantage.co/query?function=FX_INTRADAY&from_symbol=GBP&to_symbol=USD&interval=1min&apikey=OF7SE183CNQLT9DW&datatype=csv
Performing Conversion On CSV Input Columns While Reading In The Data
So it turns out that you might not have wanted to use the date_parser
parameter after all. The converters
parameter is more along the lines of what we need. If we specify a conversion function for the 'timestamp' column like so:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
... converters={'timestamp':
... lambda t: pd.Timestamp(t).timestamp()})
>>> df
timestamp open high low close
0 1.582322e+09 1.2953 1.2964 1.2953 1.2964
1 1.582322e+09 1.2955 1.2957 1.2952 1.2957
2 1.582322e+09 1.2956 1.2958 1.2954 1.2957
3 1.582322e+09 1.2957 1.2958 1.2954 1.2957
4 1.582322e+09 1.2957 1.2958 1.2955 1.2956
.. ... ... ... ... ...
95 1.582317e+09 1.2966 1.2967 1.2964 1.2965
96 1.582317e+09 1.2967 1.2968 1.2965 1.2966
97 1.582317e+09 1.2965 1.2967 1.2964 1.2966
98 1.582317e+09 1.2964 1.2967 1.2962 1.2966
99 1.582316e+09 1.2963 1.2965 1.2961 1.2964
[100 rows x 5 columns]
Then the timestamp column looks like it successfully converted ^^ to float values per your requirement. The way the converters
parameter works is you set it to a dictionary with the column name as the key, and the callback as the value. You could also use the column number as the key - but it's clearer to use the name.
This strategy can be applied to other columns by providing callback functions to do any sort of conversion compatible with pandas. It's not limited to just this datetime to float case.
[side note: You may want to confirm that the machine learning package you're using expects these float values to be POSIX timestamps.]
Using the date_parser
parameter seems be to be only understood by read_csv()
as a way to have control over parsing the text to create datetime objects. Generally trying to use that to create a column of floats produced some strange results.
date_parser
could be useful if the timestamp data spans more than one column or is in some strange format. The callback can receive the text from one or more columns for processing. The parse_dates
parameter may need to be supplied with date_parser
to indicate which columns to apply the callback to. date_parser
is just a list of the column names or indices. An example of usage:
df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
date_parser=lambda t: pd.Timestamp(t),
parse_dates=['timestamp'])
pd.read_csv()
with no date/time parameters produces a timestamp column of type object
. Simply specifying which column is the timestamp using parse_dates
and no other additional parameters fixes that:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv',
parse_dates=['timestamp'])
>>> df.dtypes
timestamp datetime64[ns]
open float64
high float64
low float64
close float64
The date_parser
parameter was unnecessary in this case. I'm thinking this last example was all that your script may have needed.
Pandas provides some of its own date/time classes and functions, here's an example of pd.Timestamp
and converting it to a numpy compatible timestamp:
>>> pd.Timestamp('2020-02-21 22:00:00')
Timestamp('2020-02-21 22:00:00')
>>> pd.Timestamp('2020-02-21 22:00:00').asm8
numpy.datetime64('2020-02-21T22:00:00.000000000')
Conversion of DataFrame Columns After Reading in CSV
As another user suggested, there's another way to convert the contents of a column using pd.to_datetime()
. Here's an example:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>> df.dtypes
timestamp object
open float64
high float64
low float64
close float64
dtype: object
>>> df['timestamp'] = pd.to_datetime(df['timestamp'])
>>> df.dtypes
timestamp datetime64[ns]
open float64
high float64
low float64
close float64
dtype: object
>>>
>>> df['timestamp'] = df['timestamp'].apply(lambda t: t.timestamp())
>>> df
timestamp open high low close
0 1.582322e+09 1.2953 1.2964 1.2953 1.2964
1 1.582322e+09 1.2955 1.2957 1.2952 1.2957
2 1.582322e+09 1.2956 1.2958 1.2954 1.2957
3 1.582322e+09 1.2957 1.2958 1.2954 1.2957
4 1.582322e+09 1.2957 1.2958 1.2955 1.2956
.. ... ... ... ... ...
95 1.582317e+09 1.2966 1.2967 1.2964 1.2965
96 1.582317e+09 1.2967 1.2968 1.2965 1.2966
97 1.582317e+09 1.2965 1.2967 1.2964 1.2966
98 1.582317e+09 1.2964 1.2967 1.2962 1.2966
99 1.582316e+09 1.2963 1.2965 1.2961 1.2964
[100 rows x 5 columns]
Or to do it all in one shot without pd.to_datetime()
, it can be implemented as the following - this last method uses the same lambda as the first example at the top of this Answer:
>>> df = pd.read_csv('~/Downloads/fx_intraday_1min_GBP_USD.csv')
>>>
>>> df['timestamp'] = df['timestamp']
... .apply(lambda t: pd.Timestamp(t).timestamp())
>>>
This last method is very versatile using the
df.apply()
method. The function that's provided as a parameter is performed on each member of the column. Then this new column of values can be assigned back to the same, or another data frame column, or appended to the data frame.
So... There it is. Two examples on how to convert timestamp strings in a pandas column to float values. I've learned a few things trying to answer this question. Thank you for that @GGr.