Extract list element in pandas series and convert to datetime

Study Astrophysics :

The series which I am handing now looks like this:

qa_answers['date_of_birth']


1                 []
2                 []
...
2600    [1988/11/23]
2601     [1992/7/15]
2602    [1993/11/8"]
2603    [1997/08/31]
2604     [1971/2/11]
2605    [1979/11/1"]
2606     [1993/9/19]
2607    [1985/01/12]
2608    [1977/11/3"]
2609     [1981/7/2"]
2610     [1952/4/9"]
2611     [1991/8/20]
2612     [1993/1/31]
Name: date_of_birth, dtype: object

This problem might consist of two parts:

  1. I want to convert the type of the series (object) to datetime.
  2. But when I tried to use to_datetime, I got this error.
qa_answers['date_of_birth'] = pd.to_datetime(qa_answers['date_of_birth'],errors='coerce')

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-147-96dff0351764> in <module>()
     28 qa_answers['date_of_birth2']= qa_answers['answers'].str.findall(dob2)
     29 qa_answers['date_of_birth'] = qa_answers['date_of_birth1'] + qa_answers['date_of_birth2']
---> 30 qa_answers['date_of_birth'] = pd.to_datetime(qa_answers['date_of_birth'],errors='coerce')
     31 
     32 

4 frames
/usr/local/lib/python3.6/dist-packages/pandas/core/algorithms.py in unique(values)
    403 
    404     table = htable(len(values))
--> 405     uniques = table.unique(values)
    406     uniques = _reconstruct_data(uniques, dtype, original)
    407     return uniques

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.unique()

pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()

TypeError: unhashable type: 'list'

So I guess I should try to extract the element out of the list first. How can I do this job?

p.s. Also, could you give some tips for removing ' " ' in the element?

Serge Ballesta :

You must first convert non empty lists to their first element and clean it and convert empty list to an empty string:

df.date_of_birth.apply(lambda x: x[0].replace('"', '') if len(x) > 0 else '')

gives:

1                 
2 
...                
2600    1988/11/23
2601     1992/7/15
2602     1993/11/8
2603    1997/08/31
2604     1971/2/11
2605     1979/11/1
2606     1993/9/19
2607    1985/01/12
2608     1977/11/3
2609      1981/7/2
2610      1952/4/9
2611     1991/8/20
2612     1993/1/31

Then you can easily convert that to a datetime column:

pd.to_datetime(df.date_of_birth.apply(lambda x: x[0].replace('"', '') if len(x) > 0 else ''))

you get:

1             NaT
2             NaT
2600   1988-11-23
2601   1992-07-15
2602   1993-11-08
2603   1997-08-31
2604   1971-02-11
2605   1979-11-01
2606   1993-09-19
2607   1985-01-12
2608   1977-11-03
2609   1981-07-02
2610   1952-04-09
2611   1991-08-20
2612   1993-01-31
Name: date_of_birth, dtype: datetime64[ns]

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=33163&siteId=1