Processing data table pandas characters and date data

    Earlier we have learned about strings being processed and expression, but are based on operating a single string or a list of strings. The following will learn how to operate based on the data frame character variables.

At the same time explain how to remove the year, month, day of the week from the date of variables, how to calculate the time difference between two dates.

Examples are as follows:

 

 For the above data, the reader can try to answer these questions about the character and date type without looking at the code below:

1. How to change the data type birthday date of birth and phone number tel two fields

2. How do I add two fields according to age and length of service and date of birth birthday date start_work two fields to work

3. How will the middle tel four-digit phone number hidden.

4. How to remove professional personnel based on other information for each field

code show as below:

PD PANDAS AS Import
# read data
DF = pd.read_excel (r'd: data_test03.xlsx ')
# data types of variables
df.dtypes
# birthday converted to a date variable
df.birthday = pd.to_datetime (df.birthday , the format = '% the Y /% m /% D')
# speaks into a string variable tel
df.tel = df.tel.astype ( 'STR')
# new Age and seniority two
df [ 'age'] pd.datetime.today = () year -. df.birthday.dt.year
DF [ 'workage'] = pd.datetime.today () year -. df.start_work.dt.year
# phone number four intermediate Hide up
df.tel = df.tel.apply (FUNC the lambda = X: x.replace (X [. 3:. 7], 'XXXX'))
# removed mailbox name
df [ 'email_domain'] = df.email.apply ( X = the lambda FUNC: x.split ( '@') [. 1])
# remove professional personnel information
df [ 'profession'] = df.other.str.findall ( ' professional:? (*)')         # Pay close attention to this place when I start debugging (. *?) After the comma are written in the English mode, which is actually the result of a table in the Chinese mode, so I start matching empty, into Chinese comma pattern before displaying the normal
# removed birthday, start_work and other variables
df.drop ([ 'Birthday', 'start_work', 'other'], Axis =. 1, InPlace = True)
DF

 out:

 

 

1, pd.to_datetime (date to convert, format =),

2,pd.to_datetime.today( ).year  ,pd.to_datetime.now( ).year

3, astype string methods: Field type conversion,

4, the date .dt.year: dt must be added

5, delete fields:. Df drop ([], axis = 1, inplace = True)

    df is the name of the data frame, the list [] where is the field name to be deleted,

    axis = 1 indicates the horizontal direction, because the default drop method is used to delete rows in the database,

    inplace = True said that it would affect the original array

 6, the sequence apply () method, apply (func =,)

 7, the use .str sequence, can be converted to a string

 

 

Guess you like

Origin www.cnblogs.com/tinglele527/p/11906085.html