Pandas text data processing and time series

Character text
Pandas provides a set of string functions, which can conveniently operate on string data. Most importantly, these functions ignore NaN values. The following methods almost all support Python's built-in string functions. Some methods of Pandas are supportedRegular expression, Such as the following replace(), you can try /xyx more.

Function name description
lower() Convert the string in Series/Index to lowercase.
upper() Convert the string in Series/Index to uppercase.
len () Calculate the length of the string.
strip() Help remove spaces (including newlines) from each string in the series/index on both sides.
split(’’) Split each string with the given pattern.
cat(sep=’ ') Concatenate series/index elements using the given separator.
get_dummies() Returns the data frame (DataFrame) with one-hot encoding value.
contains(pattern) If the element contains a substring, the boolean value True for each element is returned, otherwise it is False.
replace(a,b) Replace value a with value b.
repeat(value) Repeat each element the specified number of times.
count(pattern) Returns the total number of occurrences of each element in the pattern.
startswith(pattern) Returns true if the element in the series/index starts with a pattern.
endswith(pattern) Returns true if the element in the series/index ends in a pattern.
find(pattern) Returns the position where the pattern first appeared.
findall(pattern) Return a list of all occurrences of the pattern.
swapcase() Change letter case.
islower() Check whether all characters in each string in the series/index are lowercase and return a boolean value
isupper() Check whether all characters in each string in the series/index are capitalized, and return a boolean value
isnumeric() Check whether all characters in each string in the series/index are numbers, and return a boolean value.

The above method can be used as needed. The only thing to note is that after the selection, pay attention to use and strconvert it into a string, so that it is effective.
such as

import pandas
df = pd.DataFrame({
    
    'name':['jack','MIKE']})
df['name'] = df['name'].str.upper() #全部转化为大写

Time series The generation time range
can be used pd.date_range(start=None, end=None, periods=None, freq='D').

  • The combination of start and end and freq can generate a set of time indexes with frequency freq within the range of start and end
  • The combination of start and periods and freq can generate periods of time index with frequency freq starting from start

Possible values ​​of freq:
Parameter Description
code demonstration

import numpy as np
import pandas as pd
index1 = pd.date_range('2020-05-14',freq="D",periods=5)
df = pd.DataFrame(np.random.rand(5),index=index1)
index2=pd.date_range('2020-6-16 12:45',freq="T",periods=5)
ndf = pd.DataFrame(np.random.rand(5),index=index2)

Results screenshot
Results screenshot

Guess you like

Origin blog.csdn.net/qq_44091773/article/details/106078855