Character text
Pandas provides a set of string functions, which can conveniently operate on string data. Most importantly, these functions ignore NaN values. The following methods almost all support Python's built-in string functions. Some methods of Pandas are supportedRegular expression, Such as the following replace()
, you can try /xyx more.
Function name | description |
---|---|
lower() | Convert the string in Series/Index to lowercase. |
upper() | Convert the string in Series/Index to uppercase. |
len () | Calculate the length of the string. |
strip() | Help remove spaces (including newlines) from each string in the series/index on both sides. |
split(’’) | Split each string with the given pattern. |
cat(sep=’ ') | Concatenate series/index elements using the given separator. |
get_dummies() | Returns the data frame (DataFrame) with one-hot encoding value. |
contains(pattern) | If the element contains a substring, the boolean value True for each element is returned, otherwise it is False. |
replace(a,b) | Replace value a with value b. |
repeat(value) | Repeat each element the specified number of times. |
count(pattern) | Returns the total number of occurrences of each element in the pattern. |
startswith(pattern) | Returns true if the element in the series/index starts with a pattern. |
endswith(pattern) | Returns true if the element in the series/index ends in a pattern. |
find(pattern) | Returns the position where the pattern first appeared. |
findall(pattern) | Return a list of all occurrences of the pattern. |
swapcase() | Change letter case. |
islower() | Check whether all characters in each string in the series/index are lowercase and return a boolean value |
isupper() | Check whether all characters in each string in the series/index are capitalized, and return a boolean value |
isnumeric() | Check whether all characters in each string in the series/index are numbers, and return a boolean value. |
The above method can be used as needed. The only thing to note is that after the selection, pay attention to use and str
convert it into a string, so that it is effective.
such as
import pandas
df = pd.DataFrame({
'name':['jack','MIKE']})
df['name'] = df['name'].str.upper() #全部转化为大写
Time series The generation time range
can be used pd.date_range(start=None, end=None, periods=None, freq='D')
.
- The combination of start and end and freq can generate a set of time indexes with frequency freq within the range of start and end
- The combination of start and periods and freq can generate periods of time index with frequency freq starting from start
Possible values of freq:
code demonstration
import numpy as np
import pandas as pd
index1 = pd.date_range('2020-05-14',freq="D",periods=5)
df = pd.DataFrame(np.random.rand(5),index=index1)
index2=pd.date_range('2020-6-16 12:45',freq="T",periods=5)
ndf = pd.DataFrame(np.random.rand(5),index=index2)
Results screenshot