Big Data project - Internet precision marketing - data cleansing

Use clean kettle Data:

 1. Create a new convert to remove duplicate records handset sales information table of
  requirements: to remove all spaces in this field, to facilitate subsequent polymerization statistics, letters in the case, remove all the special characters (punctuation marks) in this field

      Here the choice of ordering added to recombinant member, can also be used to re-hash. Then use string manipulation to brackets, unified case. Replace the string regular expression to remove special characters.

 

 

 

 

 

 2. Create a new convert to remove duplicate records User comments information table

  And the above operation is similar to a heavy

 

 

3. New conversion processing user information table birth date field (converting May 20, 2019 as 2019-5-20)
that I still use regular, the years (fill in the "(Year | January)") change to "-" the day replace empty

 

 effect:

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/zhaochenguang/p/11484272.html