一次数据清洗

版权声明:本文为博主原创文章,未经博主允许不得转载。 https://blog.csdn.net/CoolScript/article/details/88167350

省市区 是否为空

update all_tables t set t.area='0',t.area_id=0 WHERE (area is null or area='0');
update all_tables t set t.city='0',t.city_id=0 WHERE (city is null or city='0');
update all_tables t set t.province='0',t.province_id=0 WHERE (province is null or province_id='0');

市和省份 是否对应

SELECT DISTINCT city,city_id  from all_blueky_handled_data_not_insert WHERE ( city_id!=0 and  left(province_id,1)!=left(city_id,1))

检验邮箱和手机号

delete FROm  all_tables WHERE (email not LIKE "%@%" and phone not REGEXP '^1[3456789][0-9]{9}$');
update all_tables set email='0' WHERE email not LIKE "%@%";
update all_tables set phone='0' WHERE phone not REGEXP '^1[3456789][0-9]{9}$';

名字不是中文

DELETE from f_610100 WHERE LENGTH(name)=CHAR_LENGTH(name);

去重

DELETE FROM f_110100 WHERE id NOT IN ( SELECT temp.min_id FROM ( SELECT MIN(id) min_id FROM f_110100 GROUP BY phone,email )AS temp )

直辖市city_id处理

update all_blueky_handled_data_not_insert set city_id=110100 WHERE city_id=110000;
UPDATE all_blueky_handled_data_not_insert set city_id=120100 WHERE city_id = 120000;
UPDATE all_blueky_handled_data_not_insert set city_id=310100 WHERE city_id = 310000;
UPDATE all_blueky_handled_data_not_insert set city_id = 500100 WHERE city_id=500000;

猜你喜欢

转载自blog.csdn.net/CoolScript/article/details/88167350