A summary of the analysis data

Business Background:
    There is 100 million data in a table, which stores personal information (city, unit, school, etc.), and these contents are stored in fields as json. After analyzing these data, we need to calculate the discrete model of these contents, and take out TOP100.
    
    The first implementation plan is to find out 1W data in batches, and then update them one by one after analysis, and find that the update is too slow. The following improvements were subsequently made, which are also summarized:
 
1. For the operation of querying and then updating, use MySQL to consider ON DUPLICATE KEY UPDATE
   But the table must have a primary key or a unique index
 
2. A large number of operations need to be updated in batches, eg:
    
INSERT INTO  sina_user_count(name,type,count)  VALUES  (?,?,?),(?,?,?),(?,?,?)...
                        
ON DUPLICATE KEY UPDATE  count=count+VALUES(count)
 
3. Do not splicing strings, use the placeholder "?", because there are special characters in the stored content, if you splicing SQL, you will encounter an exception
 
4. Pay attention to clearing the cached data in the loop, this is a bug written
The code snippet is as follows:
public void analysis() {
        final long countId = 1;
  
        Map<String, Integer> comps = new HashMap<String, Integer>(1000);
       
        Map<String, Object> row = this.findForMap("select * from table_count where id=?", countId);
        Long maxId = (Long)  row.get("count");

        List<Map<String, Object>> list = this.findForList("select * from table where id> ? limit ?", maxId, 10000);

        while (list != null && !list.isEmpty()) {
            for (Map<String, Object> map : list) {
                maxId = (Long) map.get("id");
                //Get json information
                String careers = (String) map.get("careers");
                if (careers != null)) {
                    try {
                        JSONArray array = JSONArray.fromObject(careers);
                        for (int i = 0; i < array.size(); i++) {
                            JSONObject obj = array.getJSONObject(i);
                            count(comps, obj.getString("company"));
                        }
                    } catch (Exception e) {
                        logger.error(e.getMessage());
                    }
                }

                
            //renew
            logger.info(">>>>>>>>>>>>>>>>> update start");
            updateCount(comps, 1);
            logger.info(">>>>>>>>>>>>>>>>>update end");
            //update max id and query
            this.update("update_sina_user_count_max", maxId, countId);
            logger.info(">>>>>>>>>>>>>>>>>>>=" + maxId);
            list = this.findForList("query_sina_user_1", maxId, 10000);
            logger.info(">>>>>>>>>>>>>>>>>list.size=" + list.size());
        }

    }

    private void count(Map<String, Integer> map, String key) {
        if (DataUtil.isEmpty(key) || key.length() < 2)
            return;
        if (map.containsKey(key)) {
            Integer val = map.get(key);
            map.put(key, ++val);
        } else {
            map.put(key, 1);
        }
    }

    private void updateCount(Map<String, Integer> map, int type) {
        StringBuilder content = new StringBuilder();
        List<Object> params = Lists.newArrayList();
        int i = 0;
        for (String name : map.keySet()) {
            Integer count = map.get(name);
            content.append("(?,?,?)");
            params.add(name);
            params.add(type);
            params.add(count);
            if (++i < 10000) {
                content.append(",");
            } else {
                String sql = "INSERT into table_count(name,type,count) VALUES " + content.toString()
                        + " ON DUPLICATE KEY UPDATE count=count+VALUES(count)";
                logger.info(">>>>>>>>>>>>>>>>>inner:" + i + ",type:" + type);
                this.getJdbc(0).update(sql, params.toArray());
                content = new StringBuilder();
                i = 0;
                params.clear();
            }

        }
        if (!params.isEmpty()) {
            String sql = "INSERT into  table_count(name,type,count) VALUES " + content.toString()
                    + "('最大ID',0,0) ON DUPLICATE KEY UPDATE count=count+VALUES(count)";
            logger.info(">>>>>>>>>>>>>>>>>outer:" + i + ",type:" + type);
            this.update(sql, params.toArray());
        }
        //The first version of the program does not have this code, which causes the map to become larger and larger, which is a low-level error.
        map.clear();
    }
 

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326930508&siteId=291194637