Insight into the data behind the report's major epidemic of "routine"

During the plague (zhái) Love (jiā), data analysis emerged in the field a lot of folk masters, the players have done their best data, or parameter adjustment simulation program to simulate the spread of the virus, do not go out to emphasize an important contribution to controlling the spread; or use natural language + word cloud processing tools, visual changes daily news shows the evolution of hot words, or how to live case data in real-time teaching crawling on this site for further analysis.

These data modeling capabilities, data development technology is certainly very valuable, but we also found that, everyone can get started, statistics, data descriptive analysis, the same can play a great insight into the role and value.

 

Seven methods of data analysis

Early in the No. 21 January, the public's attention on the epidemic started climbing, the number of provinces and cities have public case in previous years, the spring migration of data and correlation analysis of data made a "slightly rough", according to the preliminary verification of positive correlation noted that some cities in Hubei province and Wuhan have close ties, the state of the epidemic may be underestimated, outside of the major cities in Hubei Province to strengthen early warning airports or railways. The analysis of the trend to make full use of the monitoring, the horizontal contrast, dismantling dimension judged.

Insight into the data behind the report's major epidemic of "routine"

As in "paper clip" video No. science, so that a small part of the data to derive impressive.

⁃ First, the authors believe, very different inside and outside of the epidemic development of Hubei Province, confirmed the diagnosis of the working pressure in Hubei province and large, it is likely there is a lag in digital, so to Hubei, Hubei and non-data "split view" .

⁃ The next step, he thought divided by the total number of confirmed fatality rate obtained by the total number of dead is not accurate, rapid increase in the number of confirmed (denominator) will dilute the percentage of cases, so he chose to take as close to the "cohort" of way to deal with.

 ⁃ Furthermore, he was based on the literature, the report confirmed that the average time to report the death of eight days, then the last 3 days of the new deaths from the high probability that new cases diagnosed eight days ago, in such a "cohort "among obtain, then the number of people, if the time being that mortality within Hubei Province, Hubei Province, is also a similar level in addition to the anti-people available on the infection in about 1.1% of the mortality outside Hubei Province.

 ⁃ based on current information, the fatality rate in Hubei province will be higher than in other regions, and therefore the results may be thousands of large this level, which is the follow-up to the CDC data disclosed very close, and sophisticated analysis of benefit in the data reasonable "dimension split" and apply the idea of ​​"cohort."

Insight into the data behind the report's major epidemic of "routine"

还有我们每天会关注的疫情数据报表。以丁香园为例,丁香园出品的数据报表,用公众都能理解的朴素数据分析,细致解读国家和各地卫健委公布的疫情数字,帮助大家提高对疫情的正确认知。它善于:

-不靠肉眼看趋势,用环比量化增长率

-针对数据波动(如新增确诊一日暴增一万,病死率降低后又逐步走高),有理有据给予说明

-对比SARS、MERS、H7N9等重大疫情的相关数据,认识本次疫情特征

-将关键指标按省份/城市拆分,结论更清晰明了

-除了宏观指标,就特殊群体感染情况(如老年人、医务人员)做详细分析

-确保指标计算与分析解读的专业性,及时指正市面上流传的错误图表

Insight into the data behind the report's major epidemic of "routine"

不难发现,我们每天看到的优秀数据解读背后蕴含着数据分析的七个方法:

  • 趋势监测:指标定义正确,历史口径一致

  • 横向对比:参照对象可比,广泛收集数据

  • 维度拆解:维度拆分合理,结论指导行动

  • 过程拆解:业务逻辑清晰,指标表征转化

  • 因素拆解:铺展相关因素,数据掌握全貌

  • 分群洞察:分群不重不漏,圈人深度描摹

  • 个案细查:采集最细颗粒,多源数据关联

三驾马车 产出高价值数据分析

现在已经是2月末,大部分人已陆续复工,那么回到我们自己的业务上,如何更好的做好数据监测呢?

数据分析光有思路还远远不够,对具体业务的理解、数据采集的质量、分析工具的灵活是让数据分析高效率地产出价值的三驾马车。有了业务理解,才能提出合适的问题、规划数据需求,在采集上就尽量确保全面、口径一致、颗粒度满足拆分需求,到了分析环节的时候,有灵活的工具来实现各种折腾数据的想法,再有业务理解去加持数据的解读,这样才是真正能发挥价值的数据分析工作。

Insight into the data behind the report's major epidemic of "routine"

首先,业务理解和数据采集是数据分析、数据化运营非常重要的前提条件,指标体系就相当于是二者之间的重要桥梁,也是很重要的一个落地产物和载体。如果是数据相关的岗位,强烈建议大家去牵头了解各个业务方、甚至是管理层,他们的业务目标是什么,他们想要看数据是要回答什么样的问题,从而避免成为一个被动的、没有灵魂的SQL Boy。

If a product, business operations, and so the post, once again thinking on this issue is not excessive, although the "core indicators = operational phase characteristics of the industry * * business strategy," but the first two belong to the general rule, the same industry, the same stage of development of the enterprise, also because of the business model advantages, focusing on the development of different, tailor-made core indicators, therefore, superior to the first two factors to some extent, "corporate strategy" on, not only a monitoring role, it is a guide, represents the direction of the strategic decision-making, business goals.

Insight into the data behind the report's major epidemic of "routine"

Next, after a clear core indicators, the daily indicators need to use hierarchical classification should be prepared, not only conducive to the management and use of data, but also put forward the need for a comprehensive specification Buried work to ensure collection of accurate and consistent. Generally comply with the core indicators of strategic management level, the principle of sub-index business line level, the implementation level of business process indicators, in particular the dismantling of not strictly definite rules, a few common methods are:

- Similar DuPont decomposition tree, try to keep a clear relationship between the index formula

- user lifecycle analysis * body, with different analytical perspectives, with precipitation appropriate dimension

- Or, in accordance with the direct line of business / team divided responsibilities, more convenient collection requirements

Guess you like

Origin www.cnblogs.com/umengplus/p/12412057.html