Big Data applications in the media industry - "big data business practice route" of the two

Author: Ali cloud MVP Qi Jun

This article is the "MVP Time" video course "big four classes understand enterprise data practice route," the text version, see the video MVP Time Course Home

The current video visibility courses straight

The period Recap:
Status and pain points "big data business practice route," the enterprise big data
next issue of the link:
characteristic analytical business data - "big data business practice route" of the three

The period we are talking about the present situation and pain points of enterprise big data, we issue to the media industry, for example, in-depth analysis of large data application in the media industry.
First look at the media industry workflow:


13492518-57729d7159c67b8a.png

Journalists to produce a content, may take the form of text, and so on, whether you are a reporter or editor or a video of the United States Code, or television reporters and editors, and so the production of content will be stored in the service database, business our database usually present inside the room inside the unit, the media industry too, will be re-deposited into their small room inside, has its own business database, then, the data transmission through the channel out of service after the data stored in the database, for example, out through other channels, sites, channels straight media mobile terminal channels, channel TV video, etc., these channels out what is produced viewers, readers, listeners after the get feedback, or no feedback, but the final data will be characterized by staff That statisticians statistics. If the media is straight, unlike the electronic media have ratings, the concept of the amount of reading, but the amount of data that will be directly borne screening capabilities, which can be found inside the system or within the enterprise screening system. Data but also scattered in different systems inside, Zhen given there Zhen given system, the site has a CMS site, the video has a dedicated video-on-demand system, APP has a client APP, the new media have clients new media, such as headlines, etc. other media outlets, which most of the information they are recombined by our statisticians, categorized into our own businesses such as newspapers, magazines go into internal distribution. This process needs to draw a statistical report.

1, read a report in the newspaper to see the
2, read a news article on TV to see
3, site watching a video, the video how many minutes, about what kind of things

See these three things, it may be manifestations are not the same, but in the entire interior of the news media, there is a starting point of departure, departure three data extending from one news tips, these diffusion spread out information reverse recovery back, talk to this association and binding three clues:
First, to resolve revenue through such an association and binding, you can go to see what specific kind of news, what kind of articles easier to get readers love, or its advertisers would be better at what kind of content.
These revenues are parsed to resolve our revenue through such a process our statisticians report drawn up after the reverse analysis.
Second, the performance appraisal. Performance appraisal is mainly for our reporters, editors, staff, United States Code, such a television editorial staff performance appraisal, this will be linked to the year-end awards, monthly quarterly bonus will have some relevance.
This is our traditional media throughout a workflow, from the beginning to the end the work involved in performance assessment, an overview of the whole process derived.

Process Workflow version:

13492518-35ae8b5ad3f0d358.png

Way to follow up human flesh propagation path: The path here is for example spread to the micro-channel, or spread to a certain Web site, spread to electronic publications, television and other kinds of new media.
By analyzing superficial overview of the spread of effect: Why it would be a superficial? Human flesh tracking methods must not have a good effect, because of its timeliness, the results of statistical time feedback node, the propagation path of each platform produced is not the same, for example, there is an article placed in the headlines today today, 15:00 to statistics, the amount of reading it may be in 1500, but this number you recorded gone, but you come back tomorrow night, the user may be a particular label at 9 o'clock in the batch reading this categories of people you like to watch this type of article, this might have an explosive growth, there will be 20,000 or even 50,000 times the amount of reading, the statistics out this way in front of 1500 due to the timeliness issue becomes extremely inaccurate, this is the first issue of timeliness bring.
The second is that too many communication channels, limited manpower, not through human essentially 100% coverage, most of the statistics are the human eye to identify, record and then aggregated to EXCLE inside, or there are more advanced methods, semi automated way to crawl, to grab some of the amount of reading some electronic version of the comment data, etc. by crawlers. Inevitably restrictions channels, and reproduced forwarding restrictions, not very comprehensive collection come back, this is the biggest impediment to the dissemination of results encountered.

Summary:
1) the timeliness of
2) Channel coverage

Paid by the content or value-added services to generate revenue: for example, by paying to see themselves, or through a heterozygous, this impurity is to buy a watch. Or that your content is free, but there are some, like advertising, or helping others to do some PR branding, revenue generated by value-added way.
Extensive performance appraisal: Due to pre-process fairly thick, so this is certainly not a comprehensive assessment of staff unfair or excessive payment of performance, obviously not as good value, assessed standard performance evaluation through an intuitive feeling that this it will produce a performance appraisal extensive.
From the above graphic version to process version, in addition to not fine the problem, if nothing is causing errors in it, before we do the media industry think so, make do with what adverse consequences does not appear, but in we have such an era, competitors are ahead of him, and we had to run forward, we will analyze this a bit, I got a point to tell the media industry customers where the problem is.
The picture below can be seen that really looks like a normal media workflows.

13492518-316d92b72c7cc9f4.png

The first half of the figure is the same and we talked about above, but starting from the communication channels, your data will start a runaway out of control.

13492518-b0e0b32e48181561.png

This process will have a person who could do mechanical things and things that are not accurate, the next operation, statistics, revenue, performance will depend on the statistics because the data is not ready, just follow all the links are leading to problems, this the biggest problem is that the process of artificial barriers were not ready for data collection, data lead to runaway, uncontrolled. Man-made there will affect all aspects of a number of objective and subjective, etc., will be susceptible to a series of mistakes, but they are often the most difficult to control and difficult to manage.
In the entire workflow of the most important of the three links before adding the person who does not fly to complete this work, so let your entire data environment should be very insecure, which is in process and customer issues summed up a concept, the traditional media is such a situation and the status quo.
After completion of the transformation of the situation and what is it?
Look at the picture below, a cutting-edge media workflow:

13492518-aafbcdde6b6fef26.png

还是通过采编人员去创作内容,之后通过近期的热点、读者偏好进行定向化或者有权重的创作,创作完成后进行智能核对及智能排版,这样一个过程,像媒体行业都有三审三校的过程,防止出现大的疏漏,像错别字、书写错误、拼写错误等,可以通过比较智能的校对工具或者是校对过程,以及一个比较智能的排版过程解决这些问题,当这个问题解决之后,才会把一个比较标准的内容入到内容库中,进一步进行内容传播,内容传播后分到不同的渠道。

现在变为千人千面的推荐内容,推送给相应的读者,这个模式很像头条,或者是新闻版的淘宝,每个人会看到不一样的内容,伴随着每个人阅读的信息越多而差异性越大。每个人的阅读习惯以及行为都不同,不同程度的阅读时间积累之后把握到你感兴趣的信息点,给你推荐你喜欢的信息。

当你把信息推送给读者之后,就会产生各样各样的数据,比如像我们的传播数据,如一段视频、 一段声音,传道到不同的渠道媒体(头条、网易新闻、网站以及APP)上等,这个传播的数据是需要记录下来的,以及阅读的数据,比如某某某什么时间阅读了什么文章也需要记录下来,还有行为数据比如在阅读或者是观看信息的过程中产生的一些行为,最长见的像评论点赞、视频发弹幕,这些是比较基础的,还有更深层次的,像在这个过程中产生的分享,或者是看了几分钟,或者是跳行等都是行为数据。
这三类数据我们把它汇总到我们的大数据池中,之后就进入到下一个流程:通过内容付费或增值业务产生营收。这一步是没有变化的,你用不用大数据,它都是这样没有任何的改变,比如说方式都是固定的,但是可以通过创作人、作品口碑、题材口碑等要素形成决策报告。

In the last session, we talk about the most important issue, the said data used to make decisions things rough, messy, so that there is no way provides a powerful data support. The link to solve this problem, make a decision report out by elements of creative people, work reputation, word of mouth and other topics, this report is relatively straightforward decision to tell the management or decision-making. Such and such a creative person last 30 days, how many pieces of content creation, every piece of content and how to read the situation, and even can be subdivided, what local people or what people prefer to label certain creators produced What type of content.

Works is word of mouth for specific content, such as film and television work, a data report, all relevant information at all latitudes and this work related Taken together, a clear introduction, a summary report after the introduction to each class his themes are all latitudes report form data through data analysis.

After the decision to get the report, you can drive through the decision-making content creation report. For example, a creator of Hangzhou people prefer to see his content, you can report the data to show up that information. In addition, it is also used to do a more refined performance appraisal.

Guess you like

Origin blog.csdn.net/weixin_33721344/article/details/90908911