How to play with big data

[Editor's note] The author of this article is Andrew Oliver, founder of the big data consulting firm Mammoth Data, and mainly introduces 8 types of projects suitable for applying big data. The article is compiled and presented by OneAPM , the domestic  ITOM  management platform   . The following is the text.

For the past 12 months, I have been digging in the trenches of big data. Well, actually most of the time I just sit next to people smarter than me and watch how they dig data in the trenches and simplify what they do and report to management.

There are very few IT projects that are truly unique, and those that sound special end up being pretty much the same. But you are blessed today, because I decided to come out and share with you the 8 major types of projects that I have been exposed to in the past 12 months.

 

1. Explore the transaction cycle

Companies that do e-commerce take it for granted that with a few tools installed, they can keep track of the transactions of web visitors from sale to payment. But many companies deal with datasets that go far beyond page transaction rates, and those datasets come primarily from resellers.

Each reseller provides a different dataset in a different format. Of course, fundamentally this is a core ETL/data integration project with a BI/visualization front end. However, for many companies, truly understanding the life cycle of a deal (start, progress, and end) is harder than it seems. You need to integrate a lot of CRM data, website analytics data, and financial data before you can say with certainty, "Yes, PPC (pay-per-click) leads to a transaction, but 40% of customers don't even make the first transaction. Go to payment, then..."

 

2. Mining potential customers

A lot of companies want to know what you're doing before selling you products based on your activity. For example, you might have an app on your phone that provides telemetry data so the company knows where you are in the mall. With this big data, they can predict your purchase needs at any moment.

 

3. Measure marketing effectiveness

营销人员做事讲求效益,他们想知道具体要做哪些事情,以及这些事情对KPI有何影响。从本质上说,这又是一个 BI 项目,而且往往涉及到大量的变更数据捕获(CDC)和 ETL 数据整合工作。他们测量的实际KPI变化很大,有时还涉及到 Kylin 或 Greenplum 等工具中的数据库。至于其他情况,可能属于下一个类别——社交媒体。

 

4、测量社交媒体热度

通常,公众会在公开或半公开的社交网络上谈论你(或你的公司)。在这些地方你可以获取很多有用的信息,比如大家怎么看待你的品牌,你的营销活动是否有成效。既然美国地震勘探局可以通过 Twitter 探测到地震和震级,那么你也可以通过这样的平台了解刚推出的广告活动效果如何。随着越来越多的专业社交平台出现,对于某些垂直行业而言,其数据采集范围远远不止 Twitter 和 Facebook。

 

5、专攻日志文件

无论是为了入侵检测还是应对安全审计,你都需要捕获并收集日志文件并使其可检索。在这一领域,Splunk 无疑大赚了一笔。当然,在大数据中还有其他更灵活的选择。

 

6、因为不想买Teradata!

现在已经不是 Teradata 独统天下的时代了,大数据正在从边缘向核心发展,而且 Apache Kylin 的数据库已对所有人开放。得益于 Impala、HAWQ 和 Greenplum,MPP 分布式系统的地位也更加重要。那些价格昂贵、功能单一而且还不能兼容其他数据分析的工具,其发展空间越来越小——更别说是那些只能依靠某单一供应商的私有云。

 

7、经久不衰的ETL

ETL (Extract-Transform-Load)可能依旧是如今最常见的Hadoop工作负载——而且我敢说,ETL 是适用于 Spark 的最常见的非流式工作负载。顺便提一下,现在已经有上百个创业公司冒出来说自己能够处理这种任务了。

 

8、先捕获传感器数据再想办法处理

不管是电网、制造业、水泵,还是老司机开的车,都在向我们传递信息。这些信息都需要捕获。甚至有些人已经弄清了该如何处理这些数据。但是,及时捕获数据才是最重要的一步,因为很多人都觉得从技术上来说捕获数据并不那么容易。

此外,笔者还经常督促大家在大数据项目初期就要考虑数据分析问题。为什么呢?因为预先设计并确定好数据流的大小,远比数据已经准备好时再重新考虑整体布局要容易得多。但是有时候还是得细细咀嚼,做最好的打算。

近一年来,笔者见过不少其他项目类型,但是大多数用例都属于以上八种之一。不知各位老司机是否还有补充?

OneAPM 能为您提供端到端的 Java 应用性能解决方案,我们支持所有常见的 Java 框架及应用服务器,助您快速发现系统瓶颈,定位异常根本原因。分钟级部署,即刻体验,Java 监控从来没有如此简单。想阅读更多技术文章,请访问 OneAPM 官方技术博客

本文转自 OneAPM 官方博客

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=326847518&siteId=291194637