Data visualization and Protovis with ElasticSearch

The most important purpose search engine, ah, not surprisingly, is the search. It you pass a request, then it returns the result of the correlation in accordance with your string matching. We can create all kinds of request structures according to their content, experiment with different analyzers, the search engine will be trying to provide the best results.

However, a modern full-text search engine can do more than this. Because its core is based on a query matching documents for efficient and highly optimized data structure - inverted index. It can also perform complex aggregation operation for our data, we call here facets. (Bad translation, later this word are kept in English)

facets usual purpose is to provide a user to navigate or search terms. When you search online store "Camera", you can choose a different manufacturer, price range, or specific functions to customize conditions, which should look at what is the point of the link, rather than by modifying the long list of query syntax.
One of the few Facet search request can be strong to the end user the ability to open the way to see Moritz Stefaner test "Elastic Lists", perhaps you have more inspiration.

However, in addition to the link and check boxes, in fact, we can do more. For example, we use the data to draw, and that's what we talk about in this article.

Real-time dashboard
in almost all of the analysis, monitoring and data mining services, sooner or later you will come across such a demand: "We want a dashboard!." Because we all love the instrument panel, probably because is really useful, because it may simply beautiful - this time, we do not write any OLAP implementation, with facets can be done is to force a very nice analysis engine.

The screenshot below is taken from a social media monitoring applications. This application not only to search and data mining ES, data aggregation is also provided via an interactive dashboard.
When the user data in depth, add a keyword, use a custom query, all figures will be updated in real time, this is the facet of the polymerization work. Dashboard is not good data on a regular basis to calculate the static snapshot, but a truly interactive tool for data exploration.

In this article, we will learn how to get the data from the ES, and then how to create these charts.

Pie relationship polymerization (terms facet) in
the first graph, we do use relatively simple in ES termsfacet. This facet will return a field of the most common vocabulary and its count value.

First, let's insert some data.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
curl -X DELETE "http://localhost:9200/dashboard"
curl -X POST "http://localhost:9200/dashboard/article" -d '
             { "title" : "One",
               "tags"  : ["ruby", "java", "search"]}
'
curl -X POST "http://localhost:9200/dashboard/article" -d '
             { "title" : "Two",
               "tags"  : ["java", "search"] }
'
curl -X POST "http://localhost:9200/dashboard/article" -d '
             { "title" : "Three",
               "tags"  : ["erlang", "search"] }
'
curl -X POST "http://localhost:9200/dashboard/article" -d '
             { "title" : "Four",
               "tags"  : ["search"] }
'
curl -X POST "http://localhost:9200/dashboard/_refresh"

You have seen, we store the tag number of articles, each article can more labels, data is sent in JSON format, which is the ES document format.

Now that you know the top ten label document, we only need a simple request:

1
2
3
4
5
6
7
8
curl -X POST "http://localhost:9200/dashboard/_search?pretty=true" -d '
{
    "query" : { "match_all" : {} },
    "facets" : {
        "tags" : { "terms" : {"field" : "tags", "size" : 10} }
    }
}
'

You see, I accept all the documents, and then define a terms facet is called "tags". This request returns data look like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
{
    "took" : 2,
    // ... snip ...
    "hits" : {
        "total" : 4,
        // ... snip ...
    },
    "facets" : {
        "tags" : {
            "_type" : "terms",
            "missing" : 1,
            "terms" : [
                { "term" : "search", "count" : 4 },
                { "term" : "java",   "count" : 2 },
                { "term" : "ruby",   "count" : 1 },
                { "term" : "erlang", "count" : 1 }
            ]
        }
    }
}

JSON is part of the facets of our concern, especially facets.tags.terms array. It tells us that there are four articles hit the search tag, label two java, etc ....... (Of course, we should probably add a size parameter to the request to skip the previous results)

This ratio is the most suitable type of data visualization scheme is a pie chart, or a variant thereof: donuts pie.
We will use data visualization tools Protovis a JavaScript. Protovis is 100% open source, you can imagine it is RoR data visualization aspects. And other similar tools in stark contrast, it does not come with a set of icons for you to type "select." But it defines a set of primitives and a flexible DSL, so you can very easily create custom visualization. Create a pie chart is very simple.

Because ES returns JSON data, we can load it via Ajax calls. Do not forget you can clone or download the full source code examples.

First you need an HTML file to hold the icon and then load the data from the ES in:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
<!DOCTYPE html>
<html>
<head>
    <title>ElasticSearch Terms Facet Donut Chart</title>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <!-- Load JS libraries -->
    <script src="jquery-1.5.1.min.js"></script>
    <script src="protovis-r3.2.js"></script>
    <script src="donut.js"></script>
    <script>
        $( function() { load_data(); });
        var load_data = function() {
            $.ajax({   url: 'http://localhost:9200/dashboard/article/_search?pretty=true'
                     , type: 'POST'
                     , data : JSON.stringify({
                           "query" : { "match_all" : {} },
                           "facets" : {
                               "tags" : {
                                   "terms" : {
                                       "field" : "tags",
                                       "size"  : "10"
                                   }
                               }
                           }
                       })
                     , dataType : 'json'
                     , processData: false
                     , success: function(json, statusText, xhr) {
                           return display_chart(json);
                       }
                     , error: function(xhr, message, error) {
                           console.error("Error while loading data from ElasticSearch", message);
                           throw(error);
                       }
            });
            var display_chart = function(json) {
                Donut().data(json.facets.tags.terms).draw();
            };
        };
    </script>
</head>
<body>
  <!-- Placeholder for the chart -->
  <div id="chart"></div>
</body>
</html>

文档加载后,我们通过Ajax收到和之前curl测试中一样的facet。在jQuery的Ajaxcallback里我们通过封装的display_chart()把返回的JSON传给Donut()函数.

Donut()函数及注释如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
var Donut = function(dom_id) {
    if ('undefined' == typeof dom_id)  {                // Set the default DOM element ID to bind
        dom_id = 'chart';
    }
    var data = function(json) {                         // Set the data for the chart
        this.data = json;
        return this;
    };
    var draw = function() {
        var entries = this.data.sort( function(a, b) {  // Sort the data by term names, so the
            return a.term < b.term ? -1 : 1;            // color scheme for wedges is preserved
        }),                                             // with any order
        values  = pv.map(entries, function(e) {         // Create an array holding just the counts
            return e.count;
        });
        // console.log('Drawing', entries, values);
        var w = 200,                                    // Dimensions and color scheme for the chart
            h = 200,
            colors = pv.Colors.category10().range();
        var vis = new pv.Panel()                        // Create the basis panel
            .width(w)
            .height(h)
            .margin(0, 0, 0, 0);
        vis.add(pv.Wedge)                               // Create the "wedges" of the chart
            .def("active", -1)                          // Auxiliary variable to hold mouse over state
            .data( pv.normalize(values) )               // Pass the normalized data to Protovis
            .left(w/3)                                  // Set-up chart position and dimension
            .top(w/3)
            .outerRadius(w/3)
            .innerRadius(15)                            // Create a "donut hole" in the center
            .angle( function(d) {                       // Compute the "width" of the wedge
                return d * 2 * Math.PI;
             })
            .strokeStyle("#fff")                        // Add white stroke
            .event("mouseover", function() {            // On "mouse over", set the "wedge" as active
                this.active(this.index);
                this.cursor('pointer');
                return this.root.render();
             })
            .event("mouseout",  function() {            // On "mouse out", clear the active state
                this.active(-1);
                return this.root.render();
            })
            .event("mousedown", function(d) {           // On "mouse down", perform action,
                var term = entries[this.index].term;    // such as filtering the results...
                return (alert("Filter the results by '"+term+"'"));
            })
            .anchor("right").add(pv.Dot)                // Add the left part of he "inline" label,
                                                        // displayed inside the donut "hole"
            .visible( function() {                      // The label is visible when its wedge is active
                return this.parent.children[0]
                       .active() == this.index;
            })
            .fillStyle("#222")
            .lineWidth(0)
            .radius(14)
            .anchor("center").add(pv.Bar)               // Add the middle part of the label
            .fillStyle("#222")
            .width(function(d) {                        // Compute width:
                return (d*100).toFixed(1)               // add pixels for percents
                              .toString().length*4 +
                       10 +                             // add pixels for glyphs (%, etc)
                       entries[this.index]              // add pixels for letters (very rough)
                           .term.length*9;
            })
            .height(28)
            .top((w/3)-14)
            .anchor("right").add(pv.Dot)                // Add the right part of the label
            .fillStyle("#222")
            .lineWidth(0)
            .radius(14)
            .parent.children[2].anchor("left")          // Add the text to label
                   .add(pv.Label)
            .left((w/3)-7)
            .text(function(d) {                         // Combine the text for label
                return (d*100).toFixed(1) + "%" +
                       ' ' + entries[this.index].term +
                       ' (' + values[this.index] + ')';
            })
            .textStyle("#fff")
            .root.canvas(dom_id)                        // Bind the chart to DOM element
            .render();                                  // And render it.
    };
    return {                                            // Create the public API
        data   : data,
        draw   : draw
    };
};

现在你们看到了,一个简单的JSON数据转换,我们就可以创建出丰富的有吸引力的关于我们文章标签分布的可视化图标。完整的例子在这里。

当你使用完全不同的请求,比如显示某个特定作者的文章,或者特定日期内发表的文章,整个可视化都照样正常工作,代码是可以重用的。

日期直方图(date histogram facets)时间线
Protovis让创建另一种常见的可视化类型也非常容易:时间线。任何类型的数据,只要和特定日期相关的,比如文章发表,事件发生,目标达成,都可以被可视化成时间线。
好了,让我们往索引里存一些带有发表日期的文章吧:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
curl -X DELETE "http://localhost:9200/dashboard"
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "1",  "published" : "2011-01-01" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "2",  "published" : "2011-01-02" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "3",  "published" : "2011-01-02" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "4",  "published" : "2011-01-03" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "5",  "published" : "2011-01-04" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "6",  "published" : "2011-01-04" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "7",  "published" : "2011-01-04" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "8",  "published" : "2011-01-04" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "9",  "published" : "2011-01-10" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "10", "published" : "2011-01-12" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "11", "published" : "2011-01-13" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "12", "published" : "2011-01-14" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "13", "published" : "2011-01-14" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "14", "published" : "2011-01-15" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "15", "published" : "2011-01-20" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "16", "published" : "2011-01-20" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "17", "published" : "2011-01-21" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "18", "published" : "2011-01-22" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "19", "published" : "2011-01-23" }'
curl -X POST "http://localhost:9200/dashboard/article" -d '{ "t" : "20", "published" : "2011-01-24" }'
curl -X POST "http://localhost:9200/dashboard/_refresh"

我们用ES的date histogram facet来获取文章发表的频率。

1
2
3
4
5
6
7
8
9
10
11
12
13
curl -X POST "http://localhost:9200/dashboard/_search?pretty=true" -d '
{
    "query" : { "match_all" : {} },
    "facets" : {
        "published_on" : {
            "date_histogram" : {
                "field"    : "published",
                "interval" : "day"
            }
        }
    }
}
'

注意我们是怎么设置间隔为天的。这个很容易就可以替换成周,月 ,或者年。

请求会返回像下面这样的JSON:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
{
    "took" : 2,
    // ... snip ...
    "hits" : {
        "total" : 4,
        // ... snip ...
    },
    "facets" : {
        "published" : {
            "_type" : "histogram",
            "entries" : [
                { "time" : 1293840000000, "count" : 1 },
                { "time" : 1293926400000, "count" : 2 }
                // ... snip ...
            ]
        }
    }
}

我们要注意的是facets.published.entries数组,和上面的例子一样。同样需要一个HTML页来容纳图标和加载数据。机制既然一样,代码就直接看这里吧。

既然已经有了JSON数据,用protovis创建时间线就很简单了,用一个自定义的area chart即可。

完整带注释的Timeline()函数如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
var Timeline = function(dom_id) {
    if ('undefined' == typeof dom_id) {                 // Set the default DOM element ID to bind
        dom_id = 'chart';
    }
    var data = function(json) {                         // Set the data for the chart
        this.data = json;
        return this;
    };
    var draw = function() {
        var entries = this.data;                        // Set-up the data
            entries.push({                              // Add the last "blank" entry for proper
              count : entries[entries.length-1].count   // timeline ending
            });
        // console.log('Drawing, ', entries);
        var w = 600,                                    // Set-up dimensions and scales for the chart
            h = 100,
            max = pv.max(entries, function(d) {return d.count;}),
            x = pv.Scale.linear(0, entries.length-1).range(0, w),
            y = pv.Scale.linear(0, max).range(0, h);
        var vis = new pv.Panel()                        // Create the basis panel
            .width(w)
            .height(h)
            .bottom(20)
            .left(20)
            .right(40)
            .top(40);
         vis.add(pv.Label)                              // Add the chart legend at top left
            .top(-20)
            .text(function() {
                 var first = new Date(entries[0].time);
                 var last  = new Date(entries[entries.length-2].time);
                 return "Articles published between " +
                     [ first.getDate(),
                       first.getMonth() + 1,
                       first.getFullYear()
                     ].join("/") +
                     " and " +
                     [ last.getDate(),
                       last.getMonth() + 1,
                       last.getFullYear()
                     ].join("/");
             })
            .textStyle("#B1B1B1")
         vis.add(pv.Rule)                               // Add the X-ticks
            .data(entries)
            .visible(function(d) {return d.time;})
            .left(function() { return x(this.index); })
            .bottom(-15)
            .height(15)
            .strokeStyle("#33A3E1")
            .anchor("right").add(pv.Label)              // Add the tick label (DD/MM)
            .text(function(d) {
                 var date = new Date(d.time);
                 return [
                     date.getDate(),
                     date.getMonth() + 1
                 ].join('/');
             })
            .textStyle("#2C90C8")
            .textMargin("5")
         vis.add(pv.Rule)                               // Add the Y-ticks
            .data(y.ticks(max))                         // Compute tick levels based on the "max" value
            .bottom(y)
            .strokeStyle("#eee")
            .anchor("left").add(pv.Label)
                .text(y.tickFormat)
                .textStyle("#c0c0c0")
        vis.add(pv.Panel)                               // Add container panel for the chart
           .add(pv.Area)                                // Add the area segments for each entry
           .def("active", -1)                           // Auxiliary variable to hold mouse state
           .data(entries)                               // Pass the data to Protovis
           .bottom(0)
           .left(function(d) {return x(this.index);})   // Compute x-axis based on scale
           .height(function(d) {return y(d.count);})    // Compute y-axis based on scale
           .interpolate('cardinal')                     // Make the chart curve smooth
           .segmented(true)                             // Divide into "segments" (for interactivity)
           .fillStyle("#79D0F3")
           .event("mouseover", function() {             // On "mouse over", set segment as active
               this.active(this.index);
               return this.root.render();
           })
           .event("mouseout",  function() {             // On "mouse out", clear the active state
               this.active(-1);
               return this.root.render();
           })
           .event("mousedown", function(d) {            // On "mouse down", perform action,
               var time = entries[this.index].time;     // eg filtering the results...
               return (alert("Timestamp: '"+time+"'"));
           })
           .anchor("top").add(pv.Line)                  // Add thick stroke to the chart
           .lineWidth(3)
           .strokeStyle('#33A3E1')
           .anchor("top").add(pv.Dot)                   // Add the circle "label" displaying
                                                        // the count for this day
           .visible( function() {                       // The label is only visible when
               return this.parent.children[0]           // its segment is active
                          .active() == this.index;
            })
           .left(function(d) { return x(this.index); })
           .bottom(function(d) { return y(d.count); })
           .fillStyle("#33A3E1")
           .lineWidth(0)
           .radius(14)
           .anchor("center").add(pv.Label)             // Add text to the label
           .text(function(d) {return d.count;})
           .textStyle("#E7EFF4")
           .root.canvas(dom_id)                        // Bind the chart to DOM element
           .render();                                  // And render it.
    };
    return {                                            // Create the public API
        data   : data,
        draw   : draw
    };
};

完整示例代码在这里。不过先去下载protovis提供的关于area的原始文档,然后观察当你修改interpolate(‘cardinal’)成interpolate(‘step-after’)后发生了什么。对于多个facet,画叠加的区域图,添加交互性,然后完全定制可视化应该都不是什么问题了。

重要的是注意,这个图表完全是根据你传递给ES的请求做出的响应,使得你有可能做到简单立刻的完成某项指标的可视化需求。比如“显示这个作者在这个主题上最近三个月的出版频率”。只需要提交这样的请求就够了:

1
author:John AND topic:Search AND published:[2011-03-01 TO 2011-05-31]

总结
当你需要为复杂的自定义查询做一个丰富的交互式的数据可视化时,使用ES的facets应该是最容易的办法之一,你只需要传递ES的JSON响应给Protovis这样的工具就好了。

通过模仿本文中的方法和代码,你可以在几小时内给你的数据跑通一个示例。

REFERENECE : http://www.chepoo.com/elasticsearch-protovis.html | IT技术精华网 

 

转载于:https://www.cnblogs.com/zhangchenliang/p/4202515.html

Guess you like

Origin blog.csdn.net/weixin_33682719/article/details/93495507