Monitoring Microservices on Kubernetes with Netsil

Kubernetes  is the king of container orchestration and scheduling, beating competitors Docker Swarm and Apache Mesos, opening a shining future where microservices can  self-heal  ,  scale automatically  , and can be federate across zones, regions and even across cloud providers. In this new era of cloud-native applications, it's increasingly important to be able to gain simple insight into how services interact with each other -- not the same as looking for a needle in a haystack to find a specific cause of performance problems of.

We spent some time researching Netsil and  packaging its solution as a  native Kubernetes Deployment. Netsil's application, Application Operations Center (AOC), helps users observe and collect analytics for microservice applications running across Kubernetes clusters. The service itself is agnostic because it is on the network that determines how it actually operates. Over time, and in real time, it learns and discovers the user's environment, helping users build SLA metrics, alarms, and more.

let's start

First you need a Kubernetes cluster. I use  Stackpoint.io  to quickly create a cluster. Create a cluster on any major provider such as AWS, GCE or Azure. Make sure to choose a large enough configuration for your master node - this is where all the collectors will send data and can be expensive on the network, processor and memory. Worker nodes can be in any configuration, as long as it meets the needs of the microservice application. In my example, a larger instance configuration is used because I will be pushing multiple services into this environment.

In our example, a cluster is built with 3 N1 standard 4 instances exposed through the HAProxy Ingress Controller, which is self-discovering and registers the AOC service when deploying them. We were able to access the AOC dashboard using the cluster's public VIP.

before starting

Install some more services on an empty cluster that only runs Kubernetes services, using  Sock Shop ,  a reference program for microservices developed  by Weaveworks . This helps to simulate a real environment. Sock Shop uses 14 different services, which is a level of complexity that many enterprise applications will achieve. Now add AOC to our environment.

Here  are the details about Sock Shop. Pushing it to the environment is as simple as running the following command after cloning the repo:

kubectl apply -f deploy/kubernetes/manifests

Then check if the Pod is online:

$ kubectl get pods --namespace=default



NAME READY STATUS RESTARTS AGE

cart-3694116665-eccpp 1/1 Running 0 55m

cart-db-2305146297-u30g8 1/1 Running 0 55m

catalogue-11453786-lkslj 1/1 Running 0 55m

catalogue-db-393939662-bn7uc 1/1 Running 0 55m

front-end-382083024001e6t 1/1 Running 0 55m

orders-3498886496-z8jun 1/1 Running 0 55m

orders-db-1775353731-u7dmf 1/1 Running 0 55m

payment-3012088042-vbfhw 1/1 Running 0 55m

queue-master-936560853-ocmxi 1/1 Running 0 55m

rabbitmq-18974476212ij04 1/1 Running 0 55m

shipping-1232389217-b278a 1/1 Running 0 55m

spc-balancer-biilo 1/1 Running 0 1h

user-3090014237196pv 1/1 Running 0 55m

user-db-1338754314-exyou 1/1 Running 0 55m

开始观察吧

我们已经有了运行着的Kubernetes 1.4集群,并且安装了Sock Shop应用程序,那么开始学习环境里是什么吧。当股票购买者遇到问题时我们是否能知道呢?

在部署AOC之前需要在所有主机上运行如下命令。该命令帮助避免一个已知的Flannel和kube-proxy的 竞争问题 。

iptables -t nat -I POSTROUTING -o flannel.1 -s host-private-ip -j MASQUERADE

使用每台主机的私有IP替换 host-private-ip 。完成后,从GitHub克隆AOC Kubernetes repo:

git clone https://github.com/netsil/netsil-kube.git

并且使用如下单个命令将其推送到Kubernetes里:

kubectl apply -f netsil.yml

确保Pod和Service已经在线了。AOC容器可能需要一些时间,但是收集器会被启动并且队列里的数据会被推送进来,因为它们已经开始发现你的环境了。

$ kubectl get po,svc — namespace=netsil



NAME READY STATUS RESTARTS AGE

collector-7wpaa 1/1 Running 0 1h

collector-9o6k4 1/1 Running 0 1h

collector-rzekv 1/1 Running 0 4m

netsil-vjf5f 1/1 Running 0 1h

NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE

netsil 10.200.126.143 <nodes> 443/TCP,2001/TCP,2003/TCP,2003/UDP 1h

AOC拓扑有两个主要组件。第一个是作为带有单个副本的Replication Controller的一部分运行的Pod。它运行AOC仪表盘和数据收集的平台。第二个组件是AOC收集器的 DaemonSet 。它告诉Kubernetes在环境的所有节点上运行一个带有收集器容器的Pod。这些收集器配置为向AOC Pod发送信息。

生成流量

我们将使用Sock Shop的更多工具来模拟网站上的购物行为。这让我们能看到AOC是如何学习流量模式以及我们的通用拓扑的。

你需要知道Sock Shop监听以及运行的前端IP地址和端口:

docker run weaveworksdemos/load-test -h $frontend-ip[:$port] -r 100 -c 2

随着load-test的运行,可以开始看到AOC随着数据的获得被点亮了:

因为AOC作为DaemonSet部署,如果任意Pod销毁了并且在其他地方重新调度,AOC能够继续观测到拓扑,随着Kubernetes的变化而变化。

我很喜欢AOC的一个原因是部署通过服务来组织,并且我能够实时地观察到环境,并且开始深入不同的度量,为了那些可能影响到客户的事情搭建服务级别的警报。因此,当环境像下图一样变红时,我能够获得警报,知道某个服务处在紧急状态,比如Sock Shop里的信用卡和地址端点。

我甚至还可以深入仪表盘,知道承受最大压力的Pod和容器是什么。在本示例里,网络压力最大的容器是flannel Pod。这让我们能够了解最繁忙的服务是哪个,能够帮助我们重新思考配置或者Kubernetes里分发部署的方式。

总结

Netsil的AOC是非常棒的工具,可以帮助用户实时观察环境,随着使用模式的变化而更新。用户可以挖掘历史数据并且添加警报。应用程序随着添加更多的节点会自动扩展,新节点上线后就会在上面启动一个收集器,这样用户能够得到节点从上线到销毁的所有数据。

如果想在自己的Kubernetes环境里使用Application Operations Center,只需要下载这里的manifests就可以了。可以在 http://netsil.com 学习Netsil和Application Operations Center。

原文链接: Microservice Monitoring in Kubernetes with Netsil (翻译:崔婧雯 校对:)

 

http://www.tuicool.com/articles/J3iYjqI

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=326551100&siteId=291194637