Data visualization (3) Programmatic drawing based on Graphviz

foreword

I have mentioned the force map in several articles before the new generation of Ntopng network traffic monitoring - visualization and architecture analysis , data visualization (1) thinking tool OmniGraffle drawing guide | 201601 , and I also complained at the end of the article about OmniGraffle . The pit of auto layout buttons. In this article I try to fill this hole.

The basis for OmniGraffle to generate automatic layout graphics is the Graphviz engine. Graphviz (Graph Visualization Software) is an open source toolkit started by AT&T Labs that supports drawing graphics based on DOT scripts, the file extension is usually .gv or .dot . DOT is a text graphics description language, a command-line tool that converts the generated graphics into a variety of output formats, including PostScript, PDF, SVG, PNG, annotated text, etc. DOT itself is very primitive, providing a very simple way to describe graphs, and also means that it can be used in the command line terminal, or called by other programming languages ​​(Graphviz is available as a library). This is very critical. Graphviz-based application developers do not have to master the complex algorithm of layout, but can focus on the business aspect and hand over the final graph object to the drawing engine for processing.

Interestingly, both Graphviz (Mac version) and OmniGraffle have won the Apple Design Awards .

Before diving into Graphviz and its related derivative applications, it is necessary to understand some basic theory - Graph Theory.

1. Background knowledge: Graph theory

  • The Seven Bridges of Konigsberg

The urban area of ​​Konigsberg in East Prussia (today's Kaliningrad, Russia) spans both sides of the Pregolya River, with two small islands in the center of the river. The island is connected to both sides of the river by seven bridges. Under the premise that all bridges can only be walked once, how can we walk all the bridges in this place?

Many mathematicians have tried to find solutions to these kinds of problems, which later developed into graph theory in mathematics. The first important document in the history of graph theory was Leonhard Euler's "Seven Bridges of Konigsberg" published in 1736 at the St. Petersburg Academy of Sciences. This paper proves that in the Konigsberg Seven Bridges problem, there is no qualified move, and proposes and solves the one-stroke painting problem at the same time. The bridge crossing problem can be abstracted and simplified as a combination of points and lines on a plane, each bridge is regarded as a line, and the area connected by the bridge is regarded as a point. An odd number of lines from this point are called odd points, and lines from this point with an even number of lines are called even points. The rule for determining whether any kind of river-bridge graph can travel all at once: If there are more than two (excluding two) odd vertices, the route does not exist; and a graph with n odd vertices needs at least n/2 strokes to draw .

1. Classic application scenarios

  • Path problem (Königsberg seven bridge problem), minimum spanning tree problem, Steiner tree
  • Network flow and matching problems: maximum flow problem, minimum cut problem, maximum flow minimum cut theorem, minimum cost maximum flow problem, maximum matching on bipartite graphs and arbitrary graphs, maximum weight matching for weighted bipartite graphs
  • Covering problems: maximum clique, maximum independent set, minimum covering set, minimum dominating set

2. Classic algorithm

  • Dijkstra's Algorithm (DA)
  • Kruskal's Algorithm (KA)
  • Prim's Algorithm (PA)
  • Topological Sort Algorithm (TSA)
  • Critical Path Algorithm (CPA)
  • Breadth First Search (BFS)
  • Depth First Search Algorithm (DFS)

2. A Concise Guide to Graphviz

1. Graphviz layout

In general, Graphviz supports two types of graphs: undirected graphs (graph, with "--" between nodes) and directed graphs (digraph, with "->" between nodes). Both vertices and edges have their own properties, such as shape, color, fill mode, font, style, etc. The main layouters are as follows:

  • dot: The default layout, mainly used for directed graphs;
  • neato: based on the sprint model model, also known as force-based or energy minimized;
  • twopi: radial layout, radial;
  • circo: circular layout;
  • fdp: undirected graph;
  • dotty: a GUI program for visualizing and modifying graphics;
  • lefty: A programmable control that can display DOT graphics and allow the user to perform operations on the graph with the mouse.

2、Hello World!

$ brew install graphviz
$ dot -Tpng demo.dot -o demo.png
digraph demo{
  label="儿茶酚胺合成代谢路径";

  酪氨酸 -> L多巴 -> 多巴胺 -> 去甲肾上腺素 -> 肾上腺素;

  下丘脑 -> 多巴胺;
  交感神经元 -> 去甲肾上腺素;
  肾上腺髓质 -> 去甲肾上腺素,肾上腺素;

  酪氨酸 [label="酪氨酸",color=green];
  多巴胺 [label="多巴胺", color=red];
  肾上腺素 [label="肾上腺素", color=red];

  下丘脑 [shape=box];
  交感神经元 [shape=box];
  肾上腺髓质 [shape=box];
}

Catecholamine anabolic pathways - dot layout

3, twopi radial layout

## 缺省为 dot 布局
$ dot -Kcirco -Tpng demo.dot -o demo.png

Catecholamine anabolic pathways - twopi radial layout

3. Application scenarios

1. The field of software engineering

Complex system data structure analysis and software package dependency management in the field of software engineering. For example, the internal structure of the Linux kernel is very complex, and conceptually it consists of five main subsystems: process scheduler module, memory management module, virtual file system, network interface module and inter-process communication module. These modules interact with each other through function calls and shared data structures. In scenarios involving kernel versions and application upgrades, it is very important to figure out the dependencies between modules.

The lsmod command is used to display the status information of modules that have been loaded into the kernel, and Used by indicates the content of dependencies. After obtaining the dependency information through the lsmod command, simple processing can be converted into graphics, and the whole process of graphics generation can be cured by the program.

$ lsmod
Module          Used by
vboxdrv         vboxnetadp,vboxnetflt,vboxpci
nf_reject_ipv4  ipt_REJECT
ebtables        ebtable_filter
ip6_tables      ip6table_filter
ip6_udp_tunnel  vxlan
udp_tunnel      vxlan
xor             btrfs
raid6_pq        btrfs
nf_nat_masquerade_ipv4       ipt_MASQUERADE
xfrm_algo        xfrm_user
nf_defrag_ipv4        nf_conntrack_ipv4

......

digraph kernel{
        vboxdrv->vboxnetadp,vboxnetflt,vboxpci;
        nf_reject_ipv4->ipt_REJECT;
        ebtables->ebtable_filter;
        ip6_tables->ip6table_filter;
        ip6_udp_tunnel->vxlan;
        udp_tunnel->vxlan;
        xor->btrfs;
        raid6_pq->btrfs;
        nf_nat_masquerade_ipv4->ipt_MASQUERADE;
        xfrm_algo->xfrm_user;
        nf_defrag_ipv4->nf_conntrack_ipv4;

        ......
}

Package Dependency Case - Linux Kernel 1

Package Dependency Case - Linux Kernel 2

Package Dependency Case - Linux Kernel 3

PlantUML, an open source project based on Graphviz, supports the rapid drawing of various UML graphics: sequence diagrams, use case diagrams, class diagrams, activity diagrams, component diagrams, state diagrams, object diagrams, etc.

@startuml
scale 600 width

[*] -> State1
State1 --> State2 : Succeeded
State1 --> [*] : Aborted
State2 --> State3 : Succeeded
State2 --> [*] : Aborted
state State3 {
  state "Accumulate Enough Data\nLong State Name" as long1
  long1 : Just a test
  [*] --> long1
  long1 --> long1 : New Data
  long1 --> ProcessData : Enough Data
}
State3 --> State3 : Failed
State3 --> [*] : Succeeded / Save Result
State3 --> [*] : Aborted

@enduml

2. The field of communication engineering

  • nwdiag is a Python-based library that supports Dot scripts to generate network graphs
  • Tracing network routes with GIS information

network topology

pip install nwdiag
nwdiag simple.diag
nwdiag -Tsvg simple.diag
nwdiag {
  network dmz {
      address = "210.x.x.x/24"

      web01 [address = "210.x.x.1"];
      web02 [address = "210.x.x.2"];
  }
  network internal {
      address = "172.x.x.x/24";

      web01 [address = "172.x.x.1"];
      web02 [address = "172.x.x.2"];
      db01;
      db02;
  }
}

traceroute case

[root@li1437-101 ~]# traceroute www.google.com
traceroute to www.google.com (216.58.216.36), 30 hops max, 60 byte packets
 1  23.92.24.2 (23.92.24.2)  0.704 ms  0.736 ms 23.92.24.3 (23.92.24.3)  0.575 ms
 2  173.230.159.16 (173.230.159.16)  0.910 ms 173.230.159.14 (173.230.159.14)  2.265 ms
 		173.230.159.0 (173.230.159.0)  0.731 ms
 3  as15169.sfmix.org (206.197.187.50)  4.039 ms eqixsj-google-gige.google.com (206.223.116.21)  0.718 ms
 		as15169.sfmix.org (206.197.187.50)  3.944 ms
 4  108.170.242.227 (108.170.242.227)  4.902 ms
 		108.170.242.226 (108.170.242.226)  3.003 ms
 		108.170.243.2 (108.170.243.2)  3.064 ms
 5  216.239.47.37 (216.239.47.37)  4.836 ms 64.233.174.91 (64.233.174.91)  1.476 ms  1.447 ms
 6  216.239.54.22 (216.239.54.22)  12.464 ms  29.292 ms 64.233.174.204 (64.233.174.204)  9.032 ms
 7  209.85.245.172 (209.85.245.172)  10.633 ms
    108.170.230.130 (108.170.230.130)  20.010 ms
 		108.170.230.124 (108.170.230.124)  8.988 ms
10  lax02s22-in-f4.1e100.net (216.58.216.36)  10.358 ms  10.383 ms  10.301 ms
digraph {
    label="Google Trace Sample";
    "23.92.24.2" [label="23.92.24.2 \n Fremont,California \n location:37.5670,-121.9829"] ;
    as15169 [label="as15169.sfmix.org \n San Francisco \n Metropolitan Internet Exchange"];
    "108.170.242.227" [label="108.170.242.227 \n California \n location:37.4192,-122.0574"];
    lax02s22 [label="ax02s22-in-f4.1e100.net \n Los_Angeles,California \n location:46.07305,-100.546"];
    "23.92.24.2" -> as15169 -> "108.170.242.227"  -> lax02s22;
}

3. The field of social engineering

  • Decision Tree: Crowd Despise Chain
  • Analysis of complex character relationship chain ("Dream of Red Mansions", "Game of Thrones")

Contempt Chain Case - Real Estate Market in Marriage Market -dot

The case of contempt chain-the real estate market in the marriage market-circo-circle layout

Note: If you need to use the Group feature, the name of the subgraph must start with "cluster", otherwise it will not be recognized

digraph family {
  label ="《红楼梦》人物关系谱·主要角色";

  subgraph cluster_皇族{
      label ="皇族";
      bgcolor="mintcream";
      node [ color="lightyellow", style="filled"];

      北静王 [label = "北静王",shape="Mrecord"];
      义忠顺王 [label = "义忠顺王",shape="Mrecord"];

      贾元春 [label = "贾元春(长女)\n 凤藻宫尚书·贤德妃",shape="Mrecord"];
  }

  subgraph cluster_宁国公{
      label ="宁国公(西府)";
      bgcolor="mintcream";
      node [ color="green", style="filled"];

      贾演 [label = "贾演 \n 宁国公"];

      贾代化[label = "贾代化 \n 爵位:一等神威将军 \n 职务(武官):京营节度使",shape="Mrecord"];
      贾演 -> 贾代化[label = "子"];
      ......
    }
    ......
}

"A Dream of Red Mansions" character relationship spectrum · main characters

Further reading: Data visualization

For more exciting content, scan the code and follow the official account: RiboseYim's Blog: http://riboseyim.github.io/2017/09/15/Visualization-Graphviz/

WeChat public account

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324412508&siteId=291194637