Use Radix tree base tree for IP address matching

One base tree concept

A Radix tree (also known as a radix tree trie or a compact trie) is a space-optimized trie data structure for storing associative arrays where the keys are sequences of characters (for example, strings or byte arrays). In a radix tree, each node in the tree represents a common prefix of one or more keys, and edges between nodes represent extensions of that prefix. Because of their prefix sharing, radix trees may be more space-efficient than conventional attempts, especially for datasets with many keys sharing a common prefix. Note: In some places, Radix tree and Trie Tree are distinguished, but I think the two are similar, except that Radix Tree can superimpose multiple character branches together to save space, as shown in the figure below, but I think this does not affect the essence of the problem. It's just a form of compression.

3b17d1fe0589e8b3ee911b7c4d761ad7.png
base tree example
  • The working principle of Radix tree is to build a prefix tree on the tree and store a string prefix on each node. When querying a string, it can start at the root node and compare it to the prefix of each node until it finds a matching node.

  • The advantage of Radix tree is that it supports efficient storage and query, and it is more compact than other tree data structures, so it is an ideal choice for memory-sensitive applications.

  • Radix tree is widely used in many fields, such as network security, routing, network management, dictionary management, etc. In these fields, the Radix tree can be used to store and query information such as IP addresses and rule sets. In general, Radix tree is an efficient and compact tree data structure, which has a wide range of applications in many fields. It can be used to store and query various information, thereby improving the performance and efficiency of applications.

Two base tree realization of CIDR

2.1 Initial simple version

When we are doing IDS or routing, we often perform IP matching. It is a better way to use the base tree to realize it. The java code is used to realize it as follows:

import java.util.*;

class RadixNode {
    int value;
    Map<Integer, RadixNode> children;

    public RadixNode() {
        children = new HashMap<>();
    }
}

class RadixTree {
    private RadixNode root;

    public RadixTree() {
        root = new RadixNode();
    }

    public void insert(String cidr) {
        // 将CIDR格式切分成ip部分和网络地址的位数
        String[] parts = cidr.split("/");
        String[] ipParts = parts[0].split("\\.");

       // 网络地址的位数
        int prefixLength = Integer.parseInt(parts[1]);
        // 获取mask 
        int mask = 0xffffffff << (32 - prefixLength);

        RadixNode node = root;
        for (int i = 0; i < ipParts.length; i++) {
           // 对IP进行切分,每个元素都表示ip的一部分值
            int b = Integer.parseInt(ipParts[i]);
            int shift = 24 - 8 * i;
            // 将每一部分进行移位 b & 0xff 更好的办法是和mask进行对应匹配
            int key = (b & 0xff) << shift;

            RadixNode child = node.children.get(key);
            if (child == null) {
                child = new RadixNode();
                node.children.put(key, child);
            }
            node = child;
        }
        node.value = mask;
    }

    public int match(String ip) {
        String[] ipParts = ip.split("\\.");

        RadixNode node = root;
        for (int i = 0; i < ipParts.length; i++) {
            int b = Integer.parseInt(ipParts[i]);
            int shift = 24 - 8 * i;
            int key = (b & 0xff) << shift;

            RadixNode child = node.children.get(key);
            if (child == null) {
                return 0;
            }
            node = child;
        }
        return node.value;
    }
}

This is a simple processing of cidr. After the ipv4 code is divided according to the dot, each part is stored in the tree as a key, and the value of the leaf node is the value of the mask. There are still some problems with the code:

  1. For example, cidr must strictly follow the subnet mask to match the remainder of the ip.

  2. It is a waste of memory to directly define HashMap to store nodes above.

Test code:

RadixTree tree = new RadixTree();
tree.insert("192.168.2.0/24");
tree.insert("192.168.0.0/16");

int mask = tree.match("192.168.2.3");
System.out.println(Integer.toBinaryString(mask));

The roughly formed tree is: d2e0170b3d6ac01026b3cb21ef4485b5.pngthe yellow part is the leaf node, and the stored value is inside the brackets.

2.2 Optimized memory usage version

The above version uses hashmap to store child nodes. If there are many IPs, it will take up a lot of memory. We can change it to a simple point and use arrays to store them, thereby reducing memory usage.

import java.util.*;

class RadixNode {
    int value;
    RadixNode[] children;

    public RadixNode() {
         // 每段IP的最大值
        children = new RadixNode[256];
    }
}

class RadixTree {
    private RadixNode root;

    public RadixTree() {
        root = new RadixNode();
    }

    public void insert(String cidr) {
        String[] parts = cidr.split("/");
        String[] ipParts = parts[0].split("\\.");
        int prefixLength = Integer.parseInt(parts[1]);
        int mask = 0xffffffff << (32 - prefixLength);

        RadixNode node = root;
        for (int i = 0; i < ipParts.length; i++) {
            int b = Integer.parseInt(ipParts[i]);
            int shift = 24 - 8 * i;
            int key = (b & 0xff) << shift;

            RadixNode child = node.children[key];
            if (child == null) {
                child = new RadixNode();
                node.children[key] = child;
            }
            node = child;
        }
        node.value = mask;
    }

    public int match(String ip) {
        String[] ipParts = ip.split("\\.");

        RadixNode node = root;
        for (int i = 0; i < ipParts.length; i++) {
            int b = Integer.parseInt(ipParts[i]);
            int shift = 24 - 8 * i;
            int key = (b & 0xff) << shift;

            RadixNode child = node.children[key];
            if (child == null) {
                return 0;
            }
            node = child;
        }
        return node.value;
    }
}

This version only changes the hashmap into an array form, but each array applies for 256 sizes. When the cidr is relatively small, it still takes up more memory, which may not necessarily be less than hashmap.

2.3 Further optimize the memory usage version

We can convert the ip into a byte array, and then make a judgment on each bit value and store it in the left and right subtrees, thereby further reducing memory usage. The code is as follows:

class RadixNode {
    int value;
    RadixNode[] children;

    public RadixNode() {
        children = new RadixNode[2];
    }
}

class RadixTree {
    private RadixNode root;

    public RadixTree() {
        root = new RadixNode();
    }

    public void insert(String cidr) {
        String[] parts = cidr.split("/");
        String[] ipParts = parts[0].split("\\.");
        int prefixLength = Integer.parseInt(parts[1]);
        int mask = 0xffffffff << (32 - prefixLength);

        RadixNode node = root;
        for (int i = 0; i < 32; i++) {
            int b = Integer.parseInt(ipParts[i / 8]);
            int key = (b >> (7 - i % 8) & 1);

            RadixNode child = node.children[key];
            if (child == null) {
                child = new RadixNode();
                node.children[key] = child;
            }
            node = child;
        }
        node.value = mask;
    }

    public int match(String ip) {
        String[] ipParts = ip.split("\\.");

        RadixNode node = root;
        for (int i = 0; i < 32; i++) {
            int b = Integer.parseInt(ipParts[i / 8]);
            int key = (b >> (7 - i % 8) & 1);

            RadixNode child = node.children[key];
            if (child == null) {
                return 0;
            }
            node = child;
        }
        return node.value;
    }
}

The core code lies in:

for (int i = 0; i < 32; i++) {
            int b = Integer.parseInt(ipParts[i / 8]);
            int key = (b >> (7 - i % 8) & 1);

            RadixNode child = node.children[key];
            if (child == null) {
                return 0;
            }
            node = child;
        }

For ipv4, we can regard it as an integer of four bytes, so it occupies a total of 4*8=32 bits. The way to obtain each bit is: first int b = Integer.parseInt(ipParts[i / 8]);obtain the specific byte through this i/8 method, Then enter the key code: the calculation of the key int key = (b >> (7 - i % 8) & 1);  will get the specific byte value, move the (7 - i % 8) bits to the right and then sum with 1, the obtained value is either 0 or 1, and then decide which one to use according to this on the subtree.

Sanji tree application

3.1 Base tree application in suricata

Suricata is an open source intrusion detection system (IDS) that uses radix trees for IP address matching in rule processing. In Suricata, radix trees are used to store IP addresses and CIDR ranges for efficient lookup during rule matching. This enables Suricata to quickly determine whether a given IP address or network traffic matches a particular rule, reducing the time it takes for a rule to match.

In Suricata, radix trees are used in conjunction with other techniques such as pattern matching and rule grouping to provide a comprehensive and efficient rule matching process. The use of radix trees enables Suricata to process large numbers of rules in real time, making it a popular choice for network security administrators who need a fast and reliable IDS system.

Suricata handles IPv6 in rules similar to how it handles IPv4, by using radix trees for efficient lookups. Radix trees in Suricata are capable of storing both IPv4 and IPv6 addresses, and the tree is constructed in the same way for both types of addresses.

Simple imitation of suricata's base tree implementation:

public class RadixTree2 {
    private RadixTreeNode root;

    public RadixTree2() {
        root = new RadixTreeNode();
    }

    public void insert(String cidr) {
        String[] parts = cidr.split("/");
        String ipAddress = parts[0];
        int prefixLength = Integer.parseInt(parts[1]);
        byte[] address = null;
        if (!ipAddress.contains(":")) {
            address = ip4AddressToBytes(ipAddress);
        }else {
            address = ip6AddressToBytes(ipAddress);
        }

        RadixTreeNode node = root;
        for (int i = 0; i < prefixLength; i++) {
            int index = (address[i / 8] >> (7 - (i % 8))) & 1;
            if (!node.children.containsKey(index)) {
                node.children.put(index, new RadixTreeNode());
            }
            node = node.children.get(index);
        }
        node.cidrs.add(cidr);
    }

    public List<String> match(String ipAddress) {
        byte[] address = null;
        if (!ipAddress.contains(":")) {
            address = ip4AddressToBytes(ipAddress);
        }else {
            address = ip6AddressToBytes(ipAddress);
        }
        RadixTreeNode node = root;
        for (int i = 0; i < address.length * 8; i++) {
            int index = (address[i / 8] >> (7 - (i % 8))) & 1;
            if (!node.children.containsKey(index)) {
                break;
            }
            node = node.children.get(index);
        }
        return node.cidrs;
    }

    private byte[] ip4AddressToBytes(String ipAddress) {
        String[] parts = ipAddress.split("\\.");
        if (parts.length == 4) {
            byte[] address = new byte[4];
            for (int i = 0; i < 4; i++) {
                address[i] = (byte) Integer.parseInt(parts[i]);
            }
            return address;
        } else {
            // TODO: Add support for IPv6
            return new byte[0];
        }
    }

    private byte[] ip6AddressToBytes(String ipAddress) {
        String[] parts = ipAddress.split(":");
        if (parts.length == 8) {
            byte[] address = new byte[16];
            for (int i = 0; i < 8; i++) {
                int x = Integer.parseInt(parts[i], 16);
                address[i * 2] = (byte) (x >> 8);
                address[i * 2 + 1] = (byte) x;
            }
            return address;
        } else if (parts.length == 4) {
            byte[] address = new byte[4];
            for (int i = 0; i < 4; i++) {
                address[i] = (byte) Integer.parseInt(parts[i]);
            }
            return address;
        } else {
            throw new IllegalArgumentException("Invalid IP address format");
        }
    }

    private class RadixTreeNode {
        Map<Integer, RadixTreeNode> children;
        List<String> cidrs;

        public RadixTreeNode() {
            children = new HashMap<>();
            cidrs = new ArrayList<>();
        }
    }

    public static void main(String[] args) {
        RadixTree2 tree = new RadixTree2();

// IPv4 CIDRs
        tree.insert("192.168.0.0/24");
        tree.insert("192.168.2.0/24");

// IPv6 CIDRs
        tree.insert("2001:0db8:85a3:0000:0000:8a2e:0370:7334/64");
        tree.insert("2001:0db8:85a3:0000:0000:8a2e:0370:7336/64");

// IPv4 match
        List<String> result = tree.match("192.168.2.3");
        System.out.println("IPv4 match: " + result);

// IPv6 match
        result = tree.match("2001:0db8:85a3:0000:0000:8a2e:0370:7336");
        System.out.println("IPv6 match: " + result);

    }
}

The address conversion between ipv4 and ipv6 is realized by handwriting, and the abbreviation form is not supported. The data storage of nodes uses list to save data, and the child nodes use hashmap.

3.2 Code improved version

To convert ipv4 and ipv6 into byte arrays, use the library function method, and change the hashmap into an array, and there is the following code:

import java.net.InetAddress;
import java.util.HashMap;
import java.util.Map;

public class RadixTree3 {
    private Node root;

    public RadixTree3() {
        root = new Node();
    }

    public void insert(String cidr) throws Exception {
        String[] parts = cidr.split("/");
        String ipAddress = parts[0];
        int prefixLength = Integer.parseInt(parts[1]);
        byte[] ipBytes = InetAddress.getByName(ipAddress).getAddress();

        Node node = root;
        for (int i = 0; i < prefixLength; i++) {
            int bit = (ipBytes[i / 8] >> (7 - i % 8)) & 1;
            if (node.children[bit] == null) {
                node.children[bit] = new Node();
            }
            node = node.children[bit];
        }
        node.isLeaf = true;
        node.prefixLength = prefixLength;
    }

    public boolean search(String ipAddress) throws Exception {
        byte[] ipBytes = InetAddress.getByName(ipAddress).getAddress();
        Node node = root;
        for (int i = 0; i < (ipBytes.length * 8); i++) {
            int bit = (ipBytes[i / 8] >> (7 - i % 8)) & 1;
            if (node.children[bit] == null) {
                  break;
            }
            node = node.children[bit];
        }
        if (!node.isLeaf) {
            return false;
        }
        return true;
    }


    private static class Node {
        private Node[] children;
        private boolean isLeaf;
        private int prefixLength;

        public Node() {
            children = new Node[2];
            isLeaf = false;
        }
    }

    public static void main(String[] args) throws Exception {
        RadixTree3 tree = new RadixTree3();
        tree.insert("192.168.0.0/16");
        tree.insert("192.168.2.0/24");

        System.out.println(tree.search("192.168.2.3")); // true
        System.out.println(tree.search("192.168.3.3")); // false

        tree.insert("2001:0db8:85a3:0000:0000:8a2e:0370:7334/64");
        System.out.println(tree.search("2001:0db8:85a3:0000:0000:8a2e:0370:7334"));  // true
    }
}

The code is not complicated. An array of two elements in the left and right branches is used to store the bit branch. There is no data stored in the code, but only a judgment is made on whether the match is satisfied. If the query finds a leaf node, it matches, otherwise it does not match. In our In production applications, it is easy to transform into a base tree with data, so that storage and query performance are quite good.

The overall principle is to convert ip into a byte array, and then make a judgment on each bit. If it is 0, the position of the array is 0, which can be considered as the left subtree. If it is 1, the position of the array is 1, which can be It is considered as the right subtree, as shown in the figure below:4f21e15806583b61dc1bc107e203452f.png

Three summary

For the general search of strings, data structures such as red-black tree and hash table can well meet the general needs. Radix tree can save memory. In addition, the performance of prefix search is better, because if the search fails, you can backtrack Go to the parent node to continue searching.

Guess you like

Origin blog.csdn.net/mseaspring/article/details/129002215