Machine Learning for Computer Systems and Networking: A Survey ---Summary Reading Machine Learning for Computer Systems and Networking

Summary:

Machine learning (ML) has become the de-facto approach for various scientific domains such as computervision and natural language processing. Despite recent breakthroughs, machine learning has only made its into the fundamental challenges in computer systems and networking recently. This article attempts to shed light on recent literature that appeals for machine learning-based solutions to traditional problems in computer systems and networking. To this end, we first introduce a taxonomy based on a set of major research problem domains. Then, we present a comprehensive review per domain, where we compare the traditional approaches against the machine learning-based ones. Finally, we discuss the general limitations of machine learning for computer systems and networking, including lack of training data, training overhead, real-time performance, and explainability, and reveal future research directions targeting these limitations.

CCS Concepts: • General and reference Surveys and overviews; • Computer systems organization;

Networks;

Machine learning has become a dominant approach in several disciplines such as computer vision and natural language processing. Despite some recent breakthroughs, machine learning has only entered fundamental challenges in computer vision and networking. This paper attempts to shed light on recent attempts to solve Literature on Traditional Problems of Computer Systems and Networks. To this end, we first introduce a set of methodologies for studying the problem domain, and then we conduct a comprehensive review of each domain and compare traditional methods and ML A comparison of methods. Finally, we discuss the limitations of machine learning methods for computer systems and networks, including: insufficient training data, overfitting, real-time performance, and interpretability, and reveal directions for future development.

CCS Concepts (Core Concepts): A Survey of Computer System Network Machine Learning

1.INTRODUCTION

Revolutionary research in machine learning (ML) has significantly disrupted the scientific community by contributing solutions to long-lived challenges. Thanks to the continuous advancements in computing resources (e.g., cloud data centers) and performance capabilities of processing units (e.g., accelerators like GPUs and TPUs), ML, particularly its rather computation-expensive subset namely deep learning (DL), has gained its traction [120, 131]. In general, ML has established dominance in vision tasks such as image classification, object recognition [86], and more to follow [58,156]. Other remarkable examples where ML is stimulating include speech recognition [52] and machine translation [155]. ML is also prevailing to a plethora of specialized tasks that prior work has been far out of reach to yield notable outcomes [2, 141]. For instance, it was not until recently that top Professional Go players were beaten by a deep reinforcement learning ( DRL ) agent [141]. ML has re-established its dominance in image classification, object recognition, and many more follow-on tasks.

ML领域的革命性研究通过为长期的挑战提供解决方案,极大地颠覆了科学界.由于计算机资源(如云数据中心)和处理单元(如GPU和TPU等加速器)性能的不算进步,ML,特别是其计算成本相当高的子集,即深度学习(DL),已经获得了它的关注.ML蓬勃发展的其它例子包括语音识别和机器翻译.机器学习也在大量的专门任务中盛行,以前的工作远远无法产生显著的结果. 例如知道最近, 顶级围棋选手才被深度强化学习DRL智能体击败

Considering this unprecedented growth of ML in various classification/control tasks, one begs the question, how can we apply ML to other domains that have long suffered from a sub-optimal performance that traditional solutions can offer at their best? One prominent example is the domain of computer systems and networking, where parameter tuning and performance optimization largely rely on domain expertise and highly-engineered heuristics. Essentially, it is of great interest to answer whether it is time to make machines learn to optimize their performance by themselves automatically. Putting it into perspective, there is a multitude of challenges for which ML can prove beneficial due to its innate ability to capture complex properties and extract valuable information that no human, even the domain expert, can master.

Given the unprecedented growth of ML in various classification/control tasks, one wonders how we can apply ML to other domains that have long suffered from sub-optimal performance beyond what traditional solutions can deliver. is the field of computer systems and networks where parameter tuning and performance optimization rely heavily on domain expertise and highly engineered heuristics

We observe two general challenges in the current research practice of applying ML in computer systems and networking. First, there is no consensus on a common guideline for using ML in computer systems and networking (e.g., when ML would be preferable over traditional approaches),with research efforts, so far, scattered in different research areas. The lack of a holistic view makes researchers difficult to gain insights or borrow ideas from related areas. Second, there have not been any recent efforts that showcase how to select an appropriate ML technique for each distinct problem in computer systems and networking. In particular, we observe that in some cases, only a certain ML algorithm is suitable for a given problem, while there exist also problems that can be tackled through a variety of ML techniques and it is nontrivial to choose the best one. The above challenges constitute major obstacles for researchers to capture and evaluate recent work sufficiently, when probing for a new research direction or optimizing an existing approach.

We observe two main challenges when applying ML to the practice of computer systems and networks: First, there is no consensus on general guidelines for using ML in computer systems and networks (e.g., when ML is better than traditional methods Preferable) So far, research efforts have been scattered across different research areas. Lacking a holistic view, it is difficult for researchers to gain insights or borrow ideas from related fields. Second, and second, there have been no recent efforts to show how to The above challenges are the main obstacles for researchers to adequately capture and evaluate recent work when exploring a new direction or optimizing a new method.

In this article, we tackle these challenges by providing a comprehensive horizontal overview to the community. We focus on the research areas of computer systems and networking, which share similar flavor and have seen promising results through using ML recently. Instead of diving into one specific, vertical domain, we seek to provide a cross-cutting view for the broad landscape of the computer systems and networking field. Specifically, we make the following contributions:

— We present a taxonomy for ML for computer systems and networking, where representative works are classified according to the taxonomy.

— For each research domain covered in the taxonomy, we summarize traditional approaches and their limitations, and discuss ML-based approaches together with their pros and cons.

— We discuss the common limitations of ML when applied in computer systems and networking in general and reveal future directions to explore.

With these contributions, we expect to make the following impact: (1) introducing new researchers having no domain expertise to the broad field of ML for systems and networking, (2) bringing awareness to researchers in certain domains about the developments of applying ML on problems in neighboring domains and enabling to share and borrow ideas from each other.

This paper addresses these challenges by providing the community with a comprehensive level overview. We focus on research areas of computer systems and networks that share similar characteristics and have recently achieved promising results through the use of machine learning. Rather than delving into a particular vertical, we attempt to provide a lateral perspective on the broader landscape of computer systems and networking. Specifically, we make the following contributions: - We provide a taxonomy for ML for computer systems and networks, where representative works are classified according to the taxonomy.

- We provide a taxonomy for ML for computer systems and networks, where representative works are classified according to the taxonomy.

- For each research area covered in the taxonomy, we summarize traditional approaches and their limitations, and discuss ML-based approaches and their strengths and weaknesses.

- We discuss common limitations of ML applications in computer systems and networks, and reveal directions for future exploration.

Related surveys. There has not been a survey that satisfies the objectives we aim at achieving in this article. Most of the related surveys are domain-specific, focusing on a narrow vertical overview. For example, Zhang et al. present a survey on leveraging DL in mobile and wireless networking [195]; hence, we will skip these areas in this survey. There are also surveys focusing on areas like compiler autotuning [6], edge computing [37], and Internet-of-Things [63]. While targeting different levels of concerns, these surveys can facilitate domain experts to gain a deep understanding of all the technical details when applying ML on problems from the specific domain. However, they miss the opportunity to show the broad research landscape of using ML in Machine Learning for Computer Systems and Networking

None of the surveys currently meets the goals we aim to achieve in this paper. Most related surveys are domain-specific, focusing on a narrow vertical overview. For example, Zhang et al. presented a survey on exploiting DL in mobile and wireless networks [195]; thus, in this survey, we will skip these areas. There are also surveys focusing on areas such as compiler auto-tuning [6], edge computing [37] and Internet of Things [63]. Although targeting different levels of problems, these surveys can help domain experts gain insight into all the technical details when applying ML to solve domain-specific problems. However, they miss the opportunity to demonstrate broad research directions in the field of computer systems and networks using ML.

the general computer systems and networking field. We aim at bridging such a gap in this work.The closest work to ours is [174]. Focusing on networking, this survey provides an overview of ML applied in networking problems but ignores the computer systems part. Besides, the article was published almost four years ago. Considering that significant progress has been made in recent years, we believe it is time to revisit this topic.

At present, there is a certain gap in the application of ML in the field of computer systems and networks. This article aims to bridge this gap. The closest thing to our work is [174]. The survey focuses on networks, provides an overview of ML applied to network problems, but ignores the computer systems part. Furthermore, it has been almost four years since the article was published. Considering the remarkable progress that has been made in recent years, we feel that it is time to revisit this topic.

2 TAXONOMY

We first categorize the existing work on ML for computer systems and networking. Figure 1 presents a taxonomy from two angles: the problem space and the solution space. The problem space covers fundamental problems that have been extensively studied in the traditional computer systems and networking research area. The solution space is constructed based on our experience,where the most important feature dimensions of ML-based solutions are included. We highlightover 150 articles proposed for the problems falling in the taxonomy where ML is mentioned. The article selection is based on a comprehensive approach to cover as broadly as possible the articles in each of the selected research domains. For each of the covered domains, we selectively pick the more notable works and we provide more elaboration on their contributions.

This paper begins by categorizing the existing work on ML in computer systems and networks. Figure 1 presents the taxonomy from both problem space and solution space perspectives. The problem space covers fundamental questions widely studied in traditional computer systems and network research fields. The solution space is constructed based on our experience, which includes the most important feature dimensions for ML-based solutions. We highlight over 150 articles that raise issues that arise in taxonomies involving ML. Article selection is based on a comprehensive approach, aiming to cover the broadest possible coverage of articles in each selected research area.

2.1 Problem Space
In the problem space, we focus our attention on representative research problems from both the
computer systems and networking communities. These problems are selected based on the following principles: (1) The problem should be fundamental in the considered domain, not a niche area that requires heavy background knowledge to understand. (2) There should still be active research efforts made on addressing the problem. (3) There should be considerable research on applying ML to tackle the problem. For computer systems, we will look at three problems: memory/cache management, cluster resource scheduling, and query optimization in databases. For computer networking, we will focus on four problems: packet classification, network routing, congestion control, and video streaming. This categorization helps the readers (1) dive into the specific topics of interest directly, and (2) obtain an overview of the other problems in the neighboring fields that have benefited from ML as well. Here, we provide a brief introduction to these problems:
In the problem space, focus is placed on representative research questions from the computer systems and networking communities. The questions were selected based on the following principles: (1) Questions should be fundamental in the domain under consideration, rather than niche areas that require extensive background knowledge to understand. (2) There should still be active research efforts to address this issue. (3) There should be quite a lot of research applying ML to solve this problem. For computer systems, we will focus on three issues: memory/cache management, cluster resource scheduling, and database query optimization. For computer networks, this paper does not provide specific problem information.

computer systems.

A computer system is broadly defined as a set of integrated computing devices that take, generate, process, and store data and information. Generally speaking, a computer system is built with at least one computing device.reliability, and security. In this survey, we focus on three fundamental problems in computer systems, each representing one level of the system abstractions:

— Memory/cache management is a representative decision-making problem domain at the
level of single-device operating systems. The main problems include memory prefetching
from DRAM to CPU cache and page scheduling from disk to DRAM.
— Cluster resource scheduling is a core task at the level of distributed computing infrastructure,
which concerns the allocation of cluster resources to computing jobs in a distributed setting,
meeting set goals including resource efficiency, job completion time, among others.
— Query optimization is a central problem in databases—a representative application in systems. Given a query, the problem is to find the most efficient query execution plan.
Definition and research objectives of computer systems. A computer system is broadly defined as an integrated set of computing devices that receive, generate, process, and store data and information. Generally, a computer system is constructed from at least one computing device. In the literature, both single-device computer systems as well as distributed systems consisting of groups of computing devices have been extensively explored. The research goals of computer systems include performance, energy efficiency, etc. In computer systems, reliability and security are also one of the research goals. In this survey, we focus on three fundamental problems in computer systems, each representing a level of system abstraction:
- Memory/cache management is a representative decision problem domain at the level of a single-device operating system. The main issues include memory prefetch from DRAM to CPU cache and paging from disk to DRAM.
- Cluster resource scheduling is a core task at the level of distributed computing infrastructure, which involves allocating cluster resources to computing jobs in a distributed environment to meet set goals such as resource efficiency and job completion time.
- Query optimization is one of the core issues in the field of database management systems, aiming at improving query efficiency and response time.
Computer networking. A computer network is an interconnection of multiple computing
devices (a.k.a. hosts) where data can be sent/received among these connected devices. Apart from  the hosts, computer networks involve devices that are responsible for forwarding data between hosts, which are called network devices including routers and switches. Computer networking is a long-lasting research domain where we have seen a significant number of artifacts and control mechanisms. In particular, we will look at the following four fundamental problems in networking ranging from packet-level to connection-level, and to application-level:
— Packet classification is a basic networking functionality in almost all network devices. The
problem of network packet classification is to decide the category of packets according to
some predefined criteria with high efficiency (high speed, low resource footprint).
— Network routing concerns finding the best path for delivering packets on a network, given
some performance metrics such as latency.
— Congestion control is a network mechanism in the transport layer to provide connection
oriented services based on best-effort network delivery.
— Video streaming is one of the most popular network applications, which is mostly based on
the concept of adaptive bitrate ( ABR ) nowadays. ABR aims at choosing the most suitable
bitrate for delivering video segments under dynamic network conditions.

computer network.

A computer network is the interconnection of various computing devices (also known as hosts) that can send/receive data between these connected devices. In addition to hosts, computer networking involves devices responsible for forwarding data between hosts and computers. Hosts are called network devices, including routers and switches. Computer networking is a long-standing research area, and we have seen a large number of artefacts and control mechanisms. We will specifically focus on the following four fundamental issues in networking from the packet level, to the connection level, to the application level:

— Packet classification is the basic networking function of almost all network devices. The network packet classification problem is to determine the class of a packet according to some efficient predefined criteria (high speed, low resource usage).

-Network routing is concerned with finding the best path to send packets on a given network over some performance metric, such as latency.

— Congestion control is a network mechanism at the transport layer that provides connection-oriented services based on best-effort network delivery.

-Video streaming is one of the most popular web applications at present, and it is mostly based on the concept of Adaptive Bit Rate (ABR). The purpose of ABR is to select the most suitable bit rate for video segment transmission under dynamic network conditions.

2.2 Solution Space
Existing work of applying ML for computer systems and networking problems can also be viewed
from the angle of the solution space, namely which learning paradigm/algorithm is applied.
Applying machine learning to computer systems and networking problems can be viewed from the perspective of the solution space, i.e. which learning paradigm/algorithm to apply. There are generally three learning paradigms, supervised learning, unsupervised learning, and reinforcement learning (RL),
Learning paradigm. There are generally three types of learning paradigms, namely supervised
learning, unsupervised learning, and reinforcement learning ( RL ), and all of them have been
applied to some of the problems we cover in this survey. We refer readers not familiar with these
paradigms to a general introduction in [ 87 ].
学习范式,这段文字解释了通常有三种学习范式,即监督学习、无监督学习和强化学习(RL),它们都已经被应用于本文所涵盖的一些问题。如果您对这些范式不熟悉,可以参考[87]中的一般介绍。
Environment. There are generally two types of environments our considered problems can be
in: centralized and distributed. A centralized environment involves a single entity where decision
making is based on global information, while distributed environments involve multiple possibly
coordinated autonomous entities. While it is natural to have a distributed solution in a distributed
environment, distributed learning is generally more difficult than centralized ones, mainly due
to the limitations in coordination. Federated learning ( FL ) is a distributed learning technique,
where the client workers perform the training and communicate with a central server to share the
trained model instead of the raw data [ 7 ]. Multi-agent techniques [ 121 ] are other examples of distributed learning. In this survey, We categorize the solutions based on whether they are centralized or distributed, but we do not go further into details of the learning technique (e.g., FL). For more information about FL systems, we refer the reader to recently published surveys such as [ 190 ]

Generally, two environments can be distinguished: centralized and distributed. A centralized environment involves a single entity, with decisions based on global information, while a distributed environment involves multiple autonomous entities that may coordinate. While it is natural to have distributed solutions in a distributed environment, distributed learning is often more difficult than centralized learning, mainly due to coordination constraints. Federated learning (FL) is a distributed learning technique in which client workers perform training and communicate with a central server to share the trained model instead of raw data [7]. Multi-agent techniques [121] are other examples of distributed learning. In this survey, we classify solutions according to whether they are centralized or distributed, but we do not go into further detail on learning techniques (e.g. FL). More information on FL systems can be found in recently published surveys such as [190].

 Table 1 lists machine learning problems in different domains. These issues are grouped into four main categories: system performance, network performance, security and privacy, and others. Specific issues are listed below each category, such as in the system performance category, which includes issues such as resource management, load balancing, and fault tolerance. Table 1 aims to provide an overview to help readers understand the application of machine learning in the field of computer systems and networks.

Temporality. Learning can also be divided into two fashions with respect to their temporality:
offline and online. Offline learning requires to pre-train an ML model with existing data in advance
and the trained model is applied in decision making without being trained on further input experience. Online learning involves the continuous learning of ML models, where at inference time, the model is also updated after experiencing the given input. Depending on the scenario, a model may first be trained offline and then re-trained online.

This text explains that temporal learning in machine learning can be divided into two ways according to its temporality: offline and online. Offline learning needs to use existing data to pre-train the machine learning model in advance, and then apply the trained model when making decisions without further input experience. Online learning involves continuous learning of the machine learning model, and at inference time, the model is also updated after experiencing a given input. Depending on the circumstances, a model may be trained offline and then retrained online.

2.3 Classification of Selected Works
Before we dive into each of the problem domains in the following sections, we provide a cross
cutting view for all the fields, covering the major works and showing how they can be classified
with respect to the solution space described above. Such a view is provided in Table 1 .

The authors provide a taxonomic view across all domains, covering the main works and showing how they can be classified according to the above solution space. This taxonomy view is provided in Table 1. In the following chapters, the authors will explore each problem domain in depth, but first provide this taxonomic view across all domains to help readers better understand the application of machine learning in the field of computer systems and networks.

3 MEMORY/CACHE MANAGEMENT
Typical state-of-the-art computer systems utilize multi-layered memory devices and involve several complex memory management operations. Despite the technological advancements, i.e., the  exponential reduction of storage cost over the decades and the inverse expansion of the size, stor age systems remain the untamed stallion of performance bottlenecks in every computer system.
Figure 2 depicts the major storage-related problems in existing computer systems. In the grand
scheme of memory operations, retrieving an entry from the CPU cache is a matter of nanoseconds, but conditionally advances by several orders of magnitude when a cache miss occurs and the entry must be fetched from DRAM or a disk. Over the years, a significant amount of work has been dedicated to tackling the inefficiencies and induced latency of traversing the various memory hierarchy. In several cases, sophisticated mechanisms have been proposed that concern preemptive actions. In other words, there is ongoing research on how to prefetch data or instructions from DRAM to the CPU cache and schedule hot pages from the disk to DRAM. At the networked system level, complex cache admission and invalidation policies for content delivery networks ( CDNs ) have also been explored with the aim of performance optimization in delivering large content such as video data. The common challenge in all these storage-related problems consists in the prediction of the data pattern in these systems, which has become increasingly challenging due to the growing complexity in how applications access their data.

Modern computer systems often use multiple layers of memory devices and involve several complex memory management operations. Despite continuous technological advancements such as the exponential reduction in storage costs and inverse scaling in size over the past few decades, storage systems remain an elusive factor in the performance bottleneck of every computer system. Figure 2 depicts the main problems related to storage in existing computer systems. In the general scheme of memory operations, retrieving an entry from the CPU cache takes only a few nanoseconds, but when a cache miss occurs and the entry must be fetched from DRAM or disk, the condition improves by orders of magnitude. Over the years, a considerable amount of work has been devoted to addressing the inefficiency and latency induced by traversing various memory hierarchies. In several cases, complex mechanisms have been proposed to handle preventive actions. In other words, it is currently being studied how to prefetch data or instructions from DRAM to CPU cache, and dispatch hot pages from disk to DRAM. At the network system level, complex cache admission and invalidation strategies have also been explored to optimize the performance of delivering large content such as video data. A challenge common to all of these storage-related problems is the increasing complexity of predicting data patterns in these systems.

3.1 Traditional Approaches and Limitations
A significant amount of research has been focused on mitigating memory-induced bottlenecks.
One example is memory prefetching , which preloads memory content into the register or cache
to avoid slow memory access. There are generally two ways to implement a memory prefetcher:
software-based and hardware-based. Software-based prefetchers use explicit instructions for
prefetching. While offering flexibility, software-based prefetchers suffer from increased code
footprint and latency, and low accuracy. Therefore, mainstream memory prefetchers are imple
mented in hardware integrated in the CPU. State-of-the-art hardware prefetchers typically rely
on CPU’s memory access pattern and compute a corresponding delta based on the access pattern for prefetching [ 67 , 116 , 123 , 140 , 144 , 192 ]. However, prefetchers of this type become sub-optimal when memory accesses are highly irregular. On the other hand, prefetchers based on pattern history perform much better at capturing irregularities but are more expensive to integrate [ 194 ].
Page scheduling aims at improving performance by providing pages that are frequently accessed
close to the computing units such as DRAM. Page scheduling exhibits high complexity and vast
research efforts have attempted to address it thoroughly [ 22 , 38 , 73 , 114 , 139 , 181 , 182 ]. Common approaches on page scheduling usually involve system-level integration, for example, in the operating system or during compilation. Current state-of-the-art leverages history information to predict future memory accesses. Yet, the performance bottleneck still exists [ 36 ].
Towards caching on a larger scale, CDNs focus on optimizing the latency for content requested
from users [ 80 ]. To do so, CDNs utilize cache admission and eviction policies that fetch and remove  objects from the cache, respectively. Whenever a requested object is not in the cache and has to be  fetched, the user suffers from degraded performance [ 80 ]. Extended research has been conducted  on admission/eviction schemes for cache-miss optimization [ 23 , 39 , 80 , 102 ]

 A great deal of research has focused on alleviating the bottleneck caused by memory. One example of this is memory prefetching, which preloads memory contents into registers or caches to avoid slow memory accesses. There are generally two approaches to implementing memory prefetching: software-based and hardware-based. Software-based prefetchers use explicit instructions for prefetching. While providing flexibility, software-based prefetchers increase code footprint and latency, and are less accurate. Therefore, the mainstream memory prefetcher is implemented by hardware integrated in the CPU. State-of-the-art hardware prefetchers usually rely on the CPU's memory access patterns and prefetch by computing corresponding increments based on the access patterns [67, 116, 123, 140, 144, 192]. This type of prefetcher becomes suboptimal when memory accesses are highly irregular. On the other hand, pattern history based prefetchers perform better in capturing irregularities, but are more expensive to integrate [194].

Paging is designed to improve performance by serving frequently accessed pages to computing units such as DRAM. Paging has high complexity, and a lot of research has been done to thoroughly solve this problem [22, 38, 73, 114, 139, 181, 182]. In general, common methods of paging involve system-level integration, such as during the operating system or compilation. Current state-of-the-art techniques use historical information to predict future memory accesses. However, the performance bottleneck still exists [36]. For larger caches, CDNs focus on optimizing the latency of user requests for content [80]. To this end, CDN uses cache admission policy and invalidation policy to manage the content in the cache and minimize the time required to obtain content from the source server.

Limitations of traditional approaches: Rule-based solutions are often sub-optimal since
the data pattern is too complex to specify with simple rules. On the other hand, sophisticated solutions that explore the deep spatial and temporal correlations in the data are hard to make their way into real-world systems since they are expensive to implement and have poor generality when facing different applications. Such contradiction has led to almost stagnant developments in memory prefetching and caching solutions.
Limitations of Traditional Approaches: Rule-based solutions are often suboptimal because data patterns are too complex to be specified with simple rules. On the other hand, complex solutions for exploring deep spatial and temporal correlations in data are difficult to enter into practical systems because of their high implementation cost and poor generalizability in the face of different applications. This inconsistency has led to a near standstill for memory prefetching and caching solutions.
3.2 ML-based Approaches
As mentioned above, data pattern prediction is of paramount importance for prefetching and
caching algorithms. ML, by its nature, is a powerful tool to explore hidden patterns in irregular
data. Thus, the use of ML to these storage-related problems is well justified.
Importance of application of ML (Machine Learning) in prefetching and caching algorithms. Data pattern prediction is critical for prefetching and caching algorithms. By its very nature, ML is a powerful tool for exploring hidden patterns in irregular data. Therefore, it is entirely reasonable to apply ML to these storage-related problems.
3.2.1 Memory Prefetchers. A notable work in memory prefetchers with neural networks is introduced in [ 194 ]. The authors target pattern history-based prefetchers, and more specifically vari able length delta prefetcher ( VLDP ) [ 140 ]. The authors leverage long short-term memory
( LSTM ), an recurrent neural networks ( RNNs ) based learning algorithm, to predict the memory
addresses of upcoming accesses to memory. The authors integrate the LSTM neural network in the last cache level to make predictions in a bounded environment—the OS page [ 194 ]. Implementation and evaluation are conducted in an offline manner, where the prefetcher is tested on accuracy and coverage. Another work formulates prefetching as a classification problem and proposes two variants based on embedding LSTM and a hybrid approach respectively [ 57 ]. With embedding LSTM a small constant number of predictions per time step are performed in both a local and global context. In the hybrid approach, k -means clustering is used to partition the memory space into regions, and then a neural network is used for inference in each region. Srivastava et al. target the limitations of prior ML-based approaches and present a more robust solution for integrating a learning-based predictor to current system architectures. Similar to previous works, the authors use the LSTM to predict future memory accesses. Besides building a model with high accuracy, the authors employ model compression to increase the inference speed and achieve substantial performance gains in execution. Further, they propose to learn a policy online to retrain the model when the accuracy drops below a predefined threshold [ 149 ]. This enables the proposed solution to adjust to real-world environments where a specialized approach might not fit the mould. Overall, results are promising against traditional prefetchers and unfold a new step towards learned memory management systems.
A notable work on neural networks for memory prefetchers is presented in [194]. The authors target pattern history based prefetchers, more specifically variable length delta prefetchers (VLDP) [140]. The authors utilize long short-term memory (LSTM) and learning algorithms based on recurrent neural networks (RNNs) to predict memory addresses that are about to be accessed. The authors integrated LSTM neural networks into the last cache level for prediction in bounded environments - OS pages [194]. Implementation and evaluation are done offline, where the prefetchers are tested for accuracy and coverage. Another work formulates the prefetching problem as a classification problem and proposes two variants based on embedded LSTMs and hybrid methods [57].
An embedded LSTM-based approach that performs a small number of constant predictions per timestep locally and globally. In a hybrid approach, k-means clustering is used to partition the memory space into regions, and in each region a neural network is used for inference. Srivastava et al. address the limitations of previous ML-based approaches and propose a more robust solution to integrate learning-based predictors into current system architectures. Similar to previous work, the authors use LSTMs to predict future memory accesses. In addition to building high-precision models, the authors employ model compression to increase inference speed and achieve substantial performance gains.
The authors propose an online learning strategy to retrain the model when the accuracy falls below a predefined threshold [149]. This enables the proposed solution to be adapted to real-world settings where specialized methods may not be applicable. Overall, the results are promising compared to traditional prefetchers and demonstrate a new step towards learning memory management systems.
3.2.2 Page Scheduling.
Kleio leverages LSTM neural networks to predict page access counts in
applications that heavily impact the system’s performance [ 36 ]. Kleio identifies crucial pages that
will increase the application’s performance and trains an LSTM network for each. This approach
significantly boosts accuracy as each network is able to capture naturally the problem space of page scheduling, and the neural networks come greatly reduced in terms of output range values, which contribute to overall better predictions. Additionally, pages that are not crucial for the system’s performance fall back to the existing history-based page scheduler [ 36 ]. Evaluation indicates that  Kleio’s neural network predictions sharply enhance the application’s performance, while accuracy  indicators expose severe limitations of history-based schedulers.
leio leverages LSTM neural networks to predict page access counts in applications that have a significant impact on system performance [36]. Kleio identifies critical pages that will increase application performance and trains an LSTM network for each page. This approach significantly improves accuracy because each network is able to naturally capture the paging problem space, and the neural networks are much lower in output range values, which contributes to better predictions overall. Furthermore, for pages that are not critical to system performance, a fallback is made to the existing history-based pager [36]. Evaluations show that Kleio's neural network predictions are highly accurate and reliable. Kleio's neural network predictions significantly improved application performance, while accuracy metrics revealed severe limitations of history-based schedulers.
3.2.3 Cache Admission and Eviction in CDNs.
Recently, two remarkable contributions [ 12 , 80 ]  leverage ML techniques for cache admission policies in CDNs. Berger proposes a supervised learning scheme based on optimal caching decisions (OPT) [ 12 ]. The proposed scheme, LFO, learns a caching policy that maps features to those of OPT, essentially predicting whether an object should be admitted to the cache [ 12 ]. LFO achieves high accuracy with negligible delay, constituting a feasible alternative for production. While Berger advocates against RL for cache admission due to increased complexity and slow reward convergence [ 12 ], RL-Cache leverages RL to optimize directly for hit rate [ 80 ]. Based on a complete set of features, RL-Cache trains a neural network that decides upon an object request, whether it is to be admitted to the cache or not [ 80 ]. RL-Cache trains on trace requests from production CDNs and employs LRU as its eviction policy. Additionally, RL-Cache is optimized for deployment, considering that periodic training can be incurred in a different location relieving the content server. RL-Cache competes seriously with state-of-the-art schemes and composes an interesting direction for further research.
Fedchenko et al. take a different approach on content caching with ML in [ 45 ]. Instead of
utilizing LSTM neural networks for the prediction of sequences or time series as explained above,
a simple feed-forward neural network is used to predict the most popular entries. However,
performance advancements are insignificant when compared with existing policies [ 149 ], while
the authors report that treating the problem as classification, similarly to [ 57 ], is a direction worth
exploring [ 45 ].
Two notable recent contributions [12, 80] utilize ML techniques to cache cache permission policies in CDNs. Berger proposed a supervised learning scheme based on optimal caching decision (OPT) [12]. The proposed scheme, LFO, learns a caching policy that maps features onto OPT's features, thereby essentially predicting whether an object should be allowed into the cache [12]. With high accuracy and negligible latency, LFOs are a viable alternative in production.

3.3 Discussion on ML-based Approaches

A common theme among most existing work is the use of RNNs and in particular LSTM neural net-works. This strikes as the de-facto consideration when it comes to memory-related challenges. The ability of RNNs to preserve state is what makes them powerful in problems involving predictions on sequences of data or data in a time series. This is in clear contrast to traditional approaches, which suffer poor predictions on the complex data pattern. Meanwhile, ML-based approaches have become more accessible due to various AutoML solutions and tools. Another similar trait lies in the selection of learning algorithms the authors have to make. We observe that most approaches rely on supervised learning and making predictions. Considering the nature of the problems they are targeting, this also comes naturally. Yet, we observe that despite the fitting-the-mould type of ap-proach that researchers follow at the infant stage when results are not significant, many authors at-tempt to solve the problem with an unorthodox methodology. For instance, RL-Cache leverages RL to construct a cache admission policy rather than following a statistical estimation approach [80]. However, this does not always translate to successful solutions, albeit it certainly denotes a pat-tern on how researchers apply learned solutions to traditional and emerging challenges. Overall, ML-based approaches have demonstrated their clear benefits for problems in memory systems when facing complex data patterns and provide multiple easy-to-generalize techniques to tackle these problems from different angles. However, if the underlying data pattern is simple and easy to obtain, using ML-based approaches would become an overkill.

A common theme in most existing work is the use of RNNs, especially LSTM neural networks. This is considered the default consideration when it comes to memory challenges. The ability of RNNs to preserve state makes them powerful in forecasting problems involving data sequences or time series. This is in stark contrast to traditional methods, which are less predictive on complex data patterns. At the same time, with the diversity of automated machine learning (AutoML) solutions and tools, ML-based approaches have become more accessible. Another similar feature is that the authors need to choose the learning algorithm. We observe that most methods rely on supervised learning and prediction. This is also a natural choice given the nature of their target problems. However, we observed that although researchers followed a way appropriate to the problem at an early stage, many authors tried to solve the problem with unorthodox approaches. For example, RL-Cache leverages reinforcement learning to build cache admission policies instead of following statistical estimation methods [80]. However, this does not always translate into successful solutions, although it certainly represents a pattern in which researchers apply learned solutions to traditional and emerging challenges. Overall, ML-based methods have demonstrated their clear advantages when faced with complex data patterns, and provide a variety of easily generalizable techniques to address these issues from different perspectives. However, using ML-based approaches would become overly complex if the underlying data patterns are simple and readily available.

4 CLUSTER RESOURCE SCHEDULING

Resource scheduling concerns the problem of mapping resource demands to computing resources meeting set goals on resource utilization, response time, fairness, and affinity/anti-affinity constraints. Cloud-based solutions nowadays dominate the computing landscape, providing high Machine Learning for Computer Systems and Networking scalability, availability, and cost efficiency. Scheduling in the cloud environment goes beyond a single or multi-core computing node and needs to deal with a multitude of physical nodes, some-times also equipped with heterogeneous domain-specific accelerators. The scope of cloud resource scheduling can be within a single cloud data center or across geo-distributed cloud data centers. Cloud resource schedulers are typically built with a monolithic, two-level, or shared-state archi-tecture. Monolithic schedulers, e.g., YARN, use a single, centralized scheduling algorithm for all jobs in the system. This makes them hard to scale and inflexible to support sophisticated sched-uling policies. Two-level schedulers like Mesos [59] and Hadoop-on-Demand introduce a single active resource allocator to offer resources to scheduling frameworks and rely on these individual frameworks to perform fine-grained task scheduling. While being more scalable, the conservative resource visibility and locking make them hard to implement scheduling policies such as preemp-tion and gang scheduling that require a global view of the overall resources. Shared-state sched-ulers such as Omega [138] aim at addressing the problems of monolithic and two-level schedulers by allowing for lock-free control. Schedulers following such designs operate completely in parallel and employ optimistic concurrency control to mediate clashes between schedulers using concepts like transactions [138], achieving both flexibility in policy customization and scalability. Adopt-ing one of these architectures, many works have been done on the scheduling algorithm design. The heterogeneity of resources, together with the diversity of applications that impose different resource requirements, has rendered the resource scheduling problem a grand challenge for cloud computing, especially when scalability is of paramount importance.

Resource scheduling involves mapping resource requirements to computing resources to meet set goals such as resource utilization, response time, fairness, and affinity/anti-affinity constraints. Cloud solutions dominate the computing landscape today, offering high scalability, availability, and cost-effectiveness. Scheduling in cloud environments is not limited to single or multi-core computing nodes, but also requires processing numerous physical nodes, sometimes equipped with heterogeneous domain-specific accelerators. The scope of cloud resource scheduling can be within a single cloud data center or across geographically distributed cloud data centers. Cloud resource schedulers are typically built with monolithic, two-tier, or shared-state architectures. Monolithic schedulers such as YARN use a single centralized scheduling algorithm for all tasks in the system. This makes them difficult to scale and inflexible to support complex scheduling policies. Two-tier schedulers like Mesos (59) and Hadoop-on-Demand introduce a single active resource allocator that provides resources to scheduling frameworks and relies on these separate frameworks to perform fine-grained task scheduling. While more scalable, conservative resource visibility and locking make them difficult to implement scheduling strategies such as preemption and team scheduling, which require a global view of aggregate resources. Shared-state schedulers such as Omega are designed to solve the problems of single-block and two-tier schedulers by allowing lock-free control. Schedulers following these designs operate fully in parallel and employ optimistic concurrency control to mediate conflicts between schedulers using a transaction-like concept [138], enabling flexibility and scalability for policy customization. Much work has been done on scheduling algorithm design using one of these architectures. The heterogeneity of resources and the diversity of different resource demands make the resource scheduling problem a great challenge in cloud computing, especially when scalability is critical.

Existing scheduling algorithms generally fall into one of the three categories: centralized, dis-tributed, and hybrid. Centralized schedulers have been extensively studied, where the scheduler maintains a global view of the whole data center and applies a centralized algorithm for sched-uling [13, 49, 51, 65, 161, 163, 169]. For example, Quincy [65] and Firmament [51] transform the scheduling problem into a min-cost max-flow (MCMF) problem and use existing MCMF solvers to make scheduling decisions. Considering multiple resources including CPU, memory, disk, and network, schedulers like Tetris adapt heuristics for multi-dimensional bin packing problems to scheduling. Tetrisched takes explicit constraints with jobs as input and employs a constraint solver to optimize job placement [163]. Due to the global resource visibility, centralized schedulers normally produce efficient scheduling decisions, but require special treatments to achieve high scalability.

Existing scheduling algorithms can generally be divided into three categories: centralized, distributed, and hybrid. What has been extensively studied are centralized schedulers, where the scheduler maintains a global view of the entire data center and applies centralized algorithms for scheduling [13, 49, 51, 65, 161, 163, 169]. For example, Quincy [65] and Firmament [51] cast the scheduling problem as a minimum-cost-maximum-flow (MCMF) problem and use existing MCMF solvers for scheduling decisions. Considering multiple resources including CPU, memory, disk, and network, schedulers such as Tetris employ heuristic algorithms for multidimensional bin packing problems for scheduling. Tetrisched takes as input the explicit constraints of a job and uses a constraint solver to optimize the location of the job [163]. Centralized schedulers often yield efficient scheduling decisions due to global resource visibility, but require special handling to achieve high scalability.

Distributed schedulers make stateless scheduling decisions without any central coordination, aiming to achieve high scalability and low latency [126, 132]. For example, Sparrow [126] employs multiple schedulers to assign tasks to servers using a variant of the power-of-two-choices load balancing technique. Each of the servers maintains a local task queue and adopts the FIFO queuing principle to process the tasks that have been assigned to it by the schedulers. Fully distributed schedulers are known for their high scalability but may make poor decisions in many cases due to limited visibility into the overall resource usage.

Distributed schedulers make stateless scheduling decisions without any central coordination and aim to achieve high scalability and low latency [126, 132]. For example, Sparrow [126] employs multiple schedulers to assign tasks to servers using a power-of-two selection load balancing technique. Each server maintains a local task queue and uses a first-in-first-out (FIFO) queuing principle to process tasks that have been assigned to it by the scheduler. Fully distributed schedulers are known for their high scalability, but may make poor decisions in many cases due to limited visibility into overall resource usage.

Hybrid schedulers perform scheduling in a distributed manner with partial information about the global status of the data center [14, 2729, 74]. Hawk schedules long-running jobs with a cen-tralized scheduler while using a fully distributed scheduler for short jobs [29]. Mercury introduces a programmatic interface to enable a full spectrum of scheduling from centralized to distributed, allowing applications to conduct tradeoffs between scheduling quality and overhead [74]. Recently, a number of resource schedulers have also been proposed targeting DL work-loads [103, 122, 183]. These schedulers are domain-specific, leveraging application-specific knowledge such as early feedback, heterogeneity, and intra-job predictability to improve cluster efficiency and reduce latency.

混合式调度程序以部分有关数据中心全局状态的信息以分布式方式进行调度[14,27-29,74]。Hawk使用集中式调度程序调度长时间运行的作业,同时使用全分布式调度程序调度短作业[29]。Mercury引入了一个编程接口,可以从集中式到分布式实现全谱调度,允许应用程序在调度质量和开销之间进行权衡[74]。最近,也提出了一些针对DL工作负载的资源调度程序[103,122,183]。这些调度程序是特定于领域的,利用应用程序特定的知识,例如早期反馈、异构性和作业内可预测性,以提高群集效率并降低延迟。

Limited attention has been paid to applying ML techniques in general cluster resource scheduling. Paragon [30] and Quasar [31] propose heterogeneity and interference-aware scheduling, employ-ing techniques in recommender systems, such as collaborative filtering, to match workloads to machine types while reducing performance interference. DeepRM is one of the earliest works that leverage DRL to pack tasks with multi-dimensional resource demands to servers [106]. It translates the task scheduling problem into a RL problem and employs a standard policy-gradient RL algo-rithm to solve it. Although it only supports static workloads and single-task jobs, DeepRM demon-strates the possibility and big potential of applying ML in cluster resource scheduling. Another attempt is on device placement optimization for TensorFlow computation graphs [118]. In partic-ular, a RNN policy network is used to scan through all nodes for state embedding and is trained to predict the placement of operations in a computational graph, optimizing job completion time us-ing policy gradient methods. While being effective, training the RNN is expensive when the state is large, leading to scalability issues and requiring human experts’ involvement for manually group-ing operations in the computational graph. In a follow-up work, a two-level hierarchical network is used where the first level is used for grouping and the second for operation placement [117]. The network is trained end-to-end, thus requiring no human experts involvement.

There is insufficient research when applying machine learning techniques in general cluster resource scheduling. Paragon [30] and Quasar [31] proposed methods for heterogeneous and interference-aware scheduling, employing recommender system techniques such as collaborative filtering, to match workloads to machine types while reducing performance interference. DeepRM is one of the first studies to use deep reinforcement learning (DRL) to pack tasks with multidimensional resource requirements into servers. It transforms the task scheduling problem into a reinforcement learning problem and employs standard policy gradient reinforcement learning algorithms to solve it. Although DeepRM only supports static workloads and single-task jobs, it demonstrates the potential of applying machine learning in cluster resource scheduling. Another attempt is device location optimization for TensorFlow computational graphs [118]. In particular, use the RNN policy network to scan all nodes, embed the state and predict the position of the operation, and use the policy gradient method to optimize the job completion time. While effective, training RNNs can be expensive when the state is large, has scalability issues, and requires the involvement of human experts to manually group operations in the computation graph. In a subsequent study, a two-level hierarchical network was used, with the first level for grouping and the second level for operating locations [117]. The network is trained end-to-end, so no human experts are involved.

More recent works choose to use directed acyclic graphs (DAGs) to describe jobs and employ ML methods for scheduling DAGs. Among them, Decima proposes new representations for jobs’dependency graphs, scalable RL models, and new RL training methods [108]. Decima encodes job stages and their dependencies as DAGs and adopts a scalable network architecture as a combi-nation of a graph neural network (GNN) and a policy network, learning workload-specific so-lutions. Decima also supports continuous streaming job arrivals through novel training methods. Similarly, Lachesis proposes a learning algorithm for distributed DAG scheduling over a heteroge-neous set of clusters, executors, differing from each other on the computation and communication capabilities [101]. The scheduling process is divided into two phases: (1) the task selection phase where a learning algorithm, using modified graph convolutional networks, is used to select the next task and (2) the executor selection phase where a heuristic search algorithm is applied to as-sign the selected task to an appropriate cluster and to decide whether the task should be duplicated over multiple clusters. Such a hybrid solution has shown to provide significant performance gain compared with Decima.

较新的工作选择使用有向无环图(DAG)来描述作业,并采用机器学习方法对DAG进行调度。其中,Decima提出了新的作业依赖图表示方法、可伸缩的RL模型和新的RL训练方法[108]。Decima将作业阶段及其依赖关系编码为DAG,并采用可伸缩的网络架构,由一个图神经网络(GNN)和一个策略网络组成,学习特定于工作负载的解决方案。Decima还通过新颖的训练方法支持连续的流式作业到达。类似地,Lachesis提出了一种分布式DAG调度学习算法,用于异构集群执行器之间的调度,这些执行器在计算和通信能力上彼此不同[101]。调度过程分为两个阶段:(1)任务选择阶段,在此阶段使用修改过的图卷积网络的学习算法来选择下一个任务;(2)执行器选择阶段,在此阶段应用启发式搜索算法将所选任务分配给适当的集群,并决定是否应将该任务复制到多个集群。这种混合解决方案已经证明与Decima相比可以显著提高性能。

ML approaches have also been attempted in GPU cluster scheduling. DL2 applies a DL technique for scheduling DL training jobs [128]. It employs an offline supervised learning mechanism is used at the beginning, with an online training mechanism at run time. The solution does not depend on explicit modeling of workloads and provides a solution for transiting from the offline to the online mechanism, whenever the latter outperforms the former.

Applied machine learning methods have also been attempted in GPU cluster scheduling. DL2 applies DL techniques to schedule DL training jobs [128]. It uses an offline supervised learning mechanism at the beginning and an online training mechanism at runtime. The solution does not rely on explicit modeling of the workload and provides a transition solution from offline to online mechanisms that is switched whenever the latter outperforms the former.

4.3 Discussion on ML-based Approaches

While a significant amount of research has been conducted for cluster scheduling, only little focuses on applying learning techniques to fine-grained task scheduling. This could be in part explained by the complexity of modeling cluster scheduling problems in a learning framework, but also by the fact that the workloads related to the training of the scheduler need to be scheduled, possibly over the same set of resources, and that such training could be very costly. These all increase the complexity and should be included in the performance analysis and pros/cons studies of any ML-based cluster scheduling approach, which is so far widely ignored. Still, cluster scheduling can benefit from ML-based approaches, e.g., in heterogeneous settings or when the workload information is not known a priori.

已经有大量的研究在集群调度方向上进行,但是只有很少的研究专注于应用学习技术来进行精细的任务调度。这可以部分地解释为将集群调度问题建模在学习框架中的复杂性,同时还因为与调度器培训相关的工作负载需要被调度,可能是在同一组资源中,这种训练成本可能非常高。这些都增加了复杂性,并且应该包括在任何基于机器学习的集群调度方法的性能分析和优.

5 QUERY OPTIMIZATION IN DATABASE SYSTEMS

Involving either a well-structured relational systems with SQL support or non-tabular systems (e.g., NoSQL) and in-memory stores, database systems are always the epicenter of any meaningful transaction. Yet, the state-of-the-art database management systems (DBMS), being carefully designed, remain as the performance bottleneck for plenty of applications in a broad spectrum of scenarios.

A key factor to the performance of DBMS is query optimization—determining the most efficient way to execute a given query considering all possible query plans. Over the years, several query optimizers on different levels (e.g., execution plan optimization, optimal selection of index struc-tures) have been proposed. Most solutions rely on hand-tuned heuristics or statistical estimations. Surprisingly, case studies on query optimization reveal that the performance gains can be limited or such optimization even has a detrimental effect to the system performance, especially in the presence of estimation errors [90, 91].

Whether it's a well-structured relational system that supports SQL, or a non-tabular system like NoSQL and in-memory storage, a database system is always at the heart of any meaningful transaction. However, a well-designed advanced database management system (DBMS) still becomes a performance bottleneck in many scenarios, creating a bottleneck for many applications.

A factor critical to the performance of a database management system is query optimization, the determination of the most efficient way to execute a given query, considering all possible query plans. Over the years, multiple query optimizers have been proposed at different levels (such as execution plan optimization, optimal selection of index structures, etc.). Most solutions rely on manually tuned heuristics or statistical estimates. Surprisingly, case studies of query optimization have shown that performance gains may be limited, or such optimizations may even have a negative impact on system performance, especially in the presence of estimation errors [90, 91].

5.1 Traditional Approaches and Limitations

Over the years, significant research efforts have been made on the optimization of DBMS and particularly on the query optimizer—a crucial component causally related to the query execution performance. Traditional query optimizers take as input an SQL query and generate an efficient query execution plan, or as advertised, an optimal plan. Optimizers are typically composed of sub-components, e.g., cardinality estimators and cost models, and typically involve a great deal of statistical estimations and heuristics. The main body of literature on query optimizers typically focuses on the direct optimization of a distinct component that performs better and col-lectively yields better results. Elaborate articles on query optimization have been published over time [18]. Nevertheless, query processing and optimization remain a continually active research domain.

An early work on query optimizers is LEO [151]. LEO gradually updates, in a process described as learning, cardinality estimates, and statistics, in turn for future use to produce optimized query execution plans. LEO utilizes a feedback loop to process history query information and adjust cost models appropriately for enhanced performance [151]. CORDS is another significant work on query optimizers, which reduces query execution time by exploring statistical dependencies between columns prior to query execution [64]. Another work Eddies supports adaptive query processing by reordering operators in a query execution plan dynamically [8]. Other contri-butions focus on the optimal selection of index structures [19, 53, 166]. More recent studies attempt to answer whether query optimizers have reached their peak, in terms of optimal per-formance, or suffer from limitations and potential performance degradation and how to mitigate them [91, 129].

Over the past few years, a lot of research has been done on the optimization of database management systems, especially the query optimizer—a key component that is closely related to query execution performance. Traditional query optimizers take a SQL query as input and generate an efficient query execution plan, a so-called optimal plan. Optimizers usually consist of subcomponents, such as cardinality estimators and cost models, often involving extensive statistical estimation and heuristics. The literature on query optimizers usually focuses on directly optimizing the components that perform better and produce better results. Over time, many detailed articles on query optimization have been published [18]. However, query processing and optimization remains a continuously active area of ​​research.

One of the pioneer works on query optimizers is LEO [151]. LEO is incrementally updated through a process called learning, targeting future use to generate optimized query execution plans. LEO processes historical query information using a feedback loop and appropriately tunes the cost model to improve performance [151]. CORDS is another important query optimizer work, which exploits statistical dependencies between columns before query execution to reduce query execution time [64]. Another work, Eddies, supports dynamically rearranging operators in query execution plans to support adaptive query processing [8]. Other contributions focus on the selection of optimal index structures [19, 53, 166]. More recent research has attempted to answer whether query optimizers have reached the limits of their optimal performance, or are subject to throttling and potential performance degradation, and how to mitigate them [91,129].

It is generally true that ML models are more capable of capturing a complex intuition regarding the data schemes. Thus, leveraging ML for query optimization seems a natural fit, as the exploratory nature of ML can assist in building complex estimation and cost models. Furthermore, the innate ability of ML to adapt to different environments via continuous learning can be a beneficial factor to execution plan optimizations.

This is generally true, that machine learning models are better able to capture complex intuitions about patterns in data. Therefore, leveraging machine learning in query optimization seems to be a natural choice, since the exploratory nature of machine learning can help in building complex estimation and cost models. Furthermore, the ability of machine learning models to adapt to different environments through continuous learning is a beneficial factor for query execution plan optimization. However, it is worth noting that implementing machine learning in query optimization requires careful consideration of factors such as data quality, model accuracy, and computing resources.

5.2.1 Index Structure Optimization. Kraska et al. introduce a revolutionary approach where they investigate the overhauling of existing indexes with learned ones, based on deep learning [84]. The authors leverage a hybrid mixture of ML models to replace optimized B-trees, point indices (i.e., hash-maps), and Bloom filters. The hybrid mixture, namely Recursive Model Index (RMI), is composed of neural networks in a layered architecture and is capable of predicting index po-sitions for B-trees and hash-maps. The simplest structure of zero hidden layers resembles linear regression, an inexpensive and fast model. Inference of the output models is done in a recursive fashion, where the top-layer neural network points to the next, and the same process is repeated until one of the base models, predicts the actual index position.

In terms of index structure optimization, a revolutionary approach was proposed by Kraska et al., who studied the method of using deep learning to learn existing indexes [84]. The authors utilize a hybrid combination of machine learning models to replace optimized B-trees, point indexes (i.e. hash maps), and Bloom filters. This hybrid model, called Recursive Model Indexing (RMI), consists of a layered neural network capable of predicting index positions for both B-trees and hash maps. The simplest structure with zero hidden layers is similar to linear regression and is a cheap and fast model. The inference of the output model proceeds recursively, with the top neural network pointing to the next one, and the process repeats until one of the base models predicts the actual index position.

For Bloom filters, RMI does not fit. Thus, a more complex neural network with a sigmoid ac-tivation function is proposed, which approximates the probability that a key exists in the data-base. Comparisons with state-of-the-art approaches highlight ML as a strong rival. There are cases where reported execution speed is significantly higher, and memory footprint is sharply reduced. The novelty of the approach is a key step towards automation of data structures and optimization of DBMS, as it sets the pace for further exploration and exploitation of ML in a domain where performance is critical. ALEX extends the approach of Kraska’s et al. to provide support for write operations and dynamic workflows [33].

For Bloom filters, RMI is not applicable. Therefore, a more complex neural network with a sigmoid activation function is proposed to approximate the probability that a key exists in the database. Comparisons with state-of-the-art methods underscore machine learning as a serious competitor. In some cases, the reported execution speed is significantly improved, and the memory footprint is significantly reduced. The novelty of this approach is a key step toward the automation of data structures and optimization of database management systems, laying the foundation for further exploration and application of machine learning in performance-critical domains. ALEX extends the approach of Kraska et al. to the domain of providing support for write operations and dynamic workflows [33].

5.2.2 Cardinality Estimation. 

A slightly different attempt for alleviating wrongly predicted cost models and query cardinalities is presented in [125]. The authors seek to overcome the simplifying assumptions about the underlying data structure and patterns with the respective cost estimations, which are employed through hand-tuned heuristics. To do so, a deep neural network (DNN) is utilized that learns to output the cardinality of an input query, step by step, through division into smaller sub-queries. Moving forward, the authors utilize the estimated cardinalities to produce an optimal policy via Q-learning to generate query plans. The learning agent selects query operations based on the sub-queries, which incrementally result in a complete query plan.

In terms of cardinality estimation, [125] proposes a slightly different attempt to alleviate the problem of mispredicting cost models and query cardinality. The authors attempt to overcome the simplifying assumptions of the underlying data structure and schema applied by the hand-tuned heuristics by corresponding cost estimates. To this end, deep neural networks (DNNs) are utilized to progressively learn the cardinality of output queries by breaking them down into smaller subqueries. Next, the authors use the estimated cardinality to generate an optimal strategy for the query plan through Q-learning. The learning agent selects query operations based on subqueries, which step-by-step lead to a full query plan.

Kipf et al. focus on the prediction of join-crossing data correlations towards mitigating the drawbacks of current sampling-based cardinality estimation in query optimization [79]. The authors treat the problem with supervised learning by utilizing a DNN, defined as a multi-set convolutional network (MSCN), which in turn is integrated by a fully-connected multi-layer neural network. The MSCN learns to predict query cardinalities on unseen queries. Using unique samples, the model learns to generalize well to a variety of cases. More specifically, in a tough scenario with zero-tuples, where traditional sampling-based optimizers suffer, MSCN is able to provide a better estimation. Yang et al. take an unsupervised approach to cardinality and selectivity estimation with Naru [187]. Naru utilizes deep auto-regressive models to provide high-accuracy selectivity estimators produced in an agnostic fashion, i.e., without relying on assumptions, heuristics, specific data structures, or previously executed queries.

Kipf et al. worked on predicting cross-table data dependencies in query optimization to alleviate the shortcomings of current sampling-based cardinality estimates [79]. The authors address this problem using supervised learning, utilizing a DNN defined as a multi-ensemble convolutional network (MSCN), which in turn consists of a fully connected multilayer neural network. MSCN learns to predict query cardinality using unseen queries. Using unique samples, the model learns to generalize across a wide range of situations. More specifically, in the hard case of a zero-tuple, where traditional sampling-based optimizers suffer, MSCN is able to provide better estimates. Yang et al. perform cardinality and selectivity estimation in an unsupervised manner and use Naru [187]. Naru leverages deep autoregressive models to provide high-accuracy selectivity estimators in an unbiased manner, i.e., without relying on assumptions, heuristics, specific data structures, or executed queries.

5.2.3 Join Ordering. 

A more recent work called SkinnerDB targets adaptive query processing through optimization of join ordering [162]. SkinnerDB is based on a well-known RL algorithm, UTC [81]. Novelty lies on the fact that learning is done in real-time, during query execution, by slicing the query into small-time batches. It proceeds to select a near-optimal join order based on a qualitative measure, regret-bounded ratio, between anticipated execution time and time for an optimal join order [162]. SkinnerC, perhaps the most impactful variation of SkinnerDB, is able to outperform MonetDB, a specialized database engine for analytics, in the single-threaded mode, due to its ability to achieve highly reduced execution time in costly queries.

A recently emerged research titled SkinnerDB aims at adaptive query processing by optimizing the join order [162]. SkinnerDB is based on a well-known reinforcement learning algorithm UTC [81]. The novelty is that the learning is done in real-time during query execution, by dividing queries into small batches. It chooses a near-optimal join order based on a quality metric—the regrettably bounded ratio between the expected execution time and the optimal join order time [162]. SkinnerC is perhaps the most influential variant of SkinnerDB, able to outperform the analytics-specific database engine MonetDB in single-threaded mode, due to its ability to achieve dramatic execution time reductions in expensive queries.

ReJOIN aims at addressing the difficulties in join order selection with a DRL approach [109]. Re-JOIN formulates the RL minimization objective as the selection of the cheapest join ordering with respect to the optimizer’s cost model. Each incoming query represents a discreet time-step, upon which the agent is trained on. For the task, ReJOIN employs a neural network trained on a policy gradient method. ReJOIN performs comparably or even better than the optimizer of PostgreSQL. The evaluation is conducted on a dataset specifically designed for measuring the performance of query optimizers, namely the join order benchmark (JOB) [91].

ReJOIN旨在通过DRL方法解决连接顺序选择中的困难问题[109]。ReJOIN将RL最小化目标表述为选择最便宜的连接顺序,以优化器的成本模型为基础。每个传入的查询表示离散时间步,代理在此进行训练。为了完成任务,ReJOIN采用一种基于策略梯度方法训练的神经网络。ReJOIN的性能与甚至优于PostgreSQL的优化器相当。评估是在一个特别设计用于评估查询优化器性能的数据集上进行的,即连接顺序基准测试(JOB)[91]。

Another notable attempt is DQ [85], a DRL optimizer that exploits Q-learning in a DNN archi-tecture to learn from a sampled cost model to select the best query plan, in terms of an optimal join sequence. To mitigate failures in cardinality estimation regarding the cost model, DQ initially converges on samples collected from the optimizer’s cost model [85]. Then, weights of the neural network are stored and DQ trains again on samples gathered from real execution runs. In terms of execution, the relative Q-function is utilized to obtain the optimal join operation. Extensive eval-uation indicates that DQ is remarkably more effective than ReJOIN and in a wider scope of JOB queries. Further, DQ is able to scale by incorporating more features in an effort to achieve more accurate join cost prediction.

另一个值得注意的尝试是DQ [85],它是一个DRL优化器,利用DNN体系结构中的Q-learning从采样成本模型中学习,选择最佳的查询计划,以得到最优的连接顺序。为了缓解成本模型中基数估计的失败,DQ最初汇聚自优化器成本模型中收集的样本[85]。然后,神经网络的权重被存储下来,DQ再次针对来自实际执行运行收集的样本进行训练。在执行方面,相对Q函数被用于获得最优的连接操作。广泛的评估表明,DQ比ReJOIN在JOB查询的更广泛范围内更有效。此外,DQ能够通过合并更多的特征来扩展,以努力实现更准确的连接成本预测。

In adaptive query processing, an early approach picks off from Eddies [8], and leverages RL to train optimal eddy routing policies [164]. The authors focus is on join and conjunctive selection queries. Additionally, the proposed framework incorporates various join operators and constraints from the state-of-the-art literature. Overall, the results indicate significance in learning an optimal query execution plan and fast reactions to changes. This work can be considered an infant step towards learned query optimizers.

In adaptive query processing, an early approach was to extract from Eddies [8] and leverage RL to train an optimal eddy routing policy [164]. The author's focus is on join and conjunction select queries. Furthermore, the proposed framework also includes various join operators and constraints from the latest literature. Overall, the results show the importance of learning optimal query execution plans and reacting quickly to changes. This work can be seen as a first step towards a learned query optimizer.

5.2.4 End-to-end Query Optimization. Different from the above works that focus on distinct components of query optimizers, Neo builds an end-to-end query optimizer based on ML [110]. Neo, short for Neural Optimizer, utilizes different DL models to replace each of the components of a common optimizer. Neo relies on prior knowledge to kick-start but continues learning when new queries arrive. This approach makes Neo robust to dynamic environments, regarding unfore-seen queries, albeit Neo cannot generalize to schema and data changes. Neo’s contributions are manifold. Besides adaptation to changes, Neo is able to decide between three different common operations, namely join ordering, index selection, and physical operator selection [110]. Moreover, Neo integrates easily with current execution engines and users can specify their optimization objectives. Evaluations show that Neo outperforms simple optimizers and exhibits comparable performance to long-lived commercial ones.

Different from the above works focusing on different query optimizers, Neo builds an end-to-end query optimizer based on ML [110]. Neo stands for Neural Optimizer, which utilizes different DL models to replace every component of common optimizers. Neo relies on prior knowledge to start, but continues to learn as new queries arrive. This approach makes Neo robust to dynamic environments, including unforeseen queries, although Neo cannot generalize to schema and data changes. Neo's contributions are manifold. In addition to adapting to changes, Neo is able to decide between three common operations, join order selection, index selection, and physical operator selection [110]. In addition, Neo integrates easily with current execution engines, and users can specify their optimization goals. Evaluations show that Neo outperforms simple optimizers and exhibits comparable performance to long-term commercial optimizers.

Another notable attempt is DQ [85], a DRL optimizer that exploits Q-learning in a DNN archi-tecture to learn from a sampled cost model to select the best query plan, in terms of an optimal join sequence. To mitigate failures in cardinality estimation regarding the cost model, DQ initially converges on samples collected from the optimizer’s cost model [85]. Then, weights of the neural network are stored and DQ trains again on samples gathered from real execution runs. In terms of execution, the relative Q-function is utilized to obtain the optimal join operation. Extensive eval-uation indicates that DQ is remarkably more effective than ReJOIN and in a wider scope of JOB queries. Further, DQ is able to scale by incorporating more features in an effort to achieve more accurate join cost prediction.

Another notable attempt is DQ [85], an optimizer utilizing reinforcement learning (DRL) that learns and selects the best query plan from a sample cost model in a DNN architecture for an optimal join order. To mitigate cardinality estimation failures associated with cost models, DQ initially collects samples and converges in the optimizer's cost model [85]. The weights of the neural network are then stored and DQ is trained again on samples collected from actual execution runs. In terms of execution, the relative Q function is exploited to obtain the optimal join operation. Extensive evaluations show that DQ is significantly more effective on a wider range of JOB queries than ReJOIN. Furthermore, DQ can be extended by incorporating more features, leading to more accurate join cost predictions. Overall, this seems like a useful tool for optimizing query plans.

In adaptive query processing, an early approach picks off from Eddies [8], and leverages RL to train optimal eddy routing policies [164]. The authors focus is on join and conjunctive selection queries. Additionally, the proposed framework incorporates various join operators and constraints from the state-of-the-art literature. Overall, the results indicate significance in learning an optimal query execution plan and fast reactions to changes. This work can be considered an infant step towards learned query optimizers.

In adaptive query processing, early approaches were based on Eddies [8] and leveraged reinforcement learning (RL) to train optimal Eddy routing policies [164]. The author's focus is on joins and select queries. Furthermore, the proposed framework incorporates various join operators and constraints from the latest literature. Overall, the results show significant implications for learning optimal query execution plans and responding quickly to changes. This work can be seen as a preliminary attempt to learn query optimizers.

5.2.4 End-to-end Query Optimization. Different from the above works that focus on distinct components of query optimizers, Neo builds an end-to-end query optimizer based on ML [110]. Neo, short for Neural Optimizer, utilizes different DL models to replace each of the components of a common optimizer. Neo relies on prior knowledge to kick-start but continues learning when new queries arrive. This approach makes Neo robust to dynamic environments, regarding unfore-seen queries, albeit Neo cannot generalize to schema and data changes. Neo’s contributions are manifold. Besides adaptation to changes, Neo is able to decide between three different common operations, namely join ordering, index selection, and physical operator selection [110]. Moreover, Neo integrates easily with current execution engines and users can specify their optimization objectives. Evaluations show that Neo outperforms simple optimizers and exhibits comparable performance to long-lived commercial ones.

5.2.4 End-to-end query optimization.

Unlike the above works that focus on the individual components of a query optimizer, Neo builds an end-to-end query optimizer based on ML [110]. Neo (neural optimizer) utilizes different DL models to replace every component of common optimizers. Neo relies on prior knowledge to initiate learning, but continues learning when new queries arrive. This approach makes Neo robust to unexpected queries in dynamic environments, although Neo cannot generalize to schema and data changes. Neo's contributions are manifold. In addition to adaptability to changes, Neo is able to decide between three different common operations, namely join sort, index selection, and physical operator selection [110]. In addition, Neo is easy to integrate with current execution engines, and users can specify optimization goals. Evaluation results show that Neo outperforms simple optimizers and exhibits comparable performance to long-lived commercial optimizers.

Around the same time with Neo, SageDB conceptualizes the vision for a DBMS where crucial components, including the query optimizer, are substituted with learned ones [83]. Overall, the

article describes the design of such a system and how would all components tie together in a complete solution. Two approaches that concern optimization of queries in a distributed setting are Lube [172] and its sequel, Turbo [173]. Both techniques leverage ML models that concern query execution in clusters. More specifically, Lube regards minimization of response times for a query by identifying and resolving bottlenecks [172]. On the other hand, Turbo, aims at optimizing query execution plans dynamically [173].

 几乎与 Neo 同时,SageDB 提出了一种数据库管理系统 (DBMS) 的愿景,其中关键组件,包括查询优化器,将被学习所取代 [83]。总体而言,文章描述了这样的系统的设计方案,以及所有组件如何在整个解决方案中紧密协作。在分布式环境中优化查询的两个方法是 Lube 和其续集 Turbo。两种方法都利用集群中查询执行的 ML 模型。具体而言,Lube 通过识别和解决瓶颈来最小化查询响应时间 [172]。另一方面,Turbo 旨在动态优化查询执行计划 [173]。

5.3 Discussion on ML-based Approaches

Despite the decades of active research on DBMS and query optimization, it remains a fact that performance is far from optimal [90, 91, 129]. Yet, the pivotal role of databases in modern sys-tems calls for further scrutiny. Similar to the problems in memory systems, query optimization in databases also heavily relies on the prediction of the data pattern on which ML-based approaches have demonstrated clear benefits over traditional approaches in complex scenarios. We witness the quest to yield better performance by leveraging ML in query optimization schemes. Moreover, we observe how multi-faceted those approaches are. For instance, ReJoin [109] and DQ [85] utilize DRL to tackle the selection of optimal join sequences, while Kipf et al. focus on cardinality esti-mation through supervised learning [79]. Additionally, when traditional approaches suffer and as modern computational units migrate to more distributed settings, we notice significantly broader approaches that target query processing accordingly (e.g., Turbo). More interestingly, we are per-ceiving the progress of research and how it essentially dissolves into unified schemes as recent works conceptualize end-to-end solutions (e.g., Neo and SageDB) by leveraging ML.

尽管数据库管理系统和查询优化已经进行了数十年的积极研究,但仍然存在性能远远没有达到最佳状态的事实 [90, 91, 129]。然而,现代系统中数据库的关键作用要求进一步审查。与内存系统类似,数据库中的查询优化也在很大程度上依赖于数据模式的预测,其中基于 ML 的方法在复杂的场景中比传统方法显示出明显的优势。我们在利用 ML 在查询优化方案中追求更好的性能方面见证了努力。此外,我们观察到这些方法有多么多方面。例如,ReJoin [109] 和 DQ [85] 使用 DRL 解决最佳连接序列的选择问题,而 Kipf 等人则通过监督学习关注 cardinality 估计 [79]。此外,当传统方法受到困扰,现代计算单元向更分布式的设置迁移时,我们注意到针对查询处理显著更为广泛的方法 (例如 Turbo)。更有趣的是,我们发现研究的进展实际上已经转化为统一的方案,随着最近的作品利用 ML 概念化端到端解决方案 (例如 Neo 和 SageDB)。

6 NETWORK PACKET CLASSIFICATION

Packet classification is a crucial and fundamental networking task, which enables a variety of network services. Typical examples of such network services include traffic engineering (e.g., flow scheduling and load balancing), access control, and firewall [55, 158]. A high-level overview of packet classification is depicted in Figure 3. Given a collection of rules, packet classification matches a packet to one of the given rules. Rule matching is based on certain criteria typically applied on the fields in the packet header such as source and destination IP addresses, protocol type (often including flags), and source and destination port numbers. Matching conditions in-clude prefix-based matching, range-based matching, and exact matching. Considering the ever-increasing network traffic, packet classification dictates the need for high performance in terms of classification speed and memory efficiency. These traits need to include also a high level of classi-fication accuracy, since mismatches can result in serious network issues such as security breaches.

Network packet classification is a critical and fundamental networking task that provides the foundation for various network services. Typical examples of these network services include traffic engineering (such as flow scheduling and load balancing), access control, and firewalls [55, 158]. In Figure 3, a high-level overview is provided, describing the process of packet classification. Given a set of rules, packet classification matches a packet to a given rule. Rules are matched based on criteria, usually applied on fields in the packet header, such as source and destination IP addresses, protocol type (often including flags), and source and destination port numbers. Matching conditions include prefix matching, range matching and exact matching. Considering the ever-increasing network traffic, packet classification requires high performance in terms of classification speed and memory efficiency. These features also need to include high levels of classification accuracy, since mismatches can cause serious network issues such as security breaches.

6.1 Traditional Approaches and Limitations

6.1 Analysis of Traditional Approaches and Limitations

The solution space for packet classification can be generally divided into hardware- and software-based approaches. Hardware-based approaches typically leverage ternary content addressable memories (TCAMs) and are considered the standard in industrial high-performance routers and middleboxes. TCAM is a specialized type of high-speed memory, which stores matching rules as a massive array of fixed-width entries [89] and is able to perform multi-rule matching in constant time. Early work also extends TCAMs to increase performance on lookups and reduce power consumption by utilizing a special storage block that is indexed before resolving to subsequent lookups [146]. Although the use of TCAMs significantly boosts classification speed, these solutions have inherit limitations including poor scalability (e.g., in-range expansion), high cost, and high power consumption [89].

For the solution of packet classification, it can be generally divided into two methods: hardware and software. The hardware approach typically utilizes ternary content-addressable memories (TCAMs) and is considered standard in industrial high-performance routers and middleware. TCAM is a high-speed memory that stores matching rules in a large fixed-width array and is capable of performing multi-rule matching in constant time. Earlier work also extended TCAMs to increase performance in terms of lookup performance and utilized a special index memory block that was indexed before solving subsequent lookups [146]. Although the classification speed is significantly improved using TCAMs, these solutions suffer from inherited limitations, including poor scalability (eg, range extension), high cost, and high power consumption [89].

On the other hand, software-based approaches offer greater scalability but suffer performance-wise in general. A representative family of software-based approaches is based on tuple space in-troduced in [148]. These approaches partition rules into tuple categories and leverage hashing keys for accessing the tuple space of a particular filter [158]. While yielding fast queries, the hashing induces non-deterministic speeds on look-ups or updates [54]. Another family of software-based approaches is based on decomposition. A noteworthy work in this family is DCFL [159], which takes a distributed approach to filter searching. In particular, independent search engines match the filter fields and aggregate the results in an arbitrary order [158]. However, this technique man-dates multiple table accesses, thus impacting performance [167].

On the other hand, software-based approaches are more scalable but often perform poorly. A typical software-based approach is based on tuple spaces [148]. These methods group rules into tuple categories and use specific hash keys to access the filter-specific tuple space [158]. While this approach results in fast lookups, hashing results in non-deterministic speed when looking up or updating [54]. Another software-based approach is decomposition-based. In this family, notable is DCFL [159], which employs a distributed approach to filter search. In particular, independent search engines match filter fields and aggregate results in arbitrary order [158]. However, this approach requires multiple table accesses and thus impacts performance [167].

Most software-based packet classification approaches are based on decision trees. The idea is to classify packets by traversing a set of pre-built decision trees and selecting the highest priority rule among all matched rules in one or more decision trees. To reduce the classification time and mem-ory footprint, decision trees are optimized to have small depths and sizes based on hand-tuned heuristics like node cutting or rule partitioning [54, 142]. EffiCuts, which builds on its predeces-sors HiCuts [54] and HyperCuts [142], significantly reduces memory footprint by employing four heuristics: separable trees, selective tree merging, equi-dense cuts, and node co-location [167]. A more recent work, CutSplit, optimizes decision trees on the premises of reducing rule overlap-ping, unoptimized yet faster first stage cuttings, and by effective pre-cutting and post-splitting actions [94]. Another work leverages decision trees and TCAMs in a hybrid approach [82].

Most software-based package classification methods are based on decision trees. This approach performs classification by traversing a set of pre-built decision trees and selecting the highest priority rule among all matching decision trees. To reduce classification time and memory footprint, decision trees are optimized by hand-tuned heuristics such as node cuts or regular splits, targeting small depth and size [54, 142]. EffiCuts, built on top of its predecessors HiCuts[54] and HyperCuts[142], significantly reduces memory footprint by employing four heuristics: separable tree, selective tree merging, isopycnic cut and node co-placement, using efficient prefix-cut and post-split operations such as splitting trees, selective tree merging, isopycnic cutting, and node co-placement [167]. Recently, CutSplit proposes a novel method aimed at reducing rule overlap while performing preprocessing and post-segmentation operations on unoptimized but faster first-level cuts [94]. Another work adopted a hybrid approach of decision tree and TCAM [82]

Limitations of traditional approaches: Current hardware- and software-based solu-tions pose strong limitations to effective packet classification. As discussed, hardware ap-proaches fall short in terms of scalability and exhibit significant monetary and power costs, while software solutions rely on hand-tuned heuristics. Heuristics can be either too general to exploit the characteristics of a given rule set or too specific to achieve good performance on other rule sets. In addition, the lack of a specific, global optimization objective in the heuristic design can result in sub-optimal performance. Finally, the incorporation of differ-ent heuristics into a single solution can incrementally increase the overall complexity of the approach, hindering optimization due to difficulty in unders.

Limitations of Traditional Approaches: Current hardware and software solutions impose strong limitations on effective packet classification. As discussed, hardware approaches perform poorly in terms of scalability and cost, while software solutions rely on hand-tuned heuristics. Heuristics may be too general to take advantage of the characteristics of a given ruleset, or too specific to achieve good performance on other rulesets. Furthermore, the heuristic design lacks a specific global optimization goal, which may lead to poor performance. Finally, combining different heuristics into a single solution can increase the complexity of the overall solution, hindering optimization because it is difficult to understand them.

ML for packet classification typically replaces the classifier with a model pre-trained with super-vised learning. However, with the recent advances in DRL, the solution space of packet classifica-tion approaches broadens. In general, there are three categories: (1) using supervised learning to replace the classifier with a trained model, (2) using RL agents to generate suitable decision trees at runtime according to the given set of rules, and (3) leveraging unsupervised learning to cluster unforeseen traffic.

Machine learning for packet classification typically replaces supervised learning pre-trained patterns with Classifiers. However, with recent advances in deep learning techniques, the solution space for packet classification methods has expanded. In general, there are three categories: (1) using supervised learning to replace the trained model with a Classifier, (2) using reinforcement learning agents to generate appropriate decision trees at runtime based on a given set of rules, (3) using unsupervised Learn to group unforeseen traffic.

Approach (1) fits naturally since packet classification is by definition a classification task. These approaches commonly utilize a traditional supervised learning setting where information concern-ing incoming traffic is known a priori and traffic is classified into distinct labeled sets. Traditional supervised learning proposals for packet classification have also been reviewed extensively in [124]. Yet, the first remarkable work in this direction that leverages DL targeting traffic classifica-tion is only recently introduced with Deep Packet [100]. This work leverages convolutional neu-ral networks (CNNs) to construct a traffic classifier that is able to characterize traffic and identify applications without given advanced intelligence (i.e., hand-tuned features). Despite the promis-ing results, the accuracy requirement of packet classification renders the neural network-based approach impractical. This is because neural networks cannot guarantee the correct matching of rules. Moreover, the size of the neural network has to be big enough in order to handle a large set of rules. Thus, achieving high performance is very unlikely without hardware accelerators like GPUs [97]. Further, supervised learning schemes are generally limited by design as supervised learning necessitates certain information is known in advance [130].

方法 (1) 自然地适合,因为分组分类本身就是一种分类任务。这些方法通常使用传统的监督学习方法,其中输入流量的信息事先已知,并将流量分类为不同的标记组。在 [124] 中详细介绍了传统监督学习方法用于分组分类的建议。然而,第一个利用深度学习 (DL) 针对流量分类进行优化的工作直到最近才出现,即 Deep Packet[100]。这项工作利用卷积神经网络 (CNNs) 构建一个流量分类器,可以特征化流量并识别应用程序,而不需要高级智能 (即手动调整特征)。尽管结果令人鼓舞,但分组分类的准确性要求使得基于神经网络的方法不实用。这是因为神经网络不能保证规则的正确匹配。此外,神经网络的大小必须足够大,以处理大型规则集。因此,没有像 GPU 这样的硬件加速器,实现高性能非常不太可能 [97]。此外,监督学习方法通常被设计为有限制的,因为监督学习需要事先知道一些信息。

Approach (2) learns at the meta-level where we learn to generate appropriate decision trees and use the resulting decision trees for actual packet classification. This way, ML methods are out of the critical path so performance is no longer an issue. NeuroCuts is to the best of our knowledge the first work that employs a DRL method for decision tree generation [97]. NeuroCuts employs a DRL approach in a multi-agent learning setting by utilizing an actor-critic algorithm based on Proximal Policy Optimization (PPO) [137]. An agent executes an action in each discrete time step with the target of obtaining a maximized reward. The action depends on the observed en-vironment state. Following the same footsteps, Jamil et al. introduce a classification engine that leverages DRL to generate an optimized decision tree [70]. In detail, the derived tree concentrates the essential bits for rule classification into a compact structure that can be traversed in a sin-gle memory access [70]. Then, the outcome of the traversal of the generated tree is utilized in the original tree to classify packets. This results in a lower memory footprint and higher packet classification speed.

Method (2) learns at the meta level, i.e. learns to generate appropriate decision trees, and uses the generated decision trees for actual group classification. This way, the machine learning method is no longer on the critical path, so performance is no longer an issue. To our knowledge, NeuroCuts is the first research work to generate decision trees using the DRL approach [97]. NeuroCuts uses an actor-critic algorithm based on Proximal Policy Optimization (PPO), using a DRL approach in a multi-agent learning environment. At each discrete time step, the agent performs the action whose goal is to maximize the reward. Actions depend on the observed state of the environment. Following in the same footsteps, Jamil et al. introduced a classification engine that leverages DRL to generate optimized decision trees [70]. Specifically, the resulting tree gathers the essential bits of regular classifications in a compact structure accessible by one memory access [70]. The results of the traversal of the resulting tree are then applied to the original tree to classify the packets. This results in lower memory footprint and higher packet classification speed.

Finally, following approach (3), Qin et al. leverage an unsupervised learning scheme to miti-gate drawbacks of prior supervised learning solutions [130]. As mentioned in their work, existing supervised approaches fail to adjust to network changes as unforeseen traffic arrives and classifi-cation performance deteriorates. Besides, the authors advocate in favor of link patterns as a crucial property on network knowledge, while most approaches utilize only packet-related features. They propose a novel combinatorial model that considers both sources of information (packet and link patterns), in a clustering setting [130]. The approach is evaluated against several baselines of su-pervised and clustering algorithms and is able to outperform all of them, building a strong case for traffic classification with unsupervised learning.

Finally, following method (3), Qin et al. utilize an unsupervised learning scheme to alleviate the shortcomings of previous supervised learning solutions [130]. In their work, existing supervised learning methods cannot adapt to network changes because classification performance deteriorates when unexpected traffic arrives. Furthermore, the authors claim support for link patterns as key properties of network knowledge, whereas most methods only use packet-related features. They proposed a novel combination model that considers two sources of information (data and link patterns), in a clustering setting [130]. The method is compared to the baselines of several supervised and clustering algorithms and is able to outperform them, establishing a strong case for traffic classification using unsupervised learning.

6.3 Discussion on ML-based Approaches

Recent works in packet classification provide us with several useful insights. First, we observe that technological advancements in ML drive stimuli in the way researchers approach now the chal-lenge of packet classification. For instance, typical solutions that used to solely focus on training packet classifiers have now diverged to more radical, unorthodox approaches. Second, we can see that significant effort has been put towards leveraging DRL-based solutions, perhaps the most re-cently advanced and trending research domain for the past few years. Third, we observe that ML often entails more performance metrics than common approaches. For example, classification accu-racy is a critical metric in packet classification with supervised learning, whereas hardware-based Machine Learning for Computer Systems and Networking approaches (e.g., TCAMs) do not impose such constraints. Finally, we can deduce that as the scope of work widens, more and more works that target similar directions will be explored and proposed. For example, Li et al. propose a novel way of caching rules into memory with LSTM neural net-works, which can be directly exploited for packet classification [93]. Overall, ML-based approaches address the limitations of traditional approaches by being more generalizable, being able to incor-porate complex optimization goals, and reducing the design complexity. However, they still fall short for critical scenarios due to the lack of guarantee in results and explainability. In scenarios where accuracy is of critical importance, traditional approaches would still be preferable.

Recent research in packet classification provides us with several useful insights. First, we observe that advances in machine learning techniques have driven the way researchers now approach packet classification challenges. For example, traditional solutions that only focus on training packet classifiers have evolved into more aggressive, unconventional approaches. Second, we can see that significant efforts have been put into exploiting deep reinforcement learning based solutions, which is probably the most advanced and popular research area in the past few years. Third, we observe that machine learning typically involves more performance metrics than conventional methods. For example, classification accuracy in supervised learning does not impose limits on hardware-based machine learning computer systems and network methods such as TCAM. Finally, we can infer that as the scope of work expands, more and more works targeting similar directions will be explored and proposed. For example, Li et al. proposed a new method to cache rules into memory using LSTM neural network, which can be directly used for packet classification [93]. Overall, machine learning-based approaches address the limitations of traditional approaches by being more general, capable of incorporating complex optimization objectives, and reducing design complexity. However, there are still insufficient result guarantees and interpretability in key scenarios. In scenarios where performance-critical accuracy is critical, traditional methods are still preferable.

7 NETWORK ROUTING

Traffic Engineering (TE) is the process of optimizing performance in traffic delivery [175]. Per-haps the most fundamental task of TE is routing optimization, a path selection process that takes place between or across networks. More specifically, packet routing concerns the selection of a path from a source to a destination node through neighboring nodes. In each traversing node, routing aims at answering the question of which adjacent node is the optimal node to send the packet. Common objectives of packet routing involve optimal time to reach the destination, max-imization of throughput, and minimum packet loss. It should be applicable to a broad variety of network topologies.

Traffic engineering (TE) is the process of optimizing performance in terms of delivered traffic [175]. Perhaps the most fundamental task of TE is route optimization, a process of path selection that occurs between or across networks. More specifically, packet routing involves path selection from a source node to a destination node, passing through neighboring nodes. In each transmitting node, the goal of routing is to answer which neighboring node is the best node to send the data packet. The common goals of packet routing are optimal time to destination, maximum throughput, and minimum packet loss rate, which should apply to a wide variety of network topologies.

7.1 Traditional Approaches and Limitations

Routing is a broad research subject with a plethora of differing solutions and approaches proposed over the years. Routing commonly differentiates between intra- and inter-domain. The former concerns packets being sent over the same autonomous system (AS) in contrast with the latter, which regards sending packets between ASes. Routing can also be classified based on enforcement mechanisms or whether it concerns offline/online schemes, and furthermore on the type of traffic per se [175]. Based on this wide taxonomy, which only expands with emergent network topolo-gies, there are distinct types of proposed solutions, as well as research that typically considers a more fine-grained domain of routing optimization. As our interests lies mostly in computer sys-tems, we concentrate mainly on intra-domain traffic engineering. A comprehensive survey that covers routing optimization in a coarse-grained manner that concerns traditional networks and topologies is presented in [175]. For surveys on routing in wireless sensor networks and ad-hoc mobile networks, we refer the readers to [5].

Routing is a broad research topic and a large number of different solutions and approaches have been proposed over the years. Routing is generally distinguished as intra-domain and inter-domain. The former involves sending packets within the same Autonomous System (AS), while the latter involves sending packets between ASs. Routing can also be classified according to the enforcement mechanism or whether it involves an online/offline scheme and the type of traffic [175]. Based on this broad taxonomy, different types of solutions exist as new network topologies emerge, and research in the field of routing optimization generally considers more fine-grained. Since our interest is mainly in computer systems, we mainly focus on intra-domain traffic engineering. A survey of coarse-grained routing optimization covering traditional networks and topologies can be found in [175]. For a review of routing in wireless sensor networks and ad hoc mobile networks, we refer the reader to [5].

Regarding intra-domain traffic engineering, open shortest path first (OSPF) solutions are prevalent favoring simplicity but often suffer in performance. OSPF solutions cope well with scala-bility as network growth has reached an all-time high, but have pitfalls when it comes to resources utilization [115]. As in the case of packet classification, common OSPF proposals are based on hand-tuned heuristics [147]. Moreover, most of existing literature that aims at meditating the per-formance boundaries set by OSPF approaches, have seen rare actual implementation [115].

关于域内交通工程,开放最短路径优先 (OSPF) 解决方案普遍存在,倾向于简单性,但往往性能不佳。当网络增长达到历史新高时,OSPF 解决方案能够很好地应对可扩展性,但在资源利用率方面存在陷阱 [115]。就像 packet classification 一样,常见的 OSPF 提案基于手动调节的经验法则 [147]。此外,旨在反思 OSPF 方法所设定的性能边界的大部分现有文献,都罕见实际实施 [115]。

In addition, we find it necessary to add some notes about software-defining networking (SDN). Conventional, non-SDN, network devices embed dedicated software to implement network-ing logic, such as packet routing, and are characterized by long and costly evolution cycles. SDN reduces networking devices to programmable flow-forwarding machines, with networking logic now running at a logically centralized controller, adding more flexibility in programming the net-working behaviors [112]. SDN can therefore be seen as a way to implement routing decisions at network devices, but it does not change the nature of the routing problem, which now needs to be solved by the controller. We do not go further into such implementation details and refer the readers to published surveys in this area, e.g., [42].

Also, we think it's worth adding some notes on software-defined networking (SDN). Traditionally, non-SDN network devices embed dedicated software to implement network logic, such as Packet routing), and have a long and expensive evolution cycle. SDN reduces network devices to programmable traffic forwarding machines, and network logic now runs on logically centralized controllers, increasing the degree of freedom to program network behavior [112]. Therefore, SDN can be seen as a way to implement routing decisions on network devices, but it does not change the essence of the routing problem, which needs to be solved by the controller. We do not delve into these implementation details and refer the reader to published surveys such as [42].

Limitations of traditional approaches: The main challenge of network routing consists in the ever-increasing dynamics of the networks, including the traffic loads as well as the network characteristics (e.g., topology, throughput, latency, and reliability), and the multi-faceted optimization goals (reflecting the user quality of experience ultimately), which are hard to be interpreted as a simple formula for handcrafted heuristics to optimize. The im-plementation complexity of network routing optimization is also a practical concern.

Limitations of Traditional Approaches: The main challenge of network routing is the dynamics of network growth, including traffic load and network characteristics (such as topology, throughput, delay, and reliability), and multifaceted optimization goals (ultimately reflecting user service quality ). These objectives are difficult to optimize with simple hand-crafted heuristics. In addition, the implementation complexity of network routing optimization is also a practical problem.

The first work involving ML on the challenge of traffic engineering dates back to 1994, namely Q-Routing [15]. Leveraging a fundamental RL algorithm, Boyan et al. propose a learning-based ap-proach to tackle the problem. Q-Routing derives from Q-learning [177], and is able to generate an efficient policy with a minimization objective—the total time to deliver a packet from source to des-tination. By conducting a series of experiments on different network topologies and dynamic net-works, Q-Routing exhibits significant performance gains, especially on congested network links, over static approaches. More importantly, Q-Routing establishes RL as a natural fit to the problem and paved the way for research on learnt systems for traffic engineering.

The first work on a traffic engineering challenge involving machine learning (ML) dates back to 1994, namely Q-Routing [15]. Using a basic reinforcement learning algorithm, Boyan et al. propose a learning-based solution to this problem. Q-Routing is derived from Q-learning [177], which can generate efficient policies with the optimization goal of minimizing the goal (total time from source to destination). Through a series of experiments in different network topologies and dynamic networks, Q-Routing demonstrates significant performance gains over static methods, especially on congested links. More importantly, Q-Routing established the applicability of reinforcement learning to this problem and paved the way for the study of learning systems in the field of traffic engineering.

Since the introduction of Q-Routing, several works have emerged that utilize alternative RL methods or study various network topologies and scope of applications. A comprehensive survey of these solutions w.r.t. all known network topologies, i.e., from static to dynamic vehicular and ad-hoc networks, is provided in [104]. However, it is clear from the survey that when it comes to implementation, the tremendous state and action space quickly becomes a hefty burden in learning and thus results in sub-optimal solutions. The application of DL to network traffic control and essentially routing optimization has been studied in [44, 105, 165]. In detail, Fadlullah et al. take a high-level overview of what is comprised as a supervised learning scheme with deep belief networks (DBNs) [26], which predicts on a node basis the next path (i.e., router) to deliver to and in turn, the destination node does the same and so forth [44]. In this approach, each node is solely responsible for outputting the next node and the full delivery path is uncovered in a hop-by-hop manner. Interestingly enough, Mao et al. take a similar supervised approach with Deep Belief Architectures (DBAs) in their work [165]. Even though both approaches are novel and interesting first-steps, they are applied in a constrained static setting raising questions that come naturally as how supervised learning can be scaled and applied efficiently into dynamic and large network topologies.

Since Q-Routing was proposed, there have been some works exploiting other reinforcement learning methods or investigating various network topologies and application ranges. A comprehensive survey of these solutions is provided in [104], especially for all known network topologies, including static, dynamic vehicular, and ad hoc networks, among others. However, investigations show that, when implemented, the huge state and action spaces quickly become a learning burden, leading to suboptimal solutions. The application of deep learning (DL) to network traffic control and essentially routing optimization has been studied [44, 105, 165]. Specifically, FADLULLAH et al. present a high-level overview describing a supervised learning scheme using deep belief networks (DBNs) that makes predictions about nodes, determines the next path (i.e., router), and uses By analogy, the same is true for destination nodes [44]. In this scheme, each node is only responsible for outputting the next node and revealing the complete transmission path hop by hop. Interestingly, a similar supervised approach was adopted in their work by MAO et al., using Deep Belief Architectures (DBAs). Although both of these approaches are novel and interesting first steps, their application in constrained static environments naturally raises the question of how to efficiently scale and apply supervised learning to dynamic and large network topologies.

Mao et al. bring RL back on the table in [165]. Initially, the authors evaluate a supervised learning scheme through varying DNN architectures. By observing past Demand Matrices (DMs) the neural network learns to predict the DM which is then leveraged to calculate the optimal routing strategy for the next epoch. The evaluation results, however, show that supervised learning is not a suitable approach for dynamic settings. The authors therefore employed DRL and interchanged the prediction of DMs to learn a good mapping policy. Also, the design shifts to a more constrained setting, focusing on destination-based routing strategies. The agent’s reward is now based on max-link-utilization, and the algorithm of choice is Trust Region Policy Optimization (TRPO) [136]. As the large action space can cripple the learning process, the number of output parameters is reduced by shifting learning to per-edge weights. While this DRL-based mechanism yields better results, compared with the supervised solution, the proposed solution is not significant enough to alter the domain of routing as it is.

Reinforcement learning (RL) was reintroduced into the discussion by MAO et al. [165]. Initially, the authors evaluate by using different deep neural network (DNN) architectures for supervised learning evaluation. By observing past traffic matrices (DMs), the neural network learns to predict DMs, which are then used to compute optimal routing policies for the next epoch. However, evaluation results show that supervised learning is not suitable for dynamic environments. Therefore, the authors employ deep reinforcement learning (DRL) and exchange the predictions of DM to learn a good mapping policy. Additionally, the design moves to a more constrained environment, focusing on destination-based routing strategies. Now, the agent's reward is based on the maximum link utilization, and the chosen algorithm is Trust Region Policy Optimization (TRPO). Since a large action space may paralyze the learning process, the learning is transferred to each edge weight for output parameter reduction. While this DRL-based mechanism achieves better results relative to supervised solutions, the proposed solution is not significant enough to change the domain of routing.

To mitigate the risk of state space explosion and significant overhead of globally updating a sin-gle agent, a distributed, multi-agent approach is applied in [191]. In particular, You et al. pick up

where [15] left off. As a first step, they upgrade the original Q-Routing contribution by exchanging the Q-Table with a DNN, namely deep Q-Routing (DQR), but leaving the remainder of the algo-rithm pristine. In contrast to the semantics of the approach, the authors differentiate their proposal by specifying a multi-agent learning environment. As such, every network node holds its own agent, and each agent is able to make local decisions deriving from an individual routing policy.

To mitigate the risk of state space explosion and the significant overhead of updating a single agent globally, [191] employs a distributed, multi-agent approach. In particular, You et al. continued the work of [15]. As a first step, they upgraded the original Q-Routing contribution by exchanging the Q-Table with DNN (Deep Q-Routing, DQR), but kept the rest of the algorithm. Unlike method semantics, the authors differentiate their proposal by specifying a multi-agent learning environment. Therefore, each network node has its own agent, and each agent can make local decisions based on individual routing policies.

Limitations of the proposed ML and DL schemes to revolutionize the domain of routing led Varela et al. to argue that simply applying state-of-the-art algorithms and techniques is not suf-ficient when it comes to networking challenges [153]. Instead, Varela et al. shift their focus on feature engineering and further outline that a complete yet simple state representation might be the key to overcome the hurdles [153]. Furthermore, Varela et al. propose a DRL scheme that integrates telemetry information alongside path level statistics to provide a more accurate repre-sentation for the purpose of learning [153]. Reportedly, their proposed scheme integrates better to various network configurations.

Addressing the limitations of revolutionary ML and DL schemes in the routing field, Varela et al. argue that simply applying state-of-the-art algorithms and techniques is not enough to solve networking challenges [153]. Instead, Varela et al. focus on feature engineering and further state that a complete but simple state representation may be the key to overcoming obstacles [153]. Furthermore, Varela et al. propose a DRL scheme that combines sensor information with path-level statistics to provide more accurate learned representations [153]. It is claimed that their proposed scheme better adapts to various network configurations.

Other recent works also base their ideas on feature engineering and argue that achieving gen-eralization is the key to success [99, 134], especially when dealing with network dynamics. Rusek et al. take a supervised learning approach with GNNs introducing RouteNet that aims at gen-eralizing over any given network topology by making meaningful predictions on performance metrics [134]. Using these predictions, it is able to select a suitable routing scheme that abides by the environment constraints. Meanwhile, DRL-R contributes a novel combinatorial model that integrates several networks metrics to address shortcomings of existing DRL schemes [99].

其他近期工作也基于特征工程,并认为实现泛化是成功的关键 [99, 134],特别是在处理网络动态性时。Rusek 等人使用 GNNs 进行受监督学习,介绍了 RouteNet,旨在通过对性能指标进行有意义的预测来泛化任何给定的网络拓扑 [134]。使用这些预测,它能够选择符合环境约束的适当的路由方案。同时,DRL-R 提出了一种新的组合模型,将多个网络指标集成起来,以解决现有 DRL 方案的缺点 [99]。

7.3 Discussion on ML-based Approaches

While a lot of effort has been made on the fundamental challenge of routing optimization, we are confident to say that the task is far from complete and continues to remain an active research do-main. Purely from an ML perspective, we can obtain several useful insights for aspiring scholars and researchers. For one, while routing optimization fits naturally to RL-based approaches, this does not strictly bind that achieving optimal results is a matter of learning paradigm. We saw sev-eral works attempting to leverage supervised learning techniques for that matter. On the other hand, taking into consideration that most DL approaches initiated with supervised learning and then shifted to RL, evidence might suggest otherwise. Besides, we observe the great potential of applying distributed, multi-agent, DRL-based approaches, as they can effectively mitigate the risk of state and action space explosion and improve the generalization properties of the learned algo-rithms. These traits make them a better fit to address the routing problems in large and dynamic environments such as carrier networks.

Although much effort has been invested in the fundamental challenge of routing optimization, we are confident that the task is far from complete and remains an active research area. Purely from a machine learning perspective, we can shed some useful light on aspiring scholars and researchers. First, while routing optimization is a natural match for reinforcement learning-based methods, this does not strictly mean that achieving the best result is a matter of the learning paradigm. We have seen some work trying to solve this problem using supervised learning techniques. On the other hand, given that most deep learning methods start with supervised learning and then move to reinforcement learning, the evidence may suggest otherwise. Furthermore, we find great potential in applying distributed, multi-agent, deep learning-based methods, as they can effectively mitigate the risk of state and action space explosion and improve the generalization performance of learning algorithms. These features make them more suitable for solving large, dynamic environments, such as routing problems in carrier networks.

8 CONGESTION CONTROL

Congestion Control can be characterized as a remedy for crowded networks [66]. It concerns ac-tions that occur in event of network changes as a response to avoid collisions and thus network collapse. Network changes in this domain regularly refer to changes in the traffic pattern or config-uration, resulting in packet losses. A typical action to avoid collapse is for the sender to decrease its sending rate, e.g., through decreasing its congestion window. TCP, the de facto network transport protocol that the Internet relies on for decades, suffers from many limitations. With TCP being architecturally designed at 1980s, it is natural that the original specification contains network be-haviors observed at the time [178]. It is also the case that emerging ad-hoc and wireless networks are being hampered by the lack of flexibility inherited by TCP [9]. That being said, it has been shown that the congestion control scheme of TCP is often the root cause for degraded performance. Interesting literature has displayed the symptoms of current congestion control mechanisms, such as bufferbloat [48] and data-center incast [20].

Congestion control can be described as a cure for congested networks. It involves actions taken when the network changes in order to avoid collisions and thus avoid network crashes. In the realm of networking, network changes typically refer to changes in traffic patterns or configurations that result in packet loss. To prevent crashes, senders typically reduce the sending rate by reducing the size of the congestion window. TCP is the actual network transport protocol that the Internet has relied on for decades, but it has some limitations. Since TCP was architected in the 1980s, the original design incorporated network behavior observed at that time [178]. Furthermore, emerging ad hoc and wireless networks are hampered by TCP's inflexibility [9]. Nonetheless, it has been shown that TCP's congestion control scheme is often the root cause of performance degradation. Interesting literature exhibits symptoms of current congestion control mechanisms such as buffer bloat [48] and data center embedding [20].

Congestion control has drawn a lot of research attention in the past decades and is still a very active domain. Various techniques have been proposed, mainly relying on human expert designed heuris-tics. In particular, IETF has proposed a series of guidelines to aid network designers in meditating TCP’s innate drawbacks [75]. These mechanisms usually apply end-to-end techniques, altering TCP by tuning the congestion window rolling as a mean to achieve better performance. This is done based on a number of factors and often simplifying or constraining assumptions about net-work conditions. Placing significant approaches in a chronological order we display quite a long list of literature: Vegas [16], NewReno [61], FAST TCP [157], CUBIC [56], and BBR [17]. Other more recent approaches with a focus on subsets of congestion control such as short-flows or data-centers are found in [92, 119, 180, 193]. Extension of congestion control to multipath scenarios, i.e., multipath TCP [46], have been explored in various studies, notably [77] and [127]. As it is not feasible to cite all the related work, we refer the readers to [88] for further discussion about congestion control of TCP and its many different variants. 

Congestion control has attracted a lot of research attention over the past few decades and remains a very active field. Various methods have been proposed, mainly relying on the know-how devised by human experts. In particular, the IETF has proposed a series of guidelines to help network designers consider TCP's natural shortcomings [75]. These mechanisms usually use end-to-end technology to adjust TCP by adjusting the congestion window rolling to achieve better performance. This practice is based on a variety of factors and often simplifies or constrains network conditions. Listing the important methods in chronological order, we list a rather long list of literature: Vegas [16], NewReno [61], FAST TCP [157], CUBIC [56], and BBR [17]. Other more recent approaches, focusing on certain subsets of congestion control, such as short flows or data centers, can be found in [92, 119, 180, 193]. Extending congestion control to multipath scenarios, namely multipath TCP [46], has been explored in many studies, notably [77] and [127]. Since it is not possible to cite all related work, we refer the reader to [88] for further discussion of congestion control in TCP and its many variants.

In the congestion control scenario, ML is capable of setting clear and direct optimization objectives to eliminate the rather unknown goals that the current setting holds. Additionally, with ML we can generate online learning control algorithms that are able to adjust to constantly changing network conditions. To incrementally progress towards our goal, existing domain knowledge can be incorporated to arrive at better learning solutions.

In congestion control scenarios, machine learning has the ability to set explicit and straightforward optimization goals to eliminate relatively unknown goals in current settings. Furthermore, with machine learning, we can generate online learning control algorithms to adapt to changing network conditions. In order to gradually move towards our goal, existing domain knowledge can be leveraged for better learning solutions.

To the best of our knowledge, Remy [179] is the first to utilize ML for learning a congestion control algorithm. Remy formulates an offline learning environment, in a decentralized manner, similar to what we have seen already in Section 7.2 as POMDPs. Agents sit on the endpoints and every time step takes a decision between send and abstain. Remy is trained under millions of sampled network configurations, through which it is able to come along with the optimal control strategy within a few hours of training. RemyCC, the output control algorithm, is employed on the current TCP stack and the evaluation suggest that it is able to outperform several state-of-the-art solutions. However, Remy does not manage to escape the pitfall of underlying assumptions. The training samples that Remy builds upon place constraints on RemyCC due to assumed network configurations under which they are sampled. This limits Remy’s state-action space, and can heavily affect performance when those conditions are not met.

据我们所知,Remy 是最早利用机器学习来学习拥塞控制算法的研究之一。Remy 采用分布式方式制定离线学习环境,类似于我们在第 7.2 节中所见到的 POMDP。代理坐在端点处,每个时间步态都在发送和等待之间做出决策。Remy 在数百万个采样的网络配置下进行训练,通过训练可以在短时间内提出最佳的控制策略。RemyCC 是输出控制算法,当前应用于 TCP 栈中,评估表明它可以优于几种先进的解决方案。然而,Remy 未能摆脱基础假设的缺陷。Remy 基于训练样本建立起来的训练数据对 RemyCC 施加了限制,因为这些样本是假定的网络配置下采样得到的。这限制了 Remy 的状态 - 行动空间,当这些条件不满足时,性能会受到严重影响。

In contrast to Remy, PCC Allegro [34] attempts to tackle inherit limitations of predefined assumptions on the network by conducting micro-experiments. In each experiment, an appro-priate control action is taken based on which the learner optimizes towards a “high throughput, low loss” objective named utility. Then, Allegro learns empirically to make better decisions by adjusting to control actions that yield higher utility. Following Remy’s example, each sender in Allegro makes decisions locally based on the outcome of the micro-experiments [34]. Allegro scheme does not make assumptions about network configurations. This translates to sharplygreater performance in real-network scenarios. Despite the effort, Allegro’s convergence speed and towards-TCP-aggressiveness frame it as prohibitive for deployment [35].

与 Remy 不同,PCC Allegro 通过进行微实验来尝试解决预先定义的网络限制。在每次实验中,学习者基于适当的控制行动进行优化,以达成“高吞吐量、低损失”的目标,该目标被称为效用。然后,Allegro 通过调整产生更高效用的控制行动来经验性地学习做出更好的决策。仿照 Remy 的例子,每个 Allegro 发送者在当地根据微实验结果做出决策。Allegro 方案不对网络配置做出假设,这意味着在真实网络场景中,Allegro 的性能显著更好。尽管如此,Allegro 的收敛速度和对 TCP 的激进性使其在部署方面颇具争议。

In an attempt to eliminate Allegro’s limitations, Dong et al. introduce Vivace [35]. Vivace re-places the two key components of Allegro: (1) the utility function and (2) the learning algorithm of rate-control. For the utility function, Vivace integrates RTT estimations via linear regression to penalize for high latency and loss [35]. Through latent-aware utility, Dong et al. show that Vivace can achieve fairness while mitigating bufferbloat and remaining TCP-friendly. For the rate-control algorithm, Vivace employs gradient-ascent-based no-regret online optimization [199]. The no-regret part translates to a minimum guarantee towards performance. Further, Vivace’s rate-control scheme enables faster convergence and subsequently faster reaction to network changes [35]. Pantheon was initiated as a playground with a focus on congestion control, where researchers can benchmark their proposals with other state-of-the-art literature and evaluate performance on shared metrics and measurements [186]. Pantheon leverages transparency by keeping a public record of results. Besides the shared platform for knowledge, the authors introduced Indigo—an offline neural network-based approach to congestion control. Indigo utilizes an LSTM RNN [60] and trains it with an imitation learning algorithm called DAgger [133]. Indigo employs generated data from optimal solutions that display correct mappings from state to action. Based on this training, Indigo is able to adjust its congestion window once an ACK is received [186]. Indigo exhibits relatively comparable performance to other schemes in Pantheon’s platform.

为消除 Allegro 的限制,Dong 等人提出了 Vivace。Vivace 替换了 Allegro 的两个关键组件:(1) 效用函数和 (2) 速率控制学习算法。对于效用函数,Vivace 使用线性回归将 RTT 估计集成起来以惩罚高延迟和丢失。通过潜在意识效用,Dong 等人表明,Vivace 可以实现公平,同时减轻缓冲区膨胀并保持 TCP 友好。对于速率控制学习算法,Vivace 采用梯度上升基于无后悔 online 优化算法。无后悔部分转换为性能的最低保证。此外,Vivace 的速率控制方案实现更快的收敛和随后更快的网络变化反应。Pantheon 是一个旨在关注拥塞控制的游乐场,研究人员可以与其他最先进的文献比较其提议,并共享指标和测量来评价性能。Pantheon 通过保持结果的公共记录来强调透明度。除了共享知识平台之外,作者还介绍了 Indigo,一种基于离线神经网络的拥塞控制方法。Indigo 使用 LSTM RNN,并使用称为 DAgger 的仿生学习算法对其进行训练。Indigo 使用从最优解生成的数据,以正确的状态 - 行动映射为目标进行训练。基于这一训练,Indigo 能够在收到 ACK 时调整其拥塞窗口。Indigo 在 Pantheon 平台上的其他方案中表现出相当相当的性能。

Aurora employs DRL to extend Allegro and Vivace [71]. Similarly to these solutions, it controls the sending rate per time-step, but the learning scheme incorporates into its observed history both latency gradient and ratio from [34, 35] respectively, as well as sending ratio which is specified as the ratio of packets sent to packets acknowledged by the receiver. The reward setting praises throughput and penalizes latency and packet loss, with packet loss referring to packets that have not been acknowledged. A RL agent is trained with a utilization of an algorithm first introduced in Section 6.2, namely PPO. With a relatively simple neural network configuration, Aurora is able to generalize well to a good mapping of sending rates outside of its environment scope. This es-tablishes Aurora as a robust solution which can be applied to dynamic networks which exhibit unpredictable traffic conditions. In contrary to its robustness, Aurora is comparably similar in terms of performance to other state-of-the-art schemes.

Aurora uses DRL to extend Allegro and Vicace [71]. Similar to these solutions, it controls the sending rate at each time step, but the learning scheme incorporates its observed historical delay gradients and ratios, as well as the sending ratio (specified as the ratio of packets sent to packets received). The reward setting praises throughput and penalizes latency and packet loss, where packet loss refers to unacknowledged packets. RL agent trained using the algorithm (PPO) first introduced in Chapter 6. With a relatively simple neural network configuration, aurora generalizes well outside the range of its environment to achieve a good mapping of sending rates. This makes aurora a robust solution for dynamic networks with unpredictable traffic conditions. In contrast to its robustness, aurora is comparable to other state-of-the-art schemes in terms of performance.

Most recent studies tackle issues related to generalization and convergence speed. Specifically, a practical, hybrid approach is proposed in [1], which combines classic congestion control techniques with DRL techniques, improving the generalization toward unseen scenarios. The idea is to have two levels of controls: fine-grained control using classic TCP algorithms, e.g., BBR, to adjust the congestion window, and hence the sending rate, of a user, and coarse-grain control using DRL to calculate and enforce a new congestion window periodically, observing environment statistics. The proposed solution, therefore, has more predictable performance and better convergence properties, showing how learning from an expert, e.g., BBR algorithm, can improve the performance, in terms of convergence speed, adaptation to newly seen network conditions, and average throughput [41].

Recent studies have shown that issues involving generalization and convergence speed are addressed. Specifically, a practical hybrid approach is proposed in [1], which combines classical congestion control techniques with DRL techniques to improve generalization to unknown scenarios. The idea is two levels of control: fine-grained control using a classic TCP algorithm, such as BBR, to adjust the user's congestion window and thus the send rate; coarse-grained control using DRL to periodically compute and force a new Congestion window, and observe environment statistics. As a result, the proposed solution has more predictable performance and good convergence properties, showing how learning from experts (such as the BBR algorithm) improves convergence speed, adaptability to new network conditions, and average throughput [41].

Applying DRL to multipath scenarios is also getting a boost. Notably, a centralized solution is proposed in [184], with a single agent trained to perform congestion control for all the MPTCP flows in the network. Such centralized solutions, however, are not scalable, as they require a global view of all available resources and active MPTCP flows in the network. A distributed solution is proposed in [95], where multiple MPTCP agents, each running at a sender node, are learning a set of congestion rules that enable them to take appropriate actions observing the environment. The learning is performed in an asynchronous manner, where each node requires only local environ-ment, state, and information. Further, as state is defined in a continues, high-dimensional space,

 

tile coding methods are applied to discretize the state dimension, addressing the scalability issue. The proposed solution, however, relies on offline learning and hence has limited generalization capabilities. In contrast, an online convex optimization is explored in [50], which extends PCC to multipath settings, showing through theoretical analysis and experimental evaluation that the proposed online-learning solution is scalable and that it can significantly outperform traditional solutions and better adjust to the changes in the network conditions. However, no comparison among these DRL methods is provided.

There has also been a push to apply DRL to multipathing scenarios. Notably, a central solution is proposed in [184], where a single agent is trained to perform congestion control across all MPTCP flows in the network. However, such central solutions are not scalable because they require a global view of the resources available and active MPTCP flows across the entire network. In [95], a distributed solution is proposed where multiple MPTCP agents, each running on a sending node, learn a set of congestion rules in order to take appropriate actions based on environmental observations. Learning happens asynchronously, and each node only needs local environment and state information. Furthermore, since states are defined in a continuous, high-dimensional space, complex multipath scenarios can be better handled. The proposed solution relies on offline learning and thus has limited generalization ability. In contrast, [50] explored online convex optimization, extended the method to multipath settings, and demonstrated through theoretical and experimental evaluation that the proposed online learning solution is scalable and can significantly outperform traditional solutions, And better adapt to changes in network conditions. However, no comparison of these DRL methods has been performed.

Finally, DeePCCI [135] proposes a novel classification scheme for identification of congestion control variants using DL. The authors solely-regard the packet arrival time of a flow as their input arguing on the fact that congestion control is strongly associated with packet timings. Addition-ally, fewer features directly translate to the ease of adaptation and applicability to other congestion control schemes. The classifier is trained and evaluated against CUBIC, RENO, and BBR with gen-erated labeled data as necessitated for supervised learning.

Finally, DeePCCI [135] proposes a novel classification scheme utilizing deep learning (DL) for a variant of congestion control. The authors only take the packet arrival time of the flow as its input, and show that congestion control has a strong correlation with packet time. Furthermore, fewer features directly translate into easy adaptation and applicability to other congestion control schemes. Classifiers are trained and evaluated using generated labeled data to meet the requirements of supervised learning.

8.3 Discussion on ML-based Approaches

As one of the oldest and most established areas in networking, congestion control has attracted rich research attention, in both traditional and ML-based domains, with many articles published re-cently to adopt DL techniques. These studies show the capability of DL techniques, and specifically DRL techniques, to overcome the limitations of traditional solutions. By learning adaptive mech-anisms, these ML-based solutions are able to adjust to constantly changing network conditions and hence better utilize the resources. The majority of these solutions, however, focus on central-ized or off-line learning, with only very few tackling online learning in a distributed setting—the case for TCP. More work should be done in this direction. Besides, most of these studies focus on proposing solutions that outperform current mechanisms, without really changing the objectives and the goals.

As one of the oldest and most stable fields in network communication, congestion control has attracted rich research attention from both traditional and machine learning fields. Many recently published articles employ deep learning techniques. These studies demonstrate that deep learning techniques, especially deep reinforcement learning techniques, have the ability to overcome the limitations of traditional solutions. By learning adaptive mechanisms, these machine learning solutions can adapt to changing network conditions to better utilize resources. However, most of these solutions focus on central or offline learning, and only few studies deal with online learning in a distributed setting, such as TCP. More work should be done in this area. Furthermore, most of these studies focus on proposing solutions that perform better than current mechanisms without really changing the goals and objectives.

9 ADAPTIVE VIDEO STREAMING

As multimedia services such as video-on-demand and live streaming have witnessed tremendous growth in the past two decades, video delivery, nowadays, holds a dominant percentage of overall network traffic on the Internet [170]. Current video streaming services are mostly based on ABR, which splits a video into small chunks (a few seconds long) that are pre-encoded with various bitrates and streams each of the chunks with a suitable bitrate based on the real-time network condition. ABR is behind many mainstream HTTP-based video streaming protocols including Mi-crosoft Smooth Streaming (MSS), Apple’s HTTP Live Streaming (HLS), and more recently Dynamic Adaptive Streaming over HTTP (DASH) standardized by MPEG [143].

Video delivery has dominated the Internet over the past two decades with the rapid development of multimedia services such as video-on-demand and live streaming services. At present, most video streaming services are based on ABR, which divides the video into small pieces (usually a few seconds), and uses different bit rates for precoding, and selects the appropriate bit rate according to real-time network conditions to transmit each video. small pieces. ABR is behind many popular HTTP-based video streaming protocols, such as Microsoft's Smooth Streaming (MSS), Apple's HTTP Live Streaming (HLS), and most recently, Dynamic Adaptive Streaming over HTTP (DASH), standardized by MPEG.

An overview of adaptive video streaming is depicted in Figure 4. The bitrate selection for each chunk is dictated by an ABR algorithm running on the client side, which takes real-time throughput estimations and/or local buffer occupancy as input and optimizes for the quality of experience defined as a combination of metrics including re-buffering ratio (the percentage of time the video playback is stalled because of drained buffer), average bitrate, bitrate variability (to improve playback smoothness), and sometimes also the startup delay (the time spent between user clicking and the playback starts). Considering that the network status suffers from high dynamics, designing a good ABR algorithm is non-trivial. This has been confirmed in an early measurement study which shows significant inefficiencies of commercial and open-source ABR algorithms [3]. As a result, many new ideas have been explored to improve adaptive video streaming. Here, we focus on the advancements on client-side ABR algorithms.

Figure 4 provides an overview of adaptive video streaming. The bitrate selection for each tile is determined by an ABR algorithm running on the client, which accepts real-time throughput estimates or local buffer occupancy as input, and optimizes the quality of user experience, defined as including the rebuffering ratio (video playback due to A composite metric that includes % time stalled with buffer exhausted), average bitrate, bitrate variation (to improve playback smoothness), and sometimes start-up latency (how long it takes for playback to start after the user clicks). Considering that the network state is highly dynamic, it is not easy to design a good ABR algorithm. This has been confirmed by earlier measurement studies showing significant inefficiency of commercial and open source ABR algorithms [3]. Therefore, many new ideas have been explored to improve adaptive video streaming. Here, we focus on the progress of client-side ABR algorithms.

9.1 Traditional Approaches and Limitations

Early ABR algorithms can be generally categorized into two families: rate-based and buffer-based (including the hybrid ones). Rate-based ABR algorithms typically rely on estimating network throughput based on past chunk downloads information [72, 96, 171, 200]. The ABR algorithm then selects the highest possible bitrate that can be supported by the predicted network through-put. To reduce prediction variability, ABR algorithms usually smooth out the predictions. For exam-ple, FESTIVE uses the experienced throughputs of the past 20 samples to predict the throughput for the next chunk and adopts the harmonic mean to reduce the bias of outliers [72]. Noticing that the measured TCP throughput may not reflect precisely the available real network through-put, PANDA proposes a “probe and adapt” method, similar to TCP’s congestion control, but at the chunk granularity to stress test the real network throughput [96]. SQUAD takes running esti-mates for the network throughput which acknowledges the impact of the underlying TCP control loop and takes into account the time scale to improve smoothness and reliability. It then uses a spectrum-based adaptation algorithm for the bitrate selection [171].

Early ABR algorithms can generally be divided into two categories: rate-based and buffer-based (including hybrid algorithms). Rate-based ABR algorithms typically rely on network throughput estimates based on past block download information [72, 96, 171, 200]. The ABR algorithm then selects the highest bit rate that can be supported by the predicted network throughput. To reduce forecast uncertainty, the ABR algorithm usually smooths the forecast. For example, FESTIVE uses the experienced network throughput of the past 20 samples to predict the network throughput of the next block, and adopts the harmonic mean to reduce the influence of outliers [72]. Noting that measured TCP throughput may not accurately reflect available real network throughput, PANDA proposes a "probe and adapt" approach, similar to TCP's congestion control but implemented at a block granularity, to stress test real networks Throughput [96]. SQUAD employs a rolling estimate of network throughput, acknowledges the impact of the underlying TCP control loop, and considers timescales to improve smoothness and reliability. It then uses a spectrum-based adaptation algorithm for bitrate selection [171].

Buffer-based ABR algorithms leverage the buffer occupancy as an implicit feedback signal for bitrate adaptation [62, 160], often also in combination with throughout prediction [189]. Huang et al. advocate for a pure-buffer-based design, which incorporates bandwidth estimation whenever needed (e.g., during the session startup). They model the dynamic relationship between the buffer accuracy and bitrate selection and propose a buffer-based ABR algorithm called BBA [62]. Similarly, BOLA is solely buffer-based, where the bitrate adaptation problem is formulated as a utility maximization problem and solved by an online control algorithm based on Lyapunov optimization techniques [145]. Model predictive control (MPC) is a hybrid approach that integrates both the throughput and the buffer occupancy signals [50]. More specifically, MPC models bitrate selection as a stochastic optimal control problem with a moving look-ahead hori-zon and leverages MPC to perform bitrate selection. To reduce the high computation, FastMPC proposes to use a table enumeration approach instead of solving a complex optimization problem as in MPC. ABMA+ pre-computes a buffer map which defines the capacity of the playout buffer required under a given segment download condition to meet a predefined rebuffering threshold and uses the map to make bitrate adaptation decisions [10].

Buffer-based ABR algorithms exploit buffer occupancy as an implicit feedback signal for bitrate adaptation, often also combined with global prediction [189]. Huang et al. advocate a pure buffer-based design, i.e. incorporating bandwidth estimation whenever needed (e.g. at session start). They model the dynamic relationship between buffer precision and bit rate selection, and propose a buffer-based ABR algorithm named BBA [62]. Similarly, BOLA is a fully buffer-based algorithm that formulates the bitrate adaptation problem as a utility maximization problem and solves it using an online control algorithm based on Lyapunov optimization techniques [145]. Model predictive control is a hybrid approach that combines throughput and buffer occupancy signals [50]. Specifically, MPC models bitrate selection as a stochastic optimal control problem with a moving outlook window, and utilizes MPC for bitrate selection. To reduce the high computation, FastMPC proposes to use table enumeration method instead to solve complex optimization problems. ABMA+ precomputes a buffer map that defines the playback buffer capacity required to satisfy a pre-defined rebuffering threshold under certain segment download conditions, and uses this map for bitrate adaptation decisions [10].

Limitations of traditional approaches: ABR algorithms typically rely on accurate band-width estimation which is hard to achieve with simple heuristics or statistical methods. Also, ARB algorithms make adaptation decisions based on the bandwidth estimation with a complex relationship between the two, rendering simple heuristic approaches general to all scenarios ineffective.

Limitations of traditional approaches: ABR algorithms usually rely on accurate bandwidth estimation, which usually needs to be achieved through simple heuristics or statistical methods. Furthermore, the ARB algorithm makes adaptation decisions based on bandwidth estimates, and there is a complex relationship between the two, so simple heuristics are not effective for all scenarios.

As discussed above, ABR algorithms typically involve bandwidth estimation, a complex control logic, or both, where we can leverage existing ML methods: the bandwidth estimation problem can be treated as a general regression problem, while the control problem can be treated as a decision-making problem. In fact, existing work on applying ML in adaptive video streaming can be generally divided into these two lines. We will discuss these two lines separately.

The ABR algorithms discussed above usually involve bandwidth estimation and complex control logic, or both. We can leverage existing machine learning methods to address these challenges: the bandwidth estimation problem can be viewed as a general regression problem, and the control problem as a decision problem. In fact, the work that has applied machine learning in the field of adaptive video streaming can be divided into two categories. We will discuss these two types of work separately

The first research line aims at achieving better accuracy in bandwidth estimation. Bandwidth estimation is a general and well-studied problem that has its presence in many of the Internet applications [68, 69]. Existing ML-based approaches for bandwidth estimation mainly focus on using methods like Kalman filter [40] or neural networks [43, 78] to perform the prediction. In the context of adaptive video streaming, CS2P aims at improving bitrate adaptation by adopting data-driven approaches in throughput prediction [154]. The authors make two important observa-tions: (1) There are similarities in the throughput pattern across video streaming sessions. (2) The throughput variability within a video streaming session exhibits stateful nature. Based on the first observation, the authors cluster similar sessions and use the clustering result to predict the initial throughput of a session. Using the second observation, they propose a Hidden-Markov-Model-based method to explore the stateful nature of throughput variability to predict the throughput. The second line of research focuses on leveraging ML, RL in particular, techniques for adapta-tion decision making. By penalizing on detrimental factors that negatively affect the optimization objectives, RL can learn from expererience an optimal strategy for bitrate selection, and replace existing heuristic-based schemes that suffer in generalizing to dynamic networks. RL can exploit the low level system signals that are essential for ABR algorithm design but are hard to be modeled and considered due to the inherent complexity. Besides, RL-based approaches are typically more flexible and can be generalized to different network conditions.

The first line of research aims to achieve better bandwidth estimation accuracy. Bandwidth estimation is a pervasive and research-focused problem that exists in many Internet applications [68, 69]. Existing bandwidth estimation methods based on machine learning mainly use methods such as kalman filter [40] or neural network [43, 78] for prediction. In the context of adaptive video streaming, the CS2P approach aims to improve the accuracy of throughput prediction by employing a data-driven approach to improve bitrate adaptation [154]. The authors make two important observations: (1) There are similarities in throughput patterns between video streaming sessions. (2) Throughput variation inside a video streaming session exhibits state dependence. Based on the first observation, the authors cluster similar sessions and use the clustering results to predict the session's initial throughput. Based on the second observation, they propose a hidden Markov model-based approach to explore the state-dependent prediction of throughput changes. The second line of research focuses on the use of machine learning (especially reinforcement learning) techniques for adaptive decision-making. By penalizing adversary factors that influence the optimization objective, reinforcement learning can learn from experience the optimal bitrate selection policy and replace existing heuristic-based schemes that do not perform well in dealing with dynamic networks. Reinforcement learning can exploit low-level system signals that are critical to ABR algorithm design, but these signals are difficult to model and account for due to their inherent complexity. In addition, methods based on reinforcement learning are generally more flexible and can be adapted to different network conditions.

Claeys et al. try to replace used heuristics in the HTTP adaptive streaming client by an adaptive Q-learning-based algorithm [25]. Q-learning is a model-free RL algorithm that can make bitrate adaptation decisions by calculating the Q-value representing the quality (i.e., QoE) of decisions un-der varying network conditions. While showing promising results, this initial attempt is bounded by an explosive state-action space that burdens the convergence speed, resulting in slow responses to network variations. In a follow-up work [24], the authors apply several optimizations using a variant of Frequency Adjusted (FA) Q-learning. The new method alters Q-value calculation leading to quicker fitting in network fluctuations. In addition, the authors significantly reduce the environment state parameters and manage to achieve faster convergence.

Claeys et al. attempted to replace the empirical rules used in HTTP adaptive streaming clients with an adaptive Q-learning algorithm. Q-learning is a model-free reinforcement learning algorithm that makes bitrate adaptation decisions by computing Q-values ​​under changing network conditions. Although this initial attempt showed positive results, the method is limited by the explosion of the state-action space, which slows down the convergence and leads to slow responses to network changes. In a follow-up study [24], the authors used a variant of frequency-adjusted (FA) Q-learning with multiple optimization methods. The new method changes how the Q-value is calculated to more quickly adapt to network fluctuations. In addition, the authors greatly reduce the number of environment state parameters and achieve faster convergence.

Several studies formulate the bitrate adaptation problem as a Markov Decision Process (MDP) with a long-term reward defined as a combination of video quality, quality fluctuations, and re-buffering events [21, 47, 107, 198]. As one example, mDASH proposes a greedy algorithm to solve the MDP, which results in suboptimal adaptation decisions but is efficient and lightweight [198]. Chiariotti et al. [21] propose a learning-based approach leveraging RL to solve the MDP. To boost learning speed, the proposed learning approach utilizes Post-Decision States (PDSs) [111]. In combination with what is known as off-policy learning and softmax policy, the proposed approach is able to converge fast enough to react to network changes.

几项研究将比特率适应问题表述为马尔可夫决策过程 (MDP),其中长期奖励被定义为视频质量、质量波动和重新缓冲事件的组合 [21, 47, 107, 198]。例如,mDASH 提出一种贪心算法来解决 MDP,这种方法会导致次优适应决策,但高效、轻量级 [198]。Chiariotti 等人 [21] 提出了一种基于学习的方法,利用强化学习来解决 MDP。为了提高学习速度,该学习方法利用了后决策状态 (PDSs) [111]。结合所谓的离线学习和 softmax 策略,该方法能够足够快速地收敛,以应对网络变化。

To combat the innate limitations of Q-learning, D-DASH leverage DL where instead of enumerat-ing all the Q-values a DNN is used to approximate the Q-values [47]. Tradingoff performance with converge speed, D-Dash employed four variations of DNN architectures, and tested under several environments. Comparable to the state-of-the-art and fairly fast, D-Dash definitely sets the pace for more DL approaches in upcoming research. Another DL-based approach is called Pensieve [107], which differentiates from the herd by utilizing a DNN that is trained with state-of-the-art A3C algorithm [120]. Pensieve is trained offline in a multi-agent setting that speeds up learning on a plethora of network traces. Besides evaluations in a simulated environment, Pensieve is also tested against real network conditions. Pensieve is able to outperform baseline ABR algorithms on shared QoE metrics. Moreover, Pensieve’s ability to incorporate throughput history is a key aspect. The above list of ABR algorithms is far from complete. For a comprehensive survey of existing ABR algorithms please refer to [11]. Apart from designing new ABR algorithms, researchers have also explored how to tune a given ABR algorithm under various network conditions. For example, Oboe is such a method that auto-tunes ABR algorithms according to the stationarity of the network throughput [4]. In [168] the authors propose to leverage RL to configure the parameters in existing ABR algorithms, achieving better bandwidth awareness.

To address the natural limitation of Q-learning, D-DASH utilizes deep learning (DL), instead of enumerating all Q-values, a deep neural network (DNN) is used to approximate Q-values ​​[47]. Balancing performance with convergence speed, D-DASH uses four different DNN architectures and tests in multiple environments. The performance of D-DASH is comparable to the state-of-the-art and is quite fast, it will certainly set the pace for future DL methods. Another deep learning based approach is Pensieve [107], which performs classification differently from flocks by using a DNN trained with the A3C algorithm, one of the most advanced deep learning algorithms. Pensieve is trained offline in a multi-agent setting to speed up learning on a large number of network traces. In addition to the evaluation in simulated environments, Pensieve also conducts comparative tests with real network conditions. Pensieve outperforms the baseline ABR algorithm on shared QoE metrics, moreover, Pensieve's ability to incorporate throughput history is a key aspect. The ABR algorithms in the above list are far from complete. For a comprehensive overview of existing ABR algorithms, please refer to [11]. In addition to designing new ABR algorithms, the researchers also explored how to tune a given ABR algorithm under various network conditions. For example, Oboe is a method to automatically adjust the ABR algorithm to adapt to the static nature of network throughput [4]. In [168], the authors proposed to leverage RL to configure the parameters of existing ABR algorithms for better bandwidth awareness.

9.3 Discussion on ML-based Approaches

Adaptive video streaming is a well-formulated problem that has attracted tremendous research efforts in the past decade. Both traditional approaches based on heuristics or control theory and ML-based approaches have been explored. It is evident that DRL-based approaches have a great potential for bitrate adaptation due to its innate advantage of capturing complex variability in net-work bandwidth [107]. Choosing the right data set and allowing for enough training seem critical to the performance of DRL-based approaches, and it is unclear if solutions like Pensieve can be gen-eralized to any network environments with varying conditions. Overall, DRL-based approaches are more capable of capturing the network dynamics and the complex relationship between bandwidth estimation and adaptation decisions than traditional heuristics-based approaches. Yet, DRL-based approaches require a lot of data and resources to train an accurate model, which can be an intim-idating factor for many video streaming service providers. In such cases, traditional approaches based on control theory might be a better option.

Machine learning-based approaches hold great potential for adaptive video streaming. This approach has been extensively studied over the past decade. Traditional approaches are based on heuristics or control theory, while machine learning approaches are being explored. Clearly, methods based on deep reinforcement learning have great potential in capturing the complexity of network bandwidth, which is one of their natural strengths. Choosing the right dataset and allowing sufficient training is critical to the performance of deep reinforcement learning-based methods, but it is unclear whether a Pensieve-like solution can be applied to any network environment with varying conditions. Overall, deep reinforcement learning-based methods are better able to capture network dynamics and complex relationships between bandwidth estimation and adaptation decisions than traditional heuristic methods. However, methods based on deep reinforcement learning require large amounts of data and resources to train accurate models, which may be a deterrent factor for many video streaming service providers. In such cases, traditional methods based on control theory may be the best option.

10 DISCUSSION AND FUTURE DIRECTIONS

Despite all the praise and recent remarkable publications that regard the application of ML in computer systems, ML is not panacea and should not be treated as such. There are still challenges that lie ahead before deploying learned systems in real-world scenarios. So far, we have witnessed mostly the beneficial progress of reviewed applications and albeit this is sufficient to intrigue re-searchers to investigate more, we also ought to raise awareness when it comes to integration of ML techniques in complex systems. That being said, in this section, we aim at outlining and dis-cussing current known limitations in the literature we have reviewed, whilst, offering approaches that might assist in future research.

尽管人们对机器学习在计算机系统中的应用给予了高度评价,但机器学习并不是万能的,不应该被视为替代品。在部署机器学习系统之前,仍然存在许多挑战。到目前为止,我们只看到了评估应用有益的进展,尽管这足以引起研究人员进一步研究的兴趣,但我们也必须在复杂系统的集成中提高意识。因此,在本节中,我们将概述并讨论我们已经审查的文献中已知的当前限制,同时提供可能有助于未来的研究方法。

Explainability. Some recent studies [32, 197] try to shed light on the limitations of ML systems. Interestingly, both address ML-based solutions as “black-box approaches” and focus on the lack of interpretability of ML models. Especially in DNNs with multiple hidden layers, we cannot sufficiently capture, or better yet rationalize, the logic behind the decision-making process of these complex models and architectures. As further discussed in [197], this leads to several ambiguities and trust issues, especially when it comes to the process of debugging such complex structures. On the other hand, simpler and intuitive models are burdened with deteriorated performance and poor generalization [32]. Moreover, DNNs are subjective to unreliable predictions when input does not match the expectations assumed on training [197]. Specific research questions to explore include: (1) How to come up with clear guidelines (e.g., for determining DNN architectures and hyper-parameters) on the design of ML-based solutions? (2) How to open up the black box of DNNs and incorporate domain expertise in the decision-making process?

interpretability. Some recent studies [32,197] have attempted to reveal the limitations of machine learning systems. Interestingly, they both refer to machine learning-based solutions as “black-box approaches” and focus on the lack of interpretability of machine learning models. Especially in deep neural networks with multiple hidden layers, we cannot fully capture or better explain the logic behind the decision-making process of these complex models and architectures. Further discussion [197] shows that this leads to many ambiguity and trust issues, especially during debugging when dealing with these complex structures. On the other hand, simpler and intuitive models suffer from degraded performance and poor generalization ability [32]. Furthermore, deep neural networks are prone to untrustworthy predictions when the inputs do not match those expected during training. Specific research questions explored include: (1) How to formulate clear guidelines (e.g., determine deep neural network architecture and hyperparameters) for designing machine learning-based solutions? (2) How to open the black box of deep neural networks and integrate domain expertise into the decision-making process?

Training overhead. Besides explainability, training times and the associated costs are another significant drawback. For instance, as we have seen in [188], where each video is trained separately for efficient delivery, it requires approximately ten minutes of training per minute of video or in cost-wise, 0.23 dollar. Putting it into perspective, consider the well-known YouTube platform. YouTube, the de facto platform for video streaming is estimated to have around 500 hours of videos uploaded every minute [150]. In combination with training time, it would be impossible to handle the training for all videos with such tremendous growth. In terms of cost efficiency, consider that only a relatively small amount of videos from the total uploads will convert into revenue, hence, we can deduct that supporting such a large-scale content-aware delivery system is practically impossible. The following research questions would be interesting to explore: (1) Can we train the DNNs with just a small amount of data? (2) How to apply transfer learning to reuse DNNs in scenarios that are similar in structure but different in detail?

training overhead. Besides interpretability, training time and the cost associated with it is another significant disadvantage. For example, as we saw in [188], each video needs to be trained individually for efficient transfer, requiring roughly ten minutes of training per minute of video, or a cost of $0.23. To put this in context, consider the famous YouTube platform. YouTube is the de facto platform for video streaming, with an estimated 500 hours of video uploaded every minute [150]. Combined with the training time, it is impossible to handle training on all videos. In terms of cost efficiency, considering that only a relatively small fraction of uploaded videos is converted into revenue, we can therefore deduce that it is practically impossible to support such a large-scale content-aware delivery system. The following interesting research questions are worth exploring: (1) Can we train deep neural networks with small amounts of data? (2) How to apply transfer learning to reuse deep neural networks trained in scenes with similar structures but different details?

Lack of training data. Related to training times, lies also the concern of training data. Due to the fact that sufficient and effective training necessitates large volumes of data, many of the exist-ing studies rely on generated samples or existing datasets to train their models. For instance, both Wrangler [185] and CODA [196] train on datasets composed of traces that stem from production-level clusters. On the other hand, MSCN [79] generates training data from sampled queries that are based on “schema and data information”. This results into two major concerns. First, it has to be ensured that training samples are sufficient and cover the whole problem space [76]. Oth-erwise, we might be looking into a solution that does not generalize well in real-world scenarios. Second, as mentioned in [197], there are adversarial inputs that can be the root cause of degraded performance. The following research questions need to be answered: (1) Can we build a common training ground for DNN training without leaking sensitive data to the public? (2) How to achieve scalable incremental training so the system can learn on the fly over time?

Lack of training data. Issues related to training time also include issues with training data. Due to the large amount of data required for adequate and effective training, many existing studies rely on generated samples or existing datasets to train models. For example, Wrangler [185] and CODA [196] both train models on datasets consisting of production-grade cluster traces. On the other hand, MSCN [79] generates training data from queries based on “schema and data information” sampling. This leads to two main problems. First, it must be ensured that the training samples are sufficient and cover the entire problem space [76]. Otherwise, we risk wasting time on solutions that do not generalize well in real-world scenarios. Second, as described in [197], there are malicious inputs that can cause performance degradation. The following research questions need to be answered: (1) Can we build a common training ground for DNN training without leaking sensitive data to the public? (2) How to achieve scalable incremental training so that the system can learn online over time?

Energy efficiency. Training a large DNN could also has substantial environmental impacts. A recent study [152] shows that the carbon emission of training a BERT model on current GPUs is roughly equivalent to a trans-American flight. To reduce the carbon emission of such training, one can improve the hardware design or the algorithm, to reduce the complexity or training time. Harvesting green energy resources, such as solar cells or wind turbines, for training is another way to make such training greener and hence more sustainable. But this requires advances in distributed learning technologies, where the training of a large DNN can be distributed all over the network, and closer to the edges where green energy resources are available. Despite some advances in the field of distributed learning, e.g., by proposing FL [113], the green learning aspect of it has not yet been covered. We identify the following specific research questions: (1) How to design more energy-efficient ML methods (for both training and inference)? (2) How to use distributed learning techniques to avoid bulky energy consumption in central places?

能源效率。训练大型 DNN 也可能对环境产生重大影响。最近的一项研究表明 [152],使用当前 GPU 的 BERT 模型训练产生的碳排放大致相当于一次跨美国飞行。为了降低此类训练的碳排放,可以改进硬件设计和算法,以减少训练的复杂性或时间。训练绿色能源资源,如太阳能电池或风力涡轮机,也是一种使此类训练更加可持续的方法。但这需要分布式学习技术的进步,以便将大型 DNN 的训练分布在网络的各个角落,并更接近可用绿色能源资源的边缘。尽管在分布式学习领域已经取得了一些进展,例如提出了 FL [113],但其绿色的方面仍未得到覆盖。我们确定了以下具体的研究问题:(1) 如何设计更加能源高效的机器学习方法 (包括训练和推理)?(2) 如何使用分布式学习技术以避免中心地方庞大能源消耗?

Real-time performance. A final implication of ML approaches is that of inference time. Real-time applications require fast inference of ML models, as in the case of autonomous driving, where hardware accelerators are employed so the system can make safety-critical decisions locally [98]. This hurdle composes also the main idea and contribution of [176], where supervised learning is applied to enhance vision analytics tasks on cloud operations. Expanding to other domains where speed is crucial, e.g., routing and congestion control, we can understand how slow inference of models, especially in DNNs, can reduce the performance gains significantly. This can add up to an overall comparable performance with traditional solutions, in which an operator will prefer the tra-ditional solution for simplicity and better understanding [197]. Specific research questions include: (1) How to make ML-based methods lightweight when applied in the critical path of computer sys-tems and networks? (2) Can we use a lightweight method on the critical path while leveraging ML-based methods on the non-critical path?

real-time performance. The final impact of machine learning methods is inference time. Real-time applications require fast inference of machine learning models, such as autonomous driving systems, where hardware accelerators are used to ensure that the system can make safety-critical decisions locally [98]. This also became the main idea and contribution of [176], where supervised learning was applied to enhance visual analysis tasks in cloud computing. Extending to other critical domains, such as routing and congestion control, we understand that slow inference of models, especially in DNNs, can significantly reduce performance gains. This can lead to an overall performance comparable to conventional solutions, which operators prefer for their simplicity and better understanding [197]. Specific research questions include: (1) How to make machine learning-based methods lightweight when applied to critical paths of computer systems and networks? (2) Is it possible to use a lightweight approach on the critical path while taking advantage of the non-critical path of the machine learning based approach?

Taking all aforementioned limitations into account, it is safe to suggest that ML for computer systems and networking is still at an infant stage. Even though we have witnessed remarkable progress, we need to tread lightly when it comes to deploying systems that heavily depend their operation on ML approaches, due to the fact that we might end up with a sub-optimal solution, compared to the one we are seeking to alleviate. While it is unclear how to integrate existing domain knowledge [32], it is argued that this is the key to overcome current drawbacks [197]. Do-ing so, the authors suggest that commons limitations, and in particular transferability, robustness and training overhead will be essentially mitigated. Furthermore, a recent work from Kazak et al. proposes a novel system to verify DRL systems [76]. Verily, the aforementioned system is a first step towards formal verification of DRL models. The author’s contribution aims at verifying that learned systems deliver what they advocate for. Verily has been evaluated already on Pensive [107], DeepRM [106], and Aurora [71], and constitutes a first step towards alleviating current limitations.

Given the above limitations, it is safe to assume that machine learning in computer systems and networks is still in its infancy. Despite our remarkable progress, we need to be careful when deploying systems that rely heavily on machine learning methods to operate, as we may arrive at solutions that are less optimized than the problems we seek to alleviate. Although it is unclear how to integrate existing domain knowledge [32], it has been suggested that this is the key to overcoming current shortcomings [197]. In doing so, the authors suggest that common limitations, notably portability, robustness, and training overhead, are substantially alleviated. Furthermore, recent work by Kazak et al. proposes a new method for validating deep reinforcement learning [76]. Clearly, the above system is the first step in the formal validation of deep reinforcement learning models. The authors' contributions aim to verify that learning systems achieve what they advocate. Verily has already been evaluated on platforms such as Pensive, DeepRM, and Aurora, and constitutes a first step towards alleviating current limitations.

11 SUMMARY

In this survey, we summarize research work regarding ML on computer systems and networking. Whether it concerns achievements or comes with limitations, we attempt to present an overall picture that exhibits current progress and exploits how ML can blend in various contexts and set-tings, and above all, motivate researchers to conceptualize and utilize ML in their field of research. We first formulate a taxonomy divided into distinct areas and sub-areas of expertise, in a quest to familiarize the reader with the greater picture. Eventually, we discuss each sub-domain separately. Per se, we formulate a short description of the problem and present the traditional approaches up to date and discuss the limitations of the traditional approaches. Then, we proceed to present the state-of-the-art of ML-based approaches and analyze significance of their results, limitations, and how far we are from an actual implementation in systems. We conclude the survey by discussing future directions to explore.

In this survey, we summarize machine learning research efforts in computer systems and networks. Whether in terms of achievements or limitations involved, we have attempted to present a holistic picture that demonstrates current progress and how machine learning is being incorporated in various contexts and settings, and most importantly, inspires researchers to conceptualize and exploit machine learning. We begin by developing a taxonomy that breaks it down into different areas of expertise and subfields to help readers understand the bigger picture. Finally, we discuss each subfield separately. In itself, we pose a problem and introduce state-of-the-art versions of traditional methods and discuss their limitations. We then present the current state of machine learning-based methods and analyze their results, limitations, and how far we are from implementing them in systems. Finally, we discuss directions for future exploration.

Guess you like

Origin blog.csdn.net/weixin_40293999/article/details/129778561