Scalable Access Control for Operating Systems

Access control is the cornerstone of operating system security, and current operating systems have deployed many access control models: Unix and Windows NT multi-user security; type enforcement in SELinux; anti-malware products; Apple OS X, Apple iOS, and Google Android The application sandbox in ; and application-oriented systems such as Capsicum in FreeBSD, etc. This diversity is an astonishing result.

The essence is localized security, adapting the operating system security model to local or product-specific requirements. This shift is driven by three changes: ubiquitous Internet connectivity; the migration from dedicated embedded operating systems to general-purpose operating systems in search of more complex software stacks; and the shift from multi-user computing to single-user devices and Common use of complex application models. This shift is supported by extensible access control frameworks that allow operating system kernels to more easily adapt to new security requirements.

One such extensible kernel access control framework is TrustedBSD's Mandatory Access Control (MAC) framework, which began in 2000 and was released in 2003 on the open source FreeBSD operating system.

1. Extensible access control in operating system design

Embedded and mobile operating systems have changed dramatically over the past 20+ years: devices already have the CPU power to run a general-purpose operating system, and are placed in a network environment, with mature software stacks, as well as third-party applications, that are also exposed to under malicious activity. Vendors build on top of existing open source operating systems, rather than starting from scratch. This provides mature application frameworks and complex networking stacks, both areas of weakness for "embedded operating systems".

This trend materialized in 2007, when Google's Android, based on Linux, and Apple's iOS, based in part on Mach and FreeBSD, transformed the smartphone market.

All of these environments focus on security and reliability, and as third-party applications are deployed across various systems, sandboxing becomes critical, firstly to prevent bricking and secondly to limit malware. Consequently, the role of operating system security has shifted from protecting multiple users to protecting individual users or users from untrusted applications. Embedded devices, mobile phones and tablets are now meeting points: Many different stakeholders such as consumers, phone providers, application authors and online services must coordinate with the help of another operating system in space and time.

Operating system developers must meet device vendors' demands, which range from hardening routers and firewalls to sandboxing mobile apps. Operating system vendors have observed the dilemma of the historical "trusted operating system", whose mandatory access control schemes are problematic in terms of availability, performance, maintainability, and most importantly, end-user needs. Likewise, the viability of many promising new security models is uncertain, suggesting that no single access control model can meet all needs.

This practical reality of localized security directly drives scalable access control. Over the past 20 years, the need for a self-contained, non-bypassable, verifiable access control center has become clear. In the early 1990s, this concept has been combined with the concept of "encapsulation", appeared in the "general access control framework", and "rule set-based access control" and "Flask security architecture" and so on. It wasn't until the early 2000s that mainstream OS vendors adopted these approaches, such as FreeBSD's MAC framework and Linux Security Modules (LSM). In these cases, a key issue is supporting a third-party security model that does not commit to fixed policies like earlier trusted systems.

2. MAC framework

Proposed in 1999, the MAC framework appeared in FreeBSD 5.0 in 2003 as an "experimental feature", excluded from compilation by default, but available to early adopters. FreeBSD 8.0 in 2009 compiled the framework into the default kernel as a "production feature".

The MAC framework provides a logical solution for enhancing the access control of the kernel. Extending the infrastructure can represent many different strategies, provides better maintainability, and is supported by operating system vendors. Similar to device drivers and virtual file system (VFS) modules, policies are compiled into the kernel or loadable modules and implement well-defined kernel programming interfaces. Policies can enhance access control decisions and leverage common infrastructure such as object tagging to avoid direct kernel modification and code duplication. They are able to enforce access control on a wide range of object types, from files to network interfaces, and integrate with the kernel's concurrency model.

2.1 Mandatory Policy

MAC is a security model in which policy enforcement limits the interaction of all system users. Unlike discretionary access control schemes, such as filesystem access control lists (ACLs), which allow object owners to secure (or share) objects themselves, MAC enforces system-wide security invariants regardless of user preferences.

Early mandatory policies focused on information flow, requiring implementation in the kernel. Multi-level security protects security by marking users and data confidentiality, restricting flow. The Biba integrity policy is the logical dual of MLS, protecting its integrity. These models maintain security labels for subjects and objects, hold confidentiality or integrity information, and control actions that may cause information to be upgraded or downgraded. SRI's provably secure operating system design includes enforcing object types, complementing capability protection. This evolved into type enforcement and domain and type enforcement, these models were influential, and type enforcement was deployed in SELinux and McAfee's firewalls. Both models are flexible and fine-grained, labeling subjects and objects with symbolic domains and types . Administrators control rule authorization and interact and translate between domains. In addition, there is a class of product-specific hardening strategies that take a less principled approach and directly control services rather than abstract models.

Before scalable access control, there are the following technologies:

Direct kernel modification is used on most trusted systems, whether extended by the operating system vendor or by third parties. Tracking upstream OS development is problematic: extensions cannot rely on public and more stable APIs and KPIs, especially the Application Binary Interface (ABI) and Kernel Binary Interface (KBI). Upstream changes often trigger design and source code conflicts with security extensions. Safeguards also suffer, since the burden of proving correctness falls squarely on the extension author.
System call blocking is widely used in antivirus systems and has been used in security extension products and systems in the past. Kernel concurrency is a particular challenge, with exploitable race conditions prone to arise between the wrapper and the kernel.

2.2 Design principles of the MAC framework

The MAC framework’s dual goals of extensible access control and encouraging the participation of upstream and downstream suppliers inspired several design principles:

Don't commit to a specific access control policy. There is no consensus on a single policy or even a policy language; instead, the policy model is captured in C code.

Avoid policy-specific kernel intrusions. Encapsulate internals behind policy-agnostic interfaces. This leads naturally to object-based design, especially with respect to access control checks on principals, objects, and methods.

Provides policy-agnostic infrastructure. This addresses common requirements beyond access control, such as tagging and tracking.

Multiple concurrently loaded strategies are supported. In this way different aspects of the strategy can be expressed independently, possibly from different suppliers. Composition must be predictable, deterministic, and ideally rational.

Enforcement helps to ensure the structure of the argument. This can be achieved by referencing monitor separation strategies and mechanisms and by well-defined KPI semantics (e.g. locking).

Design a concurrent kernel. Policies must not only behave correctly, but also match the functionality they protect.

2.3 Architecture of the MAC framework

As shown in the figure below, the MAC framework is a thin layer that links kernel services, policies, and security-aware applications. Control is passed from kernel consumers to frameworks and policies through about 250 entry points (object types x methods):

Kernel service entry points allow subsystems (such as VFS) to use the reference monitor framework in correlation events and access control.
Policy entry points connect frameworks and policies, adding explicit label parameters relative to the corresponding kernel service entry point. They are complemented by policy lifecycle events and library functions. Policies only need to implement the entry points they need.
Applications manage tags (such as processes and files) using the Tag Management API .
DTrace probes allow tracing and profiling of entry points.

Collectively, these interfaces allow policies to enforce kernel access control in a maintainable manner.

2.3.1 Entry point call

To understand how these layers interact, individual file write checks can be traced through the kernel.

The code below shows vn write, a VFS function that implements the write and writev system calls. The mac vnode check write kernel service entry point authorizes writing to the vnode (vp) through two principal credentials (fp->f cred and active cred that initiates the write operation ).

static int
vn_write(struct file *fp, struct uio *uio,
struct ucred *active_cred, int flags,
struct thread *td)
{
...
vn_lock(vp, lock_flags | LK_RETRY);
...
#ifdef MAC
error = mac_vnode_check_write(active_cred,
fp->f_cred, vp);
if (error == 0)
#endif
error = VOP_WRITE(vp, uio, ioflag,
fp->f_cred);
...
VOP_UNLOCK(vp, 0);
...
return (error);
}

Policies can implement either capability semantics or revocation semantics .

The vnode lock (vp->v_lock) is held between check and use, protecting tag state and preventing time race conditions from check to use.

The parameters excluded from the entry point are as important as the parameters included. For example, vn_write's data pointer (uio) is omitted because these data reside in user memory and cannot be accessed without risk under write-related race conditions. Similar design choices in the framework prevent unsafe behaviors that cannot be safely represented by the kernel synchronization model.

Where possible, it is preferable to implement marker objects from the point of view of kernel subsystems, and policy can be enforced through control over method calls. This approach fits naturally with the object-oriented structure of the kernel. Once the object is determined, care needs to be taken in placing the entry point: the finer the KPI granularity, the easier the policy can be expressed, at the cost of policy complexity. The fewer call entry points, the easier it is to verify; however, too few can lead to insufficient protection. The entry point design must also balance placing the checks deep enough to allow knowledge of the object type while minimizing the execution points for a particular level of abstraction.

2.3.2 Kernel Object Tags

Many access control policies tag subjects and objects to support access control decisions such as integrity or confidentiality levels. The MAC framework provides policy-independent marking facilities for kernel objects, marking management system calls, and persistent storage of file markings. Policy controls markup semantics, not only storage bytes, but also memory model. For example, a policy might store per-instance, reference count, or global data.

The framework uses struct labels to represent label storage, which is opaque to kernel services and policies. In this example, Biba assigns low integrity to newly created sockets, inheriting that property from a low process. However, the partitioning strategy only marks processes and not objects. If the object type supports a metadata scheme (such as an mbuf tag holding per-packet metadata), that is used; otherwise, a tag pointer is added to a core structure (such as a vnode). Strategies can borrow existing object locks to protect tag data if the synchronization model supports it.

3. Those products based on the MAC framework

The following table describes the policies in commercial or open source products derived from FreeBSD. A number of factors contributed to the success of this transition:

The need for new access controls is imminent. The classic Unix model cannot meet the needs of ISPs, firewalls and smartphones. At the same time, the threat of attack is ubiquitous with widespread networking and strong interest incentives.
The structural argument for the frame is correct. The scalability of access control is the preferred way to support security localization, adapting to various needs.
No single policy model became dominant. Therefore, many models must be supported.
Increased hardware performance increases tolerance for security overhead. This is true even in consumer products and embedded devices.
The open source technology transition works. FreeBSD not only provides a forum for collaborative research and development, but also provides access to commercial products.

The framework has grown considerably since 2003 with contributions from multiple companies.

3.1 FREEBSD

FreeBSD is an open source operating system for building online services, appliances, and embedded devices. FreeBSD or its components can be found in data centers, integrated products, and embedded/mobile devices (Juniper switches and Apple iPhones). Its origins can be traced back to the Berkeley Software Distribution (BSD), developed in the 1970s and 1980s. BSD has its roots in many centralized Unix technologies, including the Fast File System (FFS) and the Berkeley TCP/IP stack and sockets API. The BSD license and its variants (MIT, CMU, ISC, Apache) encourage technology transfer by allowing unrestricted commercial use. FreeBSD's diverse consumers are perfect targets for localized security.

The MAC framework is a complex piece of software. Although the framework itself has only 8,500 lines of code and the reference strategy has only 15,000 lines, it is integrated with a kernel of several million lines. Moving to production is dependent on several factors, including increased trust in coordination and community feedback on design, compatibility, and performance. First released as part of FreeBSD 5.0, the framework is marked "experimental", which has several implications:

Enabling it requires recompiling the kernel.
The documentation flags it as potentially incomplete, unstable, or unsafe, and as such is not supported.
Programmatic and binary interfaces (API, KPI, ABI, and KBI) stability are abandoned, allowing changes without formal deprecation.

Incorporating this framework is critical to gaining users who can help validate and improve methods, while retaining the flexibility to make changes. Before the framework is production ready, two issues need to be resolved:

There must be a better understanding of the binary compatibility impact of kernel, policy, and other modules.
Performance must be analyzed and optimized based on community reviews.

3.1.1 Resilience of Performance Indicators and Key Business Indicators

FreeBSD stipulates that certain kernel modules compiled against a certain release must work with subsequent minor releases of the same series. The goal is to avoid breaking the consumer subsystem's kernel binary interface and provide similar binary compatibility for policy modules. Label storage opacity for subsystems and policies is a major area of improvement, this avoids encoding details in kernel data structures into policies and provides flexibility to change labels if they only require label access.

3.1.2 Performance optimization

Many FreeBSD deployments are very performance sensitive and require minimal overhead, especially if the framework is disabled. Since sites choose policies based on local security performance trade-offs, it is desirable that policies incur only the performance penalty of the functionality they actually use. However, in FreeBSD 5.0, measurement results are measurable, which is a barrier to enabling frameworks by default.

3.1.3 Label assignment tradeoffs

Even when compiled from the framework, the bloat of adding tags to kernel data structures (especially packet mbufs) incurs a significant allocation time cost. In FreeBSD 5.1, inline mbuf tags are replaced with pointers, which reduces the cost for non-MAC cores but increases the cost of allocation and overhead for MAC-enabled cores.

Label assignment is more measurable when the framework is enabled and is unnecessary for unlabeled policies. This effect is most noticeable in network packets, and a per-policy flag was introduced in FreeBSD 5.1 to request packet labels. In 8.0, this approach is generalized so that labels are only assigned to object types for which at least one loaded policy defines an initialization entry point. This effectively removes the marking cost when no policy is needed, restores the performance scale, and satisfies the general case. However, McAfee Sidewinder Firewall, a commercial product that uses packet marking, sees enough overhead to bypass the marking abstraction in favor of a direct structural modification approach.

3.1.4 Minimize synchronization overhead

In compiled frameworks, lock-protected reference-counted operations at entry point invocation are very easy to measure for frequent operations such as delivery checks per packet. As multi-core hardware becomes more common, lock contention becomes significant.

Starting with FreeBSD 5.2, policies are divided into static and dynamic sets to help fix the configuration of embedded systems. The former are compiled or loaded at compile or boot time, and can be unloaded thereafter, so no synchronization is required. Loading or unloading dynamic policies after boot still requires multiple lock operations.

In FreeBSD 8.0, synchronization has been further optimized so that the MAC framework can run in the default kernel. This work benefits from continued improvements in kernel scalability, specifically "read more writes less locks", which does not trigger cache line migration during read-only fetches at the expense of more expensive exclusive fetches, but is ideal for infrequent changes list of strategies.

3.2 MacOS and iOS

Apple released the desktop version of Leopard in 2007, and the iPhone OS 2 version for iPhone and iPod Touch in 2008, which uses the MAC framework as a reference monitoring framework. OS X Snow Leopard comes with three MAC policies:

sandbox. Provides a policy-oriented sandbox for risky components that handle untrusted data, such as web services and video codecs.
isolation. For downloaded files, support showing the user dialog of the starting website.
Time machine security. Protect the integrity of Time Machine backups.

In OS X Mountain Lion, apps distributed through Apple's app store must be sandboxed mandatory. Apple's iOS 2.0 comes with two policies: the sandbox and an additional policy. Apple Mobile File Integrity (AMFI). Used with code signing tools to terminate applications whose digital signatures are devalidated at runtime; relieve debugging during application development.

Together, these policies support system integrity and provide strong isolation between applications to protect the privacy of data. Both OS X and iOS deviate significantly from the design expectations of the MAC framework, making significant adaptations.

3.2.1 XNU Prototype

Apple began testing beta versions of OS X in 2000, and the promise of a commercial desktop operating system with an open source kernel was hard to ignore. The XNU kernel is a complex amalgamation of Carnegie Mellon University's Mach microkernel, FreeBSD 5.0, some newer FreeBSD elements, and numerous features developed by Apple. With these foundations, MAC framework methods and even code are reusable.

Although not a microkernel, XNU adopts many elements of Mach, including its scheduler, interprocess communication model, and VM system. FreeBSD's process model, IPC, network stack, and VFS are grafted onto Mach, providing a rich Posix programming model. Apple-developed kernel components in the first version of OS X included the I/O Kit device driver framework, networking kernel extensions, and the HFS+ file system; the list has grown over time. From 2003 to 2007, the increasingly mature MAC framework was ported to OS X.

3.2.2 Adapt to MacOS

The MAC framework requires a detailed analysis of the FreeBSD kernel and is tightly integrated with low-level memory management and synchronization, as well as higher-level services such as the file system, IPC, and network stack. While adapting to OS X could rely heavily on the FreeBSD components used by Apple, a fundamental change was required to reflect the differences between FreeBSD and XNU.

The first step is to integrate the MAC framework with the tightly aligned BSD process model, file system and network stack. High-level architectural alignment facilitates some adaptations, but also encounters some discrepancies. For example, FreeBSD's Unix file system considers directories to be specialized file objects, while HFS+ considers directories and object attribute structures or disk directories to be first-class objects. This requires changes to the framework and XNU.

Next, the coverage is extended to include Mach tasks and IPC. Each XNU process links Mach tasks (schedulers, VMs) with FreeBSD processes, raising the question: is the MAC framework part of Mach or BSD? While architecturally useful, the Mach-BSD boundary in XNU is artificial, with references often crossing layers, requiring the MAC framework to serve both. Label modifications on BSD process labels are mapped to corresponding Mach task labels.

Mach ports are another case where microkernels conflict with the monokernel premise of the MAC framework. Unlike BSD IPC objects whose namespace is managed by the kernel, Mach ports rely on userspace namespaces managed by launchd (e.g. for desktop IPC). A userspace label handle abstraction similar to the kernel label structure is used for this purpose.

3.2.3 Apple’s adoption method

Apple is one of the world's largest suppliers of desktop systems and one of the first to deploy Unix-like systems in smartphones. Explosive use cases and new security requirements are also being encountered due to pervasive networking and the presence of malicious actors. However, Apple's adoption of the MAC framework is uncertain, as competing technologies are also considered, which are subject to similar observations, future product directions, performance issues. Alternatives include techniques based on syscall insertion, and Apple's Kauth3 (Kernel Authorization), an authorization framework for antivirus vendors. Apple found the debate about the unreliability of syscall insertion and ended up using two technologies: Kauth for third-party antivirus vendors, and the more expressive and capable MAC framework for its own sandbox technology.

Sandbox Policy Since Apple's OS X and iOS policy modules are not open source, their implementation cannot be considered, but reference documentation exists for Sandbox policies used by Mac OS components and third-party applications such as Google's Chrome web browser . Sandboxes allow applications to voluntarily restrict their access to resources (such as file systems, IPC namespaces, and networks). Process sandbox configuration files are stored in process tags.

Bytecode compilation policies can be set through the public API or the sandbox-exec helper program. Applications can choose from several Apple-defined policies (listed below), or define custom policies. Several applications use default policies, such as video codecs, which use configuration files to limit IPC with the host process.

The common.sb configuration file used by Chrome illustrates the key Sandbox constructs: coarse control of the sysctl kernel management interface and shared memory, and fine-grained regular expression matching of file paths. The file path-based control is the highlight of the Sandbox strategy, which solves the program model better than the file tags in Biba, MLS and TE.

Path-based schemes are difficult to implement on the VFS model, while FreeBSD allows files to have one or more names (hard links), HFS+ implements parent pointers for files and ensures that the name cache always contains the explicit path to the file in use as required Information.

While Sandbox works with many Mac OS services, many third-party applications include a strong assumption of ambient permissions , that is, the ability to access any object on the system. With the iPhone, Apple broke the assumption that apps execute in isolation from system services and each other. This model, now present in OS X, can likewise help protect device integrity against malicious behavior by applications and increasingly end-user data.

performance optimization

Before FreeBSD 8.0's performance optimizations, OS X and iOS used the MAC Framework, requiring Apple to do its own optimizations based on product-specific constraints.

Similar to FreeBSD optimizations, these optimizations usually focus on framework entry and label overhead. By default, for some object types, labels are compiled away in the kernel; for other types, such as vnodes, policies may optionally request label allocation to accommodate the sparse use of labels in macOS policies.

In FreeBSD, frame and synchronization optimizations depend on being willing to pay for additional access control extensions. In Mac OS, it is assumed that most machines are sandboxed, but only selectively applied to high-risk processes. To this end, each process carries a mask set by policy indicating which object types need to be executed. With the popularization of sandbox technology, as in the case of iOS, the application is optimized more globally.

4. Thoughts on Extensible Access Control

The MAC framework has been the basis for many localized security instances, allowing local access control policies to be combined with the still popular discretionary access control model. Open source deployment is a winning strategy, providing a forum for collaborative improvements, access to early adopters, and a path to a wide range of products.

Industry adoption of an open source foundation for appliances and embedded devices is well established:

Security localization in devices is already widely used.
The importance of multiprocessing has only increased.
Security label abstractions have proven beyond their MAC roots.
The non-consensus about access control policies continues.

However, the MAC framework still needs to be improved and extended to solve several unanticipated problems:

Revisiting the structure of operating system privileges.
The importance of digital signatures when applying access controls to third-party applications.
Tensions about name-based and label-based access control.

4.1 New Design Principles

Considering the extensive practice with the MAC framework, there are several new design principles as follows:

The owner of the policy determines its own performance, functionality, and guarantee tradeoffs . Policies may not require heavyweight infrastructure such as tags, thus providing better performance.
Traceability is a key design issue.
The stability of the programming and binary interface is critical. Sustainability of APIs, ABIs, KPIs, and KBIs is often overlooked because prototypes are often one-offs with no further support obligations.
Controlling operating system privileges is important to a strategy that augments rather than complements DAC.
The application author is the main responsible person. Both Apple's App Store and Juniper's SDK use application signatures and certificates as policy inputs.
The applications themselves require flexible access controls to support application isolation.

4.2 Domain-Specific Policy Model

Why is there no consensus on the expression of operating system policy? Clearly, proponents of the strategy model believe that it captures key issues in system design. First, policy models aim to capture their key issues in different forms according to the principle of least privilege (such as information flow versus system privilege), making their approaches complementary; second, different models address different domain issues in multidimensional trade-offs, including Expressive types, guarantees, performance, management complexity, implementation complexity, compatibility, and maintainability. This reflects a consensus on domain-specific policy models.

4.3 The value of scalability

Need to make a major design enhancement? Does this confirm or disprove the assumption of scalability of access control? It seems appropriate to draw further comparisons with similar frameworks such as VFS and Device Drivers, both of which are frequently extended to accommodate new requirements, such as changes in distributed file systems or improvements in power management. Managing upstream-downstream relationships of important source code repositories is a strong motivating factor. The deployment of the MAC framework seems to confirm the more general argument that access control extensibility is a key aspect of contemporary operating system design.

5. Summary

By understanding the background and challenges of access control scalability and framework design, several products are observed in practice in deploying security strategies, including FreeBSD, Juniper's Junos, and Apple's OS X and iOS. While access control extensibility is key to these projects, they also introduce considerable changes to the framework itself, and a concluding attempt discusses how the framework meets the requirements of each product, as well as the continued evolution of operating system security.

【Associated reading】