Detailed explanation of cgroup abstraction layer in Android

Source code based on: Android R

0. Preface

In the previous blog post "Principles of App Freezer in Android" , we saw that the enable, freeze, and unfreeze of the freezer are all processed through the cgroup mechanism.

This article will introduce the basic information and usage of the cgroup abstraction layer in Android.

1. Introduction to cgroups

cgroups (full name: control groups ) is a mechanism provided by the Linux kernel that can limit the resources used by a single process or multiple processes, and can achieve refined control of resources such as CPU and memory. Docker, a lightweight container that is becoming more and more popular at present, uses the resource restriction capability provided by cgroups to complete resource control of CPU, memory and other departments.

cgroups defines a subsystem for each controllable resource. Typical subsystems are as follows:

  • cpu: mainly limits the cpu usage of the process;
  • cpuaat: can count cpu usage reports of processes in cgroups;
  • cpuset: You can allocate separate cpu nodes or memory nodes to processes in cgroups;
  • memory: You can limit the memory usage of the process;
  • blkio: block device io that can limit processes;
  • devices: You can control the process' ability to access certain devices;
  • freezer: can suspend or resume processes in cgroups;
  • net_cls: Network packets of processes in cgroups can be marked, and then the tc (traffic control) module can be used to control the packets;
  • ns: allows processes under different cgroups to use different namespaces;

Android uses cgroups to control and consider the usage and allocation of system resources such as CPU and memory, and supports Linux kernel cgroup v1 and cgroup v2 versions.

2. Introduction to Android cgroup abstraction layer

In Android Q(10)  or later, the cgroup abstraction layer is used through task profiles. Task profiles can be used to describe the restrictions of a set or sets that apply to a thread or process. The system selects one or more appropriate cgroups as specified in the task profiles. This restriction allows changes to the underlying cgroup feature set to be made without affecting higher software layers.

In Android P(9) and lower versions, the available cgroups, their mount points, and versions are set in init . rc. Although this information can be changed, the Android framework's settings (based on init.rc) are that a specific set of cgroups exist in a specific location, with a specific version and subgroup hierarchy. This limits the ability to select the next cgroup version to use, and also limits the ability to change the cgroup hierarchy to take advantage of new features.

In Android Q(10) or higher, use cgroups with task profiles:

  • cgroup configuration: Developers describe cgroups configuration in the cgroups.json file to define cgroups groups and their mount points and attibutes. All cgroups will be mounted during the early-init phase.
  • task profiles : These profiles provide an abstraction that separates required functionality from the implementation details of that functionality. The Android framework uses the SetTaskProfiles  and SetProcessProfiles  interfaces (these APIs are unique to Android R or higher) to apply task profiles to a process or a thread as described in the task_profiles.json file .

3. cgroups.json

File path: system/core/libprocessgroup/profiles/cgroups.json

system/core/libprocessgroup/profiles/cgroups.json

{
  "Cgroups": [
    {
      "Controller": "blkio",
      "Path": "/dev/blkio",
      "Mode": "0755",
      "UID": "system",
      "GID": "system"
    },
    {
      "Controller": "cpu",
      "Path": "/dev/cpuctl",
      "Mode": "0755",
      "UID": "system",
      "GID": "system"
    },
    {
      "Controller": "cpuacct",
      "Path": "/acct",
      "Mode": "0555"
    },
    {
      "Controller": "cpuset",
      "Path": "/dev/cpuset",
      "Mode": "0755",
      "UID": "system",
      "GID": "system"
    },
    {
      "Controller": "memory",
      "Path": "/dev/memcg",
      "Mode": "0700",
      "UID": "root",
      "GID": "system"
    },
    {
      "Controller": "schedtune",
      "Path": "/dev/stune",
      "Mode": "0755",
      "UID": "system",
      "GID": "system"
    }
  ],
  "Cgroups2": {
    "Path": "/sys/fs/cgroup",
    "Mode": "0755",
    "UID": "system",
    "GID": "system",
    "Controllers": [
      {
        "Controller": "freezer",
        "Path": "freezer",
        "Mode": "0755",
        "UID": "system",
        "GID": "system"
      }
    ]
  }
}

The rules described by cgroup v1 and cgroup v2 are different.

For cgroup v1, you must have:

  • Controller: Specify the name of the cgroups subsystem, and then the settings in task profiles need to rely on this name;
  • Path: Specify the mounting path. Only with this path can the file name be specified under task profiles;
  • Mode: used to specify the execution mode of files in the Path directory;
  • UID: Specify the user ID and the owner of the file in the Path directory;
  • GID: Specify the group ID and specify the owner of the file in the Path directory;

For cgroup v2, it is basically the same as v1. There are sub-cgroups defined in Controllers, which are all mounted in the same directory. Paths in child cgroups are relative to the root Path. For example, the Path of the freezer here sets the freezer, which means creating a directory freezer under the root Path /sys/fs/cgroup/ directory.

In addition, there may be more than one cgroups.json file :

/system/core/libprocessgroup/profiles/cgroups.json   //默认文件
/system/core/libprocessgroup/profiles/cgroups_<API level>.json   //API级别的文件,R版本没有,S版本很多
/vendor/xxx/cgroups.json   //vendor自定义文件

The loading order of these three files is: default -> API level -> vendor , so there is an overwriting process. As long as the Controller value defined in the later file is the same as the previous one, the former definition will be overwritten.

4. task profiles

File path: system/core/libprocessgroup/profiles/ task_profiles .json

{
  "Attributes": [
    {
      "Name": "MemSoftLimit",
      "Controller": "memory",
      "File": "memory.soft_limit_in_bytes"
    },
    {
      "Name": "MemSwappiness",
      "Controller": "memory",
      "File": "memory.swappiness"
    },
    {
      "Name": "FreezerState",
      "Controller": "freezer",
      "File": "cgroup.freeze"
    }
  ],

  "Profiles": [
    {
      "Name": "Frozen",
      "Actions": [
        {
          "Name": "JoinCgroup",
          "Params":
          {
            "Controller": "freezer",
            "Path": ""
          }
        }
      ]
    },
    {
      "Name": "TimerSlackHigh",
      "Actions": [
        {
          "Name": "SetTimerSlack",
          "Params":
          {
            "Slack": "40000000"
          }
        }
      ]
    },
    {
      "Name": "PerfBoost",
      "Actions": [
        {
          "Name": "SetClamps",
          "Params":
          {
            "Boost": "50%",
            "Clamp": "0"
          }
        }
      ]
    },
    {
      "Name": "HighMemoryUsage",
      "Actions": [
        {
          "Name": "SetAttribute",
          "Params":
          {
            "Name": "MemSoftLimit",
            "Value": "512MB"
          }
        },
        {
          "Name": "SetAttribute",
          "Params":
          {
            "Name": "MemSwappiness",
            "Value": "100"
          }
        }
      ]
    },
    {
      "Name": "FreezerEnabled",
      "Actions": [
        {
          "Name": "SetAttribute",
          "Params":
          {
            "Name": "FreezerState",
            "Value": "1"
          }
        }
      ]
    }
  ],

  "AggregateProfiles": [
    {
      "Name": "SCHED_SP_DEFAULT",
      "Profiles": [ "TimerSlackNormal" ]
    },
    {
      "Name": "SCHED_SP_BACKGROUND",
      "Profiles": [ "HighEnergySaving", "LowIoPriority", "TimerSlackHigh" ]
    },
    {
      "Name": "SCHED_SP_FOREGROUND",
      "Profiles": [ "HighPerformance", "HighIoPriority", "TimerSlackNormal" ]
    },
    {
      "Name": "SCHED_SP_TOP_APP",
      "Profiles": [ "MaxPerformance", "MaxIoPriority", "TimerSlackNormal" ]
    },
    ...
  ]
}

The entire file configuration is enclosed by a curly bracket and consists of three parts:

  • Attributes
  • Profiles
  • AggregateProfiles

In addition, there is more than one task_profiles.json file:

system/core/libprocessgroup/profiles/task_profiles.json     //默认
system/core/libprocessgroup/profiles/task_profiles_<API level>.json  //API级别的文件,R版本没有,S有很多
vendor/xxx/task_profiles.json   //vendor配置

The order of loading and overwriting is the same as cgroups.json, and matching is based on Name. As long as items with the same name are defined in two files, the latter will overwrite the definition of the former. 

4.1 Attributes 段

Specific files in cgroups in Attributes.

Attributes are references in task profiles file definitions. Outside of task profiles, direct access to these files is only possible when the framework requests it, and cannot be accessed using the task profiles abstraction. In other cases, use task profiles, which provide a better separation of the desired behavior and its implementation details.

Each item in Attributes contains:

  • Name: The name of the Attribute, the Name value is used when referenced in profiles;
  • Controller: refers to a cgroup controller in the cgroups.json file and refers to the Controller value of the cgroup;
  • File: a special file in the directory where the cgroup Controller is located;

As above:

  "Attributes": [
    {
      "Name": "FreezerState",
      "Controller": "freezer",
      "File": "cgroup.freeze"
    }
  ],

The cgroup whose Controller is freezer is used. From Section 3 above, we know that it adopts the format of cgroups v2. The cgroup Path is /sys/fs/cgroup/freezer/ . The attribute defined here specifies the cgroup in this directory. freeze file.

 

In the code,  each Attribute is managed through the ProfileAttribute class:

system/core/libprocessgroup/task_profiles.h

class ProfileAttribute {
  public:
    ProfileAttribute(const CgroupController& controller, const std::string& file_name)
        : controller_(controller), file_name_(file_name) {}

    const CgroupController* controller() const { return &controller_; }
    const std::string& file_name() const { return file_name_; }
    void Reset(const CgroupController& controller, const std::string& file_name);

    bool GetPathForTask(int tid, std::string* path) const;

  private:
    CgroupController controller_;
    std::string file_name_;
};

4.2 Profiles section

Each definition includes:

  • Name: Specified profile name;
  • Actions: Lists the set of actions that need to be executed when the profile is applied. Each action includes:
    • Name: the action category that needs to be executed;
    • Params: A collection of parameters required by the action;

Let's take a look at the optional categories of Name in Actions and their Params configuration:

Action Parameter Description
SetTimerSlack Slack The time that the timer can be extended, in ns
SetAttribute Name Reference the name of an attribute in Attributes
Value The data to be written to the file specified by attribute
WriteFile FilePath file path
Value The value to be written to the file
JoinCgroup Controller document cgroups.json 中的cgroup名称
Path The subgroup path in the cgroup hierarchy

 

4.2.1 SetTimerSlack

SetTimerSlack has only one parameter Slack, which corresponds to the /proc/PID/timerslack_ns node. TimerSlack is an alignment strategy set up by the Linux system in order to reduce system power consumption, avoid uneven timer times, and wake up the CPU too frequently. This value is related to the timer of the process, such as the wake-up time of select, epoll_wait, sleep and other APIs.

In Linux 4.6+ versions, the /proc/PID/timerslack_ns  node is supported.

Specific reference: https://cloud.tencent.com/developer/article/1836285

 

In the code, manage the profile through the SetTimerSlackAction class:

system/core/libprocessgroup/task_profiles.cpp

bool SetTimerSlackAction::ExecuteForTask(int tid) const {
    static bool sys_supports_timerslack = IsTimerSlackSupported(tid);

    if (sys_supports_timerslack) {
        auto file = StringPrintf("/proc/%d/timerslack_ns", tid);
        if (!WriteStringToFile(std::to_string(slack_), file)) {
            if (errno == ENOENT) {
                // This happens when process is already dead
                return true;
            }
            PLOG(ERROR) << "set_timerslack_ns write failed";
        }
    }

    // TODO: Remove when /proc/<tid>/timerslack_ns interface is backported.
    if (tid == 0 || tid == GetThreadId()) {
        if (prctl(PR_SET_TIMERSLACK, slack_) == -1) {
            PLOG(ERROR) << "set_timerslack_ns prctl failed";
        }
    }

    return true;
}

4.2.2 SetAttribute

SetAttribute is linked to the Attributes in task_profiles.json, corresponding to SetAttributeAction.

SetAttribute has two parameters. Name refers to the name of the previously defined Attribute, and Value refers to the value written to the child node of the cgroup corresponding to the Attribute.

In the code, the SetAttribute profile is managed through the SetAttributeAction class:

system/core/libprocessgroup/task_profiles.cpp

bool SetAttributeAction::ExecuteForTask(int tid) const {
    std::string path;

    if (!attribute_->GetPathForTask(tid, &path)) {
        LOG(ERROR) << "Failed to find cgroup for tid " << tid;
        return false;
    }

    if (!WriteStringToFile(value_, path)) {
        PLOG(ERROR) << "Failed to write '" << value_ << "' to " << path;
        return false;
    }

    return true;
}

There will be a member variable attribute in the class, of type ProfileAttribute.

It can be known from the code that the value is first written into the file node according to the path in the Attribute.

 

4.2.3 JoinCgroup

JoinCgroup has only two parameters, Controller and Path. Controller refers to the subsystem of cgroups, and Path refers to the path under the subsystem, which is the subcgroup. Through this configuration, the process or thread set to this profile is added to the sub-cgroup of the subsystem, and is subject to the resource restrictions of this cgroup.

This profile is managed through the SetCgroupAction class in the code  .

For example the above:

{
  "Attributes": [
    ...
  ],

  "Profiles": [
    {
      "Name": "Frozen",
      "Actions": [
        {
          "Name": "JoinCgroup",
          "Params":
          {
            "Controller": "freezer",
            "Path": ""
          }
        }
      ]
    }
  ],

  "AggregateProfiles": [
    ...
  ]
}

The profile name configured here is Frozen, the Cgroup Controller used is freezer, and the Path is empty.

In other words, this profile needs to use a sub-cgroup file in the /sys/fs/cgroup/freezer/ directory. See system calls for details. Through the search, the system will call Process.setProcessFrozen() in the CachedAppOptimizer class, and then call the jni android_util_Process_setProcessFrozen() interface:

frameworks/base/core/jni/android_util_Process.cpp

void android_os_Process_setProcessFrozen(
        JNIEnv *env, jobject clazz, jint pid, jint uid, jboolean freeze)
{
    bool success = true;

    if (freeze) {
        success = SetProcessProfiles(uid, pid, {"Frozen"});
    } else {
        success = SetProcessProfiles(uid, pid, {"Unfrozen"});
    }

    if (!success) {
        signalExceptionForGroupError(env, EINVAL, pid);
    }
}

When the process freezes or unfreezes, SetProcessProfiles() will be called, specifically the  profile of type SetCgroupAction  , and finally ExecuteForProcess() will be called:

system/core/libprocessgroup/task_profiles.cpp

bool SetCgroupAction::ExecuteForProcess(uid_t uid, pid_t pid) const {
    std::string procs_path = controller()->GetProcsFilePath(path_, uid, pid);
    unique_fd tmp_fd(TEMP_FAILURE_RETRY(open(procs_path.c_str(), O_WRONLY | O_CLOEXEC)));
    if (tmp_fd < 0) {
        PLOG(WARNING) << "Failed to open " << procs_path;
        return false;
    }
    if (!AddTidToCgroup(pid, tmp_fd)) {
        LOG(ERROR) << "Failed to add task into cgroup";
        return false;
    }

    return true;
}

Through the function, first obtain the path that needs to be modified for the profile through the GetProcsFilePath() interface of the Controller. The parameter is the Path configured for the profile:

system/core/libprocessgroup/cgroup_map.cpp

std::string CgroupController::GetProcsFilePath(const std::string& rel_path, uid_t uid,
                                               pid_t pid) const {
    std::string proc_path(path());
    proc_path.append("/").append(rel_path);
    proc_path = regex_replace(proc_path, std::regex("<uid>"), std::to_string(uid));
    proc_path = regex_replace(proc_path, std::regex("<pid>"), std::to_string(pid));

    return proc_path.append(CGROUP_PROCS_FILE);
}

 The final file written is CGROUP_PROCS_FILE, which is  the cgroup.procs file.

4.3 AggregateProfiles 段

In Android 12 or later, the task_profiles.json file also contains an AggregateProfiles section.

One or more profile aliases are defined here, consisting of the following content:

  • Name: Specify the name of aggregate profile;
  • Profiles: A collection of profile names contained in this aggregate profile;

When an aggregate profile is applied, all profiles contained in it will be automatically applied.

As above:

  "AggregateProfiles": [
    {
      "Name": "SCHED_SP_FOREGROUND",
      "Profiles": [ "HighPerformance", "HighIoPriority", "TimerSlackNormal" ]
    },
    ...
  ]

When applying the SCHED_SP_FOREGROUND aggregate profile, all the profiles contained in it ( High

Performance, HighIoPriority, TimerSlackNormal ) will be applied.

Additionally, without recursion, aggregate profiles can contain individual profiles or other aggregate profiles.

5. cgroups initialization

In the second phase of init startup, it will be called:

system/core/init/init.cpp

int SecondStageMain(int argc, char** argv) {
    ...
    am.QueueBuiltinAction(SetupCgroupsAction, "SetupCgroups");
    ...
}
system/core/init/init.cpp

static Result<void> SetupCgroupsAction(const BuiltinArguments&) {
    // Have to create <CGROUPS_RC_DIR> using make_dir function
    // for appropriate sepolicy to be set for it
    make_dir(android::base::Dirname(CGROUPS_RC_PATH), 0711);
    if (!CgroupSetup()) {
        return ErrnoError() << "Failed to setup cgroups";
    }

    return {};
}

Create a CGROUPS_RC_PATH file: /dev/cgroup_info/cgroup.rc

Then write the information of the cgroups.json file into the cgroup.rc file so that task_profiles can read the controller information.

 

6. Task profiles

Through the code, we can actually clearly see that the TaskProfiles class begins to parse task_profile.json when it is constructed:

syste/core/libprocessgroup/task_profiles.cpp

TaskProfiles::TaskProfiles() {
    // load system task profiles
    if (!Load(CgroupMap::GetInstance(), TASK_PROFILE_DB_FILE)) {
        LOG(ERROR) << "Loading " << TASK_PROFILE_DB_FILE << " for [" << getpid() << "] failed";
    }

    // load vendor task profiles if the file exists
    if (!access(TASK_PROFILE_DB_VENDOR_FILE, F_OK) &&
        !Load(CgroupMap::GetInstance(), TASK_PROFILE_DB_VENDOR_FILE)) {
        LOG(ERROR) << "Loading " << TASK_PROFILE_DB_VENDOR_FILE << " for [" << getpid()
                   << "] failed";
    }
}

Mainly through Load() to parse two files:

  • TASK_PROFILE_DB_FILE (/etc/task_profiles.json)
  • TASK_PROFILE_DB_VENDOR_FILE (/vendor/etc/task_profiles.json)

In Load(), the Attributes, Profiles, and AggregateProfiles sections in the task_profiles.json file will be parsed respectively. Not too much analysis here. After we complete the task profiles parsing, the system uses SetProcessProfiles() or SetTaskProfiles() to achieve the purpose of applying the profile.

6.1 SetProcessProfiles()

system/core/libprocessgroup/processgroup.cpp

bool SetProcessProfiles(uid_t uid, pid_t pid, const std::vector<std::string>& profiles) {
    return TaskProfiles::GetInstance().SetProcessProfiles(uid, pid, profiles);
}

This is a global function that calls SetProcessProfiles() under task profiles through the singleton of TaskProfiles:

system/core/libprocessgroup/task_profiles.cpp

bool TaskProfiles::SetProcessProfiles(uid_t uid, pid_t pid,
                                      const std::vector<std::string>& profiles) {
    for (const auto& name : profiles) {
        TaskProfile* profile = GetProfile(name);
        if (profile != nullptr) {
            if (!profile->ExecuteForProcess(uid, pid)) {
                PLOG(WARNING) << "Failed to apply " << name << " process profile";
            }
        } else {
            PLOG(WARNING) << "Failed to find " << name << "process profile";
        }
    }
    return true;
}

Further determine the detailed profile through the name of the profiles, and then call the ExecuteForProcess() function, as shown in Section 4.2.3 above . The final detailed profile is SetCgroupAction.

The process is roughly as follows:

6.2 SetTaskProfiles()

system/core/libprocessgroup/processgroup.cpp

bool SetTaskProfiles(int tid, const std::vector<std::string>& profiles, bool use_fd_cache) {
    return TaskProfiles::GetInstance().SetTaskProfiles(tid, profiles, use_fd_cache);
}

The specific process is the same as the SetProcessProfiles() function, and the ExecuteForTask() function of the profile action is ultimately called.

 

 

At this point, the abstract layer of cgroups in Android has been roughly described. The code logic is very clear. The main kernel code will be analyzed in detail later. Here’s a summary:

  • Configure all subsystems of the cgroup through cgroups.json, and the name of the Controller will be used in Attributes or Profiles later. In addition, there may be many such cgroups.json files . Depending on the loading order, there will be coverage;
  • All activities are configured through task_profiles.json, leveraging the subsystems previously defined in cgroups.json to further define Attributes, Profiles and AggregateProfiles. In the same way, there is also a loading sequence, and there is also coverage;
  • The parsing of cgroups.json is completed in the second phase of init;
  • The system will create a single instance of TaskProfiles to manage all profiles, and the corresponding actions are also maintained in the profile;
  • Apply specific profiles to the process through the interface SetProcessProfiles();
  • Apply specific profiles to threads through the interface SetTaskProfiles();

 

 

 

Guess you like

Origin blog.csdn.net/jingerppp/article/details/131854291