Android shader compilation principle

Author: tmaczhang

1. What is shader compilation lag?

Shaders are code that run on the GPU (graphics processing unit). When the Skia graphics backend for Flutter rendering sees a new sequence of drawing commands for the first time, it sometimes generates and compiles a custom GPU shader for that sequence of commands. Make this sequence and potentially similar sequences render as fast as possible.

Unfortunately, the process of generating and compiling the Skia shader is sequential to the work of the frame. The compilation process can take hundreds of milliseconds, and for a 60 frame-per-second display, a smooth frame must be drawn in 16 milliseconds. As a result, the compilation process can cause dozens of frames to be lost, dropping the frame count from 60 to 6. This is the so-called compilation freeze . After compiling, the animation should be smooth.

Impeller, on the other hand, already generated and compiled all the necessary shaders when we built the Flutter engine. As a result, applications running on the Impeller already have all the shaders they need, and those shaders don't cause stuttering in animations.

For more definitive evidence of shader compilation jank, you can --trace-skialook in the trace file when is turned on GrGLProgramBuilder::finalize. The screenshot below shows a sample timeline trace.

How to warm up with SkSL

At the time of the 1.20 release, Flutter provided a command-line tool for application developers to collect shaders that end-users need to format in SkSL (Skia Shader Language). The SkSL shader can be packaged into the application and pre-warmed (precompiled) in advance, so that when the end user opens the application for the first time, it can reduce the animation compilation frame drop.

In flutter, the SKSL shader is packaged into the application and precompiled in advance, and the space is exchanged for time to improve performance. Is it possible to do the same in android?

2 Shader compilation and use in Android

2.1 shader original logic

/data/user_de/0/tv.danmaku.bili/code_cache # ls -l
total 56
-r-------- 1 u0_a206 u0_a206_cache 40556 2023-06-30 15:32 com.android.opengl.shaders_cache
-r-------- 1 u0_a206 u0_a206_cache 13304 2023-06-30 15:32 com.android.skia.shaders_cache

frameworks/base/graphics/java/android/graphics/HardwareRenderer.java

/**
 * Name of the file that holds the shaders cache.
 */
private static final String CACHE_PATH_SHADERS = "com.android.opengl.shaders_cache";
private static final String CACHE_PATH_SKIASHADERS = "com.android.skia.shaders_cache";



/**
 * Sets the directory to use as a persistent storage for threaded rendering
 * resources.
 *
 * @param cacheDir A directory the current process can write to
 * @hide
 */
public static void setupDiskCache(File cacheDir) {
    setupShadersDiskCache(new File(cacheDir, CACHE_PATH_SHADERS).getAbsolutePath(),
            new File(cacheDir, CACHE_PATH_SKIASHADERS).getAbsolutePath());
}

static void android_view_ThreadedRenderer_setupShadersDiskCache(JNIEnv* env, jobject clazz,
        jstring diskCachePath, jstring skiaDiskCachePath) {
    const char* cacheArray = env->GetStringUTFChars(diskCachePath, NULL);
    android::egl_set_cache_filename(cacheArray);
    env->ReleaseStringUTFChars(diskCachePath, cacheArray);

    const char* skiaCacheArray = env->GetStringUTFChars(skiaDiskCachePath, NULL);
    uirenderer::skiapipeline::ShaderCache::get().setFilename(skiaCacheArray);
    env->ReleaseStringUTFChars(skiaDiskCachePath, skiaCacheArray);
}

2.2 Introduction to Skia

When the Render thread is initialized, the path will be initialized and set to native. So how is it preserved? This is about introducing today's protagonist SKia library.

The android path is located at external/skia/

Official description: SkSL is Skia's shading language. SkRuntimeEffect is a Skia C++ object that can be used to create SkShader, SkColorFilter and SkBlender objects whose behavior is controlled by SkSL code. You can try out SkSL at https://shaders.skia.org/. The syntax is very similar to GLSL. There are important differences (with GLSL) to keep in mind when using SkSL effects in your ski application. Most of these differences are due to a fundamental fact: with a GPU shading language, you are programming a stage of the GPU pipeline. With SkSL, you are programming a stage of the Skia pipeline.

float f(vec3 p) {
    p.z -= iTime * 10.;
    float a = p.z * .1;
    p.xy *= mat2(cos(a), sin(a), -sin(a), cos(a));
    return .1 - length(cos(p.xy) + sin(p.yz));
}

half4 main(vec2 fragcoord) { 
    vec3 d = .5 - fragcoord.xy1 / iResolution.y;
    vec3 p=vec3(0);
    for (int i = 0; i < 32; i++) {
      p += f(p) * d;
    }
    return ((sin(p) + vec3(2, 5, 9)) / length(p)).xyz1;
}

Shader and Program are two important concepts. At least one vertex Shader object, one fragment Shader object and one Program object need to be created to render with shaders. The best way to understand Shader objects and Program objects is to compare them to C Language compiler and linker, there are six basic steps from Shader creation to Program link, creating Shader, loading Shader source code, compiling Shader, creating Program, binding Program and Shader, and linking Program. Then it can be used normally.

In Android, after the shader is compiled and linked, it finally exists in the above directory.

2.3 Compile and link process

When the application starts,

external/skia/src/gpu/gl/builders/GrGLProgramBuilder.cpp

void GrGLProgramBuilder::storeShaderInCache(const SkSL::Program::Inputs& inputs, GrGLuint programID,
                                            const std::string shaders[], bool isSkSL,
                                            SkSL::Program::Settings* settings) {
    if (!this->gpu()->getContext()->priv().getPersistentCache()) {
        return;
    }
    sk_sp<SkData> key = SkData::MakeWithoutCopy(this->desc().asKey(), this->desc().keyLength());
    SkString description = GrProgramDesc::Describe(fProgramInfo, *fGpu->caps());
    if (fGpu->glCaps().programBinarySupport()) {
        // binary cache
        GrGLsizei length = 0;
        GL_CALL(GetProgramiv(programID, GL_PROGRAM_BINARY_LENGTH, &length));
        if (length > 0) {
            SkBinaryWriteBuffer writer;
            writer.writeInt(GrPersistentCacheUtils::GetCurrentVersion());
            writer.writeUInt(kGLPB_Tag);

            writer.writePad32(&inputs, sizeof(inputs));

            SkAutoSMalloc<2048> binary(length);
            GrGLenum binaryFormat;
            GL_CALL(GetProgramBinary(programID, length, &length, &binaryFormat, binary.get()));

            writer.writeUInt(binaryFormat);
            writer.writeInt(length);
            writer.writePad32(binary.get(), length);

            auto data = writer.snapshotAsData();
            this->gpu()->getContext()->priv().getPersistentCache()->store(*key, *data, description);
        }
    } else {
        // source cache, plus metadata to allow for a complete precompile
        GrPersistentCacheUtils::ShaderMetadata meta;
        meta.fSettings = settings;
        meta.fHasCustomColorOutput = fFS.hasCustomColorOutput();
        meta.fHasSecondaryColorOutput = fFS.hasSecondaryOutput();
        for (auto attr : this->geometryProcessor().vertexAttributes()) {
            meta.fAttributeNames.emplace_back(attr.name());
        }
        for (auto attr : this->geometryProcessor().instanceAttributes()) {
            meta.fAttributeNames.emplace_back(attr.name());
        }

        auto data = GrPersistentCacheUtils::PackCachedShaders(isSkSL ? kSKSL_Tag : kGLSL_Tag,
                                                              shaders, &inputs, 1, &meta);
        this->gpu()->getContext()->priv().getPersistentCache()->store(*key, *data, description);
    }
}

Note that there are two storage formats here, the front is to store the SKSL compiled binary file, and the back is to store the SKSL source code

frameworks/base/libs/hwui/pipeline/skia/ShaderCache.cpp

void ShaderCache::store(const SkData& key, const SkData& data, const SkString& /*description*/) {
    ATRACE_NAME("ShaderCache::store");
    std::lock_guard<std::mutex> lock(mMutex);
    mNumShadersCachedInRam++;
    ATRACE_FORMAT("HWUI RAM cache: %d shaders", mNumShadersCachedInRam);

    if (!mInitialized) {
        return;
    }

    size_t valueSize = data.size();
    size_t keySize = key.size();
    if (keySize == 0 || valueSize == 0 || valueSize >= maxValueSize) {
        ALOGW("ShaderCache::store: sizes %d %d not allowed", (int)keySize, (int)valueSize);
        return;
    }

    const void* value = data.data();

    BlobCache* bc = getBlobCacheLocked();
    if (mInStoreVkPipelineInProgress) {
        if (mOldPipelineCacheSize == -1) {
            // Record the initial pipeline cache size stored in the file.
            mOldPipelineCacheSize = bc->get(key.data(), keySize, nullptr, 0);
        }
        if (mNewPipelineCacheSize != -1 && mNewPipelineCacheSize == valueSize) {
            // There has not been change in pipeline cache size. Stop trying to save.
            mTryToStorePipelineCache = false;
            return;
        }
        mNewPipelineCacheSize = valueSize;
    } else {
        mCacheDirty = true;
        // If there are new shaders compiled, we probably have new pipeline state too.
        // Store pipeline cache on the next flush.
        mNewPipelineCacheSize = -1;
        mTryToStorePipelineCache = true;
    }
    set(bc, key.data(), keySize, value, valueSize);

    if (!mSavePending && mDeferredSaveDelayMs > 0) {
        mSavePending = true;
        std::thread deferredSaveThread([this]() {
            usleep(mDeferredSaveDelayMs * 1000);  // milliseconds to microseconds
            std::lock_guard<std::mutex> lock(mMutex);
            // Store file on disk if there a new shader or Vulkan pipeline cache size changed.
            if (mCacheDirty || mNewPipelineCacheSize != mOldPipelineCacheSize) {
                saveToDiskLocked();
                mOldPipelineCacheSize = mNewPipelineCacheSize;
                mTryToStorePipelineCache = false;
                mCacheDirty = false;
            }
            mSavePending = false;
        });
        deferredSaveThread.detach();
    }
}

Finally, save it to the local path through saveToDiskLocked, that is, data/user_de/0/${packagename}/code_cache/com.android.skia.shaders_cache

frameworks/native/opengl/libs/EGL/FileBlobCache.cpp

void FileBlobCache::writeToFile() {
    if (mFilename.length() > 0) {
        size_t cacheSize = getFlattenedSize();
        size_t headerSize = cacheFileHeaderSize;
        const char* fname = mFilename.c_str();

        // Try to create the file with no permissions so we can write it
        // without anyone trying to read it.
        int fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0);
        if (fd == -1) {
            if (errno == EEXIST) {
                // The file exists, delete it and try again.
                if (unlink(fname) == -1) {
                    // No point in retrying if the unlink failed.
                    ALOGE("error unlinking cache file %s: %s (%d)", fname,
                            strerror(errno), errno);
                    return;
                }
                // Retry now that we've unlinked the file.
                fd = open(fname, O_CREAT | O_EXCL | O_RDWR, 0);
            }
            if (fd == -1) {
                ALOGE("error creating cache file %s: %s (%d)", fname,
                        strerror(errno), errno);
                return;
            }
        }

        size_t fileSize = headerSize + cacheSize;

        uint8_t* buf = new uint8_t [fileSize];
        if (!buf) {
            ALOGE("error allocating buffer for cache contents: %s (%d)",
                    strerror(errno), errno);
            close(fd);
            unlink(fname);
            return;
        }

        int err = flatten(buf + headerSize, cacheSize);
        if (err < 0) {
            ALOGE("error writing cache contents: %s (%d)", strerror(-err),
                    -err);
            delete [] buf;
            close(fd);
            unlink(fname);
            return;
        }

        // Write the file magic and CRC
        memcpy(buf, cacheFileMagic, 4);
        uint32_t* crc = reinterpret_cast<uint32_t*>(buf + 4);
        *crc = crc32c(buf + headerSize, cacheSize);

        if (write(fd, buf, fileSize) == -1) {
            ALOGE("error writing cache file: %s (%d)", strerror(errno),
                    errno);
            delete [] buf;
            close(fd);
            unlink(fname);
            return;
        }

        delete [] buf;
        fchmod(fd, S_IRUSR);
        close(fd);
    }
}

Finally, it is written to the file through the writeToFile of FileBlobCache.

2.4 Shader file usage principle

When the Render thread is created, the shader file is read into memory. Then when the application loads the graphics, when creating the Program, use this builder.fCached = persistentCache->load(*key) to query from the shader. If the query is found, the program will not be bound to the Shader and linked to the Program later. Thus reaching the logic of exchanging space for time.

frameworks/native/opengl/libs/EGL/FileBlobCache.cpp

sk_sp<GrGLProgram> GrGLProgramBuilder::CreateProgram(
                                               GrDirectContext* dContext,
                                               const GrProgramDesc& desc,
                                               const GrProgramInfo& programInfo,
                                               const GrGLPrecompiledProgram* precompiledProgram) {
    TRACE_EVENT0_ALWAYS("skia.shaders", "shader_compile");
    GrAutoLocaleSetter als("C");

    GrGLGpu* glGpu = static_cast<GrGLGpu*>(dContext->priv().getGpu());

    // create a builder.  This will be handed off to effects so they can use it to add
    // uniforms, varyings, textures, etc
    GrGLProgramBuilder builder(glGpu, desc, programInfo);

    auto persistentCache = dContext->priv().getPersistentCache();
    if (persistentCache && !precompiledProgram) {
        sk_sp<SkData> key = SkData::MakeWithoutCopy(desc.asKey(), desc.keyLength());
        builder.fCached = persistentCache->load(*key);
        // the eventual end goal is to completely skip emitAndInstallProcs on a cache hit, but it's
        // doing necessary setup in addition to generating the SkSL code. Currently we are only able
        // to skip the SkSL->GLSL step on a cache hit.
    }
    if (!builder.emitAndInstallProcs()) {
        return nullptr;
    }
    return builder.finalize(precompiledProgram);
}




sk_sp<GrGLProgram> GrGLProgramBuilder::finalize(const GrGLPrecompiledProgram* precompiledProgram) {
    TRACE_EVENT0("skia.shaders", TRACE_FUNC);
    //省略逻辑
    bool cached = fCached.get() != nullptr;
    if (precompiledProgram) {
      //省略逻辑
    } else if (cached) {
        TRACE_EVENT0_ALWAYS("skia.shaders", "cache_hit");
        SkReadBuffer reader(fCached->data(), fCached->size());
    //省略逻辑
    }
//省略逻辑
}

3 shader preloading technology in Android

We introduced earlier that the shader file is created when it is used for the first time, so there must be six processes such as compilation and linking when it is used for the first time, which is shown on the trace, which will cause the render thread to take too much time, thus Causing Caton.

It was introduced earlier that Flutter can cache SKSL in advance and package it into the APK, and then preload it, thereby reducing stuttering. Can Android also do the same?

The answer is obviously yes, let's see how flutter does it?

size_t PersistentCache::PrecompileKnownSkSLs(GrDirectContext* context) const {
  // clang-tidy has trouble reasoning about some of the complicated array and
  // pointer-arithmetic code in rapidjson.
  // NOLINTNEXTLINE(clang-analyzer-cplusplus.PlacementNew)
  auto known_sksls = LoadSkSLs();
  // A trace must be present even if no precompilations have been completed.
  FML_TRACE_EVENT("flutter", "PersistentCache::PrecompileKnownSkSLs", "count",
                  known_sksls.size());

  if (context == nullptr) {
    return 0;
  }

  size_t precompiled_count = 0;
  for (const auto& sksl : known_sksls) {
    TRACE_EVENT0("flutter", "PrecompilingSkSL");
    if (context->precompileShader(*sksl.key, *sksl.value)) {
      precompiled_count++;
    }
  }

  FML_TRACE_COUNTER("flutter", "PersistentCache::PrecompiledSkSLs",
                    reinterpret_cast<int64_t>(this),  // Trace Counter ID
                    "Successful", precompiled_count);
  return precompiled_count;
}

(1) Collect SKSL shader files. source code? Or binary? This question is left to the reader

(2) Packaged into APK.

(3) During initialization, load the SKSL call shader precompiled interface into the memory and save it locally.

If you want to better grasp performance optimization-related issues, you can use the learning documents below for reference learning, and you can directly visit https://qr18.cn/FVlo89and view the full version .


Guess you like

Origin blog.csdn.net/weixin_61845324/article/details/131494349