Head pose estimation - Android

General

, And then obtains the current through Dlib face feature points by translating the feature point of the standard model fitting rotation, the difference between the feature point calculating standard model feature points obtained Dlib obtained using an iterative optimization Ceres continuously, finally obtained optimum rotation and translation parameters.

Android version is in principle the same C ++ Version: Head pose estimation - OpenCV / DLIB / Ceres .

It describes the problems encountered during migration.

Use of the environment

System environment: Ubuntu 18.04

Java environment: JRE 1.8.0

Language: C ++ (clang), Java

Compilation Tools: Android Studio 3.4.1

  • CMake 3.10.2
    • LLDB
      • NDK 20.0

These tools can be downloaded in the Android Studio in the SDK management tools.

Third-party tools

DLIB : used to obtain facial features
Ceres : for non-linear optimization

Source

https://github.com/Great-Keith/head-pose-estimation/tree/master/android/landmark-fitting

Ready to work

Android interface to third-party libraries

Dlib

Available on GitHub use of ready-made interfaces: https://github.com/tzutalin/dlib-android

The project also provides specific examples of the app: https://github.com/tzutalin/dlib-android-app/

We do app that is built on top of the sample app.

Ceres

Specific use can be found in an essay before: Android platform Ceres Solver

Finally, we integrate Dlib short and Ceres got the basic framework of our app: https://github.com/Great-Keith/dlib-android-app

Increase conversion front camera

Additional conversion button

Initial sample dlib-android-app only the rear camera, which is inconvenient for a single test, we modify the code to implement a camera before and after the switching button.

First, find the camera view res/layout/camera_connection_fragment.xml, an increase in the upper right corner Switchbutton.

Finally, we find the implementation details of the app, through their own to create a new CameraConnectionFragmentclass to replace the original Fragment, in order to achieve a series of operations. The class setUpCameraOutputsmethod enables selection of the camera, which will facilitate all cameras available on mobile devices, preference rear camera.

The method for adding a boolean bparameter for selecting the camera:

if(b) {
    // 只使用后置摄像头
    // If facing back camera or facing external camera exist, we won't use facing front camera
    if (num_facing_back_camera != null && num_facing_back_camera > 0) {
    // 前置摄像头跳过(如果有后置摄像头)
    // We don't use a front facing camera in this sample if there are other camera device facing types
        if (facing != null && facing == CameraCharacteristics.LENS_FACING_FRONT) {
            continue;
        }
    }
} else {
    // 只使用前置摄像头
    if (num_facing_front_camera != null && num_facing_front_camera > 0) {
    // 前置摄像头跳过(如果有后置摄像头)
    // We don't use a front facing camera in this sample if there are other camera device facing types
        if (facing != null && facing == CameraCharacteristics.LENS_FACING_BACK) {
          continue; 
        }
    }
}

Then in the initialization process associated with our Switchbutton:

        switchBtn = view.findViewById(R.id.cameraSwitch);
        switchBtn.setOnCheckedChangeListener(new CompoundButton.OnCheckedChangeListener() {
            @Override
            public void onCheckedChanged(CompoundButton compoundButton, boolean b) {
                closeCamera();
                openCamera(textureView.getWidth(), textureView.getHeight(), b);
            }
        });

[NOTE] by openCamerathe parameter bto the transmission setUpCameraOutputs.

Inverted front camera repair

After modification we run the program, you will find the situation reversed window appears pre-show, so we need to show the pre-display window is flipped.

Find the camera to capture images processing class listener OnGetImageListener, in which the screen capture function shall be handled drawResizedBitmap, before the final drawing, increasing the matrix flip.

/* If using front camera, matrix should rotate 180 */
if(!switchBtn.isChecked()) {
    matrix.postTranslate(-dst.getWidth() / 2.0f, -dst.getHeight() / 2.0f);
    matrix.postRotate(180);
    matrix.postTranslate(dst.getWidth() / 2.0f, dst.getHeight() / 2.0f);
}
final Canvas canvas = new Canvas(dst);
canvas.drawBitmap(src, matrix, null);

[NOTE] no way to directly get in the class CameraIdto tell where the camera is currently using the front or rear, so we have to be judged by the Switch button in front. Check may be able to use the Cameraclass after the API 21 out of use.

Main course

Or in the camera's listeners them, we can see the feature points dlib obtain data and draw.

mInferenceHandler.post(
                new Runnable() {
                    @Override
                    public void run() {
                        // ...

                        long startTime = System.currentTimeMillis();
                        List<VisionDetRet> results;
                        synchronized (OnGetImageListener.this) {
                            results = mFaceDet.detect(mCroppedBitmap);
                        }
                        long endTime = System.currentTimeMillis();
                        mTransparentTitleView.setText("Time cost: " + String.valueOf((endTime - startTime) / 1000f) + " sec");
                        // Draw on bitmap
                        if (results != null) {
                            for (final VisionDetRet ret : results) {
                                // 绘制人脸框和特征点
                                // ...
                                }
                            }
                        }
                        mWindow.setRGBBitmap(mCroppedBitmap);
                        mIsComputing = false;
                    }
                });

We choose to draw the face frame and feature point forincrease optimization cycle.

首先将特征点复制一份Point数组,用于作为传入参数。

/* Transform landmarks to array, which is needed by JNI */
Point[] tmp = landmarks.toArray(new Point[0]);

初始化好double x[]随后我们可以调用我们的CeresSolver类来进行处理,得到的最优解通过指针x返回。

CeresSolver.solve(x, tmp);

最后我们再调用两个方法来进行将三维特征点转化为二维的映射。

Point3f[] points3f = CeresSolver.transform(x);
Point[] points2d = CeresSolver.transformTo2d(points3f);

[NOTE] 项目中的二维点使用android.graphics.Point(对应C++中使用的dlib::point),而三维点使用我们自己建的一个类Point3f(对应C++中使用的dlib::vector<double, 3>)。

综上,我们实际上要实现的是一个提供Ceres支持的工具类CeresSolver,下面具体描述。

CeresSolver类与其JNI接口

初始化

我们需要读取标准模型特征点的三维坐标,该坐标存储于landmarks.txt文件中。对于Android工程,我们将该文件放在assets目录下。在CameraActivity初始化onCreate的时候顺带进行初始化:

CeresSolver.init(getResources().getAssets().open("landmarks.txt"));

该初始化具体过程如下:

public static void init(InputStream in) {
    try {
        InputStreamReader inputReader = new InputStreamReader(in);
        BufferedReader bufReader = new BufferedReader(inputReader);
        String line;
        int i = 0;
        while((line = bufReader.readLine()) != null) {
            String[] nums = line.split(" ");
            modelLandmarks[i] = new Point3f(Double.valueOf(nums[0]),
                                            Double.valueOf(nums[1]),
                                            Double.valueOf(nums[2]));
            i++;
        }
    } catch (Exception e) {
        Log.e(TAG, "Loading model landmarks from file failed.");
        e.printStackTrace();
    }
    Log.i(TAG, "Loading model landmarks from file succeed.");
    init_();
}

init_是一个JNI的函数,用于将CeresSolver类中读取的modelLandmark数据读取到本地变量``model_landmark,并提前读取一些jmethodIDjfieldID

[NOTE] 其实也可以通过调用jmethodID或者jfieldID来获得Java类中的modelLandmark,但我目前不是很清楚两种方法之间在效率上的差异。

[NOTE] 将这些数据提前在cpp文件中读取并保存成静态变量,这个过程有一些问题,由于Java的垃圾回收机制,JNI中的静态类型,有些会失去关联(可能是指针?)。比如jfieldID的调用往往没有问题,但是jclass就会失效,因此jclass类型无法提前先初始化好。

解决最小二乘

同C++一样,提前定义好CostFunctor:

struct CostFunctor {
public:
    explicit CostFunctor(JNIEnv *_env, jobjectArray _shape){
        env = _env;
        shape = _shape; }
    bool operator()(const double* const x, double* residual) const {
        /* Init landmarks to be transformed */
        fitting_landmarks.clear();
        for (auto &model_landmark : model_landmarks)
            fitting_landmarks.push_back(model_landmark);
        transform(fitting_landmarks, x);
        std::vector<Point2d> model_landmarks_2d;
        landmarks_3d_to_2d(fitting_landmarks, model_landmarks_2d);

        /* Calculate the energe (Euclid distance from two points) */
        for(unsigned long i=0; i<LANDMARK_NUM; i++) {
            jobject point = env->GetObjectArrayElement(shape, static_cast<jsize>(i));
            long tmp1 = env->GetIntField(point, getX2d) - model_landmarks_2d.at(i).x;
            long tmp2 = env->GetIntField(point, getY2d) - model_landmarks_2d.at(i).y;
            residual[i] = sqrt(tmp1 * tmp1 + tmp2 * tmp2);
        }
        return true;
    }
private:
    JNIEnv *env;
    jobjectArray shape; /* 3d landmarks coordinates got from dlib */
};

基本与C++相同,唯一不同的地方是shape的类型直接使用的JNI中的类型jobjectArray,并且需要使用到调用,因此需要在初始化的时候导入JNIEnv环境。

其余在调用部分就和C++部分基本相同,所有的JNI函数都需要注意在参数传入和传出的时候进行类型的转变。

坐标转化

涉及三维点旋转和平移的转化以及三维点转二维点的转化,同C++中的涉及。

需要另外提供JNI接口给Java中的类使用,主要涉及jobject的方法调用、成员访问等等。当然,也可以在Java中实现这些方法,感觉效率会更高一些。这一部分具体可以看源代码,其中有详细的注释。

信息打印(Debug)

在Android项目中,输出的消息很多,debug的难度是比较大的,因此需要灵活使用打印信息来获得所需要的信息。其中Java程序中可以使用android.util.Log来进行输出,可以在logcat或者run中进行查看。具体比如:

Log.i(TAG, String.format("After Solve x: %f %f %f %f %f %f", 
                          x[0], x[1], x[2], x[3], x[4], x[5]));

JNI的cpp文件中,定义如下宏定义来进行输出:

#define TAG        "CERES-CPP"
#define LOGD(...)  __android_log_print(ANDROID_LOG_DEBUG, TAG,__VA_ARGS__)
#define LOGI(...)  __android_log_print(ANDROID_LOG_INFO,  TAG,__VA_ARGS__)
#define LOGW(...)  __android_log_print(ANDROID_LOG_WARN,  TAG,__VA_ARGS__)
#define LOGE(...)  __android_log_print(ANDROID_LOG_ERROR, TAG,__VA_ARGS__)
#define LOGF(...)  __android_log_print(ANDROID_LOG_FATAL, TAG,__VA_ARGS__)

使用该Log需要在CMakeLists.txt中需要链接log库。

结果测试

进入相机界面,并进行摄像头的切换。

operation

这边可以看到,刚打开的时候,这个求解得到的点是非常混乱的,这是由于初始值没有设置好,在经过一段时间后就会进入正常状态。

实时效果

work

总结

因为整体逻辑在C++已经实现了,所以复制这个逻辑的过程并不困难。难点主要是在JNI的使用上,没有接触过NDK的我在将Ceres移植到安卓平台上花费了大量的时间,最后写了Android平台使用Ceres Solver总结了这个过程。当这一部分完成之后,后面的过程就快了起来,但关于JNI的很多特性,跟Java息息相关,还需要更多的摸索。

进一步可以优化

  • 初始值选择问题;
  • 去除app中的识别行人模块;
  • 优化使用Ceres求解最小二乘的过程;
  • 前后摄像头显示区别;
  • 优化接口,使其更据扩展性。

Guess you like

Origin www.cnblogs.com/bemfoo/p/11329441.html