Integrate Huawei's hand key point recognition service to easily recognize sign language letters

Introduction

Huawei Machine Learning (ML Kit) provides hand key point recognition services, which can be used for sign language recognition. The hand key point recognition service can identify 21 key points of the hand, and find the sign language alphabet by comparing the direction of each finger with the sign language rules.

Application scenario

Sign language is usually used by people with hearing and speaking disabilities. It is a collection of gestures that includes movements and gestures used in daily interactions.

Use ML Kit to build a smart sign language alphabet recognizer, which can translate gestures into words or sentences like an aid, or translate words or sentences into gestures.

What I tried here is the American Sign Language alphabet in gestures, which are classified based on the positions of joints, fingers and wrists. Next, the editor will try to collect the word "HELLO" from the gestures.

Insert picture description here

Development steps

1. Prepare

For detailed preparation steps, please refer to Huawei Developer Alliance:

https://developer.huawei.com/consumer/cn/doc/development/HMS-Guides/ml-process-4

Here are the key development steps.

1.1 Start ML Kit

In Huawei Developer AppGallery Connect, select Develop> Manage APIs . Make sure ML Kit is activated.

1.2 Configure Maven warehouse address in project-level gradle
buildscript {
 repositories {
 ...
 maven {url 'https://developer.huawei.com/repo/'}
 }
 }
 dependencies {
 ...
 classpath 'com.huawei.agconnect:agcp:1.3.1.301'
 }
 allprojects {
 repositories {
 ...
 maven {url 'https://developer.huawei.com/repo/'}
 }
 }
1.3 After integrating the SDK, add configuration to the file header.
apply plugin: 'com.android.application'      
 apply plugin: 'com.huawei.agconnect' 

 dependencies{
  //   Import the base SDK.
      implementation   'com.huawei.hms:ml-computer-vision-handkeypoint:2.0.2.300'
  //   Import the hand keypoint detection model package.
      implementation   'com.huawei.hms:ml-computer-vision-handkeypoint-model:2.0.2.300'
  }
1.4 Add the following statement to the AndroidManifest.xml file
<meta-data    
            android:name="com.huawei.hms.ml.DEPENDENCY"    
            android:value= "handkeypoint"/>
1.5 Apply for camera permission and local file read permission
<!--Camera permission-->
 <uses-permission android:name="android.permission.CAMERA" />
 <!--Read permission-->
 <uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />

2. Code Development

2.1 Create a Surface View for camera preview, and create a Surface View for results.

Currently we only display the results in the UI, you can also use TTS to identify extensions and read the results.

mSurfaceHolderCamera.addCallback(surfaceHolderCallback) 
    private val surfaceHolderCallback = object : SurfaceHolder.Callback {    
      override fun surfaceCreated(holder: SurfaceHolder) {    
          createAnalyzer()    
      }    
      override fun surfaceChanged(holder: SurfaceHolder, format: Int, width: Int, height: Int) {    
          prepareLensEngine(width, height)    
          mLensEngine.run(holder)    
      }    
      override fun surfaceDestroyed(holder: SurfaceHolder) {    
          mLensEngine.release()    
      }    
  }
2.2 Create a hand key point analyzer
//Creates MLKeyPointAnalyzer with MLHandKeypointAnalyzerSetting.
val settings = MLHandKeypointAnalyzerSetting.Factory()
        .setSceneType(MLHandKeypointAnalyzerSetting.TYPE_ALL)
        .setMaxHandResults(2)
        .create()
// Set the maximum number of hand regions  that can be detected within an image. A maximum of 10 hand regions can be   detected by default

mAnalyzer = MLHandKeypointAnalyzerFactory.getInstance().getHandKeypointAnalyzer(settings)
mAnalyzer.setTransactor(mHandKeyPointTransactor)
2.3 The developer creates the recognition result processing class "HandKeypointTransactor", the MLAnalyzer.MLTransactor<T> interface of this class, and uses the "transactResult" method in this class to obtain the detection results and implement specific services.
class HandKeyPointTransactor(surfaceHolder: SurfaceHolder? = null): MLAnalyzer.MLTransactor<MLHandKeypoints> {

override fun transactResult(result: MLAnalyzer.Result<MLHandKeypoints>?) {

    var foundCharacter = findTheCharacterResult(result)

    if (foundCharacter.isNotEmpty() && !foundCharacter.equals(lastCharacter)) {
        lastCharacter = foundCharacter
        displayText.append(lastCharacter)
    }

    canvas.drawText(displayText.toString(), paddingleft, paddingRight, Paint().also {
        it.style = Paint.Style.FILL
        it.color = Color.YELLOW
    })

}
2.4 Create LensEngine
LensEngine lensEngine = new LensEngine.Creator(getApplicationContext(), analyzer)
setLensType(LensEngine.BACK_LENS)
applyDisplayDimension(width, height) // adjust width and height depending on the orientation
applyFps(5f)
enableAutomaticFocus(true)
create();
2.5 Run LensEngine
private val surfaceHolderCallback = object : SurfaceHolder.Callback { 

// run the LensEngine in surfaceChanged() 
override fun surfaceChanged(holder: SurfaceHolder, format: Int, width: Int, height: Int) {
    createLensEngine(width, height)
    mLensEngine.run(holder)
}

}
2.6 Stop the analyzer and release detection resources
fun stopAnalyzer() {    
      mAnalyzer.stop()    
  }
2.7 Processing transactResult() to detect characters

You can use the transtresult method in the HandKeypointTransactor class to obtain detection results and implement specific services. In addition to the coordinate information of each key point of the hand, the detection result also includes the palm and the confidence value of each key point. The key point recognition errors of the palm and hand can be filtered out based on the confidence value. In practical applications, the threshold can be flexibly set according to the tolerance of misidentification.

2.7.1 Find the direction of the finger:

Let us first assume that the vector slopes of the possible fingers are on the X and Y axis respectively.

private const val X_COORDINATE = 0
private const val Y_COORDINATE = 1

Suppose we have fingers on 5 vectors, and the direction of any finger can be classified as up, down, down-up, up-down, and motionless at any time.

enum class FingerDirection {
    VECTOR_UP, VECTOR_DOWN, VECTOR_UP_DOWN, VECTOR_DOWN_UP, VECTOR_UNDEFINED
}

enum class Finger {
    THUMB, FIRST_FINGER, MIDDLE_FINGER, RING_FINGER, LITTLE_FINGER
}

First separate the corresponding key points from the result to the key point arrays of different fingers, like this:

var firstFinger = arrayListOf<MLHandKeypoint>()
var middleFinger = arrayListOf<MLHandKeypoint>()
var ringFinger = arrayListOf<MLHandKeypoint>()
var littleFinger = arrayListOf<MLHandKeypoint>()
var thumb = arrayListOf<MLHandKeypoint>()

Each key point on the finger corresponds to the joint of the finger, and the slope can be calculated by calculating the distance between the joint and the average position value of the finger. According to the coordinates of nearby key points, query the coordinates of the key points.

E.g:

Take the two simple key points of the letter H

int[] datapointSampleH1 = {623, 497, 377, 312,    348, 234, 162, 90,     377, 204, 126, 54,     383, 306, 413, 491,     455, 348, 419, 521 };
int [] datapointSampleH2 = {595, 463, 374, 343,    368, 223, 147, 78,     381, 217, 110, 40,     412, 311, 444, 526,     450, 406, 488, 532};

Use the average of the finger coordinates to calculate the vector

//For ForeFinger - 623, 497, 377, 312

double avgFingerPosition = (datapoints[0].getX()+datapoints[1].getX()+datapoints[2].getX()+datapoints[3].getX())/4;
// find the average and subract it from the value of x
double diff = datapointSampleH1 [position] .getX() - avgFingerPosition ;
//vector either positive or negative representing the direction
int vector =  (int)((diff *100)/avgFingerPosition ) ;

The result of the vector will be positive or negative. If it is positive, it will appear in the positive four directions of the X axis, if it is opposite, it will be negative. In this way, all the letters are vector mapped. Once you have mastered all the vectors, we can use them to program.

Insert picture description here

Using the above vector direction, we can classify the vector and define the first one as the enumeration of finger directions

private fun getSlope(keyPoints: MutableList<MLHandKeypoint>, coordinate: Int): FingerDirection {

    when (coordinate) {
        X_COORDINATE -> {
            if (keyPoints[0].pointX > keyPoints[3].pointX && keyPoints[0].pointX > keyPoints[2].pointX)
                return FingerDirection.VECTOR_DOWN
            if (keyPoints[0].pointX > keyPoints[1].pointX && keyPoints[3].pointX > keyPoints[2].pointX)
                return FingerDirection.VECTOR_DOWN_UP
            if (keyPoints[0].pointX < keyPoints[1].pointX && keyPoints[3].pointX < keyPoints[2].pointX)
                return FingerDirection.VECTOR_UP_DOWN
            if (keyPoints[0].pointX < keyPoints[3].pointX && keyPoints[0].pointX < keyPoints[2].pointX)
                return FingerDirection.VECTOR_UP
        }
        Y_COORDINATE -> {
            if (keyPoints[0].pointY > keyPoints[1].pointY && keyPoints[2].pointY > keyPoints[1].pointY && keyPoints[3].pointY > keyPoints[2].pointY)
                return FingerDirection.VECTOR_UP_DOWN
            if (keyPoints[0].pointY > keyPoints[3].pointY && keyPoints[0].pointY > keyPoints[2].pointY)
                return FingerDirection.VECTOR_UP
            if (keyPoints[0].pointY < keyPoints[1].pointY && keyPoints[3].pointY < keyPoints[2].pointY)
                return FingerDirection.VECTOR_DOWN_UP
            if (keyPoints[0].pointY < keyPoints[3].pointY && keyPoints[0].pointY < keyPoints[2].pointY)
                return FingerDirection.VECTOR_DOWN
        }

    }
return FingerDirection.VECTOR_UNDEFINED

Get the direction of each finger and store it in an array.

xDirections[Finger.FIRST_FINGER] = getSlope(firstFinger, X_COORDINATE)
yDirections[Finger.FIRST_FINGER] = getSlope(firstFinger, Y_COORDINATE )

Insert picture description hereInsert picture description hereInsert picture description hereInsert picture description here

2.7.2 Find the character from the finger direction:

Now we treat it as the only word "HELLO", it needs the letters H, E, L, O. Their corresponding X-axis and Y-axis vectors are shown in the figure.

Assumption:

  1. The direction of the hand is always vertical.

  2. Make the palm and wrist parallel to the phone, that is 90 degrees to the X axis.

  3. Keep the posture for at least 3 seconds to record characters.

Start using character map vectors to find strings

// Alphabet H
if (xDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_DOWN_UP
        && xDirections [Finger.RING_FINGER] ==  FingerDirection.VECTOR_DOWN_UP
    && xDirections [Finger.MIDDLE_FINGER] ==  FingerDirection.VECTOR_DOWN
    && xDirections [Finger.FIRST_FINGER] ==  FingerDirection.VECTOR_DOWN
        && xDirections [Finger.THUMB] ==  FingerDirection.VECTOR_DOWN)
    return "H"

//Alphabet E
if (yDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_UP_DOWN
        && yDirections [Finger.RING_FINGER] ==  FingerDirection.VECTOR_UP_DOWN
        && yDirections [Finger.MIDDLE_FINGER] ==  FingerDirection.VECTOR_UP_DOWN
        && yDirections [Finger.FIRST_FINGER] ==  FingerDirection.VECTOR_UP_DOWN
        && xDirections [Finger.THUMB] ==  FingerDirection.VECTOR_DOWN)
    return "E"

if (yDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_UP_DOWN
        && yDirections [Finger.RING_FINGER] ==  FingerDirection.VECTOR_UP_DOWN
        && yDirections [Finger.MIDDLE_FINGER] ==  FingerDirection.VECTOR_UP_DOWN
        && yDirections [Finger.FIRST_FINGER] ==  FingerDirection.VECTOR_UP
        && yDirections [Finger.THUMB] ==  FingerDirection.VECTOR_UP)
    return "L"

if (xDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_UP
        && xDirections [Finger.RING_FINGER] ==  FingerDirection.VECTOR_UP
        && yDirections [Finger.THUMB] ==  FingerDirection.VECTOR_UP)
return "O"

3. Screen and results

Insert picture description hereInsert picture description hereInsert picture description hereInsert picture description here

4. More tips and tricks

  1. When expanded to 26 letters, the error is even greater. In order to scan more accurately, it takes 2-3 seconds to find and calculate the most likely characters from 2-3 seconds, which can reduce the error of the alphabet.

  2. To support all directions, add 8 or more directions on the XY axis. First, the degree of the finger and the corresponding finger vector are required.

to sum up

This attempt is a powerful coordinate technique. It can be expanded to all 26 letters after generating a vector map, and the direction can also be expanded to all 8 directions, so it will have 26 8 5 fingers = 1040 vectors . In order to better solve this problem, we can use the first derivative function of the finger to replace the vector to simplify the calculation.

Instead of creating vectors, we can enhance others, we can use image classification and training models, and then use custom models. This training is to check the feasibility of using key point processing features in Huawei ML Kit.

For more details, please refer to:

Official website of Huawei Developer Alliance:https://developer.huawei.com/consumer/cn/hms

Obtain development guidance documents:https://developer.huawei.com/consumer/cn/doc/development

To participate in developer discussions, please go to the Reddit community:https://www.reddit.com/r/HMSCore/

To download the demo and sample code, please go to Github:https://github.com/HMS-Core

To solve integration problems, please go to Stack Overflow:https://stackoverflow.com/questions/tagged/huawei-mobile-services?tab=Newest


Original link:
https://developer.huawei.com/consumer/cn/forum/topic/0204423958265820665?fid=18
Author: timer

Guess you like

Origin blog.51cto.com/14772288/2569650