Introduction
Huawei Machine Learning (ML Kit) provides hand key point recognition services, which can be used for sign language recognition. The hand key point recognition service can identify 21 key points of the hand, and find the sign language alphabet by comparing the direction of each finger with the sign language rules.
Application scenario
Sign language is usually used by people with hearing and speaking disabilities. It is a collection of gestures that includes movements and gestures used in daily interactions.
Use ML Kit to build a smart sign language alphabet recognizer, which can translate gestures into words or sentences like an aid, or translate words or sentences into gestures.
What I tried here is the American Sign Language alphabet in gestures, which are classified based on the positions of joints, fingers and wrists. Next, the editor will try to collect the word "HELLO" from the gestures.
Development steps
1. Prepare
For detailed preparation steps, please refer to Huawei Developer Alliance:
https://developer.huawei.com/consumer/cn/doc/development/HMS-Guides/ml-process-4
Here are the key development steps.
1.1 Start ML Kit
In Huawei Developer AppGallery Connect, select Develop> Manage APIs . Make sure ML Kit is activated.
1.2 Configure Maven warehouse address in project-level gradle
buildscript {
repositories {
...
maven {url 'https://developer.huawei.com/repo/'}
}
}
dependencies {
...
classpath 'com.huawei.agconnect:agcp:1.3.1.301'
}
allprojects {
repositories {
...
maven {url 'https://developer.huawei.com/repo/'}
}
}
1.3 After integrating the SDK, add configuration to the file header.
apply plugin: 'com.android.application'
apply plugin: 'com.huawei.agconnect'
dependencies{
// Import the base SDK.
implementation 'com.huawei.hms:ml-computer-vision-handkeypoint:2.0.2.300'
// Import the hand keypoint detection model package.
implementation 'com.huawei.hms:ml-computer-vision-handkeypoint-model:2.0.2.300'
}
1.4 Add the following statement to the AndroidManifest.xml file
<meta-data
android:name="com.huawei.hms.ml.DEPENDENCY"
android:value= "handkeypoint"/>
1.5 Apply for camera permission and local file read permission
<!--Camera permission-->
<uses-permission android:name="android.permission.CAMERA" />
<!--Read permission-->
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
2. Code Development
2.1 Create a Surface View for camera preview, and create a Surface View for results.
Currently we only display the results in the UI, you can also use TTS to identify extensions and read the results.
mSurfaceHolderCamera.addCallback(surfaceHolderCallback)
private val surfaceHolderCallback = object : SurfaceHolder.Callback {
override fun surfaceCreated(holder: SurfaceHolder) {
createAnalyzer()
}
override fun surfaceChanged(holder: SurfaceHolder, format: Int, width: Int, height: Int) {
prepareLensEngine(width, height)
mLensEngine.run(holder)
}
override fun surfaceDestroyed(holder: SurfaceHolder) {
mLensEngine.release()
}
}
2.2 Create a hand key point analyzer
//Creates MLKeyPointAnalyzer with MLHandKeypointAnalyzerSetting.
val settings = MLHandKeypointAnalyzerSetting.Factory()
.setSceneType(MLHandKeypointAnalyzerSetting.TYPE_ALL)
.setMaxHandResults(2)
.create()
// Set the maximum number of hand regions that can be detected within an image. A maximum of 10 hand regions can be detected by default
mAnalyzer = MLHandKeypointAnalyzerFactory.getInstance().getHandKeypointAnalyzer(settings)
mAnalyzer.setTransactor(mHandKeyPointTransactor)
2.3 The developer creates the recognition result processing class "HandKeypointTransactor", the MLAnalyzer.MLTransactor<T> interface of this class, and uses the "transactResult" method in this class to obtain the detection results and implement specific services.
class HandKeyPointTransactor(surfaceHolder: SurfaceHolder? = null): MLAnalyzer.MLTransactor<MLHandKeypoints> {
override fun transactResult(result: MLAnalyzer.Result<MLHandKeypoints>?) {
var foundCharacter = findTheCharacterResult(result)
if (foundCharacter.isNotEmpty() && !foundCharacter.equals(lastCharacter)) {
lastCharacter = foundCharacter
displayText.append(lastCharacter)
}
canvas.drawText(displayText.toString(), paddingleft, paddingRight, Paint().also {
it.style = Paint.Style.FILL
it.color = Color.YELLOW
})
}
2.4 Create LensEngine
LensEngine lensEngine = new LensEngine.Creator(getApplicationContext(), analyzer)
setLensType(LensEngine.BACK_LENS)
applyDisplayDimension(width, height) // adjust width and height depending on the orientation
applyFps(5f)
enableAutomaticFocus(true)
create();
2.5 Run LensEngine
private val surfaceHolderCallback = object : SurfaceHolder.Callback {
// run the LensEngine in surfaceChanged()
override fun surfaceChanged(holder: SurfaceHolder, format: Int, width: Int, height: Int) {
createLensEngine(width, height)
mLensEngine.run(holder)
}
}
2.6 Stop the analyzer and release detection resources
fun stopAnalyzer() {
mAnalyzer.stop()
}
2.7 Processing transactResult() to detect characters
You can use the transtresult method in the HandKeypointTransactor class to obtain detection results and implement specific services. In addition to the coordinate information of each key point of the hand, the detection result also includes the palm and the confidence value of each key point. The key point recognition errors of the palm and hand can be filtered out based on the confidence value. In practical applications, the threshold can be flexibly set according to the tolerance of misidentification.
2.7.1 Find the direction of the finger:
Let us first assume that the vector slopes of the possible fingers are on the X and Y axis respectively.
private const val X_COORDINATE = 0
private const val Y_COORDINATE = 1
Suppose we have fingers on 5 vectors, and the direction of any finger can be classified as up, down, down-up, up-down, and motionless at any time.
enum class FingerDirection {
VECTOR_UP, VECTOR_DOWN, VECTOR_UP_DOWN, VECTOR_DOWN_UP, VECTOR_UNDEFINED
}
enum class Finger {
THUMB, FIRST_FINGER, MIDDLE_FINGER, RING_FINGER, LITTLE_FINGER
}
First separate the corresponding key points from the result to the key point arrays of different fingers, like this:
var firstFinger = arrayListOf<MLHandKeypoint>()
var middleFinger = arrayListOf<MLHandKeypoint>()
var ringFinger = arrayListOf<MLHandKeypoint>()
var littleFinger = arrayListOf<MLHandKeypoint>()
var thumb = arrayListOf<MLHandKeypoint>()
Each key point on the finger corresponds to the joint of the finger, and the slope can be calculated by calculating the distance between the joint and the average position value of the finger. According to the coordinates of nearby key points, query the coordinates of the key points.
E.g:
Take the two simple key points of the letter H
int[] datapointSampleH1 = {623, 497, 377, 312, 348, 234, 162, 90, 377, 204, 126, 54, 383, 306, 413, 491, 455, 348, 419, 521 };
int [] datapointSampleH2 = {595, 463, 374, 343, 368, 223, 147, 78, 381, 217, 110, 40, 412, 311, 444, 526, 450, 406, 488, 532};
Use the average of the finger coordinates to calculate the vector
//For ForeFinger - 623, 497, 377, 312
double avgFingerPosition = (datapoints[0].getX()+datapoints[1].getX()+datapoints[2].getX()+datapoints[3].getX())/4;
// find the average and subract it from the value of x
double diff = datapointSampleH1 [position] .getX() - avgFingerPosition ;
//vector either positive or negative representing the direction
int vector = (int)((diff *100)/avgFingerPosition ) ;
The result of the vector will be positive or negative. If it is positive, it will appear in the positive four directions of the X axis, if it is opposite, it will be negative. In this way, all the letters are vector mapped. Once you have mastered all the vectors, we can use them to program.
Using the above vector direction, we can classify the vector and define the first one as the enumeration of finger directions
private fun getSlope(keyPoints: MutableList<MLHandKeypoint>, coordinate: Int): FingerDirection {
when (coordinate) {
X_COORDINATE -> {
if (keyPoints[0].pointX > keyPoints[3].pointX && keyPoints[0].pointX > keyPoints[2].pointX)
return FingerDirection.VECTOR_DOWN
if (keyPoints[0].pointX > keyPoints[1].pointX && keyPoints[3].pointX > keyPoints[2].pointX)
return FingerDirection.VECTOR_DOWN_UP
if (keyPoints[0].pointX < keyPoints[1].pointX && keyPoints[3].pointX < keyPoints[2].pointX)
return FingerDirection.VECTOR_UP_DOWN
if (keyPoints[0].pointX < keyPoints[3].pointX && keyPoints[0].pointX < keyPoints[2].pointX)
return FingerDirection.VECTOR_UP
}
Y_COORDINATE -> {
if (keyPoints[0].pointY > keyPoints[1].pointY && keyPoints[2].pointY > keyPoints[1].pointY && keyPoints[3].pointY > keyPoints[2].pointY)
return FingerDirection.VECTOR_UP_DOWN
if (keyPoints[0].pointY > keyPoints[3].pointY && keyPoints[0].pointY > keyPoints[2].pointY)
return FingerDirection.VECTOR_UP
if (keyPoints[0].pointY < keyPoints[1].pointY && keyPoints[3].pointY < keyPoints[2].pointY)
return FingerDirection.VECTOR_DOWN_UP
if (keyPoints[0].pointY < keyPoints[3].pointY && keyPoints[0].pointY < keyPoints[2].pointY)
return FingerDirection.VECTOR_DOWN
}
}
return FingerDirection.VECTOR_UNDEFINED
Get the direction of each finger and store it in an array.
xDirections[Finger.FIRST_FINGER] = getSlope(firstFinger, X_COORDINATE)
yDirections[Finger.FIRST_FINGER] = getSlope(firstFinger, Y_COORDINATE )
2.7.2 Find the character from the finger direction:
Now we treat it as the only word "HELLO", it needs the letters H, E, L, O. Their corresponding X-axis and Y-axis vectors are shown in the figure.
Assumption:
-
The direction of the hand is always vertical.
-
Make the palm and wrist parallel to the phone, that is 90 degrees to the X axis.
- Keep the posture for at least 3 seconds to record characters.
Start using character map vectors to find strings
// Alphabet H
if (xDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_DOWN_UP
&& xDirections [Finger.RING_FINGER] == FingerDirection.VECTOR_DOWN_UP
&& xDirections [Finger.MIDDLE_FINGER] == FingerDirection.VECTOR_DOWN
&& xDirections [Finger.FIRST_FINGER] == FingerDirection.VECTOR_DOWN
&& xDirections [Finger.THUMB] == FingerDirection.VECTOR_DOWN)
return "H"
//Alphabet E
if (yDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& yDirections [Finger.RING_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& yDirections [Finger.MIDDLE_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& yDirections [Finger.FIRST_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& xDirections [Finger.THUMB] == FingerDirection.VECTOR_DOWN)
return "E"
if (yDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& yDirections [Finger.RING_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& yDirections [Finger.MIDDLE_FINGER] == FingerDirection.VECTOR_UP_DOWN
&& yDirections [Finger.FIRST_FINGER] == FingerDirection.VECTOR_UP
&& yDirections [Finger.THUMB] == FingerDirection.VECTOR_UP)
return "L"
if (xDirections[Finger.LITTLE_FINGER] == FingerDirection.VECTOR_UP
&& xDirections [Finger.RING_FINGER] == FingerDirection.VECTOR_UP
&& yDirections [Finger.THUMB] == FingerDirection.VECTOR_UP)
return "O"
3. Screen and results
4. More tips and tricks
-
When expanded to 26 letters, the error is even greater. In order to scan more accurately, it takes 2-3 seconds to find and calculate the most likely characters from 2-3 seconds, which can reduce the error of the alphabet.
- To support all directions, add 8 or more directions on the XY axis. First, the degree of the finger and the corresponding finger vector are required.
to sum up
This attempt is a powerful coordinate technique. It can be expanded to all 26 letters after generating a vector map, and the direction can also be expanded to all 8 directions, so it will have 26 8 5 fingers = 1040 vectors . In order to better solve this problem, we can use the first derivative function of the finger to replace the vector to simplify the calculation.
Instead of creating vectors, we can enhance others, we can use image classification and training models, and then use custom models. This training is to check the feasibility of using key point processing features in Huawei ML Kit.
For more details, please refer to:
Official website of Huawei Developer Alliance:https://developer.huawei.com/consumer/cn/hms
Obtain development guidance documents:https://developer.huawei.com/consumer/cn/doc/development
To participate in developer discussions, please go to the Reddit community:https://www.reddit.com/r/HMSCore/
To download the demo and sample code, please go to Github:https://github.com/HMS-Core
To solve integration problems, please go to Stack Overflow:https://stackoverflow.com/questions/tagged/huawei-mobile-services?tab=Newest
Original link:
https://developer.huawei.com/consumer/cn/forum/topic/0204423958265820665?fid=18
Author: timer