JNI/NDK Development Guide (4) - String Processing

From Chapter 3 , it can be seen that the basic types in JNI and the basic types in Java are in one-to-one correspondence. Next, let's take a look at the basic type definitions of JNI:

typedef unsigned char   jboolean;
typedef unsigned short  jchar;
typedef short           jshort;
typedef float           jfloat;
typedef double          jdouble;
typedef int jint;
#ifdef _LP64 /* 64-bit Solaris */
typedef long jlong;
#else
typedef long long jlong;
#endif

typedef signed char jbyte;

The basic type is easy to understand, that is, the basic type in C/C++ is redefined with a new name with typedef, which can be directly accessed in JNI.

JNI passes all objects in Java to native methods as a C pointer that points to an internal data structure in the JVM, and the way the internal data structure is stored in memory is invisible. Only appropriate JNI functions can be selected from the function table pointed to by the JNIEnv pointer to manipulate data structures in the JVM. In the example in Chapter 3 , when accessing the JNI type jstring corresponding to java.lang.String, it is not directly used like accessing the basic data type, because it is a reference type in Java, so it can only be used in native code through GetStringUTFChars such as JNI functions to access the contents of strings.

Let's take a look at an example:

Sample.java:

package com.study.jnilearn;

public class Sample {
	
	public native static String sayHello(String text);

	public static void main(String[] args) {
		String text = sayHello("yangxin");
		System.out.println("Java str: " + text);
	}
	
	static {
		System.loadLibrary("Sample");
	}
}

com_study_jnilearn_Sample.h和Sample.c:

/* DO NOT EDIT THIS FILE - it is machine generated */
#include <jni.h>
/* Header for class com_study_jnilearn_Sample */

#ifndef _Included_com_study_jnilearn_Sample
#define _Included_com_study_jnilearn_Sample
#ifdef __cplusplus
extern "C" {
#endif
/*
 * Class:     com_study_jnilearn_Sample
 * Method:    sayHello
 * Signature: (Ljava/lang/String;)Ljava/lang/String;
 */
JNIEXPORT jstring JNICALL Java_com_study_jnilearn_Sample_sayHello
  (JNIEnv *, jclass, jstring);

#ifdef __cplusplus
}
#endif
#endif

// Sample.c
#include "com_study_jnilearn_Sample.h"
/*
 * Class:     com_study_jnilearn_Sample
 * Method:    sayHello
 * Signature: (Ljava/lang/String;)Ljava/lang/String;
 */
JNIEXPORT jstring JNICALL Java_com_study_jnilearn_Sample_sayHello
  (JNIEnv *env, jclass cls, jstring j_str)
{
	const char *c_str = NULL;
	char buff[128] = {0};
	jboolean isCopy;	// 返回JNI_TRUE表示原字符串的拷贝,返回JNI_FALSE表示返回原字符串的指针
	c_str = (*env)->GetStringUTFChars(env, j_str, &isCopy);
	printf("isCopy:%d\n",isCopy);
	if(c_str == NULL)
	{
		return NULL;
	}
	printf("C_str: %s \n", c_str);
	sprintf(buff, "hello %s", c_str);
	(*env)->ReleaseStringUTFChars(env, j_str, c_str);
	return (*env)->NewStringUTF(env,buff);
}

The results are as follows:

Example parsing:

1> access string

The sayHello function receives a jstring type parameter text, but the jstring type points to a string inside the JVM, which is different from the C-style string type char*, so the jstring cannot be used as a normal C string in JNI. , you must use the appropriate JNI functions to access string data structures inside the JVM.

GetStringUTFChars(env, j_str, &isCopy) parameter description:

env: JNIEnv function table pointer

j_str: jstring type (a string pointer that Java passes to native code)

**isCopy:** The values ​​are JNI_TRUE and JNI_FALSE. If the value is JNI_TRUE, it means that a copy of the source string inside the JVM is returned and memory space is allocated for the newly generated string. If the value is JNI_FALSE, it means that the pointer to the source string inside the JVM is returned, which means that the content of the source string can be modified through the pointer. This is not recommended, because doing so breaks the rule that Java strings cannot be modified. But we don't care what this value is during development. Usually, this parameter can be filled with NULL.

Because Java uses Unicode encoding by default, and C/C++ uses UTF encoding by default, when manipulating strings in native code, you must use appropriate JNI functions to convert jstrings to C-style strings. JNI supports the conversion of strings between Unicode and UTF-8 encodings. GetStringUTFChars can convert a jstring pointer (pointing to the Unicode character sequence inside the JVM) into a C string in UTF-8 format. In the above example, in the sayHello function, we correctly obtained the string content inside the JVM through GetStringUTFChars.

2> Exception checking

Don't forget the security check after calling GetStringUTFChars, because the JVM needs to allocate memory space for the newly born string. When the memory space is not enough, the call will fail. After the failure, GetStringUTFChars will return NULL and throw an OutOfMemoryError exception. The exception handling process in JNI is different from the exception handling process in Java. If Java encounters an exception, if it is not caught, the program will stop running immediately. However, when JNI encounters a pending exception, it will not change the running process of the program, that is, the program will continue to go down, so all subsequent operations on this string are very dangerous. Therefore, we need to use the return statement to skip the back. code and immediately ends the current method.

3> release the string

After calling the GetStringUTFChars function to obtain a string from inside the JVM, a new memory is allocated inside the JVM to store a copy of the source string for native code access and modification. Now that there is memory allocation, it is a good programming practice to release it immediately after use. Inform the JVM that this memory is no longer in use by calling the ReleaseStringUTFChars function, and you can clear it. Note: These two functions are used in pairs. When GetXXX is used, ReleaseXXX must be called, and the names of these two functions are also regular. Except for the previous Get and Release, the latter are the same.

4> Create a string

By calling the NewStringUTF function, a new java.lang.String string object is constructed. This newly created string is automatically converted to the Unicode encoding supported by Java. If the JVM cannot allocate enough memory to construct the java.lang.String, NewStringUTF will throw an OutOfMemoryError and return NULL. In this example we don't have to check its return value, if NewStringUTF fails to create java.lang.String, OutOfMemoryError will be thrown in Sample.main method. If NewStringUTF creates a java.lang.String successfully, returns a JNI reference to the newly created java.lang.String object.

Other string processing functions:

**1> GetStringChars and ReleaseStringChars:** This pair of functions is similar to the Get/ReleaseStringUTFChars function and is used to get and release strings encoded in Unicode format. The latter is used to acquire and release UTF-8 encoded strings.

**2> GetStringLength: ** Since UTF-8 encoded strings end in '\0', Unicode strings are not. If you want to get the length of a jstring that points to the Unicode encoding, you can get it through this function in JNI.

**3> GetStringUTFLength: **Get the length of the UTF-8 encoded string, which can also be obtained by the standard C function strlen

**4> GetStringCritical and ReleaseStringCritical: ** Improve the possibility that the JVM returns a direct pointer to the source string

The source strings returned by the pair of functions Get/ReleaseStringChars and Get/ReleaseStringUTFChars will allocate memory later. If there is a string whose content is quite large, about 1M, and you only need to read the content and print it out, you can use these two pairs of functions. Some are not suitable. At this time, it is a more appropriate way to use Get/ReleaseStringCritical to directly return the pointer of the source string. However, this has a big restriction on functions, the native code between these two functions cannot call any native functions or JNI functions that will cause the thread to block or wait for other threads in the JVM. Because what is obtained through GetStringCritical is a direct pointer to the internal string of the JVM, obtaining this direct pointer will cause the GC thread to be suspended. When the GC is suspended, if other threads trigger the GC to continue running, it will cause the caller to be blocked. So any native code in the middle of the Get/ReleaseStringCritical pair cannot execute calls that cause blocking or allocate memory for new objects in the JVM, otherwise, the JVM may deadlock. In addition, you must remember to check whether its return value is NULL due to memory overflow, because when the JVM executes the GetStringCritical function, there is still the possibility of data copying, especially when the array stored inside the JVM is discontinuous, in order to Returns a pointer to a contiguous memory space where the JVM must copy all data. The following code demonstrates the correct usage of this pair of functions:

JNIEXPORT jstring JNICALL Java_com_study_jnilearn_Sample_sayHello
  (JNIEnv *env, jclass cls, jstring j_str)
{
	const jchar* c_str= NULL;
	char buff[128] = "hello ";
	char* pBuff = buff + 6;
	/*
	 * 在GetStringCritical/RealeaseStringCritical之间是一个关键区。
	 * 在这关键区之中,绝对不能呼叫JNI的其他函数和会造成当前线程中断或是会让当前线程等待的任何本地代码,
	 * 否则将造成关键区代码执行区间垃圾回收器停止运作,任何触发垃圾回收器的线程也会暂停。
	 * 其他触发垃圾回收器的线程不能前进直到当前线程结束而激活垃圾回收器。
	 */
	c_str = (*env)->GetStringCritical(env,j_str,NULL);	// 返回源字符串指针的可能性
	if (c_str == NULL)	// 验证是否因为字符串拷贝内存溢出而返回NULL
	{
		return NULL;
	}
	while(*c_str) 
	{
		*pBuff++ = *c_str++;
	}
	(*env)->ReleaseStringCritical(env,j_str,c_str);
	return (*env)->NewStringUTF(env,buff);
}

There is no such function as Get/ReleaseStringUTFCritical in JNI, because the encoding conversion is likely to cause the JVM to copy the data, because the strings represented by the JVM are encoded in Unicode.

**5> GetStringRegion and GetStringUTFRegion: ** means to get the content within the specified range of Unicode and UTF-8 encoded strings, respectively. This pair of functions will copy the source string into a pre-allocated buffer. The following code reimplements the sayHello function using GetStringUTFRegion:

JNIEXPORT jstring JNICALL Java_com_study_jnilearn_Sample_sayHello
  (JNIEnv *env, jclass cls, jstring j_str)
{
	jsize len = (*env)->GetStringLength(env,j_str);	// 获取unicode字符串的长度
	printf("str_len:%d\n",len);
	char buff[128] = "hello ";
	char* pBuff = buff + 6;
	// 将JVM中的字符串以utf-8编码拷入C缓冲区,该函数内部不会分配内存空间
	(*env)->GetStringUTFRegion(env,j_str,0,len,pBuff);
	return (*env)->NewStringUTF(env,buff);
}

The GetStringUTFRegion function will perform an out-of-bounds check. If the check finds that it is out of bounds, a StringIndexOutOfBoundsException will be thrown. This method is similar to GetStringUTFChars. The difference is that GetStringUTFRegion does not allocate memory internally and will not throw a memory overflow exception.

Note: Since the functions of GetStringUTFRegion and GetStringRegion do not allocate memory internally, JNI does not provide functions such as ReleaseStringUTFRegion and ReleaseStringRegion.

String manipulation summary:

1. For small strings, the pair of functions GetStringRegion and GetStringUTFRegion are the best choices, because the buffer can be allocated in advance by the compiler, and an out-of-memory exception will never be generated. It's also good to use this pair of functions when you need to process part of a string. Because they provide a starting index and a substring length value. In addition, the cost of copying a small number of strings is also very small.

2. Be very careful when using the pair of GetStringCritical and ReleaseStringCritical functions. Make sure that native code doesn't allocate new objects inside the JVM or make any other blocking calls that might cause a system deadlock while holding a pointer obtained by GetStringCritical

3. To get the Unicode string and length, use the GetStringChars and GetStringLength functions

4. To get the length of the UTF-8 string, use the GetStringUTFLength function

5. To create a Unicode string, use the NewStringUTF function

6. Convert from Java string to C/C++ string using GetStringUTFChars function

7. Get strings through GetStringUTFChars, GetStringChars, GetStringCritical, these functions will allocate memory internally, you must call the corresponding ReleaseXXXX function to release the memory

Sample code download address: https://code.csdn.net/xyang81/jnilearn

 

Guess you like

Origin blog.csdn.net/shangsongwww/article/details/122493134