Climbing experience: problems encountered by C# calling C++ DLL (LoadLibrary)

What happened: The two projects of .net core and .net framework have been using the LoadLibraryA method to load Dlls, and then inexplicably there are user feedbacks that the function is abnormal.

The previous way of writing:
use LoadLibraryA
The revised way of writing:
use LoadLibrary

Personal summary: I was not very clear about the concept of wide characters and multi-characters, as well as LoadLibrary, LoadLibraryA, and LoadLibraryW, which led to the wrong writing of the calling code.

Knowledge summary:

  1. LoadLibrary and LoadLibraryA are actually the same function, just an alias between the ANSI (LoadLibraryA) and Unicode (LoadLibraryW) versions. Windows automatically chooses which one to use based on your macro definitions.

  2. By default, the character encoding of Windows is Unicode, so LoadLibrary defaults to calling LoadLibraryW, the wide character version.

  3. If your DLL pathname contains only ASCII characters, you can safely use LoadLibraryA. Otherwise, you should use LoadLibrary, which will go to the correct function based on your macro definition.

If you really need to call these APIs, you can use it like this in C#:

using System.Runtime.InteropServices;

public class Win32Native {
    
    
    [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
    public static extern IntPtr LoadLibrary(string libname);

    [DllImport("kernel32.dll", CharSet = CharSet.Auto, SetLastError = true)]
    public static extern bool FreeLibrary(IntPtr hModule);
}

In this example, CharSet = CharSet.Autothe runtime is instructed to automatically choose between LoadLibraryA or LoadLibraryW.

4. In .NET Core and .NET Framework, strings use Unicode encoding by default. Each character of the string type (string) uses 16-bit (2-byte) Unicode character encoding by default. So we can say that the default is wide character encoding.

This behavior is independent of the type of project created (such as WPF, ASP.NET, console application, etc.), it is a feature of the C# language and the .NET runtime. It does not use the concepts of wide and multibyte characters in C++.

5...In .NET Core or .NET Framework, if you need to load and use C++ DLL files, you will need to use P/Invoke (Platform Invocation Services) to call unmanaged code. For the LoadLibrary function, you can use it as follows:

using System.Runtime.InteropServices;

public class Win32Native {
    
    
    [DllImport("kernel32.dll", CharSet = CharSet.Unicode, SetLastError = true)]
    public static extern IntPtr LoadLibrary(string libname);

    [DllImport("kernel32.dll", SetLastError = true)]
    public static extern bool FreeLibrary(IntPtr hModule);
}

Here, DllImportis the method .NET uses to declare the unmanaged DLL to be called. CharSet = CharSet.UnicodeTell the runtime that you expect the Unicode version of the function (ie LoadLibraryW), this is because .NET uses Unicode character encoding by default.

Therefore, in the .NET environment, you should use LoadLibrary directly, and set CharSet = CharSet.Unicode, instead of using LoadLibraryA or LoadLibraryW.

6. Wide character encoding and multi-byte encoding are two types of character encoding, which are mainly used when processing characters. These two encoding methods are mainly used to deal with multilingual environments, because different languages ​​may require different numbers of bytes to represent a character.
Wide character encoding : Wide character encoding is a fixed-length character encoding scheme. In this scheme, each character is represented using the same number of bytes. For example, UTF-16 and UTF-32 are wide character encodings where each character uses 2 bytes and 4 bytes respectively. The advantage of this encoding scheme is that it is easy to process and parse because each character is a fixed length. However, this can also lead to wasted space, since some characters may not require as many bytes to represent. Multibyte encoding : Multibyte encoding is a variable-length character encoding scheme. In this scheme, characters can be represented by one or more bytes. For example, UTF-8 is a multi-byte encoding in which a character can be represented by 1 to 4 bytes. The advantage of this encoding scheme is space efficiency, since it can represent a character using the minimum number of bytes necessary. However, handling and parsing characters in this encoding can be more complicated because each character has a variable length. In a multilingual programming environment, UTF-8 encoding is usually the preferred encoding scheme because it is ASCII-compatible and can handle characters from various languages ​​efficiently.

If the article is wrong, welcome to correct me!

Guess you like

Origin blog.csdn.net/weixin_38428126/article/details/131409609