(A) describes the character encoding

1.1 ASCII code

We know that inside the computer, all the information is ultimately represented as a binary string. Each binary digit (bit) has two states 0 and 1, so the eight bits can be combined out of 256 states, which is called a byte (byte). in other words, a byte can be used to represent a total of 256 different states, each state corresponds to a symbol, that is 256 symbols, from 0000000 to 11111111.
the 1960s, United States developed a set of character encoding, the relationship between English characters and bits, made uniform regulations. this is called ASCII code, has been in use ever since.
ASCII encoding code provides for a total of 128 characters, such as spaces "sPACE "32 (binary 00100000), the capital letter a is 65 (binary 01000001). this is 128 symbols (including 32 control symbols can not be printed out), only it takes a byte 7 behind the foremost 1 bit 0 is uniform regulations.

1.2 Non-ASCII encoding

English with 128 symbol encoding enough, but to represent other languages, 128 symbols is not enough. For example, in French, there is phonetic symbols above the letters, it can not be represented in ASCII. Therefore, some European countries decided to use most significant byte of idle incorporated into the new symbol. for example, the French é coded as 130 (binary 10000010). as a result, the coding system used by European countries, may represent up to 256 symbols.
However, here again there is a new problem. different countries have different letters, therefore, even if they are using the encoding 256 symbols, represented by the letter is not the same. For example, 130 represents the é in French coding, in it represents the coding Hebrew letter Gimel (ג), in Russian encoding symbols will on behalf of another.

NOTE:
. But in any case, all these codes, the symbol represents 0-127 is the same, not the same as this period is only 128-255 // MMMMM
As for text Asian countries, symbols used even more, as many as 10 million Chinese characters. a byte can only represent 256 kinds of symbols, is definitely not enough, you must use multiple bytes express a symbol. For example, Simplified Chinese common encoding is GB2312, use two bytes It represents a character, so in theory can represent up to 256x256 = 65536 symbols.

The above reference from this blog https://blog.csdn.net/tge7618291/article/details/7599902 tge7618291 of.

(Ii) transcoding

Simplified Chinese is GBK2312 format, to simplified characters show up, you need to convert GBK format to Unicode format, then Unicode format into utf-8 format, and finally get it to show up.
GBK turn converts Unicode:

unsigned short zz_gbk2uni(unsigned char ch, unsigned char cl)
{
    ch -= 0x81;
    cl -= 0x40;
    return (ch<=0x7d && cl<=0xbe) ? mb_gb2uni_table[ch*0xbf+cl] : 0x1fff;
}

Unicode turn UTF-8:

/***************************************************************************** 
 * 将一个字符的Unicode(UCS-2和UCS-4)编码转换成UTF-8编码. 
 * 
 * 参数: 
 *    unic     字符的Unicode编码值 
 *    pOutput  指向输出的用于存储UTF8编码值的缓冲区的指针 
 *    outsize  pOutput缓冲的大小 
 * 
 * 返回值: 
 *    返回转换后的字符的UTF8编码所占的字节数, 如果出错则返回 0 . 
 * 
 * 注意: 
 *     1. UTF8没有字节序问题, 但是Unicode有字节序要求; 
 *        字节序分为大端(Big Endian)和小端(Little Endian)两种; 
 *        在Intel处理器中采用小端法表示, 在此采用小端法表示. (低地址存低位) 
 *     2. 请保证 pOutput 缓冲区有最少有 6 字节的空间大小! 
 ****************************************************************************/  
int enc_unicode_to_utf8_one(unsigned long unic, unsigned char *pOutput,  
        int outSize)  
{  
    if(pOutput == NULL)
        return 0;  

    if(outSize < 6)
      return 0;  

    if ( unic <= 0x0000007F )  
    {  
        // * U-00000000 - U-0000007F:  0xxxxxxx  
        *pOutput     = (unic & 0x7F);  
        return 1;  
    }  
    else if ( unic >= 0x00000080 && unic <= 0x000007FF )  
    {  
        // * U-00000080 - U-000007FF:  110xxxxx 10xxxxxx  
        *(pOutput+1) = (unic & 0x3F) | 0x80;  
        *pOutput     = ((unic >> 6) & 0x1F) | 0xC0;  
        return 2;  
    }  
    else if ( unic >= 0x00000800 && unic <= 0x0000FFFF )  
    {  
        // * U-00000800 - U-0000FFFF:  1110xxxx 10xxxxxx 10xxxxxx  
        *(pOutput+2) = (unic & 0x3F) | 0x80;  
        *(pOutput+1) = ((unic >>  6) & 0x3F) | 0x80;  
        *pOutput     = ((unic >> 12) & 0x0F) | 0xE0;  
        return 3;  
    }  
    else if ( unic >= 0x00010000 && unic <= 0x001FFFFF )  
    {  
        // * U-00010000 - U-001FFFFF:  11110xxx 10xxxxxx 10xxxxxx 10xxxxxx  
        *(pOutput+3) = (unic & 0x3F) | 0x80;  
        *(pOutput+2) = ((unic >>  6) & 0x3F) | 0x80;  
        *(pOutput+1) = ((unic >> 12) & 0x3F) | 0x80;  
        *pOutput     = ((unic >> 18) & 0x07) | 0xF0;  
        return 4;  
    }  
    else if ( unic >= 0x00200000 && unic <= 0x03FFFFFF )  
    {  
        // * U-00200000 - U-03FFFFFF:  111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx  
        *(pOutput+4) = (unic & 0x3F) | 0x80;  
        *(pOutput+3) = ((unic >>  6) & 0x3F) | 0x80;  
        *(pOutput+2) = ((unic >> 12) & 0x3F) | 0x80;  
        *(pOutput+1) = ((unic >> 18) & 0x3F) | 0x80;  
        *pOutput     = ((unic >> 24) & 0x03) | 0xF8;  
        return 5;  
    }  
    else if ( unic >= 0x04000000 && unic <= 0x7FFFFFFF )  
    {  
        // * U-04000000 - U-7FFFFFFF:  1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx  
        *(pOutput+5) = (unic & 0x3F) | 0x80;  
        *(pOutput+4) = ((unic >>  6) & 0x3F) | 0x80;  
        *(pOutput+3) = ((unic >> 12) & 0x3F) | 0x80;  
        *(pOutput+2) = ((unic >> 18) & 0x3F) | 0x80;  
        *(pOutput+1) = ((unic >> 24) & 0x3F) | 0x80;  
        *pOutput     = ((unic >> 30) & 0x01) | 0xFC;  
        return 6;  
    }  
  
    return 0;  
}

(C) image generation Chinese

Content and on the introduction of a similar, but a Chinese occupies two bytes, two bytes in a word processing can be a normal show.

/************************************************************
*Copyright (C),lcb0281at163.com lcb0281atgmail.com
*BlogAddr: caibiao-lee.blog.csdn.net
*FileName: debug_font_osd.c
*Description:字符和文字生成图片
*Date:     2020-02-03
*Author:   Caibiao Lee
*Version:  V1.0
*Others:
*History:
***********************************************************/
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include "SDL/SDL.h"
#include "SDL/SDL_ttf.h"
#include "debug_font_osd.h"

#define CHINESET_STRING "阿标在学习中"

#define FONT_PATH       "./font/hisi_osd.ttf"

int string_to_bmp(char *pu8Str)
{
    SDL_PixelFormat *fmt;
    TTF_Font *font;  
    SDL_Surface *text, *temp;  

    if (TTF_Init() < 0 ) 
    {  
        fprintf(stderr, "Couldn't initialize TTF: %s\n",SDL_GetError());  
        SDL_Quit();
    }  

    font = TTF_OpenFont(FONT_PATH, 80); 
    if ( font == NULL ) 
    {  
        fprintf(stderr, "Couldn't load %d pt font from %s: %s\n",18,"ptsize", SDL_GetError());  
    }  

    SDL_Color forecol = { 0xff, 0xff, 0xff, 0xff };  
    text = TTF_RenderUTF8_Solid(font, pu8Str, forecol);

    fmt = (SDL_PixelFormat*)malloc(sizeof(SDL_PixelFormat));
    memset(fmt,0,sizeof(SDL_PixelFormat));
    fmt->BitsPerPixel = 16;
    fmt->BytesPerPixel = 2;
    fmt->colorkey = 0xffffffff;
    fmt->alpha = 0xff;

    temp = SDL_ConvertSurface(text,fmt,0);
    SDL_SaveBMP(temp, "save.bmp"); 

    SDL_FreeSurface(text);  
    SDL_FreeSurface(temp);
    TTF_CloseFont(font);  
    TTF_Quit();  

    return 0;
}

int CreateTimeBmpPicture(void)
{
    time_t     l_stTime;
    struct tm  l_stTm;
    struct tm *l_pstTm=&l_stTm;
    char s8Contenx[128]={0};

    time(&l_stTime);
    localtime_r(&l_stTime,l_pstTm); 
    snprintf(s8Contenx,sizeof(s8Contenx), "20%02d-%02d-%02d-%02d:%02d:%02d",\
        (l_pstTm->tm_year-100), (1+l_pstTm->tm_mon), l_pstTm->tm_mday,\
            l_pstTm->tm_hour, l_pstTm->tm_min, l_pstTm->tm_sec);

    printf("string: %s \n",s8Contenx);
    string_to_bmp(s8Contenx);        
}

int CreateChinesePicture(void)
{
    int  i = 0;
    char l_s32Len = 0;
    char l_arrs8Str[64] = {0};
    char l_arrs8UTFBuf[64] = {0};
    char l_arrss8Contenx[64] = {0};
    unsigned short usUnicode=0;
    
    unsigned int usUtfLen=0;
    unsigned int u32ContenxOffest=0; 
    
    snprintf(l_arrs8Str,sizeof(l_arrs8Str),"%s",CHINESET_STRING);
    
    l_s32Len = strlen(l_arrs8Str);

    printf(" len = %d \n",l_s32Len);

    for(i=0;i<l_s32Len;)
    {
        usUnicode=zz_gbk2uni((unsigned char)l_arrs8Str[i++],(unsigned char)l_arrs8Str[i++]);
        usUtfLen= enc_unicode_to_utf8_one(usUnicode,l_arrs8UTFBuf,64);
        if(usUtfLen<0)
        {
            printf("%s %d out len error \n",__FUNCTION__,__LINE__);
            break;
        };

        memcpy(&l_arrss8Contenx[u32ContenxOffest],l_arrs8UTFBuf,usUtfLen);
        
        u32ContenxOffest+=usUtfLen;
    }

    string_to_bmp(l_arrss8Contenx);

    return 0;

}

int main(void)
{
    printf("hello world \n");
    //CreateTimeBmpPicture();

    CreateChinesePicture();
    return 0;
}

Project file structure:

biao@ubuntu:~/nfs/OSD/font$ tree -L 2
.
├── bin
│   └── objs
├── debug_font_osd.c
├── debug_font_osd.h
├── font
│   ├── hisi_osd.ttf
│   └── hisi_osd.ttf_df
├── GBK_To_Unicode.c
├── GBK_To_Unicode.h
├── inc
│   ├── freetype2
│   ├── ft2build.h
│   └── SDL
├── lib
│   ├── libfreetype.a
│   ├── libfreetype.so
│   ├── libfreetype.so.6
│   ├── libSDL-1.2.so.0
│   ├── libSDL.a
│   ├── libSDLmain.a
│   ├── libSDL.so
│   ├── libSDL_ttf-2.0.so.0
│   ├── libSDL_ttf.a
│   ├── libSDL_ttf.so
│   └── pkgconfig
├── Makefile
├── save.bmp
└── test

8 directories, 20 files
biao@ubuntu:~/nfs/OSD/font$

The resulting picture is displayed as:

If you need to add Chinese characters to the video stream, you can refer to the content of a blog.

The first generation of engineering time image can be obtained from the following:

GitHub： freetype_SDL_Dl_ttf_debug

CSDN : freetype_SDL_Dl_ttf_debug.tar.gz

Complete project can go to the region "catalog preface" address provided to acquire

The first article in this column "catalog preface," lists the complete directory column, read by directory order to help your understanding.

li_wen01 blog expert

Published 175 original articles · won praise 262 · views 700 000 +

His message board concerns

Hass Multimedia (MPP) Development (6) - Regional Management (REGION & OSD Chinese display)

(A) describes the character encoding

1.1 ASCII code

1.2 Non-ASCII encoding

(Ii) transcoding

(C) image generation Chinese

Guess you like