The front-end JS obtains the real format of the image file

Table of contents

Common way to judge image format

Brief Description of Image Data

JS reads the real format of the picture

Judgment of svg format

Summarize


As mentioned in the previous blog post, the current mainstream browsers can support seven image formats: jpg, png, gif, bmp, ico, webp, and svg. Among them, the first six are bitmaps, and svg is the only vector image. For a detailed introduction, see the introduction to the basic knowledge of pictures in the previous article .

Each format of pictures has its own unique advantages and disadvantages and data structure. The purpose of this blog post is to obtain the real format of pictures based on image binary data in different formats.

Common way to judge image format

When we carry out front-end development and need to process the image upload function and judge the image format, the conventional method is to use the file suffix name to judge, as shown in the following code:

input.addEventListener('change', (e) => {
  const file = e.target.files[0]
  const format = file.name.substring(file.name.lastIndexOf('.') + 1).toLowerCase()
}, false)

The above code listens to the event of the upload control, obtains the file information to be uploaded, obtains the file name, and then intercepts the file suffix name by obtaining the file name, and uses the suffix name as the format of the image file.

This code is familiar to most people. In many scenarios, the image format is judged in this way. However, if we forcibly modify the file extension, this method will fail.

We know that the bit depth of the gif format image is 8. If we force the suffix name of the png format image with a bit depth of 32 to gif, the image file can still be used normally:

As shown in the figure above, the suffix of the png format file is changed to gif, the image system information display format is gif, but the bit depth is still 32, and the image is still in png format in essence.

At this time, it is no longer accurate to judge the format of the image simply by the suffix name. We need another way to obtain the real format of the image file. And this method requires the use of front-end binary-related knowledge, see the introduction to the front-end byte binary and related APIs .

How to modify the suffix name:

Between several bitmap formats, the suffix names can be modified mutually, and the pictures can still be used normally

If the suffix name of the gif animation is changed to other bitmap formats, the animation effect will fail and become a static image.

If the suffix of the bitmap format is changed to a vector image svg, the image will be invalid and cannot be used

If the suffix of the svg image file is changed to a bitmap format, the image will also be unusable

Brief Description of Image Data

The data stored in images of different formats is different, and each has its own special data structure.

According to the different data structures of images in various formats, we can determine the true format of the image through the image data in the type array.

  • Such as jpg format, in its image data structure, the first two bytes are a fixed value 0xFFD8 , and the third byte is generally fixed 0xFF .
  • For example, in the png format, in its image data structure, the first 8 bytes are the PNG file signature field, which can well identify the current image format as PNG.
  • For example, in the bmp format, in its image data structure, the first 14 bytes store the file header information, and the first two bytes store the file type: BM .
  • For example, in the webp format, after moving 8 bytes from the front, get the information of the next 4 bytes, representing the file type: WEBP .

For data judgment of different bitmaps, you can use the methods listed in the following table:

Format number of bytes identified corresponding decimal value Offset
jpg 3 255 216 255 0
png 8 137 80 78 71 13 10 26 10 0
gif 3 71, 73, 70 0
webp 4 87, 69, 66, 80 8
ico 4 0, 0, 1, 0 0
bmp 2 66 77 0

Among them, the offset is 0, which means to get the data of the first few bytes; the offset of webp is 8, which means to get the identifier of 4 bytes after moving 8 bytes from the front.

The above table has listed the bitmap images supported by the current browser, the byte judgment flag, and the real format can be obtained by reading the corresponding data for comparison.

Among the above formats, the data obtained by bmp, gif, and webp can all correspond to their unique signatures. BM  and WEBP are mentioned above , and the gif format is GIF . You can use the knowledge of character encoding, such as using the String.fromCharCode  method to convert values. For specific front-end character encoding knowledge, see the character encoding ASCII, Unicode, UTF8, UTF16, etc. that the front-end needs to understand .

// bmp
String.fromCharCode(66) // B
String.fromCharCode(77) // M

// gif
String.fromCharCode(71) // G
String.fromCharCode(73) // I
String.fromCharCode(70) // F

// webp
String.fromCharCode(87) // W
String.fromCharCode(69) // E
String.fromCharCode(66) // B
String.fromCharCode(80) // P

The signature identifier in gif format is processed together with the version number, generally the first 6 bytes are identified: 'G', 'I', 'F', '8', '7(9)', 'a' . The fifth byte can take the value 7 or 9, representing two different versions, namely the 1987 version and the 1989 version.

JS reads the real format of the picture

After we understand the front-end binary related knowledge, we should know that image files can also read the corresponding data through WebAPI objects:

const reader = new FileReader()
reader.onload = () => {
  const imgArrayBuffer = reader.result
  const imgUint8Array = new Uint8Array(imgArrayBuffer)
}
reader.readAsArrayBuffer(file)

The above code is to read the data of the file through the FileReader  object, here it is  read as an ArrayBuffer , and then it can be converted into a type array for processing.

After reading the Uint8Array  type array data of the picture file, according to the format byte data identification mentioned in the above table, we take jpg, bmp and webp as examples:

imgUint8Array[0] === 66 && imgUint8Array[1] === 77 // bmp 格式
imgUint8Array[0] === 255 && imgUint8Array[1] === 216 && imgUint8Array[3] === 255 // jpg 格式
imgUint8Array[8] === 87 && imgUint8Array[9] === 69 && imgUint8Array[10] === 66 && imgUint8Array[10] === 80 // webp 格式

At this point, you can use this method to read the real format of the picture. Part of the judgment code is as follows:

// 各格式对应图像数据的标识数值
const IMAGEFORMATS = [
  { ext: 'png', data: [137, 80, 78, 71, 13, 10, 26, 10] },
  { ext: 'jpg', data: [255, 216, 255] },
  { ext: 'gif', data: [71, 73, 70] },
  { ext: 'ico', data: [0, 0, 1, 0] },
  { ext: 'bmp', data: [66, 77] },
  { ext: 'webp', data: [87, 69, 66, 80], offset: 8 }
]

// 循环判断文件是否符合某个格式对应的标识数值
for (let i = 0; i < IMAGEFORMATS.length; i++) {
  const { data, offset, ext } = IMAGEFORMATS[i]
  if (isEqualFormatPrefix(imgUint8Array, data, offset)) {
    return ext
  }
}

However, the above methods are mainly for bitmaps. If it is an svg image, it will be a little more complicated and needs to be processed separately.

Judgment of svg format

The svg format picture is a vector diagram, and the corresponding data is generally described using the xml markup language. Therefore, after we read the image data, the corresponding logo signature is required to be <svg . If the corresponding image data has this logo, it can be roughly judged to be a picture in the svg format.

<The svg  logo has 4 symbols and letters, and the corresponding values ​​are: 60, 115, 118, 103. Next, I need to determine whether the image file has the same data.

imgUint8Array[0] === 60 && imgUint8Array[1] === 115 && imgUint8Array[3] === 118 && imgUint8Array[3] === 103 // svg 格式

The above code is simply to judge the svg format.

However, for our general svg pictures, the image data initially contains the <?xm  tag of the xml markup language. At this time, we judge according to the format:

if (isEqualFormatPrefix(fileUint8Array, [60, 63, 120, 109], offset)) { // 判断是否以 <?xm 开头
  if (isHasSignCodes(fileUint8Array, [60, 115, 118, 103])) { // 判断是否包含 <svg 标签
    return'svg'
  }
}

Note: The above judgment method for vector graphics in svg format is  judged by the tag symbols of the xml markup language, and can only process forged image files by changing the suffix name. When we forge a fake file that contains the <svg  tag, we can escape this judgment.

Summarize

Among the image formats supported by browsers, except for svg, several other bitmap formats can better judge the real format of the image file by reading the binary data of the image, which can prevent file forgery from bypassing the judgment, causing unnecessary abnormalities and other problems.

JavaScript obtains the real format of the picture file, and the complete code can be found at: Get the complete code of the real format of the picture file .

Guess you like

Origin blog.csdn.net/jimojianghu/article/details/127902857