How does the front end render HTML strings safely?

Dynamically generating and rendering HTML strings is a common requirement in modern web applications. However, rendering HTML strings incorrectly can lead to security vulnerabilities such as cross-site scripting (XSS). In order to ensure the security of the application, we need to take some measures to render HTML strings in a safe environment. This article will introduce some best practices for safely rendering HTML strings to help you effectively avoid potential security risks.

Table of contents

Common rendering methods
- HTML
- React
- Vue
- Angular
HTML Sanitizer API
- what is it
- how to use?
- customize
- browser support
third party library
- DOMPurify
- js-xss
- sanitize-html

1. Common rendering methods

Let's first look at how to render HTML strings in HTML, React, Vue, Angular.

HTML

To render HTML strings in HTML, you can use native JavaScript innerHTMLattributes or create element nodes and use appendChild()methods.

Using innerHTMLattributes: You can render HTML strings by getting the target element to render HTML and assigning innerHTMLHTML strings to its attributes. For example:

<div id="targetElement"></div>

<script>
  const htmlString = "<h1>Hello, World!</h1>";
  document.getElementById("targetElement").innerHTML = htmlString;
</script>

This will <div id="targetElement"></div>be rendered internally <h1>Hello, World!</h1>.

Create element nodes and appendChild()methods: You can use document.createElement()methods to create element nodes and appendChild()methods to add the node to a parent element. For example:

<div id="targetElement"></div>

<script>
  const htmlString = "<h1>Hello, World!</h1>";
  const parentElement = document.getElementById("targetElement");
  const tempElement = document.createElement("div");
  tempElement.innerHTML = htmlString;

  while (tempElement.firstChild) {
    
    
    parentElement.appendChild(tempElement.firstChild);
  }
</script>

This will <div id="targetElement"></div>be rendered internally <h1>Hello, World!</h1>.

React

dangerouslySetInnerHTMLHTML strings can be rendered in React by using attributes. However, as the name of this attribute says, it has security risks, HTML will not be escaped, which may cause XSS problems, so please use it with caution.

import React from 'react';

const MyComponent = () => {
    
    
  const htmlString = '<p>Hello, <strong>React</strong>!</p>';

  return (
    <div dangerouslySetInnerHTML={
    
    {
    
     __html: htmlString }} />
);
}

export default MyComponent;

Here the HTML string to be rendered is stored in htmlStringa variable and passed to the dangerouslySetInnerHTMLattribute's __htmlproperty. React will insert that string as HTML content into the rendered component.

Vue

v-htmlHTML strings can be rendered in Vue using directives. Similar to using it in React dangerouslySetInnerHTML, v-htmlyou need to be careful when using it.

<template>
  <div v-html="htmlString"></div>
</template>

<script>
export default {
    
    
  data() {
    
    
    return {
    
    
      htmlString: '<p>Hello, <strong>Vue</strong>!</p>',
    };
  },
};
</script>

Here the HTML string to be rendered is stored in htmlStringand v-htmlbound to the element that needs to be rendered through the directive (here <div>). Vue will htmlStringparse the string in HTML and insert it into the rendered element.

Angular

[innerHTML]HTML strings can be rendered in Angular using attributes.

<div [innerHTML]="htmlString"></div>

Here the HTML string to be rendered is stored in htmlStringa variable named and bound to [innerHTML]a property. Angular will htmlStringparse the string in to HTML and insert it into the corresponding DOM node.

Similar to other frameworks, [innerHTML]special care should be taken when using property binding. Ensure that the rendered HTML string is reliable and safe, avoid getting HTML string directly from user input or untrusted sources to prevent security issues such as XSS attacks.

In addition, Angular also provides some built-in security mechanisms to help protect applications from security threats. For example, you can improve your app's security by using Angular's built-in plumbing such as DomSanitizerescaping and validating HTML strings.

import {
    
     Component } from '@angular/core';
import {
    
     DomSanitizer, SafeHtml } from '@angular/platform-browser';

@Component({
    
    
  selector: 'app-example',
  template: `
    <div [innerHTML]="getSafeHtml()"></div>
  `,
})
export class ExampleComponent {
    
    
  htmlString: string = '<p>Hello, <strong>Angular</strong>!</p>';

  constructor(private sanitizer: DomSanitizer) {
    
    }

  getSafeHtml(): SafeHtml {
    
    
    return this.sanitizer.bypassSecurityTrustHtml(this.htmlString);
  }
}

Here first import DomSanitizerand SafeHtml, which are Angular's built-in services and types. Then, escape and validate the HTML string using DomSanitizerthe call method in the component . bypassSecurityTrustHtml()Finally, bind the returned SafeHtmlobject to [innerHTML]properties for safe HTML rendering.

By using DomSanitizerservices, Angular performs security checks on HTML strings and only allows trusted content to be rendered, thereby reducing potential security risks.

Note, when using DomSanitizer, make sure to only operate on trusted and validated HTML strings, and avoid getting HTML strings directly from user input or from untrusted sources. This ensures the security of the application and prevents security issues such as potential XSS attacks.

2、HTML Sanitizer API

As can be seen from the above examples, there are certain security risks in rendering HTML strings in common frames as well as in HTML. When user-supplied or untrusted HTML strings are rendered directly into an application, it can lead to security vulnerabilities such as cross-site scripting (XSS). Therefore, proper security measures need to be taken to prevent potential security issues when processing and rendering HTML strings.

So is there a way in HTML that allows us to render HTML strings safely? Yes, it is the HTML Sanitizer API. However, this API is still experimental and should not be used in a production environment until all major browsers support it. Let's take a look at how this API is used to prepare for the general availability of this API in the future.

what is it

The HTML Sanitizer API was first announced in a draft specification in early 2021. It provides native browser support for dynamically updated HTML on websites, from which malicious code can be removed. The HTML Sanitizer API can be used to sanitize and sanitize unsafe HTML strings and Documentor DocumentFragmentobjects before they are inserted into the DOM.

The main goals of building a separate API for cleaning are to:

Reduce the attack surface for cross-site scripting attacks in web applications.
Guarantees the safety of HTML output in the current user agent.
Improve the usability of the cleaner and make it more convenient to use.

The HTML Sanitizer API emerged to provide a convenient and safe way to process and sanitize HTML to reduce potential security risks and improve user agent security.

The Sanitizer API brings a series of new functions for the sanitization process of strings:

Sanitization of user input : The main function of this API is to accept and convert strings into a safer form. These converted strings do not accidentally execute JavaScript and ensure your application is protected from cross-site scripting attacks.
Browser Maintenance : This library is pre-installed in browsers and will be updated when bugs or new attack vectors are discovered. So now you have a built-in purifier without importing any external libraries.
Safe and Ease of Use : Moving sanitization into the browser makes it easier, safer and faster. Since the browser already has a powerful and safe parser, it knows what to do with every active element in the DOM. External parsers developed in JavaScript can be expensive compared to browsers and quickly become obsolete.

how to use?

Using the Sanitizer API is as simple as Sanitizer()instantiating the class using the constructor Sanitizerand configuring the instance.

For data cleansing, the API provides three basic methods. Let's see how and when to use them.

Sanitize a string using an implicit context

Element.setHTML()Used to parse and sanitize a string and immediately insert it into the DOM. This works when the target DOM element is known and the HTML content exists as a string.

const $div = document.querySelector('div');
const user_input = `<em>Hello There</em><img src="" οnerrοr=alert(0)>`;
const sanitizer = new Sanitizer() // Our Sanitizer

$div.setHTML(user_input, sanitizer); // <div><em>Hello There</em><img src=""></div>

Here you want to user_stringinsert the HTML in into the target element idfor target. That is, it is desirable to achieve target.innerHTML = valuethe same effect as but without the risk of XSS.

sanitizes a string using the given context

Sanitizer.sanitizeFor()Used to parse, sanitize and prepare strings for later addition to the DOM. This method works best when the HTML content exists as a string and the target DOM element type is known (for example div, ).span

const user_input = `<em>Hello There</em><img src="" οnerrοr=alert(0)>`
const sanitizer = new Sanitizer()

sanitizer.sanitizeFor("div", user_input) // HTMLDivElement <div>

Sanitizer.sanitizeFor()The first argument to describes the type of node this result is for.

When using sanitizeFor()the method, the result of parsing an HTML string depends on the context/element it is in. For example, it is allowed if an HTML string containing <td>a element is inserted into a element. <table>But if it is inserted into <div>a element, it will be removed. Therefore, when using Sanitizer.sanitizeFor()the method, the label of the target element must be specified as an argument.

sanitizeFor(element, input)

Here you can also use in the HTML element .innerHTMLto get the cleaning result in string form.

sanitizer.sanitizeFor("div", user_input).innerHTML // <em>Hello There</em><img src="">

Purify with Node

DocumentFragmentThe method can be used Sanitizer.sanitize()to sanitize a DOM tree node when there is already a user-controllable .

const sanitizer = new Sanitizer()
const $userDiv = ...;
$div.replaceChildren(s.sanitize($userDiv));

Among other things, the Sanitizer API modifies HTML strings by removing and filtering attributes and tags. For example, the Sanitizer API:

Remove certain tags (script, marquee, head, frame, menu, object, etc.), but keep content tags.
Remove most attributes. Only the and , on the tag <a>will be kept , other attributes will be removed.href<td><th>colspans
Filter strings that may cause script execution.

customize

By default, Sanitizer instances are only used to prevent XSS attacks. However, in some cases, a custom configured cleaner may be required. Next, let's take a look at how to customize the Sanitizer API.

If you want to create a custom sanitizer configuration, just create a configuration object and pass it to the constructor when initializing the Sanitizer API.

const config = {
    
    
  allowElements: [],
  blockElements: [],
  dropElements: [],
  allowAttributes: {
    
    },
  dropAttributes: {
    
    },
  allowCustomElements: true,
  allowComments: true
};
// 清理结果由配置定制
new Sanitizer(config)

The following configuration parameters define how the sanitizer should handle sanitization results for a given element.

allowElements: Specifies the elements in the input that the cleaner should keep.
blockElements: Specifies elements that the cleaner should remove from the input but keep its children.
dropElements: Specifies that the cleaner should remove elements from the input, including its children.

const str = `hello <b><i>there</i></b>`

new Sanitizer().sanitizeFor("div", str)
// <div>hello <b><i>there</i></b></div>

new Sanitizer({
    
    allowElements: [ "b" ]}).sanitizeFor("div", str)
// <div>hello <b>there</b></div>

new Sanitizer({
    
    blockElements: [ "b" ]}).sanitizeFor("div", str)
// <div>hello <i>there</i></div>

new Sanitizer({
    
    allowElements: []}).sanitizeFor("div", str)
// <div>hello there</div>

Use allowAttributesthe and dropAttributesparameters to define which attributes are allowed or removed.

const str = `<span id=foo class=bar style="color: red">hello there</span>`

new Sanitizer().sanitizeFor("div", str)
// <div><span id="foo" class="bar" style="color: red">hello there</span></div>

new Sanitizer({
    
    allowAttributes: {
    
    "style": ["span"]}}).sanitizeFor("div", str)
// <div><span style="color: red">hello there</span></div>

new Sanitizer({
    
    dropAttributes: {
    
    "id": ["span"]}}).sanitizeFor("div", str)
// <div><span class="bar" style="color: red">hello there</span></div>

AllowCustomElementsParameter to allow or deny the use of custom elements.

const str = `<elem>hello there</elem>`

new Sanitizer().sanitizeFor("div", str);
// <div></div>

new Sanitizer({
    
     allowCustomElements: true,
                allowElements: ["div", "elem"]
              }).sanitizeFor("div", str);
// <div><elem>hello there</elem></div>

NOTE: If r is created Sanitizewithout any arguments and without an explicitly defined configuration, the default configuration values will be applied.

browser support

Currently, browser support for the Sanitizer API is limited, and the specification is still a work in progress. The API is still experimental, so watch for changes before using it in production.

3. Third-party library

At this point we know that neither the native API nor the commonly used front-end frameworks provide a usable way to render HTML safely. In actual development, we can use existing third-party libraries to safely render HTML. Here are a few commonly used libraries.

DOMPurify

DOMPurify is a popular JavaScript library for HTML sanitization and protection against cross-site scripting (XSS) in the browser environment. It protects web pages from XSS attacks by removing malicious code and filtering dangerous tags and attributes. DOMPurify uses a strict parsing and validation strategy, and provides configurable options for developers to customize according to their needs. It can be easily integrated into existing web applications and is widely regarded as a safe and reliable HTML sanitization solution.

DOMPurify can be used by the following steps:

First, install the DOMPurify library. It can be installed by running the following command:

npm install dompurify

In the component file that needs to be used, introduce the DOMPurify library:

import DOMPurify from 'dompurify';

In the appropriate position of the component, use DOMPurify to purify the HTML string, the following takes React as an example:

import React from 'react';

const MyComponent = () => {
    
    
  const userInput = '<script>alert("XSS");</script><p>Hello, World!</p>';
  const cleanedHtml = DOMPurify.sanitize(userInput);

  return <div dangerouslySetInnerHTML={
    
    {
    
     __html: cleanedHtml }}></div>;
};

Safe HTML is displayed here by dangerouslySetInnerHTMLpassing the sanitized HTML content in the React component's props.

DOMPurify provides several options and configurations that can be used to customize DOMPurify's behavior:

import DOMPurify from 'dompurify';

// 创建自定义的白名单（允许的标签和属性）
const myCustomWhiteList = DOMPurify.sanitize.defaults.allowedTags.concat(['custom-tag']);
const myCustomAttributes = ['data-custom-attr'];

// 创建自定义选项
const myOptions = {
    
    
  ALLOWED_TAGS: myCustomWhiteList,
  ATTRIBUTES: {
    
    
    ...DOMPurify.sanitize.defaults.ALLOWED_ATTR,
    'custom-tag': myCustomAttributes,
  },
};

const userInput = '<script>alert("XSS");</script><p>Hello, World!</p><custom-tag data-custom-attr="custom-value">Custom Content</custom-tag>';

const cleanedHtml = DOMPurify.sanitize(userInput, myOptions);

console.log(cleanedHtml);
// 输出: <p>Hello, World!</p><custom-tag data-custom-attr="custom-value">Custom Content</custom-tag>

A custom whitelist is defined here myCustomWhiteList, which includes DOMPurify's default allowed tags, and a custom-tagcustom tag named is added. We also define an data-custom-attrobject that contains custom properties myCustomAttributes. Then, a custom option was created to apply custom whitelist and attribute rules myOptionsby overriding ALLOWED_TAGSand . ATTRIBUTESFinally, use DOMPurify.sanitize()the method, and pass in the HTML and custom options entered by the user myOptions, DOMPurify will filter and purify according to the custom rules.

You can define your own whitelist (allowed tags) and properties as needed and use them in custom options to customize DOMPurify's behavior.

js-xss

js-xss is a JavaScript library for preventing and filtering cross-site scripting attacks (XSS). It provides a set of methods and functions that can sanitize and escape user-entered HTML content to ensure that HTML rendered in the browser environment is safe.

The js-xss library uses the concept of whitelist filters to defend against XSS attacks. It defines a set of allowed HTML tags and attributes, and also provides some options and configurations to customize the filtering rules. Using js-xss, you can sanitize the HTML content submitted by users, remove or escape all potentially dangerous codes, and only keep safe HTML tags and attributes.

You can use js-xss through the following steps:

Install the js-xss library: Install the js-xss library through npm or yarn.

npm install xss

Import the js-xss library: Import the js-xss library in the React component file.

import xss from 'xss';

Use js-xss to filter HTML content: where HTML needs to be filtered, call the js-xss method to purify HTML.

import React from 'react';
import xss from 'xss';

const MyComponent = () => {
    
    
  const userInput = '<script>alert("XSS");</script><p>Hello, World!</p>';
  const cleanedHtml = xss(userInput);

  return <div dangerouslySetInnerHTML={
    
    {
    
     __html: cleanedHtml }} />;
};

export default MyComponent;

Here attributes MyComponentare used in the component dangerouslySetInnerHTMLto render the HTML content. By calling xss()the function and passing in the user-entered HTML, we can filter and sanitize it, and set the result as the component's content.

The js-xss library provides some options and configurations, which can be used to define custom filtering rules:

import xss from 'xss';

// 创建自定义WhiteList过滤规则
const myCustomWhiteList = {
    
    
  a: ['href', 'title', 'target'], // 只允许'a'标签的'href', 'title', 'target'属性
  p: [], // 允许空白的'p'标签
  img: ['src', 'alt'], // 只允许'img'标签的'src', 'alt'属性
};

// 创建自定义选项
const myOptions = {
    
    
  whiteList: myCustomWhiteList, // 使用自定义的WhiteList过滤规则
};

const userInput = '<script>alert("XSS");</script><p>Hello, World!</p><a href="https://example.com" target="_blank">Example</a>';

const cleanedHtml = xss(userInput, myOptions);

console.log(cleanedHtml);
// 输出: <p>Hello, World!</p><a href="https://example.com" target="_blank">Example</a>

A custom WhiteListfilter rule is defined here myCustomWhiteListand passed to the defined options myOptions. Then, xss()when the function is called, the HTML and custom options entered by the user are passed in, and the js-xss library will filter and purify according to the custom rules.

sanitize-html

sanitize-html is a JavaScript library for sanitizing and filtering HTML code. It is designed to remove potentially malicious or insecure content, as well as protect applications from security vulnerabilities such as cross-site scripting (XSS). It provides a simple and flexible way to sanitize user-entered HTML code to ensure that only safe tags, attributes and styles remain and that it does not contain any malicious code or potentially dangerous content.

sanitize-html uses a whitelist (configuration option) to define allowed tags, attributes and styles, and filters and deletes all content that is not in the whitelist. It can also handle mismatched tags, tag nesting issues, and other HTML-related issues.

Sanitize-html can be used by the following steps:

Install the sanitize-html library in the project:

npm install sanitize-html

Introduce the sanitize-html library in the component:

import sanitizeHtml from 'sanitize-html';

Use functions in components sanitizeHtmlto sanitize and filter HTML code. For example, you could store user-entered HTML in a component's state or props, and apply sanitizeHtmla function on render:

import React from 'react';
import sanitizeHtml from 'sanitize-html';

function MyComponent() {
    
    
  const userInput = '<script>alert("XSS");</script><p>Hello, World!</p>';
  const cleanedHtml = sanitizeHtml(userInput);

  return (
    <div>
      <div dangerouslySetInnerHTML={
    
    {
    
     __html: cleanedHtml }}></div>
    </div>
  );
}

Here the HTML code entered by the user is defined inside the component, and sanitizeHtmla function is used to sanitize it. dangerouslySetInnerHTMLThe sanitized HTML code is then rendered onto the page using attributes.

You can use the functions provided by sanitize-html sanitizeand pass a configuration object as a parameter to customize the configuration of sanitize-html. The configuration object can contain a series of options for defining filtering rules and allowed HTML tags and attributes.

import sanitizeHtml from 'sanitize-html';

const customConfig = {
    
    
  allowedTags: ['b', 'i', 'u'], // 允许的标签
  allowedAttributes: {
    
    
    a: ['href'] // 允许的a标签属性
  },
  allowedSchemes: ['http', 'https'], // 允许的URL协议
  allowedClasses: {
    
    
    b: ['bold', 'highlight'], // 允许的b标签的class
    i: ['italic'] // 允许的i标签的class
  },
  transformTags: {
    
    
    b: 'strong', // 将b标签转换为strong标签
    i: 'em' // 将i标签转换为em标签
  },
  nonTextTags: ['style', 'script', 'textarea', 'noscript'] // 不允许解析的标签
};

const userInput = '<b class="bold">Hello</b> <i class="italic">World</i> <a href="https://example.com">Link</a>';

const cleanedHtml = sanitizeHtml(userInput, customConfig);

A configuration object named config is created here customConfig, which contains some custom filtering rules and options. This configuration object defines allowed tags, allowed attributes, allowed URL protocols, allowed CSS class names, tag conversion rules, and tags that are not allowed to be parsed.

Then, the HTML code entered by the user is passed to the function as the first parameter sanitizeHtmland will be customConfigpassed as the second parameter. sanitizeHtmlThe function will filter and purify the HTML code according to the rules defined in the configuration object, and return the purified HTML code.