LocaleCompare for comparison between strings

Many people compare strings basically with > or < comparison

Some people use localeCompare, but would you really use this prototype method?

Here is an example to illustrate how to compare numbers in strings

// 当我们比较 '10' 和 '9' 的时候,返回值小于0
'10'.localeCompare('9') // -1

So if we encounter this situation, in fact, the localeCompare method also provides many parameters to compare

// locale 参数
'10'.localeCompare('9', 'zh-u-kn-true') // 1
'a'.localeCompare('A', 'zh-u-kf-upper') // 1
'a'.localeCompare('A', 'zh-u-kf-lower') // -1

// locale u 后边可以添加多种参数,比如 zh-u-kf-lower-kn-true

// options 参数
'10'.localeCompare('9', 'zh', { numeric: true }) // 1
'a'.localeCompare('A', 'zh', { caseFirst: 'upper' }) //1
'a'.localeCompare('A', 'zh', { caseFirst: 'lower' }) // -1

knSpecifies whether numerical ordering should be used, such as "1" < "2" < "10". Possible values ​​are  "true" and "false"

kf Specifies whether to sort uppercase or lowercase letters first. Possible values ​​are  "upper""lower", or "false"

 Never depend on a specific return value of -1 or 1. The values ​​of positive and negative numbers returned vary between different browsers (and between different browser versions), because the W3C specification only requires that the returned value be positive and negative, but does not specify a specific value. Some browsers may return -2 or 2 or some other negative, positive value.

So when we judge, we must judge based on whether they are greater than 0, less than 0, or equal to 0

locales The argument must be a  string of BCP 47 language tags , or an array containing multiple language tags. If  locales the parameter is not provided or is undefined, the runtime default locale will be used.

A BCP 47 language tag represents a language or region (there is not much difference between the two). In its most common form, it includes these in this order: language code, script code, and country code, all separated by a hyphen. For example:

  • "hi": Hindi (primary language).
  • "de-AT"在奥地利使用的德语 (primary language with country code)。
  • "zh-Hans-CN": Simplified Chinese (primary language with script and country codes) used in China.

"u" stands for Unicode. It can be used to request a Collator , NumberFormat , or  DateTimeFormat  object with custom locale-specific behavior  . For example:

  • "de-DE-u-co-phonebk": Use the German phonebook sort variant, which expands umlauts into character pairs: ä → ae, ö → oe, ü → ue.
  • "th-TH-u-nu-thai": Use Thai numeric representation in number format (๐, ๑, ๒, ๓, ๔, ๕, ๖, ๗, ๘, ๙)
  • "ja-JP-u-ca-japanese":在日期和时间格式化中使用日本的日历表示方式,所以 2013 会表示为平成 25

If you are used to using the third parameter options, you can pass the locale parameter undefined to use the system default

options is an object that supports some or all of the following properties

localeMatcher

Use of geographic matching algorithms. Possible values ​​are  "lookup" and  "best fit"; the default is  "best fit". For more information, please refer to  the Intl page .

usage Specifies whether the target of the comparison is sorting or searching. Possible values ​​are  "sort" and  "search"; default is  "sort".
sensitivity

Specifies the sensitivity of the sorting procedure (Which differences in the strings should lead to non-zero result values.) Possible are:

  • "base": Only strings with different letters compare as unequal. For example:  a ≠ ba = áa = A.
  • "accent": Only different letters or pronunciation comparisons are not equal. For example:  a ≠ ba ≠ áa = A.
  • "case": only different letters or case comparisons are not equal. For example:  a ≠ ba = áa ≠ A.
  • "variant": Different letters or pronunciations and other distinguishing signs or capitalization are not equal, and other differences may also be taken into account. For example:  a ≠ ba ≠ áa ≠ A.

    When using "sort", the default is "variant"; the use of "search" depends on the locale
ignorePunctuation Specifies whether to ignore punctuation. Possible values ​​are  true and  false; default is  false.
numeric Whether to specify numerical sorting, like this "1" < "2" < "10". Possible values ​​are  true and  false; the default is  false. This option can be options set via property or via Unicode extension. If both are set, takes  options precedence. Implementations are not required to support this property.
caseFirst Specifies case-limited sorting. Possible values ​​are  "upper", "lower" or  "false" (use the locale's default); default is  "false". This option can be  options set via property or via Unicode extension. If both are set, takes  options precedence. Implementations are not required to support this property.

Knowing this, we can use splitting and reordering for data that has a lot of patchwork

const a = 'abc12嘻嘻3AB哈哈C456abcABC789'
a.split(/([a-z]+|[0-9]+|[A-Z]+)/).filter(item => !!item)
// ['abc', '12', '嘻嘻', '3', 'AB', '哈哈', 'C', '456', 'abc', 'ABC', '789']
// 分割之后就可以根据不同的值进行排序
const arr = ["a-10-张三", "钱5", "A-9-赵5", '孙八', "a-10","a-9-李四", "A-10", "钱六", "A-10-张三", "a-10-李四", "A-10-李四"];

const toSeparate = (str) => {
  if (typeof str !== 'string') return
  const regex = /([a-z]+|[0-9]+|[A-Z]+)/
  return str.split(regex).filter(item => !!item)
}

// 个人觉得,不是同一个类型,比较就应该用默认,不用纠结根据类型获取不同比较
// 如果想分得更细,就自己加内容
const compare = new Intl.Collator('zh-u-kn-true-kf-lower').compare // 小写字母放在大写字母前面

arr.sort((one, other) => {
  const oneSeparate = toSeparate(one)
  const otherSeparate = toSeparate(other)

  while (oneSeparate.length && otherSeparate.length) {
    if (compare(oneSeparate[0], otherSeparate[0]) === 0) {
      oneSeparate.shift()
      otherSeparate.shift()
    } else {
      return compare(oneSeparate[0], otherSeparate[0])
    }
  }
  // 到这一步说明 某一个 已经没有值了
  if (oneSeparate[0] === otherSeparate[0]) return 0
  return oneSeparate[0] ? 1 : -1 // 表示有值的放在后边,没有值了的放在前面
});

console.log(arr);
// ['钱5', '钱六', '孙八', 'a-9-李四', 'a-10', 'a-10-李四', 'a-10-张三', 'A-9-赵5', 'A-10', 'A-10-李四', 'A-10-张三']

Guess you like

Origin blog.csdn.net/weixin_42335036/article/details/125366705