8 methods of deduplication of List collection objects and deduplication by attributes-sixth part of java basic summary series

Recently, I am writing some articles about the basics of java, but I don't want to write articles about knowledge points in the way of textbooks, because it is of little significance. There are too many basic knowledge, how to summarize this knowledge, summarize the advantages and disadvantages or use scenarios is the sublimation of knowledge . So I would like to pierce through the basic knowledge of java and make an overall summary.

  • Summarize 5 ways to create and write files in java
  • Summarize the 6 methods of reading data from files in java
  • Summarize the 4 methods of creating folders in java and their advantages and disadvantages
  • Summarize 7 ways to delete files or folders in java
  • Summarize 5 ways to copy and cut files in java

For example, I have written the above content before. If you are interested in the Java basic knowledge summary series, you can follow my blog (my blog address is given at the end of the article) .

1. Summary of this article

In this article, I want to write about 8 methods for deduplication of List collection elements . In fact, through flexible use, there may not be 8 permutations and combinations, but there may be 18 methods.

  • 4 methods for overall deduplication of object elements
  • 4 ways to de-duplicate according to object attributes

In order to explain the test content below, we first do some initialization data

public class ListRmDuplicate {
  private List<String> list;
  private List<Player> playerList;

  @BeforeEach
  public void setup() {
    list  =  new ArrayList<>();
    list.add("kobe");
    list.add("james");
    list.add("curry");
    list.add("zimug");
    list.add("zimug");

    playerList= new ArrayList<>();
    playerList.add(new Player("kobe","10000"));  //科比万岁
    playerList.add(new Player("james","32"));
    playerList.add(new Player("curry","30"));
    playerList.add(new Player("zimug","27"));   // 注意这里名字重复
    playerList.add(new Player("zimug","18"));   //注意这里名字和年龄重复
    playerList.add(new Player("zimug","18")); //注意这里名字和年龄重复

  }
}

The Player object is an ordinary java object with two member variables name and age, which implement parameterized constructors, toString, equals and hashCode methods, and GET/SET methods.

Second, the overall de-duplication of collection elements

The following four methods de-duplicate the String type in the List as a unit of the collection element object as a whole. If your List is an Object object, you need to implement the equals and hashCode methods of the object. The code implementation method List<String>for deduplication is the same as that for deduplication.

the first method

It is easiest for everyone to think of, first put the List data into the Set, because the Set data structure itself has the function of deduplication, so after the SET is converted to the List, it is the result of the deduplication. This method will change the original order of List elements after de-duplication, because HashSet itself is unordered, and TreeSet sorting is not the original order of List elements.

@Test
void testRemove1()  {
  /*Set<String> set = new HashSet<>(list);
  List<String> newList = new ArrayList<>(set);*/

  //去重并排序的方法(如果是字符串,按字母表排序。如果是对象,按Comparable接口实现排序)
  //List<String> newList = new ArrayList<>(new TreeSet<>(list));

  //简写的方法
  List<String> newList = new ArrayList<>(new HashSet<>(list));

  System.out.println( "去重后的集合: " + newList);
}

The console print results are as follows:

去重后的集合: [kobe, james, zimug, curry]

The second method

It is relatively simple to use. First, use the stream method to convert the collection into a stream, then distinct to remove the duplicates, and finally collect the Stream stream collect as a List.

@Test
void testRemove2()  {
  List<String> newList = list.stream().distinct().collect(Collectors.toList());

  System.out.println( "去重后的集合: " + newList);
}

The console print results are as follows:

去重后的集合: [kobe, james, curry, zimug]

The third method This method is used set.add(T), if the T element already exists in the collection, it returns false. Use this method to determine whether the data is duplicated, and if it is not duplicated, put it into a new newList, this newList is the final deduplication result

//三个集合类list、newList、set,能够保证顺序
@Test
void testRemove3()  {

  Set<String> set = new HashSet<>();
  List<String> newList = new  ArrayList<>();
  for (String str :list) {
    if(set.add(str)){ //重复的话返回false
      newList.add(str);
    }
  }
  System.out.println( "去重后的集合: " + newList);

}

The console print result is consistent with the second method.

The fourth method This method has broken away from the thinking of using Set collection for deduplication, but uses a newList.contains(T)method to determine whether the data already exists when adding data to a new List, and if it does not add it, so as to achieve deduplication. Effect.

//优化 List、newList、set,能够保证顺序
@Test
void testRemove4() {

  List<String> newList = new  ArrayList<>();
  for (String cd:list) {
    if(!newList.contains(cd)){  //主动判断是否包含重复元素
      newList.add(cd);
    }
  }
  System.out.println( "去重后的集合: " + newList);

}

The console print result is consistent with the second method.

Three, according to the collection element object attributes

In fact, in actual work, the application of deduplication according to the overall deduplication of collection element objects is still relatively small, and more is that we are required to deduplication according to certain attributes of element objects. Seeing this, please go back and take a look at the initialization data constructed above playerList. Pay special attention to some of the repeated elements and the repetition of member variables.

The first method is to implement the Comparator interface for the TreeSet. If we want to de-duplicate according to the name attribute of the Player, we can compare the names in the Comparator interface. Two methods of implementing the Comparator interface are written below:

  • Lambda expression:(o1, o2) -> o1.getName().compareTo(o2.getName())
  • Method reference:Comparator.comparing(Player::getName)
@Test
void testRemove5() {
  //Set<Player> playerSet = new TreeSet<>((o1, o2) -> o1.getName().compareTo(o2.getName()));
  Set<Player> playerSet = new TreeSet<>(Comparator.comparing(Player::getName));
  playerSet.addAll(playerList);

  /*new ArrayList<>(playerSet).forEach(player->{
    System.out.println(player.toString());
  });*/
  //将去重之后的结果打印出来
  new ArrayList<>(playerSet).forEach(System.out::println);
}

The output is as follows: three zimugs have duplicate names, and the other two are deduplicated. But because TreeSet is used, the elements in the list are reordered.

Player{name='curry', age='30'}
Player{name='james', age='32'}
Player{name='kobe', age='10000'}
Player{name='zimug', age='27'}

The second method This method is used in many articles on the Internet to show that you are awesome, but in my opinion, I take off my pants and fart. Since everyone says that there is such a method, I don't write it as if I'm not good at it. Why do I say this method is "take off my pants and fart"?

  • First use stream() to convert the list collection into a stream
  • Then use collect and toCollection to convert the stream into a collection
  • Then the rest is the same as the first method

Didn’t you take off your pants and fart in the first two steps? Just take a look, the practical application is not very meaningful, but if it is to learn the use of Stream, it is still desirable to come up with such an example.

@Test
void testRemove6() {
  List<Player> newList = playerList.stream().collect(Collectors
          .collectingAndThen(
                  Collectors.toCollection(() -> new TreeSet<>(Comparator.comparing(Player::getName))),
                  ArrayList::new));

  newList.forEach(System.out::println);
}

The console printout is the same as the first method.

The third method

This method is also a method that the author recommends that everyone use. At first glance, it seems that the amount of code is larger, but in fact, this method is a relatively simple method to apply.

Predicate (some people call this a predicate, from the English point of view it can be translated as a predicate as a noun, and as a verb can be translated as a predicate). Predicates are used to modify the subject, for example: a bird who likes to sing is a predicate, which is used to limit the scope of the subject. So we are here to filter, but also to limit the scope of the subject, so I think it is more appropriate to translate as a predicate. Whatever you want, you can do whatever you think is reasonable and easy to remember.

  • First, we define a predicate Predicate to filter, and the filter condition is distinctByKey. The predicate returns the true element to keep, and the false element is filtered out.
  • Of course, our requirement is to filter out duplicate elements. Our de-duplication logic is implemented by map's putIfAbsent. The putIfAbsent method adds a key-value pair. If there is no value corresponding to the key in the map set, add it directly and return null. If the corresponding value already exists, it is still the original value.
  • If putIfAbsent returns null, it means that the data is added successfully (not repeated). If putIfAbsent returns value(value==null:false), then the element that meets the condition of the distinctByKey predicate is filtered out.

Although this method seems to have increased the amount of code, the distinctByKey predicate method only needs to be defined once and can be reused indefinitely.

@Test
void testRemove7() {
  List<Player> newList = new ArrayList<>();
  playerList.stream().filter(distinctByKey(p -> p.getName()))  //filter保留true的值
          .forEach(newList::add);

  newList.forEach(System.out::println);
}

static <T> Predicate<T> distinctByKey(Function<? super T, ?> keyExtractor) {
  Map<Object,Boolean> seen = new ConcurrentHashMap<>();
  //putIfAbsent方法添加键值对,如果map集合中没有该key对应的值,则直接添加,并返回null,如果已经存在对应的值,则依旧为原来的值。
  //如果返回null表示添加数据成功(不重复),不重复(null==null :TRUE)
  return t -> seen.putIfAbsent(keyExtractor.apply(t), Boolean.TRUE) == null;
}

The output is as follows: three zimugs have duplicate names, and the other two are deduplicated. And did not disrupt the original order of the List

Player{name='kobe', age='10000'}
Player{name='james', age='32'}
Player{name='curry', age='30'}
Player{name='zimug', age='27'}

The fourth method The fourth method is not actually a new method. The above examples are all based on a certain object attribute to de-duplicate. If we want to de-duplicate according to certain elements, we need to modify the above three methods. . I only remodeled one of them. The principle of the other remodelings is the same, that is, adding up multiple comparison properties and comparing them as a String property.

@Test
void testRemove8() {
  Set<Player> playerSet = new TreeSet<>(Comparator.comparing(o -> (o.getName() + "" + o.getAge())));

  playerSet.addAll(playerList);

  new ArrayList<>(playerSet).forEach(System.out::println);
}

Welcome to follow my blog, there are many boutique collections

  • This article is reproduced indicate the source (en must not turn only the text): letters Gebo off .

If you think it is helpful to you, please like and share it for me! Your support is my inexhaustible creative motivation! . In addition, the author has output the following high-quality content recently, and I look forward to your attention.

Guess you like

Origin blog.csdn.net/hanxiaotongtong/article/details/108442705