How to efficiently convert a multidimensional collection of 2 different types into a multidimensional List

Ouissal :

I have the following HashSet, which needs to have TreeSets as elements :

HashSet<TreeSet<Integer>> hash=new HashSet<TreeSet<Integer>>(); 

I want to be able to store ordered elements into the TreeSet, order is also important hence the choice of a TreeSet. Those elements themselves need to be ordered but not necessarily sorted. But I must return :

List<List<Integer>>

What is the most efficient way to convert my hash into a List of a List of Integers in terms of performance ?

Thank you

Andronicus :

You still need to iterate over collection and convert inner sets into lists. Although if performance is of the greatest importance, consider using some third-party libraries, that provide some optimizations.

1. Benchmarks

I have written a couple of simple benchmarks using JMH to compare them with vanilla java solution:

public class Set2ListTest {

    private static final int SMALL_SET_SIZE = 10;
    private static final int LARGE_SET_SIZE = 1000;

    public static void main(String[] args) throws RunnerException {
        Options options = new OptionsBuilder()
                .include(Set2ListTest.class.getSimpleName())
                .forks(1)
                .threads(8)
                .warmupIterations(1)
                .measurementIterations(1)
                .build();
        new Runner(options).run();
    }

    @State(Scope.Benchmark)
    public static class Provider {

        Set<Set<Integer>> smallSet = new HashSet<>(SMALL_SET_SIZE);
        Set<Set<Integer>> largeSet = new HashSet<>(LARGE_SET_SIZE);

        @Setup
        public void setup() {
            fillSet(smallSet, SMALL_SET_SIZE);
            fillSet(largeSet, LARGE_SET_SIZE);
        }

        private void fillSet(Set<Set<Integer>> set, int count) {
            Random random = new Random();
            for (int i = 0; i < count; i++) {
                Set<Integer> innerSet = new TreeSet<>();
                for (int j = 0; j < count; j++) {
                    innerSet.add(random.nextInt(Integer.MAX_VALUE));
                }
                set.add(innerSet);
            }
        }

    }

    @Benchmark
    public void small_plainJava(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = new ArrayList<>(SMALL_SET_SIZE);
        for (Set<Integer> set : provider.smallSet) {
            list.add(new ArrayList<>(set));
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void large_plainJava(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = new ArrayList<>(LARGE_SET_SIZE);
        for (Set<Integer> set : provider.largeSet) {
            list.add(new ArrayList<>(set));
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void small_guava(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = new ArrayList<>(SMALL_SET_SIZE);
        for (Set<Integer> set : provider.smallSet) {
            list.add(com.google.common.collect.Lists.newArrayList(set));
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void large_guava(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = new ArrayList<>(LARGE_SET_SIZE);
        for (Set<Integer> set : provider.largeSet) {
            list.add(com.google.common.collect.Lists.newArrayList(set));
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void small_commons(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = new ArrayList<>(SMALL_SET_SIZE);
        for (Set<Integer> set : provider.smallSet) {
            List<Integer> innerList = new ArrayList<>(SMALL_SET_SIZE);
            CollectionUtils.addAll(innerList, set);
            list.add(innerList);
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void large_commons(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = new ArrayList<>(LARGE_SET_SIZE);
        for (Set<Integer> set : provider.largeSet) {
            List<Integer> innerList = new ArrayList<>(LARGE_SET_SIZE);
            CollectionUtils.addAll(innerList, set);
            list.add(innerList);
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void small_eclipse(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = FastList.newList();
        for (Set<Integer> set : provider.smallSet) {
            list.add(FastList.newList(set));
        }
        blackhole.consume(list);
    }

    @Benchmark
    public void large_eclipse(Provider provider, Blackhole blackhole) {
        List<List<Integer>> list = FastList.newList();
        for (Set<Integer> set : provider.largeSet) {
            list.add(FastList.newList(set));
        }
        blackhole.consume(list);
    }

}

2. Results

The results are as follows:

Benchmark                      Mode  Cnt        Score   Error  Units
Set2ListTest.large_commons    thrpt           183.205          ops/s
Set2ListTest.large_eclipse    thrpt           219.068          ops/s
Set2ListTest.large_guava      thrpt           178.478          ops/s
Set2ListTest.large_plainJava  thrpt           148.058          ops/s
Set2ListTest.small_commons    thrpt       2140560.198          ops/s
Set2ListTest.small_eclipse    thrpt       2619862.935          ops/s
Set2ListTest.small_guava      thrpt       2720692.868          ops/s
Set2ListTest.small_plainJava  thrpt       2472748.566          ops/s

3. Discussion

Two different sizes of Sets were used to measure performance: 10 sets of size 10 (100 elements) and 1000 sets of 1000 elements (1.000.000 elements).

For the small Set, eclipse collections outperformed the other tools. Plain java seems to be the slowest solution.

In the case of the larger Set, guava provides the best results.

It's obvious, that the size of collections has a great impact on performance and can influence the best choice among those libraries. I don't know the size of your data, but my goal is not to give you the best collections library but to give you a tool to find one.

4. Additional notes

I do not consider myself a specialist in any of the collection frameworks and there might be a better way to use them.

If those results are not satisfactory, you can use primitive collections, so that much less memory has to be allocated. This will result in a huge performance boost.

Note, that if you want to experiment with the benchmarks provided, it's important, that you compare the implementations against each other, not against the results I got, because that might be hardware-related.

Guess you like

Origin http://10.200.1.11:23101/article/api/json?id=418948&siteId=1