How to partition a collection based on summing of item properties, up to a given limit?

JRL :

How can I chunk a collection into N chunks, based on summing one of the fields of each item in the collection, up to a given max value?

E.g. given the following:

class FileObject { public long sizeInBytes; }
Collection<FileObject> files;
long MAX_SIZE_THRESHOLD = 1024 * 1024 * 100; // 100 MB

I'd like to transform items into a Collection<Collection<FileObject>>, with the smallest number of inner collections, and satisfying the predicate that for each collection, the sum of the sizeInBytes of each element is less than MAX_SIZE_THRESHOLD.

Taking this further, in addition to the above requirement, if FileObject is extended to contain a timestamp, I'd also like to partition the result by year, month, and day.

E.g.

class FileObject { public long sizeInBytes; public long modifiedDate; }

I'd like the end result to look like:

Map<Integer, Map<Integer, Map<Integer, Collection<FileObject>>>>

where the keys in the maps are: year, month, and day (corresponding to the FileObject's modifiedDate), and the Collection contains all files within that year, month, day, and where the sum of the sizeInBytes of each file is less than MAX_SIZE_THRESHOLD.

Can both operations be done while avoiding loops and using functional constructs available using the Stream API or other? Can both be done in a single statement?

user_3380739 :

You can try StreamEx.collapse(...) in StreamEx. Here are the sample codes:

final long MAX_SIZE_THRESHOLD = 12; // only for test purpose.

// create the sample file objects with random size for test.
Collection<FileObject> files =
    new Random().longs(0, 1000).limit(50).mapToObj(n -> new FileObject(n % 15, n))
    .collect(Collectors.toList());

// here is the final solution you can try
final MutableLong remaining = MutableLong.of(MAX_SIZE_THRESHOLD);

List<List<FileObject>> result = StreamEx.of(files).collapse((a, b) -> {
  if (b.sizeInBytes <= remaining.value() - a.sizeInBytes) {
    remaining.subtract(a.sizeInBytes);
    return true;
  } else {
    remaining.setValue(MAX_SIZE_THRESHOLD);
    return false;
  }
}, Collectors.toList()).toList();

result.forEach(System.out::println);

And here is the solution by nested groupingBy for part 2 of your question:

// import static java.util.stream.Collectors.*
Map<Integer, Map<Integer, Map<Integer, List<FileObject>>>> result2 = files.stream()
    .filter(f -> f.sizeInBytes < MAX_SIZE_THRESHOLD)
    .collect(groupingBy(f -> f.getYear(), 
                        groupingBy(f -> f.getMonth(), 
                                        groupingBy(f -> f.getDay(), toList()))));

result2.entrySet().forEach(System.out::println);

Finally here is the FileObject I used for test:

static class FileObject {
  public long sizeInBytes;
  public long modifiedDate;

  public FileObject(long sizeInBytes, long modifiedDate) {
    this.sizeInBytes = sizeInBytes;
    this.modifiedDate = modifiedDate;
  }

  public int getYear() {
    return (int) modifiedDate / 100; // only for test purpose
  }

  public int getMonth() {
    return (int) (modifiedDate % 100) / 10; // only for test purpose
  }

  public int getDay() {
    return (int) modifiedDate % 10; // only for test purpose
  }

  @Override
  public String toString() {
    return sizeInBytes + "-" + modifiedDate;
  }
}

Updated based on the comments:

You will need Collectors.collectAndThen.

Function<List<FileObject>, List<List<FileObject>>> finisher = fileObjs -> {
  MutableLong remaining2 = MutableLong.of(MAX_SIZE_THRESHOLD);
  return StreamEx.of(fileObjs).collapse((a, b) -> {
    if (b.sizeInBytes <= remaining2.value() - a.sizeInBytes) {
      remaining2.subtract(a.sizeInBytes);
      return true;
    } else {
      remaining2.setValue(MAX_SIZE_THRESHOLD);
      return false;
    }
  }, toList()).toList();
};

Map<Integer, Map<Integer, Map<Integer, List<List<FileObject>>>>> result4 = files.stream()
    .collect(groupingBy(f -> f.getYear(),
        groupingBy(f -> f.getMonth(), 
            groupingBy(f -> f.getDay(), collectingAndThen(toList(), finisher)))));

And the result type should be Map<Integer, Map<Integer, Map<Integer, List<List<FileObject>>>>>, not Map<Integer, Map<Integer, Map<Integer, List<FileObject>>>>.

By the way, if you don't want to write finisher Function(I don't :-)), try my library: Abacus-Util:

Function<List<FileObject>, List<List<FileObject>>> finisher2 = fileObjs -> Seq.of(fileObjs)
    .split(MutableLong.of(0), (f, sizeSum) -> sizeSum.addAndGet(f.sizeInBytes) <= MAX_SIZE_THRESHOLD,
        sizeSum -> sizeSum.setValue(0));

// import static com.landawn.abacus.util.stream.Collectors.MoreCollectors.*;
StreamEx.of(files)
    .toMap(f -> f.getYear(),
        groupingBy(f -> f.getMonth(),
            groupingBy(f -> f.getDay(), collectingAndThen(toList(), finisher2))));

How to partition a collection based on summing of item properties, up to a given limit?

Guess you like