Description of feature
The idea of add_to_collection is that you can load the whole dataset into memory, inject a new dataset, and then write it out to disk. But at some point, the shard becomes too big in which case you probably want to have some sort of "split" option where past a certain point, the shard gets split into two (or n).
An alternative to this would be handling everything truly lazily, but I think we'd lose io efficiency since pure in-memory -> disk io is going to be faster than iteratively writing.