Description
Problem
We keep a cache of the .tex
and .dvi
files when rendering with an external LaTeX process, however we put no controls on the size of that cache. We have anecdotal reports (#4880 (comment)) that if this cache gets too big it becomes its own bottle neck (I assume the problem is we have put too many files in a single folder for the file system).
There are two hard problems in computer science
- naming thing
- cache invalidation
- off-by-one bugs
Proposed solution
- The files are already doing content-based addressing. One solution is to go with the nested folder approach (like git does internally) where the first 8 characters become 4 levels of 2 letter named folders (or whatever tree width / depth make sense).
- Pro: it will avoid any filesystem related slow down due to too many files in a directory, no to state or API (no, it does not need to be configurable)!
- Con: still unbounded space
- set a maximum number or diskspace (or both) that can be used and then cull the files by some algorithm (random? oldest on disk? do we want track enough to do LRU or LFU?)
- Pro: solves the unbounded cache problem!
- Con: we will have to add some API to control this, maybe track some extra state, and we might be opening up a whole new vector for inter-process race conditions (process A: "I need to clean up!" process B: "oh, I need that file!" process A: "deletes that file" process B: 💥 )
I am labeling this as a good first issue because while there may be some new API it should be well contained to how we manage a cache (and it is a cache so we should already be robust to it going away under us) but medium difficulty because this will require thinking through the consequences of the caching algorithm and would be best done by someone who has at least worked with (and preferably implemented / maintained) a similar on-disk caching system.