LATENESS/GC downstream of recursive views #5492
-
|
Hi all, I've been absolutely loving feldera, thank you for an amazing project. I have a question about GC which I'd appreciate a gut check on. The TLDR of my question is -- would it theoretically be possible to add LATENESS GC to views which are downstream (not inside) of recursive queries, if I am confident my application assumptions are safe? I have an application that ingests a large volume of payload records associated with nodes in a tree/forest. The node graph itself is relatively small and can be retained indefinitely, but the payloads are extremely large and must be garbage-collected to keep memory reasonably bounded. Each node has a numeric height (incrementing coordinate along parent edges) and at most one parent, with no cycles. Separately, there is an input stream of finalization updates that effectively advances the “finalized height” monotonically over time. It does this by writing The rough shape of the SQL is something like this: Since Feldera allows LATENESS annotations on views, my original assumption was that the above would work and allow the circuit to discard the finalized_payloads to an external sink. However, after digging in, my understanding now is that compiler validation disables GC for all views which have any dependency on a recursive view whatsoever. My question is: is that an overly conservative decision on the compiler's part? If I try to remove the compiler GC validation or insert |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments
-
|
I ran the compiler on this program and it did produce a GC operator for the join with node_payload. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
|
Note that GC only works for incremental circuits, you need to supply the -i flag to the compiler. |
Beta Was this translation helpful? Give feedback.
-
|
Derp -- I tried to make a simpler example for readability but didn't make sure my issue repro'd 🤦 I must have something else going wrong in my actual example that I misconstrued was coming from the JOIN. That's awesome news for me, thank you so much! |
Beta Was this translation helpful? Give feedback.

I ran the compiler on this program and it did produce a GC operator for the join with node_payload.
The finalized_payloads view is not an input to any other computation, so the lateness does not help anything - lateness acts downstream of a view.