Description
Based on discussion with DE(https://docs.google.com/document/d/1S2Ij3FikNGdN8ZwZKcwpAfNmGQoNtah9AwNilxqhC3k/edit), it has been decided to move forward with static stream config option as a preference.
However, there are unknowns in terms of performance issues that need to be investigated to ensure that the workflow will scale. This work is necessary irrespective of static vs dynamic approach, though they would have slightly different risk profiles.
For example, how will filtering will impact performance (especially for dashboarding) with Parquet / Iceberg?
Acceptance Criteria
- Documented set of usage scenarios covering scaling of storage, requests etc
- Posit performance implications for each
- Propose mitigation for most likely scenarios
Final status
- An extreme case for unified instrument volumes was suggested.
- Some dummy queries were provided
- A existent stand-in Hive Parquet table with a suitable structure and volume was selected
- Dummy filtering queries were executed against this the stand-in table.
The results are summarized at T366627#10015322. The comparison of the results against a 'baseline' dummy 'per-instrument' table were flawed, but the query times against the unified instrument table were not.
Note that Presto query times can vary wildly, as the Presto engine is a shared environment.
The dummy queries filtering for an small instrument in a large unified instrument table can take around 5 seconds.
In T366627#10030263, it was noted that this is fine.
As ~5 seconds is acceptable for now, and things will only improve, we we call this task done and not do further work to compare query times.