[Garbage Collector] Umbrella: known issues of garbage collector

@lavalamp

This is a umbrella issue for Todos and known problems of the garbage collector that we have to solve before graduating GC to beta.

_Functionalities_

[optional]
This race will cause DeleteOptions.OrphanDependents=true not orphan all dependents. We expect this race to be rare. Here are the details: there is no guarantee on the ordering of the events of different resources, it's possible that the GC observes the orphan request (which is an Update event of the owner resource), before observing the creation/update of the dependents. Consequently, these dependents will not be orphaned. [GarbageCollector] monitor the watch latency #30483 adds a monitor the latency of watch, because the latency is small, so this race is rare.
Considered solution:
a. let the GC wait for a short period of time (e.g., 1 min) before carrying out the orphaning procedure, but it won't thoroughly solve the problem, and it will make the deletion slow.
b. let the user supply a resourceVersion, and before carrying out the orphaning procedure, let GC wait until it has observed events of the dependent resource with a larger resourceVersion. Problem is GC is a client, and a client should treat resourceVersion as opaque.
[optional] Need a discovery mechanism to determine what resources GC should manage. For example, GC needs to determine if it should watch for extensions/v1beta1/job or batch/v1/job. (edit: this use case doesn't matter, because the ownerRef will only point to one version of the object, the other version of the object will just be a shadow)
~~[DONE] According to the controllerRef proposal Proposal for ControllerReference #25256, "GarbageCollector will remove ControllerRef from objects that no longer points to existing controllers".~~
~~[DONE] Update at least one controller to use GC. Now replicaset controller and replicationcontroller manager use GC ([GarbageCollector] Let the RC manager set/remove ControllerRef #27600)~~
~~[DONE] [update] we have foreground gc now. Expose the progress of garbage collection. See [RFC][GarbageCollector] expose the progress of garbage collection #29891.~~
[Fixing, see GC: Fix re-adoption race when orphaning dependents. #42938 ] The design doc said before orphaning dependents, GC should wait for the owner's controller to bump up owner's ObservedGeneration, which means the owner controller has acknowledged that it has observed the deletion of the owner and will stop adoption. Otherwise GC's orphaning process will race with owner controller's adoption process, and results in the deletion of the dependents. We hasn't implemented this yet. We expect this race to be rare, because currently only the replicaset and replication controller does the adoption, and it's triggered by updates to RC or the 10-minute resync. We have an e2e test for orphaning 100 pods controlled by a RC, it never hits this race.
[Done for new resources. For old resources 200 is returned for compatiblity] API server should return 202 if a deletion request is not completed synchronously. (API server should return 202 if the operation is asynchronous #33196)
[ tracked in https://github.com/Garbage collector should support non-core APIs #44507] Supporting non-core APIs, either registered via ThirdParty Resource or kube-aggregator.

_Peformance_

[Mechanism is there, need numbers] benchmark the average queuing time (eventQueue and dirtyQueue) ([GarbageCollector] measure latency #28387)
~~2. [Done] Improvement: update the List and Watch to only store the TypeMeta and ObjectMeta ([GarbageCollector] only store typeMeta and objectMeta in the gc store #28480)~~
[Done] [GarbageCollector] add absent owner cache #31167. Caching known deleted UID. GC contacts API server to see if an owner exists when processing its dependents. If the owner doesn't exist according to API server, it won't exist in the future either, because it's impossible for user to predicate UID of a future object and put it in the ownerRef. Such a cache is very useful in the RC-Pods case, because GC will check for the existence of the RC for every pods it created. Such a cache will help API server as well because a GET request with JSON media type is expensive for the API server.
~~4. [RFC] In API server, support resouceVersion as a delete precondition. This will make the Get() in processItem() unnecessary.~~

References:

@lavalamp @gmarek @derekwaynecarr @kubernetes/sig-api-machinery

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Garbage Collector] Umbrella: known issues of garbage collector #26120

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Garbage Collector] Umbrella: known issues of garbage collector #26120

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions