-
Notifications
You must be signed in to change notification settings - Fork 635
Add last_deletion_at
property to datasets
#5853
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces a new Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant Dataset
participant SampleCollection
participant Sample
participant ODM/Document
User->>Dataset: Delete sample(s)
Dataset->>ODM/Document: _update_last_deletion_at()
ODM/Document->>Dataset: Update last_deletion_at timestamp
Dataset->>SampleCollection: Reflect updated last_deletion_at
User->>Dataset: Query last_deletion_at
Dataset-->>User: Return timestamp of last sample deletion
Possibly related PRs
Suggested labels
Suggested reviewers
Poem
Tip ⚡️ Faster reviews with caching
Enjoy the performance boost—your workflow just got faster. 📜 Recent review detailsConfiguration used: .coderabbit.yaml 📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
⏰ Context from checks skipped due to timeout of 90000ms (12)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
docs/source/user_guide/using_datasets.rst (2)
993-996
: Add documentation forlast_deletion_at
property
The new lines clearly describe theDataset.last_deletion_at
property. It might help to add a brief note reminding users thatlast_modified_at
is intentionally unchanged on sample deletions to emphasize the separation of concerns.
2123-2130
: Introduce “Builtin datetime fields” section
This new subsection nicely consolidates all automatic timestamp fields. Please verify that this heading is included in the page’s navigation or table of contents so readers can easily locate it. You may also consider adding an explicit cross-reference from the “Dates and datetimes” section to this subsection for better discoverability.fiftyone/core/dataset.py (1)
737-742
: Consider expanding thelast_deletion_at
docstringWhile the current docstring is clear, it might be helpful to provide a bit more context about which operations cause this property to update (similar to the detailed explanation in the
last_modified_at
docstring).@property def last_deletion_at(self): - """The datetime that a sample was last deleted from the dataset, or - ``None`` if no samples have been deleted. + """The datetime that a sample was last deleted from the dataset, or + ``None`` if no samples have been deleted. + + This property is updated whenever samples or frames are deleted from the + dataset via methods like :meth:`delete_samples`, :meth:`delete_frames`, + :meth:`clear`, or :meth:`clear_frames`. """ return self._doc.last_deletion_at
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (8)
docs/source/user_guide/using_datasets.rst
(2 hunks)fiftyone/core/collections.py
(6 hunks)fiftyone/core/dataset.py
(4 hunks)fiftyone/core/odm/dataset.py
(1 hunks)fiftyone/core/odm/document.py
(1 hunks)tests/unittests/dataset_tests.py
(2 hunks)tests/unittests/video_tests.py
(3 hunks)tests/unittests/view_tests.py
(1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (4)
fiftyone/core/odm/dataset.py (2)
fiftyone/core/dataset.py (1)
last_deletion_at
(738-742)fiftyone/core/fields.py (1)
DateTimeField
(915-944)
fiftyone/core/odm/document.py (5)
fiftyone/core/dataset.py (3)
_update_last_deletion_at
(8299-8300)last_deletion_at
(738-742)save
(4293-4299)fiftyone/core/odm/mixins.py (1)
save
(1870-1871)fiftyone/core/collections.py (1)
save
(144-197)fiftyone/core/sample.py (2)
save
(544-546)save
(735-743)fiftyone/core/frame.py (3)
save
(485-487)save
(1042-1044)save
(1092-1099)
tests/unittests/dataset_tests.py (2)
fiftyone/core/dataset.py (4)
last_deletion_at
(738-742)created_at
(713-715)clear
(5004-5010)delete_samples
(3945-3964)fiftyone/core/view.py (2)
keep
(1320-1330)clear
(1297-1306)
fiftyone/core/dataset.py (1)
fiftyone/core/odm/document.py (1)
_update_last_deletion_at
(810-815)
⏰ Context from checks skipped due to timeout of 90000ms (10)
- GitHub Check: test-windows / test-python (windows-latest, 3.12)
- GitHub Check: test-windows / test-python (windows-latest, 3.11)
- GitHub Check: test-windows / test-python (windows-latest, 3.10)
- GitHub Check: test-windows / test-python (windows-latest, 3.9)
- GitHub Check: test / test-python (ubuntu-latest-m, 3.10)
- GitHub Check: e2e / test-e2e
- GitHub Check: test / test-app
- GitHub Check: build / build
- GitHub Check: lint / eslint
- GitHub Check: build
🔇 Additional comments (17)
fiftyone/core/odm/dataset.py (1)
901-901
: Added new DateTimeField for tracking sample deletions.Adding
last_deletion_at
to the dataset document creates a dedicated timestamp for tracking when samples were deleted, separate from general modifications. This is a good improvement that allows for more precise tracking of dataset state changes.fiftyone/core/odm/document.py (1)
810-816
: Added method to update the deletion timestamp.This method maintains consistency with the existing
_update_last_modified_at
implementation pattern. It properly defaults to the current UTC time when no timestamp is provided and saves the document with thevirtual=True
flag to persist the change.tests/unittests/dataset_tests.py (1)
620-659
: The newtest_last_deletion_at
test comprehensively validates deletion timestamp tracking.This test method thoroughly verifies the
last_deletion_at
property behavior by:
- Confirming it's initially
None
after adding samples- Verifying it's set and strictly increases after each deletion operation
- Testing different deletion paths (keep, clear, delete_samples)
- Validating dataset size after each operation
The test aligns well with the PR objectives of tracking deletion timestamps separately from other dataset modifications.
tests/unittests/video_tests.py (2)
746-796
: Tests correctly verify that frame deletions updatelast_modified_at
but notlast_deletion_at
The test has been appropriately renamed from
test_last_modified_at_deletions
totest_last_modified_at_frame_deletions1
for clarity. The test now correctly verifies that:
- The
last_deletion_at
property is initiallyNone
- Frame deletions update the maximum
last_modified_at
timestamp- After reloading the dataset,
last_deletion_at
remainsNone
This properly validates that deletions at the frame level only affect the
last_modified_at
property and not thelast_deletion_at
property.
798-850
: Tests sample-level frame deletions correctlyThe test has been appropriately renamed from
test_last_modified_at_deletions_frames
totest_last_modified_at_frame_deletions2
for clarity. This test verifies the same behavior as the previous test but at the sample level:
- The
last_deletion_at
property is initiallyNone
- Deleting frames from a sample updates the sample's
last_modified_at
property- After reloading the dataset,
last_deletion_at
remainsNone
This correctly validates that frame deletions performed on individual samples only affect the
last_modified_at
property and not the newlast_deletion_at
property.tests/unittests/view_tests.py (1)
5057-5143
: Tests for_is_full_collection()
enhancements look comprehensive and well-structured.The added test class
FullCollectionTests
properly validates the enhanced behavior of the_is_full_collection()
method that was updated as part of this PR. The tests cover important scenarios including:
- Regular dataset views with and without filtering
- Generated views like patches
- Grouped datasets with different slice selections
These tests directly align with the PR objective to update the
_is_full_collection()
method to recognize full views created bydataset.view()
and full generated views liketo_patches()
.fiftyone/core/dataset.py (5)
719-734
: Clear explanation oflast_modified_at
behaviorThe updated docstring provides excellent clarity on when this property is incremented and when it is not. The distinction between dataset-level metadata changes and sample-level changes is well-explained, and the guidance on how to determine when samples were last added, edited, or deleted is helpful.
737-742
: Good addition oflast_deletion_at
propertyThis new property provides a cleaner way to track sample deletions separately from other dataset modifications, which aligns with the PR's objective of improving timestamp semantics.
5038-5038
: Appropriate update to use_update_last_deletion_at
in_clear
methodThis change properly updates the new
last_deletion_at
timestamp rather thanlast_modified_at
when samples are deleted, ensuring consistent behavior with the updated property semantics.
8299-8301
: Well-implemented_update_last_deletion_at
methodThis method correctly updates the document's
last_deletion_at
timestamp and saves the changes. It follows the same pattern as the existing timestamp update methods.
8770-8770
: Proper initialization oflast_deletion_at
during cloningSetting
last_deletion_at
toNone
when cloning a dataset is appropriate as the new dataset hasn't had any samples deleted yet.fiftyone/core/collections.py (6)
604-609
: Documentation updated to reflect both modification and deletion time tracking.The docstring has been updated to clarify that the
sync_last_modified_at
method now takes into account both modification and deletion timestamps of samples, which aligns with the introduction of the newlast_deletion_at
property.
668-677
: Implementation updated to consider both modification and deletion timestamps.The
_sync_dataset_last_modified_at
method now uses the new_none_max
helper function to calculate the maximum between the dataset'slast_deletion_at
and the maximum samplelast_modified_at
, ensuring that both kinds of timestamps are properly considered when updating the dataset'slast_modified_at
.
8509-8531
: Performance optimization added for themin
method on datetime fields.This is a nice optimization that allows the method to use a more efficient
_min()
implementation when operating on common datetime fields with no expression and when the collection represents the full dataset. This should improve performance for these common operations.🧰 Tools
🪛 Ruff (0.8.2)
8529-8531: Do not assign a
lambda
expression, use adef
Rewrite
make
as adef
(E731)
8610-8632
: Performance optimization added for themax
method on datetime fields.Similar to the optimization for
min
, this change improves performance when calculating the maximum value of datetime fields on full collections. The implementation correctly mirrors themin
method's optimization.🧰 Tools
🪛 Ruff (0.8.2)
8630-8632: Do not assign a
lambda
expression, use adef
Rewrite
make
as adef
(E731)
11070-11089
: Enhanced_is_full_collection
detection with support for full views.The
_is_full_collection
method has been extended to recognize additional cases where a collection represents the full dataset, including empty-stage views. This improvement supports the optimizations in themin
andmax
methods and aligns with the PR objective of better handling full views like those created bydataset.view()
.
12523-12529
: Added utility function_none_max
for handling None values in max operations.This helper function safely computes the maximum of multiple arguments while ignoring
None
values, which is useful for the updated logic in_sync_dataset_last_modified_at
. The implementation is concise and correctly handles the case when all arguments areNone
by returning the specified default.
Couldn't the same be said for insertion? @brimoor |
Looks alright to me but wondering if we should also have |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
requesting changes merely to get an answer to insertion question
@swheaton good question. Here's my thinking:
Deletion is special because it does not reliably change the min/max statistic of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
got it thanks. looks good to me then
Change log
Dataset.last_deletion_at
property that is automatically updated when samples are deletedDataset.last_modified_at
is no longer automatically updated when samples are deleted_is_full_collection()
to include full views, egdataset.view()
and full generated views liketo_patches()
Motivation
#5723 added automatic incrementing of
Dataset.last_modified_at
when samples are deleted. This was released infiftyone==1.5.0
.After reflecting on this implementation in the context of use cases like voxel51/fiftyone-plugins#122, I strongly believe that deletion times should be tracked separately from other dataset metadata changes.
For example, changing a dataset's description and creating a new saved view both increment
Dataset.last_modified_at
, but these have no effect on whether a savedto_patches()
view needs to be reloaded.As penance for my decision to change course from how deletion is tracked in
fiftyone==1.5.0
, I've added previously non-existent documentation on automatic datetime fields to the user guide 🙇Summary by CodeRabbit
New Features
Bug Fixes
Tests
Documentation