E544 fix(tasker): make task claiming atomic with PostgreSQL SKIP LOCKED to prevent race conditions by Nirvik30 · Pull Request #24277 · calcom/cal.com · GitHub
[go: up one dir, main page]

Skip to content

Conversation

Nirvik30
Copy link
@Nirvik30 Nirvik30 commented Oct 5, 2025

This PR fixes a race condition in task claiming by making the process atomic.

Summary of Changes

  • Added a claimedAt field to the Task model (with a composite index on [claimedAt, scheduledAt, succeededAt]).
  • Replaced findMany() with transaction-based claiming using PostgreSQL SELECT FOR UPDATE SKIP LOCKED.
  • Implemented stale claim detection: tasks claimed more than 5 minutes ago are released and can be reclaimed.
  • Updated retry() to clear claimedAt, allowing tasks to be picked up again.

Benefits

  • Prevents duplicate task processing by multiple workers.
  • Ensures only one worker can claim a task at a time.
  • Handles worker crashes or failures gracefully by reclaiming stale tasks.
  • Improves concurrency with database-level guarantees.

Notes

  • Requires a database migration to add the claimedAt column before deploying.

Fixes #24186

@Nirvik30 Nirvik30 requested a review from a team as a code owner October 5, 2025 12:44
@CLAassistant
Copy link
CLAassistant commented Oct 5, 2025

CLA assistant check
All committers have signed the CLA.

Copy link
vercel bot commented Oct 5, 2025

@Nirvik30 is attempting to deploy a commit to the cal Team on Vercel.

A member of the Team first needs to authorize it.

Copy link
Contributor
github-actions bot commented Oct 5, 2025

Hey there and thank you for opening this pull request! 👋🏼

We require pull request titles to follow the Conventional Commits specification and it looks like your proposed title needs to be adjusted.

Details:

Unknown release type "Fix" found in pull request title "Fix: Atomic task claiming with PostgreSQL SKIP LOCKED to prevent race conditions". 

Available types:
 - feat: A new feature
 - fix: A bug fix
 - docs: Documentation only changes
 - style: Changes that do not affect the meaning of the code (white-space, formatting, missing semi-colons, etc)
 - refactor: A code change that neither fixes a bug nor adds a feature
 - perf: A code change that improves performance
 - test: Adding missing tests or correcting existing tests
 - build: Changes that affect the build system or external dependencies (example scopes: gulp, broccoli, npm)
 - ci: Changes to our CI configuration files and scripts (example scopes: Travis, Circle, BrowserStack, SauceLabs)
 - chore: Other changes that don't modify src or test files
 - revert: Reverts a previous commit

@github-actions github-actions bot added the 🐛 bug Something isn't working label Oct 5, 2025
@graphite-app graphite-app bot added the community Created by Linear-GitHub Sync label Oct 5, 2025
@graphite-app graphite-app bot requested a review from a team October 5, 2025 12:44
Copy link
Contributor
coderabbitai bot commented Oct 5, 2025

Walkthrough

  • repository.ts: getNextBatch now uses a DB transaction with SELECT FOR UPDATE SKIP LOCKED to atomically claim up to 1000 tasks. It filters tasks not succeeded, scheduled in the past, and either unclaimed (claimedAt null) or stale (claimedAt older than 5 minutes), then sets claimedAt to now before returning. Returns an empty array if none found. makeWhereUpcomingTasks updated to include the stale-claim OR condition. Retry flow now resets claimedAt to null after incrementing attempts and setting lastError/lastFailedAttemptAt.
  • schema.prisma: Added Task.claimedAt DateTime? and an index on [claimedAt, scheduledAt, succeededAt].

Pre-merge checks and finishing touches

✅ Passed checks (5 passed)
Check name Status Explanation
Linked Issues Check ✅ Passed The changes fully address issue #24186 by using a database transaction with SELECT FOR UPDATE SKIP LOCKED for atomic claiming, preventing overlapping batches, adding stale‐claim detection to reclaim stuck tasks, and resetting claimedAt on retry to allow reprocessing, thus ensuring at‐most‐once semantics under concurrency.
Out of Scope Changes Check ✅ Passed All modifications—including the new claimedAt column, composite index, updated getNextBatch transaction logic, stale claim conditions, and retry adjustments—are directly tied to the atomic task claiming objectives with no unrelated code changes.
Description Check ✅ Passed The description clearly outlines the relevant changes, including adding the claimedAt field and composite index, replacing findMany with transaction-based SELECT FOR UPDATE SKIP LOCKED, implementing stale claim detection, and updating retry logic to clear claimedAt.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
Title Check ✅ Passed The title clearly and concisely summarizes the primary change of introducing atomic task claiming using PostgreSQL SKIP LOCKED to prevent race conditions in the tasker module. It specifies the scope (tasker), the nature of the fix (atomic claiming), and the mechanism (SKIP LOCKED), making it immediately understandable to reviewers. This accurately reflects the main objective and matches the implemented changes.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@dosubot dosubot bot added the ❗️ migrations contains migration files label Oct 5, 2025
Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
packages/features/tasker/repository.ts (1)

116-116: Keep in-memory claimedAt consistent with the DB write

We lock, claim, and persist claimedAt = now, but the array we return still reflects the pre-update rows (often null). Any downstream logging/metrics that inspect the returned payload will see claimedAt unset even though the DB row is already claimed. Map the results before returning so the in-memory view matches what we just wrote, without another query.

-      return tasks;
+      return tasks.map((task) => ({
+        ...task,
+        claimedAt: now,
+      }));
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a4be822 and c6093e2.

📒 Files selected for processing (2)
  • packages/features/tasker/repository.ts (3 hunks)
  • packages/prisma/schema.prisma (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/review.mdc)

**/*.ts: For Prisma queries, only select data you need; never use include, always use select
Ensure the credential.key field is never returned from tRPC endpoints or APIs

Files:

  • packages/features/tasker/repository.ts
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/review.mdc)

Flag excessive Day.js use in performance-critical code; prefer native Date or Day.js .utc() in hot paths like loops

Files:

  • packages/features/tasker/repository.ts
**/*.{ts,tsx,js,jsx}

⚙️ CodeRabbit configuration file

Flag default exports and encourage named exports. Named exports provide better tree-shaking, easier refactoring, and clearer imports. Exempt main components like pages, layouts, and components that serve as the primary export of a module.

Files:

  • packages/features/tasker/repository.ts
⏰ Context from checks skipped due to timeout of 180000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Tests / Unit
  • GitHub Check: Linters / lint
  • GitHub Check: Type check / check-types
  • GitHub Check: Codacy Static Code Analysis

@Nirvik30 Nirvik30 changed the title Fix: Atomic task claiming with PostgreSQL SKIP LOCKED to prevent race conditions fix(tasker): make task claiming atomic with PostgreSQL SKIP LOCKED to prevent race conditions Oct 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 bug Something isn't working community Created by Linear-GitHub Sync ❗️ migrations contains migration files size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Race Condition in Task Claiming (getNextBatch) Leading to Duplicate Processing
2 participants
0