E534 Fix stale PID race in ConsumerRegistry.unregister_name/1 by alco · Pull Request #3979 · electric-sql/electric · GitHub
[go: up one dir, main page]

Skip to content

Fix stale PID race in ConsumerRegistry.unregister_name/1#3979

Draft
alco wants to merge 8 commits intorefactor/static-logger-stringsfrom
alco/stale-consumer-registry
Draft

Fix stale PID race in ConsumerRegistry.unregister_name/1#3979
alco wants to merge 8 commits intorefactor/static-logger-stringsfrom
alco/stale-consumer-registry

Conversation

@alco
Copy link
Member
@alco alco commented Mar 9, 2026

Summary

  • Replace the no-op unregister_name/1 with an atomic :ets.match_delete/2 that removes the ETS entry only if the pid matches the calling (dying) process
  • A naive lookup-then-delete approach is still racey because the entry can be overwritten between the two ETS calls; match_delete is atomic
  • Add unit tests covering: basic cleanup, skip-if-different-pid, and the full race scenario (old process unregisters after replacement has registered)

Closes #3864

Test plan

  • mix test test/electric/shapes/consumer_registry_test.exs — 16 tests pass (3 new)
  • Full test suite passes in CI

🤖 Generated with Claude Code

@alco alco added the claude label Mar 9, 2026
@pkg-pr-new
Copy link
pkg-pr-new bot commented Mar 9, 2026

Open in StackBlitz

npm i https://pkg.pr.new/@electric-sql/react@3979
npm i https://pkg.pr.new/@electric-sql/client@3979
npm i https://pkg.pr.new/@electric-sql/y-electric@3979

commit: ebdb61c

@codecov
Copy link
codecov bot commented Mar 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 88.74%. Comparing base (c613cc5) to head (c622202).
✅ All tests successful. No failed tests found.

Additional details and impacted files
@@                       Coverage Diff                       @@
##           refactor/static-logger-strings    #3979   +/-   ##
===============================================================
  Coverage                           88.74%   88.74%           
===============================================================
  Files                                  25       25           
  Lines                                2425     2425           
  Branches                              609      613    +4     
===============================================================
  Hits                                 2152     2152           
  Misses                                271      271           
  Partials                                2        2           
Flag Coverage Δ
packages/experimental 87.73% <ø> (ø)
packages/react-hooks 86.48% <ø> (ø)
packages/start 82.83% <ø> (ø)
packages/typescript-client 93.93% <ø> (ø)
packages/y-electric 56.05% <ø> (ø)
typescript 88.74% <ø> (ø)
unit-tests 88.74% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@blacksmith-sh
Copy link
Contributor
blacksmith-sh bot commented Mar 9, 2026

Found 3 test failures on Blacksmith runners:

Failures

Test View Logs
at line 42 in shell electric/at line 42 in shell electric View Logs
at line 45 in shell electric_1/at line 45 in shell electric_1 View Logs
at line 97 in shell electric/at line 97 in shell electric View Logs

Fix in Cursor

alco and others added 7 commits March 9, 2026 13:28
…3964) (#3968)

This reverts commit cea4c13.

Possibly temporarily. Since this change is likely to have adverse
effects on the memory footprint, we've decided to test things out in
Cloud without it first.
## Summary

Add [TanStack Intent](https://github.com/TanStack/intent) skills to
`@electric-sql/client` and `@electric-sql/y-electric` so AI coding
agents can work with Electric correctly out of the box. Ships 9 skills
covering the full developer journey from shape setup to production
deployment.

## Approach

Skills were generated using the `@tanstack/intent scaffold` workflow:
1. **Domain Discovery** — analyzed source code, docs, and maintainer
interview to map 5 domains and 35 failure modes
2. **Tree Generator** — chose flat structure across 2 packages (8 skills
in typescript-client, 1 in y-electric)
3. **Skill Generation** — wrote SKILL.md files with setup patterns, code
examples verified against source, and wrong/correct mistake pairs

Each skill targets a specific task (not a concept), with code examples
validated against the current source at `v1.5.10`.

### Skills

| Skill | Package | Type | Key coverage |
|-------|---------|------|-------------|
| `electric-shapes` | client | core | ShapeStream/Shape config, onError
contract, WHERE clauses, type parsers |
| `electric-proxy-auth` | client | core | Server proxy setup,
ELECTRIC_PROTOCOL_QUERY_PARAMS, secret injection |
| `electric-schema-shapes` | client | core | Schema + shape co-design,
REPLICA IDENTITY FULL, replica modes |
| `electric-debugging` | client | lifecycle | Error retry behavior,
MissingHeadersError, fast-loop detection |
| `electric-deployment` | client | lifecycle | Docker Compose, env vars,
health checks, replication slot cleanup |
| `electric-new-feature` | client | lifecycle | Full 5-step setup:
schema → proxy → collection → queries → mutations |
| `electric-orm` | client | composition | Drizzle/Prisma txid patterns,
migration with REPLICA IDENTITY |
| `electric-postgres-security` | client | security | Pre-deploy
checklist: roles, grants, WAL, connection, publication |
| `electric-yjs` | y-electric | composition | ElectricProvider, BYTEA
parsing, resume state, awareness |

### Non-goals

- No framework-specific skills (TanStack DB handles React/Vue/etc.)
- No multi-shape-sync skill (experimental feature, may be removed)
- No runtime code changes

### CI Workflows

- `validate-skills.yml` — validates SKILL.md files on PRs touching
`skills/`
- `check-skills.yml` — checks for stale skills on release, opens review
PR
- `notify-intent.yml` — notifies TanStack/intent when source or docs
change

## Verification

```bash
cd packages/typescript-client && npx @tanstack/intent validate skills/
cd packages/y-electric && npx @tanstack/intent validate skills/
```

Both pass with no errors or warnings.

## Files changed

- `packages/typescript-client/skills/` — 8 SKILL.md files + 2 reference
files (WHERE clause syntax, type parsers)
- `packages/y-electric/skills/` — 1 SKILL.md file
- `packages/*/package.json` — added `skills` + `bin` to `files`,
`@tanstack/intent` devDependency, `bin` entry
- `packages/*/bin/intent.mjs` — intent CLI shim
- `.github/workflows/` — 3 new workflow files
- `.gitignore` — added `_artifacts`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This PR was opened by the [Changesets
release](https://github.com/changesets/action) GitHub action. When
you're ready to do a release, you can merge this and the packages will
be published to npm automatically. If you're not ready to do a release
yet, that's fine, whenever you add more changesets to main, this PR will
be updated.


# Releases
## @electric-sql/experimental@6.0.12

### Patch Changes

-   Updated dependencies [d1e08b8]
    -   @electric-sql/client@1.5.12

## @electric-sql/react@1.0.41

### Patch Changes

-   Updated dependencies [d1e08b8]
    -   @electric-sql/client@1.5.12

## @electric-sql/client@1.5.12

### Patch Changes

- d1e08b8: Add TanStack Intent skills for AI agent guidance. Ships 9
skills covering shapes, proxy auth, schema design, debugging,
deployment, new feature setup, ORM integration, Postgres security, and
Yjs collaboration.

## @electric-sql/y-electric@0.1.38

### Patch Changes

- d1e08b8: Add TanStack Intent skills for AI agent guidance. Ships 9
skills covering shapes, proxy auth, schema design, debugging,
deployment, new feature setup, ORM integration, Postgres security, and
Yjs collaboration.
-   Updated dependencies [d1e08b8]
    -   @electric-sql/client@1.5.12

## expo-db-electric-starter@1.0.13

### Patch Changes

-   Updated dependencies [d1e08b8]
    -   @electric-sql/client@1.5.12

## @core/sync-service@1.4.13

### Patch Changes

- 8daa822: Index `= ANY(array_field)` and `IN (const_list)` WHERE clause
expressions for O(1) shape filtering. ANY clauses reuse the
InclusionIndex (via single-element array containment), and IN clauses
reuse the EqualityIndex (registering each value separately). At 1000
concurrent shapes, fan-out latency improves by 6x (ANY) and 15x (IN)
compared to the previous linear scan.

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
https://deploy-preview-3973--electric-next.netlify.app/blog/2026/03/06/agent-skills-now-shipping

## Summary

- Announces that Electric, TanStack DB, and Durable Streams npm packages
now ship with agent skills
- Covers why versioned skills matter (training data staleness, new
libraries absent from models)
- Points readers to the kpb repo to try it out and the feedback skill to
report issues

## Test plan

- [ ] Preview the blog post on the website
- [ ] Verify hero image renders correctly
- [ ] Check all links resolve

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary

- Announces that Electric, TanStack DB, and Durable Streams npm packages
now ship with agent skills via the TanStack Intent system
- Covers why versioned skills matter (training data staleness, new
libraries absent from models)
- Points readers to the Playbook repo to try it out and the feedback
skill to report issues

## Test plan

- [ ] Preview the blog post on the website
- [ ] Verify hero image renders correctly
- [ ] Check all links resolve (no dead links)

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
## Summary
- Adds a `CLAUDE.md` symlink pointing to `AGENTS.md` in
`packages/sync-service` so that Claude Code automatically loads the
sync-service guidelines into context.

## Test plan
- [x] Verify symlink resolves correctly: `readlink
packages/sync-service/CLAUDE.md` → `AGENTS.md`

🤖 Generated with [Claude Code](https://claude.com/claude-code)
The previous no-op implementation left stale PIDs in the ETS table.
A naive lookup-then-delete fix would still be racey because the entry
could be overwritten between the lookup and delete. Using
:ets.match_delete/2 atomically deletes the entry only if the pid
matches the calling (dying) process.

Closes #3864

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@alco alco force-pushed the alco/stale-consumer-registry branch from dbb2f9f to 44ce5d6 Compare March 9, 2026 12:28
@claude
Copy link
claude bot commented Mar 9, 2026

Claude Code Review

Summary

This PR fixes the long-standing race in ConsumerRegistry.unregister_name/1 by replacing the deliberate no-op with an atomic :ets.match_delete/2 call that only removes the ETS entry if it still belongs to the dying process. It also simplifies consumer initialization by removing the redundant :shape pass-through from ShapeCache and directly fetching from ShapeStatus. Three new tests validate the race scenario end-to-end.

What is Working Well

  • Choosing match_delete over lookup+delete: The issue proposed a two-step lookup/delete which still has a TOCTOU window. The PR improves on this by using :ets.match_delete/2 which is atomic -- no window for a replacement consumer entry to be accidentally removed.
  • self() correctness: unregister_name/1 is called in the dying process context by OTP cleanup machinery, so self() correctly identifies the dying PID. The comment makes this non-obvious but accurate decision explicit.
  • Double-removal safety: For the {:shutdown, :cleanup} and {:shutdown, :suspend} paths, remove_consumer already deletes the ETS entry before unregister_name fires. The match_delete on a missing key is a no-op, so no double-removal crash is possible.
  • Consumer init simplification: Removing the :shape pass-through from ShapeCache.start_opts and always fetching fresh from ShapeStatus is cleaner. The old Map.get_lazy fallback was essentially always used (the cached value was redundant), so the simplification is safe.
  • Test quality: The three new tests cover the key scenarios well, especially the race scenario test that simulates the exact OTP lifecycle sequence (old process registers, replacement overwrites, old process calls unregister).

Issues Found

Important (Should Fix)

1. Error message regression in consumer.ex:124

File: packages/sync-service/lib/electric/shapes/consumer.ex:124

Issue: The old fetch_shape_by_handle! raised ArgumentError with the shape handle in the message. The new bare {:ok, shape} = produces a MatchError: no match of right hand side value: :error -- the shape handle is absent from the error message.

Impact: If a consumer races with shape deletion (shape removed between consumer spawn and handle_continue execution), the crash log is harder to diagnose. Crash reports in Sentry/telemetry aggregate on the error term, not the GenServer state.

Suggested fix: Explicitly handle the :error case, e.g.:

case ShapeCache.ShapeStatus.fetch_shape_by_handle(stack_id, shape_handle) do
  {:ok, shape} ->
    state = State.initialize_shape(state, shape, config)
    # ... continue init ...
  :error ->
    Logger.warning("Shape handle not found during consumer init, stopping",
      shape_handle: shape_handle)
    {:stop, :normal, state}
end

Or at minimum preserve a descriptive raise so the handle appears in crash logs.

2. Missing changeset entry for the ConsumerRegistry race fix

Issue: The 1.4.13 changelog entry (from the base branch) mentions the ANY/IN WHERE clause index fix, but the ConsumerRegistry race fix from this PR has no corresponding entry. When this branch reaches main, the fix will be invisible in the release history.

Suggested fix: Add a .changeset/*.md file documenting the fix so it appears in the next release CHANGELOG.

Suggestions (Nice to Have)

3. Issue #3864 code path #3 not addressed

The linked issue identified three affected code paths. This PR addresses paths #1 (stale PID blocking await_snapshot_start) and #2 (silent message loss via stale PIDs in publish/2). Code path #3 -- restore_shape_and_dependencies/4 trusting stale dead PIDs without a Process.alive? check -- is untouched. PR #3975 handles runtime dead-PID recovery, but it is worth noting in the PR description whether #3 is intentionally deferred.

4. Unrelated files inflate the diff

The diff includes GitHub Actions workflows (check-skills.yml, notify-intent.yml, validate-skills.yml), SKILL.md files, bin/intent.mjs, and package version bumps for react-hooks, experimental, y-electric, and typescript-client -- all from the base branch (refactor/static-logger-strings). A note in the PR description pointing reviewers to the sync-service changes would reduce review noise.

Issue Conformance

Issue #3864 is well-specified with root cause analysis, three affected code paths, and a proposed fix. This PR implements a strictly better version of the proposed fix (atomic match_delete vs. the TOCTOU-prone lookup+delete in the issue comment). Scope is appropriately narrow.

Previous Review Status

No previous review.


Review iteration: 1 | 2026-03-09

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

0