8000 Avoid processing a single commit multiple times by smortex · Pull Request #1058 · github-changelog-generator/github-changelog-generator · GitHub
[go: up one dir, main page]

Skip to content

Avoid processing a single commit multiple times #1058

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

smortex
Copy link
Contributor
@smortex smortex commented May 31, 2025

When listing the commits that appear in a tag, some commits are added multiple time to the resulting Set. As a Set cannot contain duplicate, this is only a waste of computing resources, but with large repositories with a lot of branches and merges, this can require a tremendous amount of time.

To illustrate the issue, consider the following git history with 5 commits:

A---B---C---D
     \ /
      E

Assuming A is the root commit, and we call commits_in_tag(D). Let's inspect the value of shas and queue at the beginning of the loop, and what is pushed to these variables on each iteration:

  1. shas = {}; queue = [D] #=> shas << D; queue << C
  2. shas = {D}; queue = [C] #=> shas << C; queue << [B, E]
  3. shas = {D, C}; queue = [B, E] #=> shas << B; queue << A
  4. shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B
  5. shas = {D, C, B, E}; queue = [A, B] #=> shas << A
  6. shas = {D, C, B, E, A}; queue = [B] #=> shas << B; queue << A
  7. shas = {D, C, B, E, A}; queue = [A] #=> shas << A
  8. shas = {D, C, B, E, A}; queue = [] #=> exit loop

As you can see above, 6 and 7 attempt to add B and A to the set, but they where already added at step 3 and 5, so these action are no-op.

This change checks if the current commit is already present in shas, and if so skip to the next queued commit. The above example becomes:

  1. shas = {}; queue = [D] #=> shas << D; queue << C
  2. shas = {D}; queue = [C] #=> shas << C; queue << [B, E]
  3. shas = {D, C}; queue = [B, E] #=> shas << B; queue << A
  4. shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B
  5. shas = {D, C, B, E}; queue = [A, B] #=> shas << A
  6. shas = {D, C, B, E, A}; queue = [B] #=> next
  7. shas = {D, C, B, E, A}; queue = [] #=> exit loop

While we where previously unable to generate the ChangeLog of one of our projects in a reasonable amount of time (Ctrl+C after 15+ minutes of waiting at 100% CPU), this change allows us to generate it in ~45s on my laptop.

@smortex
Copy link
Contributor Author
smortex commented May 31, 2025

While we where previously unable to generate the ChangeLog of one of our projects in a reasonable amount of time (Ctrl+C after 15+ minutes of waiting at 100% CPU), this change allows us to generate it in ~45s on my laptop.

The repo in question: https://github.com/OpenVoxProject/openvox-agent

bundle exec bin/github_changelog_generator --user OpenVoxProject --project openvox-agent

When listing the commits that appear in a tag, some commits are added
multiple time to the resulting Set.  As a Set cannot contain duplicate,
this is only a waste of computing resources, but with large repositories
with a lot of branches and merges, this can require a tremendous amount
of time.

To illustrate the issue, consider the following git history with 5
commits:

```
A---B---C---D
     \ /
      E
```

Assuming A is the root commit, and we call `commits_in_tag(D)`.  Let's
inspect the value of `shas` and `queue` at the beginning of the loop,
and what is pushed to these variables on each iteration:

1. `shas = {}; queue = [D] #=> shas << D; queue << C`
2. `shas = {D}; queue = [C] #=> shas << C; queue << [B, E]`
3. `shas = {D, C}; queue = [B, E] #=> shas << B; queue << A`
4. `shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B`
5. `shas = {D, C, B, E}; queue = [A, B] #=> shas << A`
6. `shas = {D, C, B, E, A}; queue = [B] #=> shas << B; queue << A`
7. `shas = {D, C, B, E, A}; queue = [A] #=> shas << A`
8. `shas = {D, C, B, E, A}; queue = [] #=> exit loop`

As you can see above, 6 and 7 attempt to add B and A to the set, but
they where already added at step 3 and 5, so these action are no-op.

This change checks if the current commit is already present in `shas`,
and if so skip to the next queued commit.  The above example becomes:

1. `shas = {}; queue = [D] #=> shas << D; queue << C`
2. `shas = {D}; queue = [C] #=> shas << C; queue << [B, E]`
3. `shas = {D, C}; queue = [B, E] #=> shas << B; queue << A`
4. `shas = {D, C, B}; queue = [E, A] #=> shas << E; queue << B`
5. `shas = {D, C, B, E}; queue = [A, B] #=> shas << A`
6. `shas = {D, C, B, E, A}; queue = [B] #=> next`
7. `shas = {D, C, B, E, A}; queue = [] #=> exit loop`

While we where previously unable to generate the ChangeLog of one of our
projects in a reasonable amount of time (Ctrl+C after 15+ minutes of
waiting at 100% CPU), this change allows us to generate it in ~45s on my
laptop.
@smortex smortex force-pushed the avoid-processing-a-single-commit-multiple-time branch from 4fa93b7 to a0344af Compare May 31, 2025 07:19
smortex added a commit to OpenVoxProject/puppet that referenced this pull request Jun 3, 2025
For the record, the changelog was generated with [this commit of
github-changelog-generator](github-changelog-generator/github-changelog-generator#1058).
While not stricktly necessary, it happens that other non-released merged
PR where not added to the ChangeLog with the latest release.
smortex added a commit to OpenVoxProject/puppet that referenced this pull request Jun 3, 2025
For the record, the changelog was generated with [this commit of
github-changelog-generator](github-changelog-generator/github-changelog-generator#1058).
It happens that other non-released merged PR where not added to the
ChangeLog with the latest release.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0