8000 Parallelize Git operations by BinaryMuse · Pull Request #518 · atom/github · GitHub
[go: up one dir, main page]

Skip to content
This repository was archived by the owner on Dec 15, 2022. It is now read-only.

Conversation

BinaryMuse
Copy link
Contributor
@BinaryMuse BinaryMuse commented Feb 15, 2017

This PR introduces a new AsyncQueue that has the ability to parallelize tasks. These parallelized tasks won't block other operations at the queue level, though of course JS semantics may still cause the underlying operations to block one another.

The queue also has the ability to add tasks that are non-parallelizable; currently, this is used to ensure that writes always execute alone, waiting for the queue to drain before they run, and running to completion before normal parallel queue semantics are resumed.

Additionally, a new segment in the timings view has been added called nexttick; it is colored yellow and shows the time after prepare has run but before the actual call to exec runs. In particular, this can help show time growing in a stepwise fashion waiting for the JS event loop to finish with the previous task invocation before the next one can start, indicating we can save some time by optimizing this or making it fully async:

github_package_timings_view


TODO:

  • Variable names
  • Convert exec args to an option object

@BinaryMuse BinaryMuse requested review from smashwilson and removed request for smashwilson February 15, 2017 08:08
Copy link
Contributor
@smashwilson smashwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall summary:

mind blown

}

execute() {
return this.fn.call(undefined).then(this.resolve, this.reject);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to clobber this with .call() rather than just calling it directly with this.fn().then(...)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I specifically don't want the Task to be the context in the function.

this.commandQueue.shift();
constructor(options = {}) {
this.parallelism = options.parallelism || 1;
this.nonParallelizableOperation = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR is a great exercise in spelling different forms of "parallelizable" 😂

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or saying it out loud 😏 @kuychaco

constructor(options = {}) {
this.parallelism = options.parallelism || 1;
this.nonParallelizableOperation = false;
this.threadsInUse = 0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

threadsInUse feels like a misleading name to me because there aren't really "threads" in play. How about tasksInProgress?

try {
await task.execute();
} catch (err) {
// nothing
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this swallow any important errors... ? I'm assuming that we expect most task failures to manifest as Promise rejections 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Promise inside the Task — the one returned from getPromise — will be rejected, but we don't want the task failure itself to affect the control flow of the queue.


// Execute a command and read the output using the embedded Git environment
exec(args, stdin = null, useGitPromptServer = false) {
exec(args, stdin = null, useGitPromptServer = false, writeOperation = false) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yah, I can see why you want to turn this into an options map. 👍

constructor(workingDir) {
this.workingDir = workingDir;
this.commandQueue = new AsyncQueue();
this.commandQueue = new AsyncQueue({parallelism: 5});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, maybe inject as a package config option?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this is temporary. I plan on using the number of CPUs or other similar things to determine this number.

return stdout;
return new Promise(resolve => {
timingMarker.mark('nexttick');
setImmediate(() => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason to use setImmediate vs. setTimeout(..., 0) or process.nextTick for this? I can never remember what the implications of those are.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had used nextTick at first but switched to setImmediate. I'll need to look into it to see what the difference is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use setImmediate if you want to queue the function behind whatever I/O event callbacks that are already in the event queue. Use process.nextTick to effectively queue the function at the head of the event queue so that it executes immediately after the current function completes.

So in a case where you're trying to break up a long running, CPU-bound job using recursion, you would now want to use setImmediate rather than process.nextTick to queue the next iteration as otherwise any I/O event callbacks wouldn't get the chance to run between iterations.

I'm not sure it matters a ton here but I'm going to stick with setImmediate for now.

class Task {
constructor(fn, parallel = true) {
this.fn = fn;
this.parallel = parallel;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about the terminology here: would it be clearer to invert this flag and have a Task be "serial" or "exclusive" rather than "parallel"? Or would that lead to a bunch of double negatives elsewhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha yeah @kuychaco and I debated a bit on this. I think she was right. We'll make it better.

}
}

describe.only('AsyncQueue', function() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔥 .only( 😉

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😡

Copy link
Contributor
@smashwilson smashwilson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

aheadCount: null,
behindCount: null,
remoteName: await repository.getRemoteForBranch(this.branchName),
remoteName: await dataPromises.remoteName,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Neat.


// Give tests that rely on filesystem event delivery lots of breathing room.
until.setDefaultTimeout(5000);
until.setDefaultTimeout(3000);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm will this cause problems on AppVeyor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hope not. But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.

We can keep an eye on it and go from there.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.

Ohhh that's what's been happening. Good catch.

@BinaryMuse BinaryMuse merged commit 55fc31f into master Feb 15, 2017
@BinaryMuse BinaryMuse deleted the mkt-ku-parallelize-git-ops branch February 15, 2017 21:21
*/
async getStatusesForChangedFiles() {
const output = await this.exec(['status', '--untracked-files=all', '-z']);
const output = await this.exec(['status', '--untracked-files=all', '-z'], {writeOperation: true});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lots of these operations seem to be writeOperation: true unnecessarily. Is there some context here I'm missing?

Copy link
Contributor Author
@BinaryMuse BinaryMuse Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@shiftkey These are all status calls and can create index.locks (as far as I understand). We're converting some of them to nodegit calls in #524

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0