-
Notifications
You must be signed in to change notification settings - Fork 406
Parallelize Git operations #518
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lib/async-queue.js
Outdated
} | ||
|
||
execute() { | ||
return this.fn.call(undefined).then(this.resolve, this.reject); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason to clobber this
with .call()
rather than just calling it directly with this.fn().then(...)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I specifically don't want the Task
to be the context in the function.
this.commandQueue.shift(); | ||
constructor(options = {}) { | ||
this.parallelism = options.parallelism || 1; | ||
this.nonParallelizableOperation = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR is a great exercise in spelling different forms of "parallelizable" 😂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or saying it out loud 😏 @kuychaco
lib/async-queue.js
Outdated
constructor(options = {}) { | ||
this.parallelism = options.parallelism || 1; | ||
this.nonParallelizableOperation = false; | ||
this.threadsInUse = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
threadsInUse
feels like a misleading name to me because there aren't really "threads" in play. How about tasksInProgress
?
lib/async-queue.js
Outdated
try { | ||
await task.execute(); | ||
} catch (err) { | ||
// nothing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this swallow any important errors... ? I'm assuming that we expect most task failures to manifest as Promise rejections 🤔
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Promise
inside the Task
— the one returned from getPromise
— will be rejected, but we don't want the task failure itself to affect the control flow of the queue.
lib/git-shell-out-strategy.js
Outdated
|
||
// Execute a command and read the output using the embedded Git environment | ||
exec(args, stdin = null, useGitPromptServer = false) { | ||
exec(args, stdin = null, useGitPromptServer = false, writeOperation = false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yah, I can see why you want to turn this into an options map. 👍
lib/git-shell-out-strategy.js
Outdated
constructor(workingDir) { | ||
this.workingDir = workingDir; | ||
this.commandQueue = new AsyncQueue(); | ||
this.commandQueue = new AsyncQueue({parallelism: 5}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, maybe inject as a package config option?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this is temporary. I plan on using the number of CPUs or other similar things to determine this number.
return stdout; | ||
return new Promise(resolve => { | ||
timingMarker.mark('nexttick'); | ||
setImmediate(() => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason to use setImmediate
vs. setTimeout(..., 0)
or process.nextTick
for this? I can never remember what the implications of those are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had used nextTick
at first but switched to setImmediate
. I'll need to look into it to see what the difference is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use
setImmediate
if you want to queue the function behind whatever I/O event callbacks that are already in the event queue. Useprocess.nextTick
to effectively queue the function at the head of the event queue so that it executes immediately after the current function completes.So in a case where you're trying to break up a long running, CPU-bound job using recursion, you would now want to use
setImmediate
rather thanprocess.nextTick
to queue the next iteration as otherwise any I/O event callbacks wouldn't get the chance to run between iterations.
I'm not sure it matters a ton here but I'm going to stick with setImmediate
for now.
class Task { | ||
constructor(fn, parallel = true) { | ||
this.fn = fn; | ||
this.parallel = parallel; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about the terminology here: would it be clearer to invert this flag and have a Task be "serial" or "exclusive" rather than "parallel"? Or would that lead to a bunch of double negatives elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha yeah @kuychaco and I debated a bit on this. I think she was right. We'll make it better.
test/async-queue.test.js
Outdated
} | ||
} | ||
|
||
describe.only('AsyncQueue', function() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔥 .only(
😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😡
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
aheadCount: null, | ||
behindCount: null, | ||
remoteName: await repository.getRemoteForBranch(this.branchName), | ||
remoteName: await dataPromises.remoteName, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Neat.
|
||
// Give tests that rely on filesystem event delivery lots of breathing room. | ||
until.setDefaultTimeout(5000); | ||
until.setDefaultTimeout(3000); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm will this cause problems on AppVeyor?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I hope not. But it has to be lower than the mocha timeout or else test-until
can never timeout with an appropriate message because the whole test times out instead.
We can keep an eye on it and go from there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But it has to be lower than the mocha timeout or else test-until can never timeout with an appropriate message because the whole test times out instead.
Ohhh that's what's been happening. Good catch.
*/ | ||
async getStatusesForChangedFiles() { | ||
const output = await this.exec(['status', '--untracked-files=all', '-z']); | ||
const output = await this.exec(['status', '--untracked-files=all', '-z'], {writeOperation: true}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lots of these operations seem to be writeOperation: true
unnecessarily. Is there some context here I'm missing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces a new
AsyncQueue
that has the ability to parallelize tasks. These parallelized tasks won't block other operations at the queue level, though of course JS semantics may still cause the underlying operations to block one another.The queue also has the ability to add tasks that are non-parallelizable; currently, this is used to ensure that writes always execute alone, waiting for the queue to drain before they run, and running to completion before normal parallel queue semantics are resumed.
Additionally, a new segment in the timings view has been added called
nexttick
; it is colored yellow and shows the time afterprepare
has run but before the actual call toexec
runs. In particular, this can help show time growing in a stepwise fashion waiting for the JS event loop to finish with the previous task invocation before the next one can start, indicating we can save some time by optimizing this or making it fully async:TODO:
exec
args to an option object