-
Notifications
You must be signed in to change notification settings - Fork 18.8k
Windows: do not use parallelism on container restart #39733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@thaJeztah @ddebroy PTAL. It would be nice to get this to cherry pick process as we have seen Windows containers hosts time to time freezing on reboot and only fix to that one is do manually hard reset for them. Afaiu it happens because they are not able to handle kernel messages as last message on event logs varies. |
419a5fe
to
f4d41a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SGTM (don't know if 1
is the right value to use, but makes sense to me from the description)
ping @jhowardmsft @ddebroy @vikramhh ptal
I actually also tested with number of CPUs but result was that restoring speed is very near of same than with 1 but it server is more "stuck" during restore. One issue with this one is that Line 504 in 9b25c0f
actually finishes when some processes inside of container are still starting so optimal solution probably would be modify code to keep track how many containers there which are not completely started but that probably would need also changes to hcsshim side (=> not sure, I have not investigated). That why I ended up to propose this kind of simple solution which hopefully is good enough for most of the use cases. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to make this a configurable option for the engine rather than hardcoding it? That way, depending on the workloads and environment, administrators can tune the value to exactly what they want. It can default to 1 if that is desirable.
@ddebroy sure it is possible but I was wondering why it have not been implemented earlier? 🤔 Also we need then decide name for that option. Any suggestions? |
Maybe something like |
@ddebroy if we want to make it configurable, I'd keep that separate. Would an administrator know what value to put there? I think we should have more metrics as to what limit would be realistic (on a beefy machine). If (even on a powerful machine) (basically; I'd like to avoid adding bells and whistles that nobody uses) |
Also; forgot to write that, but configurable would be a new feature, and not something we would backport to current releases; it looks like this patch may be addressing an issue that's introduced in 19.03, so could be candidate for backporting |
I repeated same 100 container restore test on couple of different combinations: 4 CPU - default limittime="2019-08-14T11:22:14.260679800+03:00" level=info msg="Starting up"
...
time="2019-08-14T11:23:11.326345700+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T11:23:11.326345700+03:00") - (get-date "2019-08-14T11:22:14.260679800+03:00")
TotalSeconds : 57,0656659 4 CPU - limit to 1time="2019-08-14T11:31:09.337321500+03:00" level=info msg="Starting up"
...
time="2019-08-14T11:32:09.525747200+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T11:32:09.525747200+03:00") - (get-date "2019-08-14T11:31:09.337321500+03:00")
TotalSeconds : 60,1884257 8 CPU - default limittime="2019-08-14T11:12:23.265451500+03:00" level=info msg="Starting up"
...
time="2019-08-14T11:12:56.014299000+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T11:12:56.014299000+03:00") - (get-date "2019-08-14T11:12:23.265451500+03:00")
TotalSeconds : 32,7488475 8 CPU - limit to 1time="2019-08-14T11:16:39.268269800+03:00" level=info msg="Starting up"
...
time="2019-08-14T11:17:44.633734300+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T11:17:44.633734300+03:00") - (get-date "2019-08-14T11:16:39.268269800+03:00")
TotalSeconds : 65,3654645 36 CPU - default limittime="2019-08-14T10:41:41.025922400+03:00" level=info msg="Starting up"
...
time="2019-08-14T10:42:15.217742700+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T10:42:15.217742700+03:00") - (get-date "2019-08-14T10:41:41.025922400+03:00")
TotalSeconds : 34,1918203 36 CPU - limit to 1time="2019-08-14T10:57:17.434044600+03:00" level=info msg="Starting up"
...
time="2019-08-14T10:58:27.619699700+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T10:58:27.619699700+03:00") - (get-date "2019-08-14T10:57:17.434044600+03:00")
TotalSeconds : 70,1856551 So it looks to be that those bigger machines can actually have bigger limit but I can try to look if I'm able to create some nice rule to there. It is also interesting that even with 4 CPU these tests was now lot of faster after I switched to another environment (maybe because of faster harddrives, etc...) |
f4d41a3
to
64a5ba8
Compare
@thaJeztah @ddebroy did more testing and ended up to change logic now to way that it uses value Logs of those startup tests comparisons on below. Like you can see. More server have CPUs less this change have affect to startup speed and we still avoid that hanging which I trying to fix here. 4 CPU - default limittime="2019-08-14T14:49:42.549725900+03:00" level=info msg="Starting up"
...
time="2019-08-14T14:50:50.048546700+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T14:50:50.048546700+03:00") - (get-date "2019-08-14T14:49:42.549725900+03:00")
TotalSeconds : 67,4988208 4 CPU - limit to 1time="2019-08-14T14:25:43.905063900+03:00" level=info msg="Starting up"
...
time="2019-08-14T14:27:14.485654600+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T14:27:14.485654600+03:00") - (get-date "2019-08-14T14:25:43.905063900+03:00")
TotalSeconds : 90,5805907 5 CPU - default limittime="2019-08-14T12:22:17.792513300+03:00" level=info msg="Starting up"
...
time="2019-08-14T12:22:53.407527000+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T12:22:53.407527000+03:00") - (get-date "2019-08-14T12:22:17.792513300+03:00")
TotalSeconds : 35,6150137 5 CPU - limit to 5time="2019-08-14T15:20:23.314785800+03:00" level=info msg="Starting up"
...
time="2019-08-14T15:21:22.962476000+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T15:21:22.962476000+03:00") - (get-date "2019-08-14T15:20:23.314785800+03:00")
TotalSeconds : 59,6476902 6 CPU - default limittime="2019-08-14T13:06:55.115185700+03:00" level=info msg="Starting up"
...
time="2019-08-14T13:07:40.747497400+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T13:07:40.747497400+03:00") - (get-date "2019-08-14T13:06:55.115185700+03:00")
TotalSeconds : 45,6323117 6 CPU - limit to 6time="2019-08-14T16:15:29.737757900+03:00" level=info msg="Starting up"
...
time="2019-08-14T16:16:21.564104300+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T16:16:21.564104300+03:00") - (get-date "2019-08-14T16:15:29.737757900+03:00")
TotalSeconds : 51,8263464 7 CPU - default limittime="2019-08-14T14:13:20.532105100+03:00" level=info msg="Starting up"
...
time="2019-08-14T14:14:05.378632900+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T14:14:05.378632900+03:00") - (get-date "2019-08-14T14:13:20.532105100+03:00")
TotalSeconds : 44,8465278 7 CPU - limit to 7time="2019-08-14T16:00:29.633611200+03:00" level=info msg="Starting up"
...
time="2019-08-14T16:01:17.411028500+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T16:01:17.411028500+03:00") - (get-date "2019-08-14T16:00:29.633611200+03:00")
TotalSeconds : 47,7774173 16 CPU - default limittime="2019-08-14T14:43:18.720012600+03:00" level=info msg="Starting up"
...
time="2019-08-14T14:44:00.060481100+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T14:44:00.060481100+03:00") - (get-date "2019-08-14T14:43:18.720012600+03:00")
TotalSeconds : 41,3404685 16 CPU - limit to 16time="2019-08-14T14:31:52.016665600+03:00" level=info msg="Starting up"
...
time="2019-08-14T14:32:38.578530700+03:00" level=info msg="API listen on //./pipe/docker_engine"
(get-date "2019-08-14T14:32:38.578530700+03:00") - (get-date "2019-08-14T14:31:52.016665600+03:00")
TotalSeconds : 46,5618651 |
ping @jhowardmsft @ddebroy @vikramhh ptal |
@olljanat - thanks for sharing extensive comparison data above. #38301 talks about container numbers that are orders of magnitude more than what you used [20K vs. 100]. Any way to be sure that 100 is indeed representative of typical workloads and not "too small"? IIUC the main reason for this change is to mitigate the negative impact on User Experience. For Windows Server SKU-s, that would typically not be an issue. It could be possible to scope this change to desktop only but at least based on data you have shared, looks like currently it would make less than 10% difference. Are there scenarios where we would care for that perf regression? LGTM otherwise |
That is good question and I definitely would like to know if someone from Microsoft have tested that where those limits actually are? We are using ~20 containers per node on based on my experience I would not recommend anyone use anything more than 50 containers per node. So 100 is double about of that "limit".
That is not true. User Experience is just easiest way demonstrate what happens. The real issue is that under that load server is not able to handle other processes (e.g. monitoring show that server is down, VMware says that guest tools are not running, RDP connections to server does not working, etc...). We seen that happening on production server(s) every week that after scheduled reboots they got frozen and only way fix then is do power off + power on for them (randomly on some of the servers). Btw. I did some more testing with Moby master version on 36 cores machines. When I do 200 containers restart it actually passes on two minutes and CPU usage does not exceed 40%. Then next two minutes server just idle. But the this happens: So definitely this PR will no fix whole issue but question more likely is to make situation little less bad. |
In general I would say that |
@jterry75 all those tests are done with Windows Server 2019. I will try with that number and update PR. |
64a5ba8
to
2d72c4e
Compare
LGTM |
2d72c4e
to
b4ad1a5
Compare
Updated based on @jterry75 suggestion. Here is also calculation of parallelism value with different CPU counts: for i := 1; i <= 64; i++ {
fmt.Printf("CPU count: %v Parallelism: %v\n", i, int(math.Max(1, math.Floor(float64(i)*.8))))
}
CPU count: 1 Parallelism: 1
CPU count: 2 Parallelism: 1
CPU count: 3 Parallelism: 2
CPU count: 4 Parallelism: 3
CPU count: 5 Parallelism: 4
CPU count: 6 Parallelism: 4
CPU count: 7 Parallelism: 5
CPU count: 8 Parallelism: 6
CPU count: 9 Parallelism: 7
CPU count: 10 Parallelism: 8
CPU count: 11 Parallelism: 8
CPU count: 12 Parallelism: 9
CPU count: 13 Parallelism: 10
CPU count: 14 Parallelism: 11
CPU count: 15 Parallelism: 12
CPU count: 16 Parallelism: 12
CPU count: 17 Parallelism: 13
CPU count: 18 Parallelism: 14
CPU count: 19 Parallelism: 15
CPU count: 20 Parallelism: 16
CPU count: 21 Parallelism: 16
CPU count: 22 Parallelism: 17
CPU count: 23 Parallelism: 18
CPU count: 24 Parallelism: 19
CPU count: 25 Parallelism: 20
CPU count: 26 Parallelism: 20
CPU count: 27 Parallelism: 21
CPU count: 28 Parallelism: 22
CPU count: 29 Parallelism: 23
CPU count: 30 Parallelism: 24
CPU count: 31 Parallelism: 24
CPU count: 32 Parallelism: 25
CPU count: 33 Parallelism: 26
CPU count: 34 Parallelism: 27
CPU count: 35 Parallelism: 28
CPU count: 36 Parallelism: 28
CPU count: 37 Parallelism: 29
CPU count: 38 Parallelism: 30
CPU count: 39 Parallelism: 31
CPU count: 40 Parallelism: 32
CPU count: 41 Parallelism: 32
CPU count: 42 Parallelism: 33
CPU count: 43 Parallelism: 34
CPU count: 44 Parallelism: 35
CPU count: 45 Parallelism: 36
CPU count: 46 Parallelism: 36
CPU count: 47 Parallelism: 37
CPU count: 48 Parallelism: 38
CPU count: 49 Parallelism: 39
CPU count: 50 Parallelism: 40
CPU count: 51 Parallelism: 40
CPU count: 52 Parallelism: 41
CPU count: 53 Parallelism: 42
CPU count: 54 Parallelism: 43
CPU count: 55 Parallelism: 44
CPU count: 56 Parallelism: 44
CPU count: 57 Parallelism: 45
CPU count: 58 Parallelism: 46
CPU count: 59 Parallelism: 47
CPU count: 60 Parallelism: 48
CPU count: 61 Parallelism: 48
CPU count: 62 Parallelism: 49
CPU count: 63 Parallelism: 50
CPU count: 64 Parallelism: 51 |
ping @olljanat unit test fail |
Hmm. Looks that RS1 failed because of build server issue. Is there some way to re-run just it? |
Ah, yes, the new Jenkins does not (yet) have an option to trigger using labels 😞 (also, it's all or nothing, so we can't re-run a single stage because Jenkins declarative pipelines don't suppler that) I see tests passed on RS1, so should be ok to ignore;
Everything else is green |
can you re-run it? |
daemon/daemon_windows.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should change this function to return a int64
(looks like we always convert the value);
Lines 223 to 227 in 32688a4
parallelLimit := adjustParallelLimit(len(dir), 128*runtime.NumCPU()) | |
// Re-used for all parallel startup jobs. | |
var group sync.WaitGroup | |
sem := semaphore.NewWeighted(int64(parallelLimit)) |
Lines 576 to 579 in 32688a4
parallelLimit := adjustParallelLimit(len(daemon.List()), 128*runtime.NumCPU()) | |
var group sync.WaitGroup | |
sem := semaphore.NewWeighted(int64(parallelLimit)) |
(doesn't have to be done in this PR, but just noticed it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest that it will be done on another PR
@coolljt0725 I can re-run it from Jenkins, but (see my comment above); it actu 9E88 ally passed. Also left a comment/question on the code change (so let me wait for @olljanat to reply first) 🤗 |
f976dfe
to
464876c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
oh! I think it's missing an import;
|
464876c
to
7a9ab8d
Compare
😵 ... fixed |
@thaJeztah ping is this ok now? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
7a9ab8d
to
a0b9ba0
Compare
Signed-off-by: Olli Janatuinen <olli.janatuinen@gmail.com>
a0b9ba0
to
447a840
Compare
Fixed typo and rebased. Now passes also tests |
ping @thaJeztah @tonistiigi @jterry75 PTAL |
This LGTM (not a maintainer) |
let's merge 👍 |
@thaJeztah nice and can we get this one to cherry pick process too? |
opened docker-archive#423 |
- What I did
#38301 did set container restart/restore task parallelism limit to 128*NumCPU which is good limit for Linux containers. Especially when they are made correctly by following one process per container rule.
However Windows containers are much heavier and example Windows Server 2019 base image
mcr.microsoft.com/windows/servercore:ltsc2019
it selves includes ~20 system processes which causes restoring to generate so high load to server and it cannot response anything else until restore is completed.- How I did it
Disabled restore parallelism from Windows platform.
- How to verify it
I created 100 containers with restart policy:
On my 4 CPU test machine it they take about 8 minutes to restart with and without this changes.
However there is big difference how server is able to response to other commands.
Without this change CPU load is constantly 100% and even typing text to notepad takes long time:

After this change server still uses all CPU it have now it still responses to user input.

- A picture of a cute animal (not mandatory but encouraged)
