![]() |
As Q rises, our sec/rel performance falls. By the time we get to Q=200M, performance will be half of what it was at Q=20M.
That doesn't fully explain the performance difference, though, since we're only at 40M now (we started at 8M). It's possible that your current instance is sharing the machine with something that uses hyperthreads more; "CPU time" that CADO records doesn't to my knowledge know anything about HT, so running 8 HT threads on a 4-core will "look" 75% slower, but actually be 25% faster (not verified percentages). Finally, you may be on an older architecture. Compare the performance of the machine "9800" (a 9800 processor, current generation) to "Supercomputer" (haswell i7-5820k), "TheMachine" (ivy bridge xeon, running only a few more threads than cores), or "z600" (sandy bridge xeon, running ~20 threads on 12 cores). My guess is your speed difference is 20% from larger Q values, 80% from different architecture / sharing the machine. Workunits doubled in size around Q=20M. That's why all the "unlucky" badges are from people who began running before we changed WU size. |
[QUOTE=lukerichards;519251][code]
host # of workunits Relations CPU-days Last workunit submitted instance-1 970 9236551 (2.8% total) 45.6 [COLOR="Red"]2019-06-05[/COLOR] 23:27:08,078 lukerichards-pre1 175 3180220 (0.9% total) 21.9 2019-06-13 08:18:30,440 [/code][/QUOTE] The first hasn't reported anything in a week. |
[QUOTE=RichD;519266]The first hasn't reported anything in a week.[/QUOTE]
No, it isn't running any more. |
[QUOTE=RichD;519266]The first hasn't reported anything in a week.[/QUOTE]
He changed from an always-on instance to an at-will 24-hr-max instance. Much cheaper; we're speculating about why it's also slower. |
Have tweaked some settings and using a clientid of lukerichards-pre1-test1 for a day or two. Will probably check in on Sunday to see how that has fared.
|
Anyone knows if it is quicker to run individual instances rather than multi threads ( call 8x with t 1 instead of t 8). TIA
|
The siever is designed to be multi-threaded, and the documentation explains that CADO defaults to -t 2 because there is no penalty.
I think there is a small penalty for running, say, 20 threads rather than 2 or 4, and I think that penalty is related to the size of the factor base/size of the job. I wouldn't run 4-threaded sievers for a C110, and my guess is that on this C207 multiple 4- or 8-threaded instances would be a bit faster than a single 16+ threaded instance. Fivemack did some testing on a C193 in the CADO thread, but I didn't see clear evidence of perfect scaling to megathreads; then again, he's smarter than I am, so perhaps he's convinced and this job is fine up to 16 or 20 threads per job. It may be that they're the only ones making full use of HT and are also running older architecture, but I note that Vebis and birch4 are the slowest relations-per-cpuday on the stats, and both are running 16+ threads per instance. |
Another question: how to run the client without having to have the terminal window open. This is very important :wink:
|
From stats I’m also one of the slowest rels/sec running 8 threads per instance. Maybe I’ll try 6 or even stay with 4 since I suppose I’m reaching bandwidth limit. BTW, ETA will soon drop, stay tuned.
Edit: Not sure what happened, double posted. Upper one can be deleted. |
I use "screen" before the cado invocation, and then ctrl-a, ctrl-d to detach it from the terminal window.
8 threads on a HT 4-core machine is likely faster than 6 or 4; rel/CPU-sec appears slower, but rel/wall-clock-sec is faster. Seth detailed this early on- that using HT and 8 threads means about a 20-25% reduction in wall clock time (while CPU-time appears to rise by 75% because the CADO timer just counts threads * wall clock, I think) |
[QUOTE=VBCurtis;519398]I use "screen" before the cado invocation, and then ctrl-a, ctrl-d to detach it from the terminal window.
8 threads on a HT 4-core machine is likely faster than 6 or 4; rel/CPU-sec appears slower, but rel/wall-clock-sec is faster. Seth detailed this early on- that using HT and 8 threads means about a 20-25% reduction in wall clock time (while CPU-time appears to rise by 75% because the CADO timer just counts threads * wall clock, I think)[/QUOTE] Would it be better for me to use wall clock? or to guess at number of cores? or to remove rels/CPU-sec? Also 15%! |
| All times are UTC. The time now is 22:25. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.