![]() |
![]() |
#12 |
"Oliver"
Sep 2017
Porta Westfalica, DE
2·3·181 Posts |
![]()
Yes, VBITS will be set in the make command of msieve. Sometimes MPI may help with data locality, but it also might induce overhead, but directions are possible. Yes, octa channel is hardware, it depends on whether this is your own machine or not. If you own it, you can install eight DIMMs in the according slots and this will enable octa channel memory.
What do you mean by msieve sleeping a lot? Going with your estimation of 92.5 % efficiency, I would say that this is normal for more than 16 threads. |
![]() |
![]() |
![]() |
#13 |
Aug 2020
79*6581e-4;3*2539e-3
3×193 Posts |
![]()
Ok, having looked at old logs I can say this is sub-average speed but nothing strange. A 15M matrix took 140 hours, a 13 M 130 hours. So the 800 h is fine, as I said the 7401P is not that fast. So as long as the matrix size is ok, I don't think any further optimization makes sense.
"Sleeping a lot" as in every few seconds they all go to "S" and then switch back. Htop average is 18.5/20 cores utilization. |
![]() |
![]() |
![]() |
#15 |
"Curtis"
Feb 2005
Riverside, CA
3·1,789 Posts |
![]()
32M is close enough to my guess of 30M to consider the matrix a typical size / normal for this size of job.
800 hours sounds high; I bet if you tried the list of ideas in the last few posts you'd find 10-30% more speed. The good news is that when you do find the time to test ideas, the fastest invocation will be fastest on all future jobs. For (poor) comparison: I'm running my first job on a new 5950x, 16 core 2-channel RAM. a 35M matrix is taking 420 hr, so I would do a 32M matrix in around 360hr on an otherwise-idle CPU. My cores are much faster than that EPYC's, but I still would expect the EPYC to be relatively faster than the clockspeed comparison due to all its memory bandwidth. |
![]() |
![]() |
![]() |
#16 |
"Carlos Pinho"
Oct 2011
Milton Keynes, UK
26·79 Posts |
![]()
I recall msieve not utilising in full a thread or a core, maybe the reference for the sleeping comes from this. Some threads/cores are awaiting for others to finish.
|
![]() |
![]() |
![]() |
#17 | |
Aug 2020
79*6581e-4;3*2539e-3
3·193 Posts |
![]() Quote:
Regarding HT, if the threads aren't fully utilizing the physical cores, wouldn't that be an ideal situation for HT? If I try something, I can just close msieve with Ctrl+C and restart with -ncr? Will I loose everything not checkpointed, i.e. would it make sense to wait for a checkpoint? |
|
![]() |
![]() |
![]() |
#18 |
"Curtis"
Feb 2005
Riverside, CA
3×1,789 Posts |
![]()
The more threads you assign to msieve, the more waiting there is. Break a task into 20 pieces instead of 10 pieces, and wait for everyone to finish their slice before anyone can proceed; while each task is half as big, there would be a lot of waiting. One experiments to find how to reduce this waiting (such as assigning cores with taskset). Hyperthreads aren't always bad for msieve, just usually- they introduce more variability into how long each slice takes, so more waiting happens.
Yes to using -ncr. You don't lose work when closing via ctrl-c; msieve writes a checkpoint upon closure. |
![]() |
![]() |
![]() |
#19 |
Jul 2003
So Cal
3×811 Posts |
![]()
Note, though, that you can't restart if you change VBITS. You would have to start the linear algebra from the beginning.
|
![]() |
![]() |
![]() |
#20 |
Aug 2020
79*6581e-4;3*2539e-3
3·193 Posts |
![]()
Thanks, that's what I figured, so I didn't do that but just assigned cores. ETA went down for a while and now it's up again. I also tried assigning the range of physical cores, but that didn't do anything noticeable.
But actually, I'm not sure there's really that much to gain. 24 cores vs 10 cores, the 7401P is still 10-15% slower than an i10 10900k - and I'm only using 20/24. And a recent LA by swellman with 8 threads had 400+ hours for a 17M matrix. That would correspond to nearly 1600 hours for 31M, if I'm not mistaken. I can later on offer the matrix for benchmarking, if someone's interested. |
![]() |
![]() |
![]() |
#21 | |
Jun 2012
362510 Posts |
![]() Quote:
Not sure you can gain much more performance out of your machine, though I do sincerely hope I’m wrong. |
|
![]() |
![]() |
![]() |
#22 | |
Aug 2020
79*6581e-4;3*2539e-3
3·193 Posts |
![]()
It's finally done, LA took 800 hours.
I noticed the machine is generally not very effective if you use a lot of threads for one task. As an example: Single threaded LLR took 4.5 ms per iteration 2 threads = 2.9 ms / iter 20 threads = 1.2 ms / iter The cores start to see only 70-90 % utilization once a lot of threads are involved. I still have it until end of August, so I guess I'll use it for sieving. Quote:
Last fiddled with by bur on 2022-07-19 at 06:50 Reason: what's it with the line breaks added all the time... |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Is there any sensible auxiliary task for HT logical cores when physical cores already used for PRP? | hansl | Information & Answers | 5 | 2019-06-17 14:07 |
More cores or less. | Math31415 | Hardware | 6 | 2019-01-16 18:51 |
Cannot use two cores | abelianbhaskar | Information & Answers | 3 | 2018-05-28 15:40 |
Is an online exercise game not based on trust doable? | jasong | jasong | 1 | 2013-04-07 05:55 |
CPU cores | Unregistered | Information & Answers | 7 | 2009-11-02 08:27 |