mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Software

Reply
 
Thread Tools
Old 2018-08-05, 20:05   #1
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

3·5·29 Posts
Default LLR multi-thread observation

Over on PrimeGrid there is an upcoming challenge running PPS tasks, which are currently 192k FFT size. After another post, I decided to look at scaling. What I found interesting was that on very different CPUs, when running one task on 4 threads, the run times were very close.

i7-6700k, 4.0 GHz all core clock, 695s
i5-6600k, 3.6 GHz all core clock, 695s
i5-4570S, 3.2 GHz all core clock, 708s

I assume the thread work is done in gwnum, not by LLR, so I was wondering if there is something that might explain this observation? It if was doing the same thing as fast as it can, I would have expected some clock scaling (even ignoring architecture). That it doesn't seem to be clock scaling, is there some other mechanism for dividing the work that is a limiting factor?

BTW the throughput of running such a small task with so many threads wasn't great, but it was still interesting as it made the slowest CPU above look like it was scaling better (or least worse) with more threads than the faster ones.
mackerel is offline   Reply With Quote
Old 2018-08-06, 03:55   #2
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

2·1,579 Posts
Default

George said a few times that the benefit of multithreading is very low at very low FFT sizes:

http://www.mersenneforum.org/showpos...4&postcount=35

http://www.mersenneforum.org/showpos...0&postcount=37

http://www.mersenneforum.org/showpos...50&postcount=6


Try running 1 instance for each core?
ATH is offline   Reply With Quote
Old 2018-08-06, 07:32   #3
mackerel
 
mackerel's Avatar
 
Feb 2016
UK

43510 Posts
Default

1 per core was highest throughput. I know the performance sucks with one task on all threads, but the interesting point was that they were about the same on different hardware, and was looking to understand that. Intuitively, you'd still expect higher clocks to run faster at a similar level of inefficiency. If it isn't limited by CPU clock, what is it being limited by?
mackerel is offline   Reply With Quote
Old 2018-08-06, 07:52   #4
pepi37
 
pepi37's Avatar
 
Dec 2011
After milion nines:)

1,451 Posts
Default

Quote:
Originally Posted by mackerel View Post
1 per core was highest throughput. I know the performance sucks with one task on all threads, but the interesting point was that they were about the same on different hardware, and was looking to understand that. Intuitively, you'd still expect higher clocks to run faster at a similar level of inefficiency. If it isn't limited by CPU clock, what is it being limited by?



1 per core was highest throughput.
It is true up to some degree. After that point performance going down
pepi37 is online now   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
An observation xilman Lounge 1 2016-08-07 20:32
Random observation jnml Miscellaneous Math 9 2014-04-28 20:43
Multi-Core / Multi-CPU Assignments (missing) worknplay Software 3 2008-11-05 17:26
Mersenne Observation.... petrw1 Math 5 2008-11-04 20:27
Interesting observation MooooMoo Lounge 15 2006-11-14 03:40

All times are UTC. The time now is 17:40.


Sun Aug 1 17:40:23 UTC 2021 up 9 days, 12:09, 0 users, load averages: 1.72, 1.63, 1.50

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.