![]() |
![]() |
#144 | |
Sep 2016
238 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#145 |
"Kieren"
Jul 2011
In My Own Galaxy!
236568 Posts |
![]()
Forgive the 'corny' smiley.
I learn things, as well, even though the code stuff is mostly beyond me. I try to pick up an impression of the significance from the context of the discussions. Then, as a hardware nut, there are vicarious thrills in the whole undertaking. |
![]() |
![]() |
![]() |
#146 |
"David"
Jul 2015
Ohio
11×47 Posts |
![]()
One other random benchmark that I would not expect to perform particularly well - I ran mfakto with CPU and VectorSize=4(best performance in my testing) - 56.57 GhzDay/Day at 71 bits. Given that mfakto is optimized for GPUs and I have no idea how much, if any, effort intel spent optimizing their OpenCL implementation for KNL, this is really not too shabby.
|
![]() |
![]() |
![]() |
#147 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
101000001001002 Posts |
![]()
That is weak. But my impression is that it would be a waste of this system to run TF on it, when an average graphic card is 5-10 times cheaper and 10-20 times faster....
|
![]() |
![]() |
![]() |
#148 | |
"David"
Jul 2015
Ohio
10058 Posts |
![]() Quote:
mfakto is not the most optimized CPU TF program. My inclusion of this benchmark was just a curiosity . I am not aware of an easy TF benchmark for prime95, but I am pretty confident it will outperform mfakto quite handily on a CPU. |
|
![]() |
![]() |
![]() |
#149 | |
Sep 2016
100112 Posts |
![]() Quote:
If you guys want tests on different GPU's I can do that too. I have T10 Tesla's, M2060 Fermi's and K20 Keplers. I'll have quite a few P100's as soon as someone takes my credit card. Last fiddled with by xathor on 2016-09-17 at 04:00 |
|
![]() |
![]() |
![]() |
#150 | |
"David"
Jul 2015
Ohio
11×47 Posts |
![]() Quote:
For reference, 113.1GhzDay/Day TF is my rough calculation from mprime using the physical cores - best achieved with 16 workers 4 threads at 100%utilization. It's a bit more difficult to get TFworking against the hyperthreaded cores, but 64 workers with the HT came out to 196GhzDay/Day Last fiddled with by airsquirrels on 2016-09-17 at 04:25 |
|
![]() |
![]() |
![]() |
#151 | |
Sep 2016
19 Posts |
![]() Quote:
So I have that KNL box sitting in my office just running mprime, is there some settings I should be using to be as useful to the community as possible? |
|
![]() |
![]() |
![]() |
#152 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
22×7×367 Posts |
![]()
...which is still a third of a $250 Radeon R9 card, or a half of $150 gtx 580 card. (yeah, I read that you compare it with CPUs only, but I can't resist making my point, that you should not compare apples with dragon fruits).
We want to see what this beast can do with its huge registers and FFTs.... i.e. LL testing, or even P-1. Which means new developments. Quite eager here to see how Ernst's program performs. |
![]() |
![]() |
![]() |
#153 | |
∂2ω=0
Sep 2002
República de California
2DEB16 Posts |
![]()
Thanks - but your numbers still widely mismatch mine, but now in the opposite direction - I got roughly the same 10 ms/iter @4096 using 32 threads (half as many as phys-cores) and 64. Those times are slightly less than half the ones you posted, for 64-threaded. Was your system running other stuff at the same time?
Quote:
Re. TF: I spent some months last year multithreading my Mfactor TF code and adding an option to permit more than 16 distinct (k mod) passes, in preparation for manycore. (I also added CUDA support, but my GPU sieve is still slow, result is ~1/2 the speed of mfaktc overall.) Each thread does its own sieving, so scaling to lots of cores should be good. Will try build on the KNL of that tomorrow and report results. |
|
![]() |
![]() |
![]() |
#154 |
Romulan Interpreter
"name field"
Jun 2011
Thailand
22·7·367 Posts |
![]()
Yes, at the time, and went back right now and read it again. (Lots of numbers, it can count from 0 to 255, so it can play minesweeper
![]() That post does not say much beside of the fact that it scales quite well. The number of iterations, without the attached size of the FFT, give no indication about the performance. Anyhow, am I very optimistic when I say that I expect a 20-fold performance increase from the actual P95/mlucas to the "tuned for phi" P95/mlucas? 10-folds? 5-folds? Then if so, I won't pay much attention to the actual benchmarks. They mean nothing when I ask "what this beast can do". As opposite to "what is doing right now". I will wait until I see the "can do". ![]() (of course, not my work... it is easy to criticize others' work - don't pay much attention to me!) Last fiddled with by LaurV on 2016-09-17 at 10:19 |
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
LLR development version 3.8.7 is available! | Jean Penné | Software | 39 | 2012-04-27 12:33 |
LLR 3.8.5 Development version | Jean Penné | Software | 6 | 2011-04-28 06:21 |
Do you have a dedicated system for gimps? | Surge | Hardware | 5 | 2010-12-09 04:07 |
Query - Running GIMPS on a 4 way system | Unregistered | Hardware | 6 | 2005-07-04 04:27 |
System tweaks to speed GIMPS | Uncwilly | Software | 46 | 2004-02-05 09:38 |