mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2016-09-17, 01:43   #144
xathor
 
Sep 2016

238 Posts
Default

Quote:
Originally Posted by ewmayer View Post
How many threads are those with? Are you using the -nthread flag to control threadcount for those (an on your KNL)? Without that flag, Mlucas will use as many threads as virtual cores it detects on the system. (This seems to be OS-dependent - on my debian-running Haswell quad with HT enabled at boot that number is 4, on my dual-core Broadwell NUC under Ubuntu it is again 4, i.e. 2x the number of physical cores on the latter system.)
For each I specified the exact physical cores the machine has.
xathor is offline   Reply With Quote
Old 2016-09-17, 02:00   #145
kladner
 
kladner's Avatar
 
"Kieren"
Jul 2011
In My Own Galaxy!

236568 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Glad we can provide some entertainment!

.....
Forgive the 'corny' smiley.
I learn things, as well, even though the code stuff is mostly beyond me. I try to pick up an impression of the significance from the context of the discussions. Then, as a hardware nut, there are vicarious thrills in the whole undertaking.
kladner is offline   Reply With Quote
Old 2016-09-17, 02:43   #146
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

One other random benchmark that I would not expect to perform particularly well - I ran mfakto with CPU and VectorSize=4(best performance in my testing) - 56.57 GhzDay/Day at 71 bits. Given that mfakto is optimized for GPUs and I have no idea how much, if any, effort intel spent optimizing their OpenCL implementation for KNL, this is really not too shabby.
airsquirrels is offline   Reply With Quote
Old 2016-09-17, 03:21   #147
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

101000001001002 Posts
Default

That is weak. But my impression is that it would be a waste of this system to run TF on it, when an average graphic card is 5-10 times cheaper and 10-20 times faster....
LaurV is offline   Reply With Quote
Old 2016-09-17, 03:48   #148
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

10058 Posts
Default

Quote:
Originally Posted by LaurV View Post
That is weak. But my impression is that it would be a waste of this system to run TF on it, when an average graphic card is 5-10 times cheaper and 10-20 times faster....
For reference, this about the same as a dual E5-2658 v2 system.

mfakto is not the most optimized CPU TF program. My inclusion of this benchmark was just a curiosity . I am not aware of an easy TF benchmark for prime95, but I am pretty confident it will outperform mfakto quite handily on a CPU.
airsquirrels is offline   Reply With Quote
Old 2016-09-17, 03:59   #149
xathor
 
Sep 2016

100112 Posts
Default

Quote:
Originally Posted by LaurV View Post
That is weak. But my impression is that it would be a waste of this system to run TF on it, when an average graphic card is 5-10 times cheaper and 10-20 times faster....
I think the only advantage that KNL has is the AVX512 VPU's... it's hampered by the overall low clock speeds of each core.

If you guys want tests on different GPU's I can do that too. I have T10 Tesla's, M2060 Fermi's and K20 Keplers. I'll have quite a few P100's as soon as someone takes my credit card.

Last fiddled with by xathor on 2016-09-17 at 04:00
xathor is offline   Reply With Quote
Old 2016-09-17, 04:24   #150
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Quote:
Originally Posted by xathor View Post
I think the only advantage that KNL has is the AVX512 VPU's... it's hampered by the overall low clock speeds of each core.

If you guys want tests on different GPU's I can do that too. I have T10 Tesla's, M2060 Fermi's and K20 Keplers. I'll have quite a few P100's as soon as someone takes my credit card.
I will be pretty eager to see how the P100s do.

For reference, 113.1GhzDay/Day TF is my rough calculation from mprime using the physical cores - best achieved with 16 workers 4 threads at 100%utilization.

It's a bit more difficult to get TFworking against the hyperthreaded cores, but 64 workers with the HT came out to 196GhzDay/Day

Last fiddled with by airsquirrels on 2016-09-17 at 04:25
airsquirrels is offline   Reply With Quote
Old 2016-09-17, 05:08   #151
xathor
 
Sep 2016

19 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I will be pretty eager to see how the P100s do.

For reference, 113.1GhzDay/Day TF is my rough calculation from mprime using the physical cores - best achieved with 16 workers 4 threads at 100%utilization.

It's a bit more difficult to get TFworking against the hyperthreaded cores, but 64 workers with the HT came out to 196GhzDay/Day
I'm also pretty eager for the P100's. I'm going to purchase one for testing as soon as possible then probably 3 more. I'm also *hopefully* going to purchase around ten GTX1080's.

So I have that KNL box sitting in my office just running mprime, is there some settings I should be using to be as useful to the community as possible?
xathor is offline   Reply With Quote
Old 2016-09-17, 07:13   #152
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

22×7×367 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
196GhzDay/Day
...which is still a third of a $250 Radeon R9 card, or a half of $150 gtx 580 card. (yeah, I read that you compare it with CPUs only, but I can't resist making my point, that you should not compare apples with dragon fruits).

We want to see what this beast can do with its huge registers and FFTs.... i.e. LL testing, or even P-1. Which means new developments. Quite eager here to see how Ernst's program performs.
LaurV is offline   Reply With Quote
Old 2016-09-17, 08:30   #153
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

2DEB16 Posts
Default

Quote:
Originally Posted by xathor View Post
For each I specified the exact physical cores the machine has.
Thanks - but your numbers still widely mismatch mine, but now in the opposite direction - I got roughly the same 10 ms/iter @4096 using 32 threads (half as many as phys-cores) and 64. Those times are slightly less than half the ones you posted, for 64-threaded. Was your system running other stuff at the same time?

Quote:
Originally Posted by LaurV View Post
We want to see what this beast can do with its huge registers and FFTs.... i.e. LL testing, or even P-1. Which means new developments. Quite eager here to see how Ernst's program performs.
Did you see my post #139?

Re. TF: I spent some months last year multithreading my Mfactor TF code and adding an option to permit more than 16 distinct (k mod) passes, in preparation for manycore. (I also added CUDA support, but my GPU sieve is still slow, result is ~1/2 the speed of mfaktc overall.) Each thread does its own sieving, so scaling to lots of cores should be good. Will try build on the KNL of that tomorrow and report results.
ewmayer is offline   Reply With Quote
Old 2016-09-17, 10:16   #154
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

22·7·367 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Did you see my post #139?
Yes, at the time, and went back right now and read it again. (Lots of numbers, it can count from 0 to 255, so it can play minesweeper )
That post does not say much beside of the fact that it scales quite well. The number of iterations, without the attached size of the FFT, give no indication about the performance. Anyhow, am I very optimistic when I say that I expect a 20-fold performance increase from the actual P95/mlucas to the "tuned for phi" P95/mlucas? 10-folds? 5-folds? Then if so, I won't pay much attention to the actual benchmarks. They mean nothing when I ask "what this beast can do". As opposite to "what is doing right now". I will wait until I see the "can do".
(of course, not my work... it is easy to criticize others' work - don't pay much attention to me!)

Last fiddled with by LaurV on 2016-09-17 at 10:19
LaurV is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR development version 3.8.7 is available! Jean Penné Software 39 2012-04-27 12:33
LLR 3.8.5 Development version Jean Penné Software 6 2011-04-28 06:21
Do you have a dedicated system for gimps? Surge Hardware 5 2010-12-09 04:07
Query - Running GIMPS on a 4 way system Unregistered Hardware 6 2005-07-04 04:27
System tweaks to speed GIMPS Uncwilly Software 46 2004-02-05 09:38

All times are UTC. The time now is 10:55.


Tue Jan 31 10:55:35 UTC 2023 up 166 days, 8:24, 0 users, load averages: 1.33, 1.06, 1.01

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔