mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2016-09-14, 21:50   #78
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

Quote:
Originally Posted by Madpoo View Post
I had him add these to the "prime.txt" file. Of course, adjust the min/max FFT size if desired, but I saw little benefit in testing out anything smaller than 2M since those are no longer being used.

Code:
MinBenchFFT=2048
MaxBenchFFT=5120
BenchHyperthreads=0
BenchMultithreads=1
OnlyBenchThroughput=1
OnlyBenchMaxCPUs=0
I will run a bunch of benchmarks this evening. What I can say is the system is using the HBM in Flat mode by default, using the Quadrant cluster config. That means these benchmarks are using the 6 channel DDR4 without any HBM. My quick benchmarks match those of the other user in that configuration. I attempted to get mprime launched using numactl -m 1 to use the HBM instead, however performance on the 2048K was only 2ms better. I have the system in HBM cache mode right now and am running a full set of benchmarks.

Last fiddled with by airsquirrels on 2016-09-14 at 21:51
airsquirrels is offline   Reply With Quote
Old 2016-09-14, 23:55   #79
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

11011101102 Posts
Default

Could we see a few pictures of the new hardware please?
Chuck is offline   Reply With Quote
Old 2016-09-15, 00:07   #80
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

Sure, except, how does one post pictures on this forum? It only appears to allow externally hosted links?

I've had an interesting observation about prime95 while running the benchmarks on this thread. Despite currently benchmarking at 46 threads on 46 CPUs, there are only exactly 20 threads at 100% in htop (the other 236 are at 0 most of the time). It has been this way since the 20 thread mark....

I have been able to load all of the physical cores to 100% with a test program...

Last fiddled with by airsquirrels on 2016-09-15 at 00:08
airsquirrels is offline   Reply With Quote
Old 2016-09-15, 00:15   #81
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

3·977 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Sure, except, how does one post pictures on this forum? It only appears to allow externally hosted links?
Yeah, it's my own frustration with the forum: no embedded pictures. It does prevent against broken images though.

It might be easiest to upload them all to an imgur gallery and just link that.
Mark Rose is offline   Reply With Quote
Old 2016-09-15, 00:23   #82
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×32×17×19 Posts
Default

David, if the pics are not huge you can just attach them to a forum post.

Did you get the ssh pubkey I e-mailed you? (I assumed the e-mail address you linked to paypal for the funding here was same as your actual e-mail e-mail, or something.
ewmayer is offline   Reply With Quote
Old 2016-09-15, 00:58   #83
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

*grumbles about imgur* - I've uploaded a few shots of the hardware here.

I have received several of the ssh keys from the other developers, I've also provisioned the external IP address and firewall and SSH. I should be able to get accounts setup tomorrow.

I have verified that the benchmarks we have so far actually are far from telling us how much power this beast has. I can confirm that prime95 benchmark is not creating the number of threads it claims to be. At 58 threads on 58 CPUs we actually have only 7 active threads. At 59 CPUs we actually only have 6 active threads.... (verified from proc/<pid>/tasks). By the 61 mark I only have 4 active threads. 62->3 threads, 63 claimed -> 2, and only one thread at 64. Once the full benchmark completes I will test with just 2048k to monitor this behavior across the entire

One things that appears to be behaving differently is the benchmarking code that starts busy threads on all cores during benchmarking (To prevent turbo boost from affecting results). When a benchmark starts all the cores do actually kick off threads, but most of the threads exit immediately. I compared this to my dual E5-2698 v3, which kept all 32 cores at 100% with active threads during the entire benchmark process.
Attached Thumbnails
Click image for larger version

Name:	IMG_5597.JPG
Views:	162
Size:	689.3 KB
ID:	14914   Click image for larger version

Name:	IMG_5598.JPG
Views:	179
Size:	938.1 KB
ID:	14915   Click image for larger version

Name:	IMG_5600.JPG
Views:	171
Size:	881.2 KB
ID:	14916  
airsquirrels is offline   Reply With Quote
Old 2016-09-15, 01:19   #84
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×32×17×19 Posts
Default

Sweet - so roughly ATX-case-sized, just with a Ferrari F1 racing engine under the hood.
ewmayer is offline   Reply With Quote
Old 2016-09-15, 01:31   #85
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Sweet - so roughly ATX-case-sized, just with a Ferrari F1 racing engine under the hood.
Exactly!

Also, I believe I was incorrect on the threads issue during benchmarking - what is actually happening is the benchmark happens very fast, and the number of threads left is the busy threads waiting to finish their busy loop. Turns out the benchmarks take a lot longer than they really need to because of this.
airsquirrels is offline   Reply With Quote
Old 2016-09-15, 01:40   #86
Mysticial
 
Mysticial's Avatar
 
Sep 2016

22×83 Posts
Default

Oh that's pretty.

Is there any plan in the future to see if there's any remote chance that it will run Windows? I read somewhere that there's a 4 processor group/256 core limitation. But that shouldn't be a problem with exactly 256 logical cores. I'm actually more curious about the whether Windows 10 has XSAVE support for AVX512.
Mysticial is offline   Reply With Quote
Old 2016-09-15, 02:50   #87
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11×47 Posts
Default

I can't seem to get the benchmark code in prime95 to actually obey my affinity settings, so I am not sure how well it is actually representing this chip.

In actually testing with a set of 2240K double checks using the HBM as cache, I have tried:

1 Worker 64 threads (All showing <35% utilization in htop)
4 Worker 16 threads (All showing <60% utilization in htop)

I settled for now on 32 workers 2 threads, with affinity set so each pair shares their L2 cache. In this configuration the average FFT time is 23.5ms/iteration and each core shows 100% utilization, leading to a net output of 183.1 GhzDay/Day, or roughly 366.3 GFLOP.

It appears that 32 or 64 workers generates the greatest total throughput and becomes CPU limited with plenty of memory bandwidth available, however running a smaller number of workers with more threads is quickly becoming choked up and barely using the CPU cores, which is the opposite of what I would expect based on experience with normal Xeons.

Interestingly enough, htop will show mostly red in the many-threads-few-workers cases, suggesting that kernel calls are the source of the bottleneck.

Last fiddled with by airsquirrels on 2016-09-15 at 02:54
airsquirrels is offline   Reply With Quote
Old 2016-09-15, 02:54   #88
Mark Rose
 
Mark Rose's Avatar
 
"/X\(‘-‘)/X\"
Jan 2013

55638 Posts
Default

How are the results with hyperthreading?
Mark Rose is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
LLR development version 3.8.7 is available! Jean Penné Software 39 2012-04-27 12:33
LLR 3.8.5 Development version Jean Penné Software 6 2011-04-28 06:21
Do you have a dedicated system for gimps? Surge Hardware 5 2010-12-09 04:07
Query - Running GIMPS on a 4 way system Unregistered Hardware 6 2005-07-04 04:27
System tweaks to speed GIMPS Uncwilly Software 46 2004-02-05 09:38

All times are UTC. The time now is 14:11.

Sat May 8 14:11:54 UTC 2021 up 30 days, 8:52, 0 users, load averages: 2.49, 2.47, 2.38

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.