![]() |
![]() |
#1 | |||
"Simon Josefsson"
Jan 2020
Stockholm
2116 Posts |
![]()
Hi. I'm trying to get mlucas 20.1 working on my Raptor Computing System Talos II Lite with a POWER9 18-core CPU.
Initial building fails: Quote:
The next problem seems harder to solve though: Quote:
Just to get something going, I changed p09 to p08, and it builds eventually and seems to work: Quote:
What more testing can I do here? I'm running a LL DC on it now, we'll see if it finishes correctly. |
|||
![]() |
![]() |
![]() |
#2 |
Aug 2002
23·29·37 Posts |
![]()
Tell us more about your computer!
![]() |
![]() |
![]() |
![]() |
#3 | |
"Simon Josefsson"
Jan 2020
Stockholm
3×11 Posts |
![]() Ok I'll bite :) Here is some information: https://wiki.raptorcs.com/wiki/Talos_II https://wiki.raptorcs.com/wiki/POWER9 My 18-core 4threads/core CPU appears to be doing 11msec/iter with -cpu 0:63:2 which appears to be the fastest config I could find after experimenting. I was expected 0:72:4 or 0:72:2 or even 0:72 would be faster, but for some reason 0:63:2 won. I even considered 0:31:4 and various other settings. It draws about 90-130W if I can trust the builtin power meter. Maybe further compiler flag optimizations will help, but I'm fairly happy now and my main worry is if I can trust the results. I'm not particular fond of randomly changing FFT source code without understanding it. I'll leave this machine doing LL DCs for a while to see if there is any heating problems. Quote:
|
|
![]() |
![]() |
![]() |
#4 |
Aug 2002
23·29·37 Posts |
![]()
We considered building one a few months ago. Getting info from Raptor was very difficult.
How did the build go? Was there anything weird? What is the "BIOS" like? What OS are you using? Is this a hobby computer or do you have specialized work to do with it? ![]() |
![]() |
![]() |
![]() |
#5 | |
"Simon Josefsson"
Jan 2020
Stockholm
1000012 Posts |
![]() Quote:
Everything except the price has been uncomplicated, the only "big" problem I had was realizing Debian netinst image had output on serial port and not on VGA like Ubuntu and Fedora has. Once I got help to reliaze that, it was quickly resolved though. There is source code for all of the BIOS, and especially the OpenBMC is nice. It uses petitboot to load the OS. I bought it as an experiment to see if this platform is stable enough to deploy some of my production services on, and it will be used for experiments and I've offered a VM on it as a Guix build server. Having mlucas running in the background at all times is a good way to "burn in" the machine I think... /Simon |
|
![]() |
![]() |
![]() |
#6 | |
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest
3×5×503 Posts |
![]() Quote:
Code:
20.1 2048 msec/iter = 7.45 ROE[avg,max] = [0.188009655, 0.250000000] radices = 32 8 16 16 16 0 0 0 0 0 2304 msec/iter = 8.41 ROE[avg,max] = [0.187239248, 0.250000000] radices = 144 32 16 16 0 0 0 0 0 0 2560 msec/iter = 8.34 ROE[avg,max] = [0.219140625, 0.281250000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 9.77 ROE[avg,max] = [0.235196521, 0.312500000] radices = 176 16 16 32 0 0 0 0 0 0 3072 msec/iter = 9.66 ROE[avg,max] = [0.180298339, 0.250000000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 10.45 ROE[avg,max] = [0.188625155, 0.234375000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 10.97 ROE[avg,max] = [0.189514034, 0.250000000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 10.71 ROE[avg,max] = [0.237338918, 0.312500000] radices = 240 8 8 8 16 0 0 0 0 0 4096 msec/iter = 10.58 ROE[avg,max] = [0.199521603, 0.265625000] radices = 256 32 16 16 0 0 0 0 0 0 Last fiddled with by kriesel on 2021-10-18 at 14:37 |
|
![]() |
![]() |
![]() |
#7 | |
"Simon Josefsson"
Jan 2020
Stockholm
3310 Posts |
![]() Quote:
Yes, 3.25M, so these systems appear comparable for FFT in msec/iter: Code:
3328 msec/iter = 10.78 ROE[avg,max] = [0.204429768, 0.281250000] radices = 208 32 16 16 0 0 0 0 0 0 Last fiddled with by jas on 2021-10-18 at 15:39 |
|
![]() |
![]() |
![]() |
#8 | ||||
∂2ω=0
Sep 2002
República de California
5×2,351 Posts |
![]()
Hi, Simon - sorry, late to the thread. Been busy with Mlucas v20.1 bugfix work, only had time for my already-subscribed threads this past week.
Yes, it's been a while since anyone built the latest release on PowerPC or POWER. You are building in non-SIMD mode, which is one of the standard prerelease build-and-test paths, but building in that mode on PowerPC triggers the preprocessor flag PFETCH=1 in prefetch.h, whereas on non-PPC PFETCH remains undef'd, explaining why the bug you hit was not seen in my own non-SIMD builds. You seem to have guessed from the use of the undeclared var p09 that it's just used as a prefetch offset, thus changing it to p08 (or commenting it out) does not affect correctness of the computed results. The proper fix is to replace p09 by p08+p01 - I must've shortened the list of declared double-array offsets from the original p01-p19 to p01,p02,p03,p04,p08,p12,p16 sometime in the past few years, and multiples like p05 and p09 are computed by summing 2 of those. When I force -DPFETCH=1 in a non-SIMD build on my x86 macbook, the radix20_main_carry_loop.h error is the only one I see. Interestingly, in the POWER section of prefetch.h, PFETCH remains unset, implying that on your platform on or more of the following (snip from platform.h) is predefined: Code:
#elif(defined(__ppc__) || defined(__powerpc__) || defined(__PPC__) || defined(__powerc) || defined(__ppc64__)) Code:
/* IBM Power: Note that I found that only __xlc__ was properly defined on the power5 I used for my tests. We Deliberately put this #elif ahead of the PowerPC one because the Power/AIX compiler also defines some PowerPC flags: */ #elif(defined(_POWER) || defined(_ARCH_PWR)) Re. SIMD and NUMA-ness, found some info here: https://www.olcf.ornl.gov/wp-content...PUs_Walkup.pdf Throughput would clear be hugely improved by using assembly to target the SIMD, but porting all the asm to a new architecture is always a major half-year-ish effort, it needs a significant likely user base to make it tempting. Most important thing near-term is to properly target your CPU's multithread and NUMA topology - I plan to add support for the freeware hwloc library next year in order to automate this sort of thing, but for now we have to figure it out ourselves - starting with the /proc/cpuinfo file and other related documentation you can find, can you answer the following questions? 1. What is the logical core numbering convention? It is likely either like Intel's (where on an N-physical-core CPU with 4 threads/core, physical core 0 gets threads 0,N,2N,3N; physical core 1 gets threads 1,N+1,2N+1,3N+1, etc) or like AMD's (where on an N-physical-core CPU with 4 threads/core, physical core 0 gets threads 0-3, etc). 2. What can you tell us about the NUMA domains on the CPU? Lastly, a good starting point for the total-throughput-maximization procedure is - with nothing else of consequence running - doing a basic set of 1-thread self-tests, './Mlucas -s m -cpu 0 >& test.log'. If you would be so kind as to attach zipped copies of /proc/cpuinfo and the test.log and mlucas.cfg files resulting from the 1-thread self-test, that would be great. Thanks, -Ernst Quote:
|
||||
![]() |
![]() |
![]() |
#9 | |
"Simon Josefsson"
Jan 2020
Stockholm
3310 Posts |
![]() Quote:
FWIW, it finished the LL DC correctly: https://www.mersenne.org/report_expo...1906697&full=1 /Simon |
|
![]() |
![]() |
![]() |
#10 | |||
"Simon Josefsson"
Jan 2020
Stockholm
1000012 Posts |
![]() Quote:
Quote:
Quote:
The OS is vanilla Debian 11 Bullseye with GCC 10.2.1 so a fairly "normal" setup. As for the other stuff, I can't really answer it right now but I'll read your post and try to understand what to investigate and how to report it. I can setup SSH access to it if you want. cpuinfo: https://gist.github.com/jas4711/a999...7f10b84cff3eef /Simon Last fiddled with by jas on 2021-10-27 at 16:10 Reason: add cpuinfo |
|||
![]() |
![]() |
![]() |
#11 | |
"Simon Josefsson"
Jan 2020
Stockholm
2116 Posts |
![]() Quote:
Running it takes around 38 minutes. Below is the mlucas.cfg and I put test.log here: https://gist.github.com/jas4711/100d...68592ae56e9a12 Code:
2048 msec/iter = 75.75 ROE[avg,max] = [0.161886161, 0.187500000] radices = 1024 32 32 0 0 0 0 0 0 0 2304 msec/iter = 91.21 ROE[avg,max] = [0.158895438, 0.187500000] radices = 36 32 32 32 0 0 0 0 0 0 2560 msec/iter = 108.66 ROE[avg,max] = [0.188839286, 0.250000000] radices = 40 32 32 32 0 0 0 0 0 0 2816 msec/iter = 114.73 ROE[avg,max] = [0.170926339, 0.218750000] radices = 44 32 32 32 0 0 0 0 0 0 3072 msec/iter = 121.73 ROE[avg,max] = [0.175083705, 0.218750000] radices = 48 32 32 32 0 0 0 0 0 0 3328 msec/iter = 137.32 ROE[avg,max] = [0.242410714, 0.312500000] radices = 52 32 32 32 0 0 0 0 0 0 3584 msec/iter = 144.97 ROE[avg,max] = [0.229241071, 0.281250000] radices = 56 32 32 32 0 0 0 0 0 0 3840 msec/iter = 157.38 ROE[avg,max] = [0.169998605, 0.203125000] radices = 60 32 32 32 0 0 0 0 0 0 4096 msec/iter = 163.65 ROE[avg,max] = [0.233258929, 0.281250000] radices = 128 32 32 16 0 0 0 0 0 0 4608 msec/iter = 183.77 ROE[avg,max] = [0.174515206, 0.218750000] radices = 144 32 32 16 0 0 0 0 0 0 5120 msec/iter = 234.53 ROE[avg,max] = [0.234598214, 0.281250000] radices = 160 32 32 16 0 0 0 0 0 0 5632 msec/iter = 228.97 ROE[avg,max] = [0.181417411, 0.218750000] radices = 176 32 32 16 0 0 0 0 0 0 6144 msec/iter = 252.87 ROE[avg,max] = [0.209709821, 0.250000000] radices = 192 32 32 16 0 0 0 0 0 0 6656 msec/iter = 272.22 ROE[avg,max] = [0.177399554, 0.187500000] radices = 208 32 32 16 0 0 0 0 0 0 7168 msec/iter = 291.60 ROE[avg,max] = [0.181417411, 0.218750000] radices = 224 32 32 16 0 0 0 0 0 0 7680 msec/iter = 352.16 ROE[avg,max] = [0.186188616, 0.218750000] radices = 240 32 32 16 0 0 0 0 0 0 |
|
![]() |
![]() |
![]() |
Thread Tools | |
![]() |
||||
Thread | Thread Starter | Forum | Replies | Last Post |
Core i5 2500K vs Core i7 2600K (Linear algebra phase) | em99010pepe | Hardware | 0 | 2011-11-11 15:18 |
How to retire one core in a dual-core CPU? | Rodrigo | PrimeNet | 4 | 2011-07-30 14:43 |
Dual Core to Quad Core Upgrade | Rodrigo | Hardware | 6 | 2010-11-29 18:48 |
exclude single core from quad core cpu for gimps | jippie | Information & Answers | 7 | 2009-12-14 22:04 |
Optimising work for Intel Core 2 Duo or Quad Core | S485122 | Software | 0 | 2007-05-13 09:15 |