![]() |
[QUOTE=kriesel;518526]I read through the commit listings back to mid January, and saw Preda had acknowledged there numerous contributions made by several individuals. A crude summary follows[CODE]valeriob01 -w argument; readme.md work; description of cmd line arguments
& updates, display of parameters; primenet.py date & time; makefile fix k3ack3r fix some msys2 warnings; update makefile chengsun fix alignment violation causing OUT_OF_RESOURCES error on NVIDIA GPUs sillygitter add -iters argument gwoltman allow making small test kernels; new X2 definition; fft8 cleanup + documentation; new sq macro; overhaul/comment fft5/fft10 macros; improved pairSq and pairMul; faster 6m fft using new fft12 middle; new 5.5m fft using new fft11 middle; increased precision of fft11 constants; inline X2; fft7 middle step; shorter multiply chains in middle[/CODE]Thanks to you all![/QUOTE] Thanks, though the readme.md work is incomplete. The argument listing has been growing and I haven't had time to follow. |
[QUOTE=kriesel;518470]Latest makefile seems to get the strip right on Windows, requires specifying the target as gpuowl-win.exe.[CODE]
$ make gpuowl-win.exe cat head.txt gpuowl.cl tail.txt > gpuowl-wrap.cpp echo \"`git describe --long --dirty --always`\" > version.new diff -q -N version.new version.inc >/dev/null || mv version.new version.inc echo Version: `cat version.inc` Version: "v6.5-75-g4902439-dirty" g++ -MT Pm1Plan.o -MMD -MP -MF .d/Pm1Plan.Td -Wall -O2 -std=c++17 -c -o Pm1Plan.o Pm1Plan.cpp g++ -MT GmpUtil.o -MMD -MP -MF .d/GmpUtil.Td -Wall -O2 -std=c++17 -c -o GmpUtil.o GmpUtil.cpp g++ -MT Worktodo.o -MMD -MP -MF .d/Worktodo.Td -Wall -O2 -std=c++17 -c -o Worktodo.o Worktodo.cpp g++ -MT common.o -MMD -MP -MF .d/common.Td -Wall -O2 -std=c++17 -c -o common.o common.cpp g++ -MT main.o -MMD -MP -MF .d/main.Td -Wall -O2 -std=c++17 -c -o main.o main.cpp g++ -MT Gpu.o -MMD -MP -MF .d/Gpu.Td -Wall -O2 -std=c++17 -c -o Gpu.o Gpu.cpp g++ -MT clwrap.o -MMD -MP -MF .d/clwrap.Td -Wall -O2 -std=c++17 -c -o clwrap.o clwrap.cpp g++ -MT Task.o -MMD -MP -MF .d/Task.Td -Wall -O2 -std=c++17 -c -o Task.o Task.cpp g++ -MT checkpoint.o -MMD -MP -MF .d/checkpoint.Td -Wall -O2 -std=c++17 -c -o checkpoint.o checkpoint.cpp g++ -MT timeutil.o -MMD -MP -MF .d/timeutil.Td -Wall -O2 -std=c++17 -c -o timeutil.o timeutil.cpp g++ -MT Args.o -MMD -MP -MF .d/Args.Td -Wall -O2 -std=c++17 -c -o Args.o Args.cpp g++ -MT state.o -MMD -MP -MF .d/state.Td -Wall -O2 -std=c++17 -c -o state.o state.cpp g++ -MT Signal.o -MMD -MP -MF .d/Signal.Td -Wall -O2 -std=c++17 -c -o Signal.o Signal.cpp g++ -MT FFTConfig.o -MMD -MP -MF .d/FFTConfig.Td -Wall -O2 -std=c++17 -c -o FFTConfig.o FFTConfig.cpp g++ -MT clpp.o -MMD -MP -MF .d/clpp.Td -Wall -O2 -std=c++17 -c -o clpp.o clpp.cpp g++ -MT gpuowl-wrap.o -MMD -MP -MF .d/gpuowl-wrap.Td -Wall -O2 -std=c++17 -c -o gpuowl-wrap.o gpuowl-wrap.cpp g++ -o gpuowl-win.exe Pm1Plan.o GmpUtil.o Worktodo.o common.o main.o Gpu.o clwrap.o Task.o checkpoint.o timeutil.o Args.o state.o Signal.o FFTConfig.o clpp.o gpuowl-wrap.o -lstdc++fs -lOpenCL -lgmp -pthread -L/opt/rocm/opencl/lib/x86_64 -L/opt/amdgpu-pro/lib/x86_64-linux-gnu -L/c/Windows/System32 -L. -static strip gpuowl-win.exe[/CODE]What does it mean that it's labeled dirty? Perhaps that the conversion to u32 is not complete? [CODE]>gpuowl-win -prp 3321928097 2019-06-03 14:22:00 gpuowl v6.5-75-g4902439-dirty 2019-06-03 14:22:00 Exception St12out_of_range: stol 2019-06-03 14:22:00 Bye >gpuowl-win -prp 2147483659 2019-06-03 14:28:16 gpuowl v6.5-75-g4902439-dirty 2019-06-03 14:28:17 Exception St12out_of_range: stol 2019-06-03 14:28:17 Bye >gpuowl-win -prp 2147483647 -use FMA_X2 2019-06-03 14:29:52 gpuowl v6.5-75-g4902439-dirty 2019-06-03 14:29:52 Note: no config.txt file found 2019-06-03 14:29:52 config: -prp 2147483647 -use FMA_X2 2019-06-03 14:29:52 2147483647 FFT 147456K: Width 512x8, Height 256x8, Middle 9; 14.22 bits/word 2019-06-03 14:29:52 using long carry kernels 2019-06-03 14:30:00 OpenCL args "-DEXP=2147483647u -DWIDTH=4096u -DSMALL_HEIGHT=2048u -DMIDDLE=9u -DWEIGHT_STEP=0xd.b745787f2c4cp-3 -DIWEIGHT_STEP=0x9.550d2c9e8 37e8p-4 -DWEIGHT_BIGSTEP=0x8.b95c1e3ea8bd8p-3 -DIWEIGHT_BIGSTEP=0xe.ac0c6e7dd2438p-4 -DFMA_X2=1 -DFMA_X2=1 -I. -cl-fast-relaxed-math -cl-std=CL2.0" 2019-06-03 14:30:04 OpenCL compilation in 4704 ms 2019-06-03 14:30:28 2147483647.owl not found, starting from the beginning. 2019-06-03 14:42:03 2147483647 OK 2000 0.00%; 162.835 ms/sq; ETA 4047d 06:30; fb12c8169932aa03 (check 172.72s)[/CODE](Above was on an RX480. 2147483647 < 2[SUP]31[/SUP] < 2147483659; log10(2[SUP]3321928097[/SUP]-1) > 10[SUP]9[/SUP])[/QUOTE] The -dirty version tag means you have local modifications. You can drop them with "git stash" before upgrading. |
After a recent commit, it seems it's now fine to upgrade to ROCm 2.5 (which will be released soon). I.e. the performance degradation that appeared in ROCm 2.3 has been worked-around in GpuOwl.
|
[QUOTE=preda;518606]After a recent commit, it seems it's now fine to upgrade to ROCm 2.5 (which will be released soon). I.e. the performance degradation that appeared in ROCm 2.3 has been worked-around in GpuOwl.[/QUOTE]
332299993 is not prime, with ROCm 2.4: [CODE]2019-06-06 17:13:37 RadeonVII 332299993 332290000 100.00%; 3447 us/sq; ETA 0d 00:01; f6bea554d5dd44f0 2019-06-06 17:14:12 RadeonVII CC 332299992 / 332299993, e7ad0dddd78cd94c 2019-06-06 17:14:14 RadeonVII 332299993 OK 332300000 100.00%; 3500 us/sq; ETA 0d 00:00; 5254b6ede6bf9ca1 (check 1.86s)[/CODE] [URL]https://www.mersenne.org/report_exponent/?exp_lo=332299993&full=1[/URL] |
I'm a bit of a GPU computing newbie, but was hoping to get some use out of a VERY old GPU I had laying around -- it's an ATI Radeon 4650 HD.
I was able to get mfakto up and running and generate about 25-30 GHz-day/day output but it's slowing my Prime95 output a bit, so on pause for now. I was wondering if there is any way to get gpuOwL running and attempt a PRP test on this old card? It seems to support OpenCL 1.1 so not sure if this meets the minimum specs. Any help would be greatly appreciated! |
[QUOTE=mnd9;519452]I'm a bit of a GPU computing newbie, but was hoping to get some use out of a VERY old GPU I had laying around -- it's an ATI Radeon 4650 HD.
I was able to get mfakto up and running and generate about 25-30 GHz-day/day output but it's slowing my Prime95 output a bit, so on pause for now. I was wondering if there is any way to get gpuOwL running and attempt a PRP test on this old card? It seems to support OpenCL 1.1 so not sure if this meets the minimum specs. Any help would be greatly appreciated![/QUOTE] IIRC, the minimum required for gpuOwL was OpenCL 1.2. |
[QUOTE=ET_;519459]IIRC, the minimum required for gpuOwL was OpenCL 1.2.[/QUOTE]
I see -- is there another software package to implement LL/PRP on AMD GPUs or am I only able to do TF work on this card? |
[QUOTE=mnd9;519462]I see -- is there another software package to implement LL/PRP on AMD GPUs or am I only able to do TF work on this card?[/QUOTE]
See [URL]http://www.mersenneforum.org/showpost.php?p=488291&postcount=2[/URL] [URL]https://www.mersenneforum.org/showthread.php?t=23401[/URL] and [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL] You could try cllucas, but your gpu is old and slow, and cllucas is about half the speed of gpuowl for the same hardware and parameters, and lacks the Gerbicz error check or Jacobi LL check. The same 86M primality test that would take 3.8 days on an RX480 with gpuowl may (if it successfully runs) take around ~4.5 months on your gpu. I never did get an answer to [URL]https://mersenneforum.org/showpost.php?p=463096&postcount=425[/URL] re what OpenCl level cllucas requires. Maybe this is the justification you're looking for to upgrade your gpu. Either way, try turning your mfakto.ini gpu sieving parameters. Welcome to the hunt. |
[QUOTE=kriesel;519470]See [URL]http://www.mersenneforum.org/showpost.php?p=488291&postcount=2[/URL]
[URL]https://www.mersenneforum.org/showthread.php?t=23401[/URL] and [URL]https://www.mersenneforum.org/showpost.php?p=488535&postcount=2[/URL] You could try cllucas, but your gpu is old and slow, and cllucas is about half the speed of gpuowl for the same hardware and parameters, and lacks the Gerbicz error check or Jacobi LL check. The same 86M primality test that would take 3.8 days on an RX480 with gpuowl may (if it successfully runs) take around ~4.5 months on your gpu. I never did get an answer to [URL]https://mersenneforum.org/showpost.php?p=463096&postcount=425[/URL] re what OpenCl level cllucas requires. Maybe this is the justification you're looking for to upgrade your gpu. Either way, try turning your mfakto.ini gpu sieving parameters. Welcome to the hunt.[/QUOTE] Thanks so much for your detailed response -- I may put GPU computing on pause for now until I get a better card, as you suggested, since it's hurting my Prime95 throughput for not much gain. On a somewhat unrelated topic, I recently discovered a very basic mistake I had made when building my PC years back that literally doubled my prime95 output when I recentlycorrected it. I had erroneously installed my RAM DIMMs in single channel mode by placing them in adjacent slots. When I moved to sticks in slots #1 and #3, my mobo instantly switched to dual channel mode, and my ms/iter literally halved!! Apparently memory bandwidth was really bottle-necking my throughput. Does anyone have a sense of whether adding additional RAM always contributes to Prime95 throughput? E.g. I currently have 2 x 8GB DDR3 memory @ 1333 mHz -- would adding another 2 sticks help? Is it safe to assume memory bandwidth is always limiting? Is there an easy way to test whether my current limiting factor for throughput is CPU or memory bandwidth? |
[QUOTE=mnd9;519506]...
Does anyone have a sense of whether adding additional RAM always contributes to Prime95 throughput? E.g. I currently have 2 x 8GB DDR3 memory @ 1333 mHz -- would adding another 2 sticks help? Is it safe to assume memory bandwidth is always limiting? Is there an easy way to test whether my current limiting factor for throughput is CPU or memory bandwidth?[/QUOTE] Adding more RAM will not help in a meaningful way (maybe single digit improvements if anything), swapping to faster RAM like 1600 should. The thing that doubled your speed was going from single channel to dual channel, desktop motherboards top out at dual channel. HEDT platforms range from triple to hex channel, server platforms go from quad to eight channel, intel has a 12 channel but it costs more than my car and is more like dual hex channel anyway. Memory bandwidth isn't always limiting but for any modern intel desktop quad core or better it's probably pushing the limits of whatever speeds a motherboard of that era supports. |
[QUOTE=mnd9;519506]Thanks so much for your detailed response -- I may put GPU computing on pause for now until I get a better card, as you suggested, since it's hurting my Prime95 throughput for not much gain.[/QUOTE]mfakto on a discrete gpu, and configured for gpu sieving, should have very little effect on prime95 throughput. Check whether your mfakto.ini is configured to use CPU sieving instead of GPU sieving of trial factor candidates. Also check whether all your cooling fans are working and system ventilation is adequate.
|
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.