![]() |
[QUOTE=kriesel;536893]CUDALucas still has its place;
faster on a few gpu models than gpuowl; will run on older NVIDIA gpus that are entirelly incapable of running gpuowl because they don't support the required OpenCL level for gpuowl; relatively current gpuowl versions don't do LL so can't do LLDC (although v0.5 and v0.6 gpuowl can with 4M fft) It would be great if CUDALucas had the Jacobi check.[/QUOTE] Ha, ha, re. your note about running on older nVidia GPUs -- in preparation for my recent upgrade of my deskside Haswell system to put a Radeon 7 in the PCI3 slot, I first removed an ancient gtx430 card from the PCI2 slot. Mike/Xyzzy had gifted me that ~5 years ago to use to play with CUDA development work - I actually got as far as working TF code, but never did get the sieving stuff optimized for the nVidia architecture, so it was spending way more time there than it should, overall speed was about 1/2 that of mfaktc. Anyhow, I still have the card, could re-install it in PCI2, that would leave a mere 1" gap between it and the underbelly fan array of the R7 so I would need to make sure it wasn't significantly impeding airflow to the latter. Just curious - do you have any sense how fast - and I use the term very loosely :) - this card would be at LL and TF? Probably not worth it on a work-per-watt basis, but the experiment might be useful in terms of seeing whether *some* kind of GPU - e.g. a newer used model of one of the ones known to be good choices for GIMPS work - could go into PCI2 without hurting the R7 throughput. (A second R7 is not an option, even if it could be adapted to go into PCI2, my PS has only 2 8-pin power connects, and the current R7 uses them both.) |
[QUOTE=ewmayer;536909]I first removed an ancient gtx430 ... do you have any sense how fast - and I use the term very loosely :) - this card would be at LL and TF? Probably not worth it on a work-per-watt basis[/QUOTE]Probably slower than any of these:
[URL]https://www.mersenne.ca/mfaktc.php[/URL] (enter gtx 4 in the model search box) [URL]https://www.mersenne.ca/cudalucas.php[/URL] (ditto) If you want a more sure answer, find the NVIDIA spec pages for the GTX430 and one or more of the models listed on Heinrich's pages, and compute an estimate by proportion. Since my test GTX480 could not run gpuowl, the GTX430's chances are slim. It would look pretty weak compared to a Radeon VII. Or in mfaktc, compared to a GTX 1xxx or RTX. Try it on. Send James some benchmarks. Understandable if you don't regard its space-heater value as sufficient to run for long, in California. |
gpuowl-win build 6.11-142-gf54af2e
2 Attachment(s)
Stumbled across the new commit while bringing up a new Colab account. Haven't run it other than -h. I defer to Mihai as to what this offers beyond v6.11-134.
It still eats a whole cpu core on Windows 10 during P-1 with -yield option included. At least for a while after startup. |
10M test p-1 reliably missed known factor
The first run of the 10M exponent was with optimized -use flags on gpuowl-win v6.11-134. After that it was just -use NO_ASM. Different -use, different bounds, never found the known factor although all the bounds tried should have been adequate. The 50M test p-1 would not run with the numerous -use options previously in place for a different fft length, so those options were temporarily removed.
The same radeon vii gpu has run error free in PRP GEC since a clock reduction Dec 15 2019. (1250Mhz gpu, 880 Mhz memory; runs P-1 at 70-100W; hot spot currently 71C) It also missed the known factor on 4444091. And missed the known factors of 2000081. And that of 15000031. Everything I tried below 20M failed. [CODE]{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:17:56 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000} {"exponent":"50001781", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:31:40 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":2883584, "B1":100000, "B2":5000000, "factors":["4392938042637898431087689"]} {"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:33:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000} {"exponent":"24000577", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:43:41 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1310720, "B1":300000, "factors":["13504596665207"]} {"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:46:42 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":40000, "B2":600000} {"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:48:38 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":120000, "B2":2200000} {"exponent":"4444091", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:02:00 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":229376, "B1":15015, "B2":90000} {"exponent":"61012769", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:05:19 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":3670016, "B1":20000, "B2":2000000, "factors":["2018028590362685212673"]} {"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:19:58 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":300300} {"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:24:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":30030} {"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:55:47 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000} {"exponent":"20000023", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 01:05:52 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1179648, "B1":240000, "factors":["60100040564410724460091241"]} [/CODE] |
[QUOTE=preda;533571]ROCm exposes a per-GPU unique_id, e.g.:
[CODE] cat /sys/class/drm/card0/device/unique_id 3044212172dc768c [/CODE]This id is a property of the GPU itself, and does not depend on the system or PCIe slot. So changing a GPU in a different slot, or in a different system, preserves the UID. I added a way to specify the GPU to run on by using this unique id: ./gpuowl -uid 3044212172dc768c this can be used instead of -device (-d) which specifies the device by position in the list of devices. The advantage is that the identity of the GPU is preserved when swapping the PCIe slots. Combining -uid with -cpu allows to associate a stable symbolic name to an actual GPU. [/QUOTE] The Windows driver does not support this, yielding a nul id:[CODE]2020-02-09 19:00:02 roa/radeonvii Bye 2020-02-09 19:00:06 config: -device 1 -user kriesel -cpu roa/radeonvii -yield -maxAlloc 16000 -use NO_ASM 2020-02-09 19:00:06 device 1, unique id ''[/CODE] |
Thank you for the bug report! investigating..
The first failed case, 10000831, is fixed by using "-use ORIG_SLOWTRIG". Could you please check if there are any failures with -use ORIG_SLOWTRIG In parallel we'll be looking for a better fix. A faster way to repro the problem is e.g. "gpuowl -prp 10000831" which fails GEC. Note: if re-running the P-1s be sure to delete the savefiles from the previous runs (or run in a new location) [QUOTE=kriesel;537179]The first run of the 10M exponent was with optimized -use flags on gpuowl-win v6.11-134. After that it was just -use NO_ASM. Different -use, different bounds, never found the known factor although all the bounds tried should have been adequate. The 50M test p-1 would not run with the numerous -use options previously in place for a different fft length, so those options were temporarily removed. The same radeon vii gpu has run error free in PRP GEC since a clock reduction Dec 15 2019. (1250Mhz gpu, 880 Mhz memory; runs P-1 at 70-100W; hot spot currently 71C) It also missed the known factor on 4444091. And missed the known factors of 2000081. And that of 15000031. Everything I tried below 20M failed. [CODE]{"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:17:56 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000} {"exponent":"50001781", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:31:40 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":2883584, "B1":100000, "B2":5000000, "factors":["4392938042637898431087689"]} {"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:33:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000} {"exponent":"24000577", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:43:41 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1310720, "B1":300000, "factors":["13504596665207"]} {"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:46:42 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":40000, "B2":600000} {"exponent":"10000831", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-09 23:48:38 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":120000, "B2":2200000} {"exponent":"4444091", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:02:00 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":229376, "B1":15015, "B2":90000} {"exponent":"61012769", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:05:19 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":3670016, "B1":20000, "B2":2000000, "factors":["2018028590362685212673"]} {"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:19:58 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":300300} {"exponent":"2000081", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:24:03 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "B2":30030} {"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 00:55:47 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000} {"exponent":"20000023", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 01:05:52 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1179648, "B1":240000, "factors":["60100040564410724460091241"]} [/CODE][/QUOTE] |
@Ken An attempted fix is in the most recent commit [url]https://github.com/preda/gpuowl/commit/6146b6d49716011d0340f5a670653c12ef4f417c[/url]
, which is supposed to fix the issues without requiring -use ORIG_SLOWTRIG [QUOTE=preda;537204]Thank you for the bug report! investigating.. The first failed case, 10000831, is fixed by using "-use ORIG_SLOWTRIG". Could you please check if there are any failures with -use ORIG_SLOWTRIG In parallel we'll be looking for a better fix. A faster way to repro the problem is e.g. "gpuowl -prp 10000831" which fails GEC. Note: if re-running the P-1s be sure to delete the savefiles from the previous runs (or run in a new location)[/QUOTE] |
[QUOTE=preda;537214]@Ken An attempted fix is in the most recent commit [URL]https://github.com/preda/gpuowl/commit/6146b6d49716011d0340f5a670653c12ef4f417c[/URL]
, which is supposed to fix the issues without requiring -use ORIG_SLOWTRIG[/QUOTE] Thanks for the quick actions on this. The issue reached to at least 19M. [CODE]{"exponent":"18000137", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 17:58:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":220000, "B2":5060000} {"exponent":"19000013", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 19:03:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":230000, "B2":9520000} [/CODE]Will try a new commit later. |
Hi Ken, running with -use ORIG_SLOWTRIG I find factors (in stage1 already) for both 18000137 and 19000013. Do you still see a problem with -use ORIG_SLOWTRIG?
[QUOTE=kriesel;537248]Thanks for the quick actions on this. The issue reached to at least 19M. [CODE]{"exponent":"18000137", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 17:58:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":220000, "B2":5060000} {"exponent":"19000013", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 19:03:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":230000, "B2":9520000} [/CODE]Will try a new commit later.[/QUOTE] |
[QUOTE=preda;537256]Hi Ken, running with -use ORIG_SLOWTRIG I find factors (in stage1 already) for both 18000137 and 19000013. Do you still see a problem with -use ORIG_SLOWTRIG?[/QUOTE]Win10 x64, gpuowl v6.11-134
In config.txt:[CODE]-device 1 -user kriesel -cpu roa/radeonvii -yield -maxAlloc 16000 -use NO_ASM,ORIG_SLOWTRIG[/CODE]In worktodo:[CODE]B1=15015,B2=300300;PFactor=0,1,2,2000081,-1,61,2 B1=3000;PFactor=0,1,2,4444091,-1,64,2 B1=30000,B2=500000;PFactor=0,1,2,10000831,-1,70,2 B1=180000,B2=3780000;PFactor=0,1,2,15000031,-1,66,2 B1=220000,B2=5060000;PFactor=0,1,2,18000137,-1,35,2 B1=230000,B2=9520000;PFactor=0,1,2,19000013,-1,53,2[/CODE]Results (15M missed factor): [CODE]{"exponent":"2000081", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:25:34 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":131072, "B1":15015, "factors":["2700109974025273"]} {"exponent":"4444091", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:25:47 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":229376, "B1":15015, "factors":["1809798096458971047321927127"]} {"exponent":"10000831", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:26:15 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":524288, "B1":30000, "B2":500000, "factors":["646560662529991467527"]} {"exponent":"15000031", "worktype":"PM1", "status":"NF", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:30:07 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":786432, "B1":180000, "B2":3780000} {"exponent":"18000137", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:32:35 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":220000, "factors":["2479169845866581244380961527"]} {"exponent":"19000013", "worktype":"PM1", "status":"F", "program":{"name":"gpuowl", "version":"v6.11-134-g1e0ce1d"}, "timestamp":"2020-02-10 21:35:57 UTC", "user":"kriesel", "computer":"roa/radeonvii", "fft-length":1048576, "B1":230000, "factors":["4674003199"]}[/CODE]I don't know why, but the 768k fft for 15M took 444us/it in P1 while the 1024k for 18M, 19M took 310us/it in P1. I reran the 15M with a 3980000 B2 but it still missed. |
I ran
[CODE]B1=180000,B2=3780000;PFactor=0,1,2,15000031,-1,66,2 [/CODE] with a fresh checked out master, with and without ORIG_SLOWTRIG, as well as an old version (pre new sin/cos code) and none of those found a factor. So there may be some other issue. I'll keep checking. |
| All times are UTC. The time now is 23:12. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.