mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GpuOwl (https://www.mersenneforum.org/forumdisplay.php?f=171)
-   -   gpuOwL: an OpenCL program for Mersenne primality testing (https://www.mersenneforum.org/showthread.php?t=22204)

paulunderwood 2020-05-05 04:16

different fft sizes
 
I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?

Prime95 2020-05-05 04:41

[QUOTE=paulunderwood;544616]I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?[/QUOTE]

One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.

paulunderwood 2020-05-05 06:37

[QUOTE=Prime95;544617]One would presume so. BTW, the latest commit supports exponents up to 106.6M in the 5.5M FFT.[/QUOTE]

[CODE]./gpuowl
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./gpuowl)
[/CODE]

I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?

paulunderwood 2020-05-05 07:36

[QUOTE=paulunderwood;544620][CODE]./gpuowl
./gpuowl: /usr/lib/x86_64-linux-gnu/libstdc++.so.6: version `GLIBCXX_3.4.26' not found (required by ./gpuowl)
[/CODE]

I don't have GLIBCXX_3.4.26 on my Debian Buster -- is there a work-around?[/QUOTE]

I have two compilers. I think I installed gcc 9 manually for the source and 8 is native. Anyway for my purposes I hard wired g++-8 into the make file and all is hunky dory now.

kriesel 2020-05-05 18:50

[QUOTE=kriesel;544376]Quick update / recap on that[/QUOTE]
For what it's worth, this matching LL DC with shift 0 was performed in gpuowl v6.11-264 on the RX480 in the same system as was previously having frequent GEC errors on rx550s.
[url]https://www.mersenne.org/report_exponent/?exp_lo=55487503&full=1[/url]

ewmayer 2020-05-05 20:24

[QUOTE=paulunderwood;544616]I have been running two instances at 5632K. Now I have a new set of assignments at 6144K. Running two different FFT sizes has two effects.: The smaller runs faster and the larger runs much slower. It is more efficient to run one instance. Will this imbalance be redressed when I have equal 6144K assignments?[/QUOTE]

Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

[b]Edit:[/b] I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.

paulunderwood 2020-05-05 21:59

[QUOTE=ewmayer;544664]Paul, what expo(s) are you running that need 6144K? I'd be interested to see your per-job timings for the following 2-job setups:

1. Both @5632K;
2. One each @5632K and @6144K;
3. Both @6144K.

Presumably you already know the timing for the first 2 ... you could temporarily move one of your queued-up 6144K assignments to top of the worktodo file for the current 5632K run to get both @6144K.

If the slowdown for 2 compared to 1 and 3 really is as bad as you describe, I wonder if it's something to do with context-switching on the GPU between tasks that have different memory mappings: 2 jobs at same FFT length have different run data and e.g. DWT weights but have the same memory profile and GPU resources usage.

[b]Edit:[/b] I tried the above three 2-jobs scenarios on my own Radeon7, using expos ~107M to trigger the 6M FFT length. Here are the per-iteration timings:

1. Both @5632K: 1470 us/iter for each, total throughput 1360 iter/sec;
2. One each @5632K,@6144K: 1530,1546 us/iter resp., total throughput 1300 iter/sec;
3. Both @6144K: 1615 us/iter for each, total throughput 1238 iter/sec.

So no anomalous slowdowns for me at any of these combos, and the per-iteration timings hew very closely to what one would expect based on an n*log(n) per-autosquaring scaling.[/QUOTE]

1. Both @5632K; ---> 1489us/it each
2. One each @5632K and @6144K ----> the latter was ~2300us/it (very slow); the former 1125us/it

At the moment (with latest commit) it is running ~1200us/it (103.9M) and ~1800us/it (104.9M). They were running at the average earlier until I restarted them.

It is my last 103.9M exponent.

paulunderwood 2020-05-07 05:49

Was ~1200us/it (103.9M) and ~1800us/it (104.9M).

Now 1440us/it each -- both at 104.9M.

xx005fs 2020-05-07 16:56

PM1 Result not understood
 
It seems that for factored PM1 results out of GPUOWL, primenet won't be able to understand it.

[CODE]{"status":"F", "exponent":"98141611", "worktype":"PM1", "B1":"750000", "B2":"15000000", "fft-length":"5767168", "factors":"["****"]", "program":{"name":"gpuowl", "version":"v6.11-258-gb92cdfd"}, "computer":"TITAN V-0", "aid":"******", "timestamp":"2020-05-06 07:29:29 UTC"}[/CODE]

S485122 2020-05-07 18:23

[QUOTE=paulunderwood;544778]Was ~1200us/it (103.9M) and ~1800us/it (104.9M).

Now 1440us/it each -- both at 104.9M.[/QUOTE]"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob

paulunderwood 2020-05-07 18:35

[QUOTE=S485122;544825]"us" ? Usually it is capitalised as "US", but it is not a unit (AFAIK.) Or do you (and preceding posters) mean µs ?

Jacob[/QUOTE]

Yes, I meant µs. But how do I generate mu with the keyboard easily?

nvm: I found [URL="https://www.maketecheasier.com/quickly-type-special-characters-linux/"]this on how to do it Gnome[/URL] without having to remember and use unicodes. Thanks for prompting me!


All times are UTC. The time now is 23:05.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.