![]() |
[QUOTE=SELROC;515840]Experimenting with -block sizes for a 332M exponent:
1. the GEC time with block 400K is 2.11~ sec. 2. the GEC time with block 1000K is 4.25~ sec. The GEC time varies with block size.[/QUOTE] Yes because a check involves doing "block-size" additional iterations. E.g. with block=400, 400 additional iterations are done every 400^2==160K iterations, while with block=1000, 1000 additional iterations are done every 1000^2=1M iterations. |
[QUOTE=preda;515866]Yes because a check involves doing "block-size" additional iterations. E.g. with block=400, 400 additional iterations are done every 400^2==160K iterations, while with block=1000, 1000 additional iterations are done every 1000^2=1M iterations.[/QUOTE]
That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here. Do I want long unfrequent checks or short frequent checks ? When using a block of 1000 and a log of 20000, sometimes gpuowl misses to display the OK... (check ...) output, probably because it is in between. |
[QUOTE=SELROC;515914]That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ? When using a block of 1000 and a log of 20000, sometimes gpuowl misses to display the OK... (check ...) output, probably because it is in between.[/QUOTE] The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M. But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan) |
[QUOTE=preda;515923]The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.
But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)[/QUOTE] Ok so if you confirm that the check is displayed, I may have missed it. Experimenting further ... |
[QUOTE=preda;515923]The check is done at block-size^2 (squared). So if block=1000, the check is done every 1M. With a log of 20000, you will hit every 1M.
But what happens, for example, with a block of 400 and a log of 100K? the check will be done every 160K, and will be displayed correctly even if it doesn't hit a 'log' multiple of 100K. (at least that's the plan)[/QUOTE] PS: The Mersenne number [url]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/url] is composite. Computation took 15 days 10 hours ~ on Radeon VII. |
[QUOTE=SELROC;515931]PS: The Mersenne number [url]https://www.mersenne.org/report_exponent/?exp_lo=332252533&full=1[/url] is composite.
Computation took 15 days 10 hours ~ on Radeon VII.[/QUOTE] Impressive speed. IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing? |
[QUOTE=Prime95;515935]Impressive speed.
IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?[/QUOTE] I am currently using ROCm 2.3 which has a performance regression. I bet that with ROCm 2.4 (if they fix the issue) the ETA for 332M will be around 13-14 days. |
[QUOTE=Prime95;515935]Impressive speed.
IMO, far too little P-1 factoring was done. Were these bounds chosen to be optimal for all-Radeon testing? Might this indicate that prime95 should be used for P-1 prior to GPU PRP testing?[/QUOTE] At this time GpuOwl supports P-1, so do I better do p-1 before PRP ? |
[QUOTE=SELROC;515914]That is if I use a block of 2000 the GEC should durate 8-9~ seconds. There is a trade-off to apply here.
Do I want long unfrequent checks or short frequent checks ? [/QUOTE] Trade-off is a good word, beacuse larger blocksize=L means more work at rollbacks, because you need to redo L^2 iterations. It isn't that easy for a not that faulty card/cpu to choose the best L, but for example if you have 0.2 rollbacks per p (so basically one for 5 tests) at p~1e8 then your optimal L value is 1000. Just interestingly the exact formula: L=(2*p/#rollback)^(1/3), where #rollback is the average number of rollbacks for p (so this could be even higher than 1, for a faulty card). ps. and don't choose L>sqrt(p), because then you'd not make any check, but this also depends on your implementation. |
[QUOTE=SELROC;515937]At this time GpuOwl supports P-1, so do I better do p-1 before PRP ?[/QUOTE]
Yes, P-1 is highly recommended. It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP. Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds. IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary. For reference, prime95 would use these bounds (assuming 1.8GB of memory). Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000 Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000 |
[QUOTE=Prime95;515949]Yes, P-1 is highly recommended.
It's a question of "relative" performance. That is, if on your machine GpuOwl is 5x faster than prime95 at PRP but only 2x faster at P-1, then GpuOwl should be doing PRP 100% of the time. Use prime95 on your CPU to do all your P-1 work prior to having the the GPU do the PRP. Your P-1 bounds are "suspicious" in that when prime95 is used to double-check your result years down the road, I think prime95 will want to redo the P-1 to higher bounds. IIUC, Preda has a somewhat different view on optimal P-1 bounds in that he believes Gerbicz error checking in the initial PRP test means double-checking is not necessary. For reference, prime95 would use these bounds (assuming 1.8GB of memory). Saving 1 LL/PRP test: B1=1,115,000 B2=14,495,000 Saving 2 LL/PRP tests: B1=2,395,000 B2=35,326,000[/QUOTE] The ETA for 332M exponent is 2 months on Radeon RX580 and 15 days on Radeon VII. I am using primenet .py to get assignments and return results, gpuowl runs in parallel, and the worktodo.txt file is checked every 2 hours. The assignment type is 153. To do P-1, do I have to get a different assignment type ? |
| All times are UTC. The time now is 23:14. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.