![]() |
[QUOTE=flashjh;358204]This is not true... see [URL="http://www.mersenneforum.org/showthread.php?p=40816#post40816"]here[/URL].
[/QUOTE] That is very true. Read what I wrote. I did some exponent to B1=100. How do YOU extend it to B1=1000? |
[QUOTE=flashjh;358204]This is not true...[/QUOTE]I think what [i]LaurV[/i] meant is that B1 can't be extended [u]without having the stage1 savefile[/u]. If you do have the savefile then sure you can extend stage1 no problem. And if the PrimeNet server would save all the stage1 savefiles then any user could extend B1 done by any other user, but that would take up too much server storage and bandwidth.
|
Yes indeed, you explained it better. I was referring to the fact that if some user wants to extend some B1 (for an expo assigned by primenet or gpu72, for which he didn't do any work before, like there are many on mersenne.ca with insufficient P-1 done), he currently need to do everything from scratch. That is because storing checkpoint files on the server is costly (not as much to store - now there are cheap big HDDs - as to download, the trafic will be overkill).
If you look how many P-1 "extensions" were done, especially for small exponents (I just [URL="http://www.mersenne.org/report_exponent/?exp_lo=219647&exp_hi=&B1=Get+status"]did one recently[/URL]), a lot of resources were wasted by doing stage 1 from scratch, for some expos 5 or 6 times!. Anyhow, in spite of the fact that what Jerry said about P95 is also true, my current problem wasn't P95, but cudapm1. |
Right. I see I overlooked what you meant. I just wanted to point out that P95 can extend B1, so the code for such work exists, even if it can't be used in cudapm1. Extending B2 would be awesome!
|
How efficient are GPU's at P-1? Compared to trial factoring, for example?
|
[QUOTE=TheMawn;358217]How efficient are GPU's at P-1? Compared to trial factoring, for example?[/QUOTE]I'm sure someone will give a better answer, but:
CUDAPm1 is based on CudaLucas so the relative performance charts on my [url=http://www.mersenne.ca/cudalucas.php]CudaLucas page[/url] compared to my [url=http://www.mersenne.ca/mfaktc.php]mfaktc page[/url] should be vaguely applicable. The latest version of CUDAPm1 does [url=http://www.mersenneforum.org/showpost.php?p=354013&postcount=375]include a benchmark[/url] but so far only one person has sent me any data from it so I don't want to read too much into such a small sample size. |
for a 560
[code] Iteration 32000 M11802799, 0xb8423c5eaf567790, n = 648K, CUDAPm1 v0.10 err = 0. 7080 (0:01 real, 2.5918 ms/iter, ETA 8:17) Iteration 33000 M11802799, 0x22fb44273d4c946e, n = 648K, CUDAPm1 v0.10 err = 0. 6982 (0:03 real, 2.3335 ms/iter, ETA 7:25) Iteration 34000 M11802799, 0x50efa92a42ce2b4b, n = 648K, CUDAPm1 v0.10 err = 0. 7031 (0:02 real, 2.3368 ms/iter, ETA 7:23) Iteration 35000 M11802799, 0xeb03cb8632c5b33b, n = 648K, CUDAPm1 v0.10 err = 0. 6836 (0:02 real, 2.3391 ms/iter, ETA 7:21) Iteration 36000 M11802799, 0xdd08a619769a545f, n = 648K, CUDAPm1 v0.10 err = 0. 7031 (0:03 real, 2.3192 ms/iter, ETA 7:15) Iteration 37000 M11802799, 0xd874a96b5ddd0ff7, n = 648K, CUDAPm1 v0.10 err = 0. 6641 (0:02 real, 2.3400 ms/iter, ETA 7:17) [/code] and stage 2 [code] Transforms: 2052 M11802943, 0xde1309e648ca422a, n = 648K, CUDAPm1 v0.10 err = 0 .06641 (0:03 real, 1.1989 ms/tran, ETA 0:07) Transforms: 2148 M11802943, 0xe642ce5b422dd69d, n = 648K, CUDAPm1 v0.10 err = 0 .07031 (0:02 real, 1.2181 ms/tran, ETA 0:05) Transforms: 2046 M11802943, 0x894a405d75167ee1, n = 648K, CUDAPm1 v0.10 err = 0 .06641 (0:03 real, 1.1928 ms/tran, ETA 0:02) Transforms: 2092 M11802943, 0xfcb58d1a13c3410f, n = 648K, CUDAPm1 v0.10 err = 0 .06641 (0:02 real, 1.2029 ms/tran, ETA 0:00) [/code] |
[QUOTE=firejuggler;358231]for a 560[/QUOTE]The benchmark invoked by the linked code snippet generates a benchmark file, and [i]owftheevil[/i] was even kind enough to suggest in the screen output that people email it to me. :smile:[code]./CUDAPm1 -cufftbench 1 8192 1[/code]
|
sorry
[code] CUDAPm1 v0.10 CUFFT bench start = 1 end = 8192 distance = 1 CUFFT_Z2Z size= 1024 time= 0.008226 msec [/code] better? timing varies between 0.007559 msec to 0.008309 msec |
Better, yes, but if you (and anyone else who cares to share) could email me the whole file that'd be great, I can try and establish some expected-performance numbers once I have a decent sample size.
|
1 Attachment(s)
Earlier version only gave me the one line I posted earlier.
Now here is a file that might help. Sent by mail too. The first version of the file was while I ran Msieve_gpu. So I just reran it. |
| All times are UTC. The time now is 23:19. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.