![]() |
How about a global source code search and replace from "CUDALucas" to "CUDAPm1" ?
It's in the ini file too. |
Thanks. Done, except for the ini file. It needs a complete rewrite anyway.
|
I have a few questions:
1. Has stage 2 saving been implemented? 2. Does this program output Prime95-style timestamps? 3. Considering that GPUs are much faster than CPUs, would it be reasonable to use larger B1 and B2 values? Sorry if any of them have already been covered. |
[QUOTE=ixfd64;358173]I have a few questions:
1. Has stage 2 saving been implemented? 2. Does this program output Prime95-style timestamps? 3. Considering that GPUs are much faster than CPUs, would it be reasonable to use larger B1 and B2 values? Sorry if any of them have already been covered.[/QUOTE] 1. Yes it is, and resuming works very nice (REALLY NICE!) for both stage 1 and stage 2, but [U]be careful[/U] and DO NOT DELETE the stage [B][U]1[/U][/B] saving files when resuming. I don't know if it is a bug, or it was intended to work like that, but if you want to resume a stage 2, even if you have the stage 2 checkpoints (so theoretically you don't need stage 1 checkpoints anymore), if stage 1 checkpoints are not found, the program (is a little bit stupid :razz: and it) will do stage 1 from scratch. I found this by mistake, because normally I keep all the stage 1 files, with big hopes that sometime in the future we will be able to EXTEND the B1 limit, which - in my opinion - is more important than resuming stage 2. As it is now, you can't extend B1, without doing all the stage 1 work from scratch. (edit: which is the case for Prime95 too, and it is a pity, because a lot of contributors wasted time to redo the P-1 stage 1 when they wanted to extend the limits. The main problem is that keeping P-1 huge checkpoint files on the server will take too much space and it will generate too much traffic). But well... regardless of my dreams, resuming both stage 1 and 2 works very nice in the actual implementation. 2. No. But you don't need those, anyhow manual reports can't parse them and you would need to manually edit your result files, which would be not recommended... 3. You still can specify your own B1 and B2 in the command line, just create a small batch. I do low expos (under 1M) to high limits in this way. The developers promised (see few posts above) a full rewritten ini file :razz:, we wait for the time when all B1, B2, b (the base, sometime specifying a base different of 3 might help, see my pari implementation), e, d, etc be allowed to be specified in the ini files. Dreaming on... BTW, same batch files with command lines for testing, same hardware, same limits, same everything, the new version shows "e=2" in the result files, where the old version used to show "e=6". There seems to be no difference in working. Stupid question: why? Also: (@owftheevil) the residues which are attached to the file names (you know my old fixed idea... can't get it out of my head... :razz:) are wrong. Compare them with displayed values. They are "one -c step" behind, showing not the residue reached after calculus, but the residue from which the calculus started. I found this the hard way: I was running one test with "-c 1000", because I didn't want to wait between the outputs (it was just for testing) and then I realized that the space on the disk is consumed too fast (due to many checkpoints), I switched to -c 10k for the second test (the comparison, witness, DC, whatever you want to call it). Of course, meantime I deleted all checkpoints from the first run which were not multiple of 10k... Big mistake! At the end the remaining files had the residue from iteration 9000, 19000, 29000, and so on, and the names did not match with the second run, which had the residues from 10k, 20k, etc, but attached to the wrong files (i.e. the 20k iteration checkpoint had the residue of 10k iteration, the 30k checkpoint had the residue of 20k iteration, and so on, versus the first run where the remaining-after-deletion files, the 10k iteration checkpoint had the residue of 9k iteration, the 20k checkpoint had the residue of 19k iteration, and so on). Interesting enough, the content off the files (when compared ignoring the file name) was identical, and correct. And to finish in a positive note: I really REALLY love the FFT and especially the threads tuning mechanism. BRILLIANT! You have to try it and see, to beleive. On a gtx580, you can get about 10% or more performance only from thread tuning, without touching the clocks or fft lengths. |
Thanks for the detailed reply. I hope these shortcomings will soon be eliminated.
Timestamps aren't [I]that[/I] essential, but they can be quite useful for investigating interruptions. For example, if I let the program run unsupervised for a few days and come back to see that it had crashed, the timestamp could quickly help me determine when the crash occurred. In any case, implementing timestamps probably wouldn't require more than a few lines of code. :smile: |
I use the date/time of the checkpoint files for that thing. Also (see few posts above), having a batch file like
:label cudapm1 goto label is very helpful, as the program may often crash due to a bug in memory allocation of the cuda55 drivers [edit: for my system, this seems to happen only for the card which drives the monitors, and not for the other cards, even if they do physx too, but SLI is disabled, and they are not connected to monitors]. Having the right ini file and starting from a cycling batch as above helps a lot in this situation, the program will restart and resume properly (tried already). |
@owftheevil re. extending B1: If it helps, I have a pari implementation of the algorithm, with all the calculus of the power differences for the small primes, etc.
|
[QUOTE=LaurV;358176](edit: which is the case for Prime95 too, and it is a pity, because a lot of contributors wasted time to redo the P-1 stage 1 when they wanted to extend the limits. The main problem is that keeping P-1 huge checkpoint files on the server will take too much space and it will generate too much traffic). [/QUOTE]
This is not true... see [URL="http://www.mersenneforum.org/showthread.php?p=40816#post40816"]here[/URL]. Also, from the P95 undoc.txt file: [CODE]By default P-1 work does not delete the save files when the work unit completes. This lets you run P-1 to a higher bound at a later date. You can force the program to delete save files by adding this line to prime.txt: KeepPminus1SaveFiles=0[/CODE]I have experimented with many settings. To extend B1, while B1 is in progress, just increase B1. If you know you'll want to increase B1 later, do B1=B2 and then save the file(s) once complete, noting the current B1 value. When you want to increase B1, just set B1=B2 > current B1 and P95 will continue. If B1 is very high, it may appear like it's not working, but just be patient. (To see the progress, just stop the worker and then restart, you'll see it's going quite fast to get back to where it left off). This allows you to get to a certain B1, run a B2 value and then raise B1 and do another B2 run. You can also continue to raise B1 on another system (or from another folder) and then run B2 again. |
[QUOTE=flashjh;358204]This is not true... see [URL="http://www.mersenneforum.org/showthread.php?p=40816#post40816"]here[/URL].
[/QUOTE] That is very true. Read what I wrote. I did some exponent to B1=100. How do YOU extend it to B1=1000? |
[QUOTE=flashjh;358204]This is not true...[/QUOTE]I think what [i]LaurV[/i] meant is that B1 can't be extended [u]without having the stage1 savefile[/u]. If you do have the savefile then sure you can extend stage1 no problem. And if the PrimeNet server would save all the stage1 savefiles then any user could extend B1 done by any other user, but that would take up too much server storage and bandwidth.
|
Yes indeed, you explained it better. I was referring to the fact that if some user wants to extend some B1 (for an expo assigned by primenet or gpu72, for which he didn't do any work before, like there are many on mersenne.ca with insufficient P-1 done), he currently need to do everything from scratch. That is because storing checkpoint files on the server is costly (not as much to store - now there are cheap big HDDs - as to download, the trafic will be overkill).
If you look how many P-1 "extensions" were done, especially for small exponents (I just [URL="http://www.mersenne.org/report_exponent/?exp_lo=219647&exp_hi=&B1=Get+status"]did one recently[/URL]), a lot of resources were wasted by doing stage 1 from scratch, for some expos 5 or 6 times!. Anyhow, in spite of the fact that what Jerry said about P95 is also true, my current problem wasn't P95, but cudapm1. |
Right. I see I overlooked what you meant. I just wanted to point out that P95 can extend B1, so the code for such work exists, even if it can't be used in cudapm1. Extending B2 would be awesome!
|
How efficient are GPU's at P-1? Compared to trial factoring, for example?
|
[QUOTE=TheMawn;358217]How efficient are GPU's at P-1? Compared to trial factoring, for example?[/QUOTE]I'm sure someone will give a better answer, but:
CUDAPm1 is based on CudaLucas so the relative performance charts on my [url=http://www.mersenne.ca/cudalucas.php]CudaLucas page[/url] compared to my [url=http://www.mersenne.ca/mfaktc.php]mfaktc page[/url] should be vaguely applicable. The latest version of CUDAPm1 does [url=http://www.mersenneforum.org/showpost.php?p=354013&postcount=375]include a benchmark[/url] but so far only one person has sent me any data from it so I don't want to read too much into such a small sample size. |
for a 560
[code] Iteration 32000 M11802799, 0xb8423c5eaf567790, n = 648K, CUDAPm1 v0.10 err = 0. 7080 (0:01 real, 2.5918 ms/iter, ETA 8:17) Iteration 33000 M11802799, 0x22fb44273d4c946e, n = 648K, CUDAPm1 v0.10 err = 0. 6982 (0:03 real, 2.3335 ms/iter, ETA 7:25) Iteration 34000 M11802799, 0x50efa92a42ce2b4b, n = 648K, CUDAPm1 v0.10 err = 0. 7031 (0:02 real, 2.3368 ms/iter, ETA 7:23) Iteration 35000 M11802799, 0xeb03cb8632c5b33b, n = 648K, CUDAPm1 v0.10 err = 0. 6836 (0:02 real, 2.3391 ms/iter, ETA 7:21) Iteration 36000 M11802799, 0xdd08a619769a545f, n = 648K, CUDAPm1 v0.10 err = 0. 7031 (0:03 real, 2.3192 ms/iter, ETA 7:15) Iteration 37000 M11802799, 0xd874a96b5ddd0ff7, n = 648K, CUDAPm1 v0.10 err = 0. 6641 (0:02 real, 2.3400 ms/iter, ETA 7:17) [/code] and stage 2 [code] Transforms: 2052 M11802943, 0xde1309e648ca422a, n = 648K, CUDAPm1 v0.10 err = 0 .06641 (0:03 real, 1.1989 ms/tran, ETA 0:07) Transforms: 2148 M11802943, 0xe642ce5b422dd69d, n = 648K, CUDAPm1 v0.10 err = 0 .07031 (0:02 real, 1.2181 ms/tran, ETA 0:05) Transforms: 2046 M11802943, 0x894a405d75167ee1, n = 648K, CUDAPm1 v0.10 err = 0 .06641 (0:03 real, 1.1928 ms/tran, ETA 0:02) Transforms: 2092 M11802943, 0xfcb58d1a13c3410f, n = 648K, CUDAPm1 v0.10 err = 0 .06641 (0:02 real, 1.2029 ms/tran, ETA 0:00) [/code] |
[QUOTE=firejuggler;358231]for a 560[/QUOTE]The benchmark invoked by the linked code snippet generates a benchmark file, and [i]owftheevil[/i] was even kind enough to suggest in the screen output that people email it to me. :smile:[code]./CUDAPm1 -cufftbench 1 8192 1[/code]
|
sorry
[code] CUDAPm1 v0.10 CUFFT bench start = 1 end = 8192 distance = 1 CUFFT_Z2Z size= 1024 time= 0.008226 msec [/code] better? timing varies between 0.007559 msec to 0.008309 msec |
Better, yes, but if you (and anyone else who cares to share) could email me the whole file that'd be great, I can try and establish some expected-performance numbers once I have a decent sample size.
|
1 Attachment(s)
Earlier version only gave me the one line I posted earlier.
Now here is a file that might help. Sent by mail too. The first version of the file was while I ran Msieve_gpu. So I just reran it. |
1 Attachment(s)
GTX 580, emailed also.
|
[QUOTE=LaurV;358209]Yes indeed, you explained it better. I was referring to the fact that if some user wants to extend some B1 (for an expo assigned by primenet or gpu72, for which he didn't do any work before, like there are many on mersenne.ca with insufficient P-1 done), he currently need to do everything from scratch. That is because storing checkpoint files on the server is costly (not as much to store - now there are cheap big HDDs - as to download, the trafic will be overkill).[/QUOTE]
It doesn't seem that expensive to me. I'm P-1ing M63970349, using 3456K, which result in a save file of about 13.5 MB. My internet connection at home could download or upload that in 2.16 seconds (unlimited 50/50 Mbps symmetrical can be had for $115/month, or 175/175 for $211). Alternatively, I could rent a server with 20 TB/month traffic for $100/month, which would provide for the uploading or downloading of over 1 million save files. |
[QUOTE=Mark Rose;358288]My internet connection at home could download or upload that in 2.16 seconds (unlimited 50/50 Mbps symmetrical can be had for $115/month, or 175/175 for $211)[/QUOTE]You are most fortunate. Many of us would consider 1Mbps upstream "very good".
|
[QUOTE=James Heinrich;358290]You are most fortunate. Many of us would consider 1Mbps upstream "very good".[/QUOTE]
I suppose fast internet is one benefit of living in Hogtown. Personally, I'd rather be out in the country! Though I'd probably start scheming to trench fiber... I do remember the days of having to use a 2400 bits per second connection a mere 14 years ago. My parents took the computer away, but I assembled another out of antique free parts. Downloading JPEGs was a luxury. |
1 Attachment(s)
[QUOTE=James Heinrich;358258]Better, yes, but if you (and anyone else who cares to share) could email me the whole file that'd be great, I can try and establish some expected-performance numbers once I have a decent sample size.[/QUOTE]
here you are for 580 |
1 Attachment(s)
FWIW, here's another GTX 560, CUDAPm1 v0.20.
|
Trouble with cudapm1. I have a couple of 69M exponents reserved from gpu72, for which stage 1 was done, but stage 2 crashes without any error. Two of my cards wasted the last 20 hours or so, retrying stage 2 (the batch loop, remember?).
When I saw the thing (in fact, I saw less lines in the result file than expected), I stopped the batch and tried with different parameters for -e2, -d2, and even "UnusedMem=" in the ini file, with or without "M" at the end, the result is the same: some memory-related crash (i assume, because the test works for cards with 3G, and fails in all cards with 1536M of memory). I did not report the "stage 1" yet, and still keep the s1 checkpoint files, if someone else is going to try (8MB checkpoint file size, for a 4096k FFT) |
[QUOTE=LaurV;358915]Trouble with cudapm1. I have a couple of 69M exponents reserved from gpu72, for which stage 1 was done, but stage 2 crashes without any error. Two of my cards wasted the last 20 hours or so, retrying stage 2 (the batch loop, remember?).
When I saw the thing (in fact, I saw less lines in the result file than expected), I stopped the batch and tried with different parameters for -e2, -d2, and even "UnusedMem=" in the ini file, with or without "M" at the end, the result is the same: some memory-related crash (i assume, because the test works for cards with 3G, and fails in all cards with 1536M of memory). I did not report the "stage 1" yet, and still keep the s1 checkpoint files, if someone else is going to try (8MB checkpoint file size, for a 4096k FFT)[/QUOTE] Two weeks on vacation, and I've lost you guys :sad: Is there some documetation on how to use benchmark files on CudaP-1 ? Luigi |
Quoting post 404,
[QUOTE=James Heinrich;358234]The benchmark invoked by the linked code snippet generates a benchmark file, and [i]owftheevil[/i] was even kind enough to suggest in the screen output that people email it to me. :smile:[code]./CUDAPm1 -cufftbench 1 8192 1[/code][/QUOTE] |
[QUOTE=firejuggler;358944]Quoting post 404,[/QUOTE]
Thanks :smile: :bow: Luigi |
[QUOTE=LaurV;358915]Trouble with cudapm1. I have a couple of 69M exponents reserved from gpu72, for which stage 1 was done, but stage 2 crashes without any error. Two of my cards wasted the last 20 hours or so, retrying stage 2 (the batch loop, remember?).
When I saw the thing (in fact, I saw less lines in the result file than expected), I stopped the batch and tried with different parameters for -e2, -d2, and even "UnusedMem=" in the ini file, with or without "M" at the end, the result is the same: some memory-related crash (i assume, because the test works for cards with 3G, and fails in all cards with 1536M of memory). I did not report the "stage 1" yet, and still keep the s1 checkpoint files, if someone else is going to try (8MB checkpoint file size, for a 4096k FFT)[/QUOTE] How far past stage1 gcd does it get? What is the reported available memory and estimate of memory it will use (beginning of stage 1)? Are you doing stage 1 and stage 2 on different cards? |
[QUOTE=owftheevil;358998]How far past stage1 gcd does it get? What is the reported available memory and estimate of memory it will use (beginning of stage 1)? Are you doing stage 1 and stage 2 on different cards?[/QUOTE]
I had the same behavior. It turned out that the outside temperature was too hot... :smile: Luigi |
Sorry for the lack of the details. I will get home in the evening and come back with some output snippets. I still have the stage 1 final files. To confirm: the problem is not related to temperature, to my cards, to my computer, etc. I can accurately reproduce it in different cards (all 580 with 1536M ram), different computers. The problem is related to 69M exponents, the program just crash without giving any error, immediately after the init of stage 2 finishes. I tried -e2 and -d2 switches with different values, and I [B][U]also[/U][/B] tried to put "UnusedMem=" in the ini file. The number of primes and the memory used always differ (like from 1 prime to 27 primes, from 308M to 1370M used, the card says ~1400M free), as a sign that the switches [B][U]are working[/U][/B]. I also retried stage 1 from previous checkpoints, which can finish without problem, do gcd, init stage 2, BOOM!
You know me from the past, I always have been the first to blame heat, dust, whatever, when other people had problems, but now, it is not a hardware, nor temperature problem. I have [URL="http://www.mersenneforum.org/showthread.php?t=16829"]external water cooler[/URL] (I mean, outside of the house!), with 4 liters of water, two pumps and 12 fans, and in the night the temperature outside drops to +16 in this period of the year. The installation is designed for tremendous hot Thai Aprils, when the temperature outside is in average 40C degrees. If I start all fans NOW, in this perios of the year, there is no way that the temperatures go over 50C. Trust me, this time, there is a bug in the program. |
Once stage 2 is begun, it would be a big mess to allow the various parameters, e, d, nrp to change. So if fact, regardless of any command line parameters, if stage 2 has already been initialized, e, d, nrp, b1, and b2 are taken from the saveflie. If stage 2 is initialized with 3Gb of available memory and then resumed with only 1.5Gb of available memory, there will be problems. If this is the case, delete the stage 2 savefile. Then everything should work fine. If not, I'll need all the information you can give me. Thanks for pointing out this bug.
|
I am back.
here is the stuff: [CODE] Microsoft Windows [Version 6.1.7601] Copyright (c) 2009 Microsoft Corporation. All rights reserved. e:\cpm1_1> e:\cpm1_1>cudapm1 69010547 CUDAPm1 v0.20 Warning: Couldn't parse ini file option UnusedMem; using default. ------- DEVICE 1 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1564 memClockRate (MHz) 2004 totalGlobalMem zu totalConstMem zu l2CacheSize 786432 sharedMemPerBlock zu regsPerBlock 32768 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 1432M of 1536M GPU memory free. Using threads: norm1 64, mult 64, norm2 32. No stage 2 checkpoint. Using up to 1312M GPU memory. Selected B1=640000, B2=14400000, 3.7% chance of finding a factor Using B1 = 640000 from savefile. Continuing stage 2 from a partial result of M69010547 fft length = 4096K Starting stage 2. Using b1 = 640000, b2 = 14400000, d = 2310, e = 2, nrp = 30 Zeros: 644312, Ones: 737128, Pairs: 145269 Processing 1 - 30 of 480 relative primes. Inititalizing pass... ) Quitting, estimated time spent = 0:01 e:\cpm1_1>cudapm1 69010547 -e2 6 -d2 30 CUDAPm1 v0.20 Warning: Couldn't parse ini file option UnusedMem; using default. ------- DEVICE 1 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1564 memClockRate (MHz) 2004 totalGlobalMem zu totalConstMem zu l2CacheSize 786432 sharedMemPerBlock zu regsPerBlock 32768 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 1432M of 1536M GPU memory free. Using threads: norm1 64, mult 64, norm2 32. No stage 2 checkpoint. Using up to 1312M GPU memory. Selected B1=640000, B2=14400000, 3.7% chance of finding a factor Using B1 = 640000 from savefile. Continuing stage 2 from a partial result of M69010547 fft length = 4096K Starting stage 2. Using b1 = 640000, b2 = 14400000, d = 30, e = 6, nrp = 8 Zeros: 898832, Ones: 746888, Pairs: 135479 Processing 1 - 8 of 8 relative primes. Inititalizing pass... ) Quitting, estimated time spent = 0:01 e:\cpm1_1> [/CODE] mention that there is no "stage 2" checkpoint file, and none is created during the crash. The only file is the last stage 1 checkpoint. Same story with the undocumented stuff in the ini file, setting the unused mem to 1200M (remark that the used mem in this case is much less, it could be anything between like 300 and 800M, or so) [CODE] e:\cpm1_1>cudapm1 69010547 CUDAPm1 v0.20 ------- DEVICE 1 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1564 memClockRate (MHz) 2004 totalGlobalMem zu totalConstMem zu l2CacheSize 786432 sharedMemPerBlock zu regsPerBlock 32768 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 1432M of 1536M GPU memory free. Using threads: norm1 64, mult 64, norm2 32. No stage 2 checkpoint. Using up to 608M GPU memory. Selected B1=495000, B2=2475000, 2.48% chance of finding a factor Using B1 = 640000 from savefile. Continuing stage 2 from a partial result of M69010547 fft length = 4096K Starting stage 2. Using b1 = 640000, b2 = 2475000, d = 210, e = 2, nrp = 1 Zeros: 100255, Ones: 109505, Pairs: 19827 Processing 1 - 1 of 48 relative primes. Inititalizing pass... done. transforms: 168, err = 0.02612, (0.56 real, 3.3270 ms/tran, ETA NA) Quitting, estimated time spent = 0:00 e:\cpm1_1>cudapm1 69010547 -e2 2 -d2 30 CUDAPm1 v0.20 ------- DEVICE 1 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1564 memClockRate (MHz) 2004 totalGlobalMem zu totalConstMem zu l2CacheSize 786432 sharedMemPerBlock zu regsPerBlock 32768 warpSize 32 memPitch zu maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment zu deviceOverlap 1 CUDA reports 1432M of 1536M GPU memory free. Using threads: norm1 64, mult 64, norm2 32. No stage 2 checkpoint. Using up to 352M GPU memory. Selected B1=495000, B2=2475000, 2.48% chance of finding a factor Using B1 = 640000 from savefile. Continuing stage 2 from a partial result of M69010547 fft length = 4096K Starting stage 2. Using b1 = 640000, b2 = 2475000, d = 30, e = 2, nrp = 1 Zeros: 132682, Ones: 111990, Pairs: 17316 Processing 1 - 1 of 8 relative primes. Inititalizing pass... done. transforms: 168, err = 0.02441, (0.56 real, 3.3294 ms/tran, ETA NA) Quitting, estimated time spent = 0:00 e:\cpm1_1>[/CODE] |
post size limit, I had to delete some stuff
For now I solved the problem very simple: I reported the stage 1 only (and not going to do any stage 2), and I moved all the unstarted 69M exponents into prime95's worktodo, and I brought all the 64M exponents from prime95 to cudapm1. Under 69M (fft 3584 and smaller) everything works fine. I also tried fft 3600, it is still working. I tried 4096 also with all threads combinations I could think of, not working. But working in cards with 3G memory. So, it is related to some allocation. edit2: still keeping stage 1 files |
LaurV please edit the threads.txt file, changing the norm1 threads to 128. e.g. from
[CODE]4096 64 64 32 [/CODE] to [CODE]4096 128 64 32[/CODE] and try again. The error you are getting is a round off error (Maybe I could add a line which actually tells you this?). The thread optimizing function probably has a <= where it needs a < when checking if the thread sizes are acceptable. |
:party:
You are my man! Tell me when you come to northern Thailand so I fill the refrigerators with beer! It is working. Plain and simple... And I am sure I tried many different thread combinations too (in command line only, however, not in the thread tuning file!). BTW, related to tuning file, I just wrote a "tuning batch" with a pari/gp for loop, so if someone is tired to tune all FFT possibilities, run this batch and keep the "GeForce GTX blabla threads.txt" file. You may have few FFT sizes which are not in the list, but they won't be more then few (depending on your card). Saves you of a lot of typing... [CODE] cudapm1 -cufftbench 4 4 6 cudapm1 -cufftbench 5 5 6 cudapm1 -cufftbench 14 14 6 cudapm1 -cufftbench 32 32 6 cudapm1 -cufftbench 36 36 6 cudapm1 -cufftbench 40 40 6 cudapm1 -cufftbench64 64 6 cudapm1 -cufftbench 80 80 6 cudapm1 -cufftbench 96 96 6 cudapm1 -cufftbench 98 98 6 cudapm1 -cufftbench 128 128 6 cudapm1 -cufftbench 144 144 6 cudapm1 -cufftbench 160 160 6 cudapm1 -cufftbench 162 162 6 cudapm1 -cufftbench 192 192 6 cudapm1 -cufftbench 224 224 6 cudapm1 -cufftbench 256 256 6 cudapm1 -cufftbench 288 288 6 cudapm1 -cufftbench 320 320 6 cudapm1 -cufftbench 324 324 6 cudapm1 -cufftbench 336 336 6 cudapm1 -cufftbench 384 384 6 cudapm1 -cufftbench 392 392 6 cudapm1 -cufftbench 400 400 6 cudapm1 -cufftbench 448 448 6 cudapm1 -cufftbench 512 512 6 cudapm1 -cufftbench 576 576 6 cudapm1 -cufftbench 640 640 6 cudapm1 -cufftbench 648 640 6 cudapm1 -cufftbench 672 672 6 cudapm1 -cufftbench 720 720 6 cudapm1 -cufftbench 768 768 6 cudapm1 -cufftbench 784 784 6 cudapm1 -cufftbench 800 800 6 cudapm1 -cufftbench 864 864 6 cudapm1 -cufftbench 896 896 6 cudapm1 -cufftbench 1024 1024 6 cudapm1 -cufftbench 1152 1152 6 cudapm1 -cufftbench 1176 1176 6 cudapm1 -cufftbench 1280 1280 6 cudapm1 -cufftbench 1296 1296 6 cudapm1 -cufftbench 1344 1344 6 cudapm1 -cufftbench 1440 1440 6 cudapm1 -cufftbench 1512 1512 6 cudapm1 -cufftbench 1536 1536 6 cudapm1 -cufftbench 1568 1568 6 cudapm1 -cufftbench 1600 1600 6 cudapm1 -cufftbench 1728 1728 6 cudapm1 -cufftbench 1792 1792 6 cudapm1 -cufftbench 2048 2048 6 cudapm1 -cufftbench 2240 2240 6 cudapm1 -cufftbench 2304 2304 6 cudapm1 -cufftbench 2352 2352 6 cudapm1 -cufftbench 2592 2592 6 cudapm1 -cufftbench 2688 2688 6 cudapm1 -cufftbench 2880 2880 6 cudapm1 -cufftbench 3024 3024 6 cudapm1 -cufftbench 3136 3136 6 cudapm1 -cufftbench 3150 3150 6 cudapm1 -cufftbench 3200 3200 6 cudapm1 -cufftbench 3360 3360 6 cudapm1 -cufftbench 3456 3456 6 cudapm1 -cufftbench 3584 3584 6 cudapm1 -cufftbench 3600 3600 6 cudapm1 -cufftbench 4096 4096 6 cudapm1 -cufftbench 4320 4320 6 cudapm1 -cufftbench 4608 4608 6 cudapm1 -cufftbench 4704 4704 6 cudapm1 -cufftbench 5040 5040 6 cudapm1 -cufftbench 5184 5184 6 cudapm1 -cufftbench 5292 5292 6 cudapm1 -cufftbench 5400 5400 6 cudapm1 -cufftbench 5600 5600 6 cudapm1 -cufftbench 5670 5670 6 cudapm1 -cufftbench 5760 5760 6 cudapm1 -cufftbench 6144 6144 6 cudapm1 -cufftbench 6272 6272 6 cudapm1 -cufftbench 6480 6480 6 cudapm1 -cufftbench 6720 6720 6 cudapm1 -cufftbench 6912 6912 6 cudapm1 -cufftbench 7056 7056 6 cudapm1 -cufftbench 7168 7168 6 cudapm1 -cufftbench 7200 7200 6 cudapm1 -cufftbench 7776 7776 6 cudapm1 -cufftbench 8064 8064 6 cudapm1 -cufftbench 8192 8192 6 [/CODE](hehehe) edit: reserving more 69M exponents... :smile: I first was stupid, trying to bring back those from prime95, until some light bulb turned on in my head... |
I'm glad it was that simple. Thanks again for your input. I'll get a fix up as soon as no home internet and windows adjusting to new hardware allow.
|
The code with the fix of what turned out to be a threads issue on one of the kernels is now up at sourceforge, as is a cuda5.5 linked windows executable.
[URL="https://sourceforge.net/projects/cudapm1/?source=navbarhttp://"]https://sourceforge.net/projects/cudapm1[/URL] |
CudaPm1 v0.20
I just downloaded and recompiled CudaPm1 v0.20 svn 51.
Before testing an exponent, I ran [code] ./CudaPm1 -cufftbench 1 8192 1 [/code] that gave me a file named "[FONT="Courier New"]GeForce GTX 580 fft.txt[/FONT]" I then ran the program on M67,231.XXX with the default INI file. everything looked fine. I also noticed a message from the output: [code] No GeForce GTX 580 threads.txt file found. Using default thread sizes. For optimal thread selection, please run ./CUDAPm1 -cufftbench 4096 4096 r for some small r, 0 < r < 6 e.g. Using threads: norm1 256, mult 128, norm2 128. [/code] I ran the program with r = 4 and got the folowing message: [code] CUDA bench, testing various thread sizes for fft 4096K, doing 4 passes. fft size = 4096K, ave time = 6.0904 msec, Norm1 threads 64, Mult threads 32, Norm2 threads 32 fft size = 4096K, ave time = 6.0979 msec, Norm1 threads 64, Mult threads 32, Norm2 threads 64 fft size = 4096K, ave time = 6.1026 msec, Norm1 threads 64, Mult threads 32, Norm2 threads 128 fft size = 4096K, ave time = 6.1052 msec, Norm1 threads 64, Mult threads 32, Norm2 threads 256 fft size = 4096K, ave time = 6.1038 msec, Norm1 threads 64, Mult threads 32, Norm2 threads 512 fft size = 4096K, ave time = 6.1022 msec, Norm1 threads 64, Mult threads 32, Norm2 threads 1024 fft size = 4096K, ave time = 6.0837 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 32 fft size = 4096K, ave time = 6.0929 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 64 fft size = 4096K, ave time = 6.0960 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 128 fft size = 4096K, ave time = 6.1007 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 256 fft size = 4096K, ave time = 6.0981 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 512 fft size = 4096K, ave time = 6.0974 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 1024 fft size = 4096K, ave time = 6.1389 msec, Norm1 threads 64, Mult threads 128, Norm2 threads 32 fft size = 4096K, ave time = 6.1475 msec, Norm1 threads 64, Mult threads 128, Norm2 threads 64 fft size = 4096K, ave time = 6.1538 msec, Norm1 threads 64, Mult threads 128, Norm2 threads 128 fft size = 4096K, ave time = 6.1559 msec, Norm1 threads 64, Mult threads 128, Norm2 threads 256 fft size = 4096K, ave time = 6.1564 msec, Norm1 threads 64, Mult threads 128, Norm2 threads 512 fft size = 4096K, ave time = 6.1548 msec, Norm1 threads 64, Mult threads 128, Norm2 threads 1024 fft size = 4096K, ave time = 6.1958 msec, Norm1 threads 64, Mult threads 256, Norm2 threads 32 fft size = 4096K, ave time = 6.2042 msec, Norm1 threads 64, Mult threads 256, Norm2 threads 64 fft size = 4096K, ave time = 6.2083 msec, Norm1 threads 64, Mult threads 256, Norm2 threads 128 fft size = 4096K, ave time = 6.2126 msec, Norm1 threads 64, Mult threads 256, Norm2 threads 256 fft size = 4096K, ave time = 6.2120 msec, Norm1 threads 64, Mult threads 256, Norm2 threads 512 fft size = 4096K, ave time = 6.2096 msec, Norm1 threads 64, Mult threads 256, Norm2 threads 1024 fft size = 4096K, ave time = 6.2603 msec, Norm1 threads 64, Mult threads 512, Norm2 threads 32 fft size = 4096K, ave time = 6.2692 msec, Norm1 threads 64, Mult threads 512, Norm2 threads 64 fft size = 4096K, ave time = 6.2705 msec, Norm1 threads 64, Mult threads 512, Norm2 threads 128 fft size = 4096K, ave time = 6.2746 msec, Norm1 threads 64, Mult threads 512, Norm2 threads 256 fft size = 4096K, ave time = 6.2736 msec, Norm1 threads 64, Mult threads 512, Norm2 threads 512 fft size = 4096K, ave time = 6.2739 msec, Norm1 threads 64, Mult threads 512, Norm2 threads 1024 CUDAPm1.cu(2163) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED [/code] It seems that the best timing for my system was [code] fft size = 4096K, ave time = 6.0837 msec, Norm1 threads 64, Mult threads 64, Norm2 threads 32 [/code] I also tried to modify the "[FONT="Courier New"]Threads=[/FONT]" parameter in the INI file: the best results were obtained with [COLOR="Red"]128[/COLOR]. As the program still failed to recognize the file created by the [FONT="Courier New"]-cufftbench[/FONT] option, I renamed it as asked by the executable, and got this message: [code] Using threads: norm1 75846319, mult 6, norm2 0. over specifications Grid = 174762 try increasing mult threads (6) or decreasing FFT length (4096K) [/code] I rollbacked the last change. My actual configuration is the one shown in the second CODE box, with Threads=256 in the INI file, getting 6.3266 ms/iter. What am I missing? :help: Luigi |
[QUOTE=ET_;359999]As the program still failed to recognize the file created by the [FONT="Courier New"]-cufftbench[/FONT] option, I renamed it as asked by the executable[/QUOTE]Note that the "<gpu> fft.txt" and "<gpu> threads.txt" files are distinct from each other.
<gpu> fft.txt should look something like[code]Device GeForce GTX 670 Compatibility 3.0 clockRate (MHz) 980 memClockRate (MHz) 3004 fft max exp ms/iter 4 85933 0.0697 16 333803 0.1153 32 657719 0.1306 36 738083 0.1618 48 978041 0.1635 ... skip a whole bunch of fft lines ... 28800 511382147 76.5273 32768 580225813 79.6749[/code]Whereas "<gpu> threads.txt" should be quite short (and more cryptic), mine looks like:[code]17496 256 64 512 45.9160 3456 256 128 32 8.0790[/code] I suspect it didn't make a "<gpu> threads.txt" file for you because it appears to have failed partway through the process:[quote]CUDAPm1.cu(2163) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/quote] |
[QUOTE=James Heinrich;360002]Note that the "<gpu> fft.txt" and "<gpu> threads.txt" files are distinct from each other.
<gpu> fft.txt should look something like[code]Device GeForce GTX 670 Compatibility 3.0 clockRate (MHz) 980 memClockRate (MHz) 3004 fft max exp ms/iter 4 85933 0.0697 16 333803 0.1153 32 657719 0.1306 36 738083 0.1618 48 978041 0.1635 ... skip a whole bunch of fft lines ... 28800 511382147 76.5273 32768 580225813 79.6749[/code]Whereas "<gpu> threads.txt" should be quite short (and more cryptic), mine looks like:[code]17496 256 64 512 45.9160 3456 256 128 32 8.0790[/code] I suspect it didn't make a "<gpu> threads.txt" file for you because it appears to have failed partway through the process:[/QUOTE] Thanks James. :bow: From what you said, I assume that there should be 2 distinct files: the first created by cufftbench 1.8192 1, the second by -cufftbench 4096 4096 4 I'll try to modify the [COLOR="Red"]r[/COLOR] parameter of the second bench run and see if it suffices. Luigi |
[QUOTE=ET_;360003]Thanks James. :bow:
From what you said, I assume that there should be 2 distinct files: the first created by cufftbench 1.8192 1, the second by -cufftbench 4096 4096 4 I'll try to modify the [COLOR="Red"]r[/COLOR] parameter of the second bench run and see if it suffices. Luigi[/QUOTE] Sadly, I always get "[FONT="Courier New"][COLOR="Red"]CUDAPm1.cu(2163) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/COLOR][/FONT]" with r between 1 and 5 and Threads=128 or 256. Hints? Luigi |
Does it always fail at the same place in the test?
Also try putting [CODE]cutilSafeThreadSync();[/CODE] after the cufft call on line 2161 and after the square call on 2162. That will at least tell us what is failing. |
[QUOTE=owftheevil;360013]Does it always fail at the same place in the test?
Also try putting [CODE]cutilSafeThreadSync();[/CODE] after the cufft call on line 2161 and after the square call on 2162. That will at least tell us what is failing.[/QUOTE] Yes, it always fails at the same place. Added the line in the 2 places you asked. A new result: [code] CUDAPm1.cu(2165) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED [/code] Added a new sync after line 2165: same error. Luigi |
Sorry, I jumped too quickly on the safecall stuff. More is needed. Let me think a bit.
|
[QUOTE=owftheevil;360017]Sorry, I jumped too quickly on the safecall stuff. More is needed. Let me think a bit.[/QUOTE]
No hurry. I'm actually playing with Threads=128 and the program is working: I just tried to squeeze some more juice from it. I'll be quietly waiting for your thoughts, thank you. Luigi :smile: |
Could you try this little snippet after the square call on 2162?
[CODE]cudaThreadSynchronize(); { cudaError_t error = cudaGetLastError(); if(error != cudaSuccess) { printf("CUDA error: %s\n", cudaGetErrorString(error)); exit(2); } }[/CODE] |
[QUOTE=owftheevil;360021]Could you try this little snippet after the square call on 2162?
[CODE]cudaThreadSynchronize(); { cudaError_t error = cudaGetLastError(); if(error != cudaSuccess) { printf("CUDA error: %s\n", cudaGetErrorString(error)); exit(2); } }[/CODE][/QUOTE] The error is: [code] CUDA error: too many resources requested for launch [/code] while the environment is: [code] ------- DEVICE 0 ------- name GeForce GTX 580 Compatibility 2.0 clockRate (MHz) 1594 memClockRate (MHz) 2025 totalGlobalMem 1610285056 totalConstMem 65536 l2CacheSize 786432 sharedMemPerBlock 49152 regsPerBlock 32768 warpSize 32 memPitch 2147483647 maxThreadsPerBlock 1024 maxThreadsPerMP 1536 multiProcessorCount 16 maxThreadsDim[3] 1024,1024,64 maxGridSize[3] 65535,65535,65535 textureAlignment 512 deviceOverlap 1 [/code] HTH... :smile: thanks. Luigi |
Sorry for the delay... I was dining.
|
Thanks for getting back with that. The only thing I can think of right now is that somehow, either t2 or the threads array have messed up values. I'll look at it over the weekend and get back on Monday.
|
[QUOTE=owftheevil;360032]Thanks for getting back with that. The only thing I can think of right now is that somehow, either t2 or the threads array have messed up values. I'll look at it over the weekend and get back on Monday.[/QUOTE]
Thanks :bow: I add that I am using Linux_64, driver 304.88. CUDA version info: [code] CUDA version info binary compiled for CUDA 4.10 CUDA runtime version 4.10 CUDA driver version 5.0 [/code] Luigi |
[QUOTE=ET_;360005]Sadly, I always get "[FONT="Courier New"][COLOR="Red"]CUDAPm1.cu(2163) : cufftSafeCall() CUFFT error 6: CUFFT_EXEC_FAILED[/COLOR][/FONT]" with r between 1 and 5 and Threads=128 or 256.[/QUOTE]I just tried running the FFT benchmark (CudaPm1 -cufftbench 1 8192 1) on my new GTX 580, and I also got failure:[code]...
fft size = 3645K, ave time = 6.4376 msec, max-ave = 0.00000 fft size = 3675K, ave time = 6.9818 msec, max-ave = 0.00000 fft size = 3750K, ave time = 6.7061 msec, max-ave = 0.00000 C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(2279) : cudaSafeCall() Runtime API error 30: unknown error.[/code]Screen went black for a second as the NVIDIA drivers recovered from the crash. Win7, drivers v331.82, GTX 580 3GB Line number is slightly different, but this is the 24-Sep-2013 Windows binary if that helps. [i]edit: but a second attempt at running the same command, with no changes, resulted in success.[/i] :cmd: |
Owftheevil has said that some errors like this one are caused by a problem in the nVidia drivers starting with the 3xx series. While I have 64 bit drivers going back to 285.62, I have assumed that it is not worth trying to install anything that old as they are probably not compatible with current CUDA libraries.
|
[QUOTE=James Heinrich;360071]I just tried running the FFT benchmark (CudaPm1 -cufftbench 1 8192 1) on my new GTX 580, and I also got failure:[code]...
fft size = 3645K, ave time = 6.4376 msec, max-ave = 0.00000 fft size = 3675K, ave time = 6.9818 msec, max-ave = 0.00000 fft size = 3750K, ave time = 6.7061 msec, max-ave = 0.00000 C:/Users/filbert/Documents/Visual Studio 2010/Projects/CUDAPm1/CUDAPm1.cu(2279) : cudaSafeCall() Runtime API error 30: unknown error.[/code]Screen went black for a second as the NVIDIA drivers recovered from the crash. Win7, drivers v331.82, GTX 580 3GB Line number is slightly different, but this is the 24-Sep-2013 Windows binary if that helps. [i]edit: but a second attempt at running the same command, with no changes, resulted in success.[/i] :cmd:[/QUOTE] I got the error while running [COLOR="Red"]Cudapm1 -cufftbench 4096 4096 3[/COLOR] and reaching [COLOR="Red"]Mult threads 1024[/COLOR]. My run of [COLOR="SeaGreen"]Cudapm1 -cufftbench 1 8192 1[/COLOR] ran smoothly :smile: Luigi |
Revision 52, up at sourceforge now has a partial fix. I haven't tested this, power was off due to a snowstorm this weekend. It might not even compile. But barring any stupid mistakes, it should allow you to run that benchmark. Looks like 4.1 is not as good at optimizing register use as 5.5. It will still fail in stage 2 if you try to test with mult threads = 1024, but I will wait until I can test it too make all the other necessary changes.
|
[QUOTE=owftheevil;360263]Revision 52, up at sourceforge now has a partial fix. I haven't tested this, power was off due to a snowstorm this weekend. It might not even compile. But barring any stupid mistakes, it should allow you to run that benchmark. Looks like 4.1 is not as good at optimizing register use as 5.5. It will still fail in stage 2 if you try to test with mult threads = 1024, but I will wait until I can test it too make all the other necessary changes.[/QUOTE]
Same error. :no: I'm planning on updating to 5.5 (although I'd rather wait for 6.0...) Luigi |
[QUOTE=ET_;360278]Same error. :no:
I'm planning on updating to 5.5 (although I'd rather wait for 6.0...) Luigi[/QUOTE] It's going to be a while before 6 comes out. |
[QUOTE=flashjh;360282]It's going to be a while before 6 comes out.[/QUOTE]
Thanks for the hint. Now I have no reason to wait furhter... Luigi |
speed test on a GTX 750 ti
[code] Device GeForce GTX 750 Ti Compatibility 5.0 clockRate (MHz) 1110 memClockRate (MHz) 2700 fft max exp ms/iter 4 85933 0.0725 8 169409 0.1211 16 333803 0.1354 18 374587 0.1556 20 415253 0.1585 25 516589 0.1904 28 577177 0.1980 32 657719 0.1988 36 738083 0.2152 40 818239 0.2225 48 978041 0.2775 50 1017889 0.2837 56 1137271 0.2938 64 1296011 0.3271 72 1454273 0.3766 80 1612249 0.4374 81 1631969 0.4650 84 1691093 0.4911 90 1809193 0.5352 96 1927129 0.5400 100 2005673 0.5505 112 2240863 0.5863 128 2553659 0.6611 135 2690201 0.7673 144 2865601 0.7755 160 3176779 0.8483 168 3332107 0.9370 180 3564823 1.0058 200 3951977 1.0640 216 4261051 1.1387 224 4415431 1.1826 225 4434721 1.2089 256 5031737 1.2806 288 5646379 1.4322 320 6259537 1.7545 324 6336103 1.7849 360 7024163 1.9349 392 7634537 2.0081 400 7786967 2.1736 432 8395997 2.2762 448 8700169 2.3520 450 8738161 2.4762 512 9914521 2.4840 576 11125619 3.0138 588 11352347 3.4336 640 12333809 3.4786 648 12484649 3.4864 720 13840423 3.8105 729 14009689 3.9701 800 15343429 4.1448 864 16543493 4.5113 896 17142793 4.7358 900 17217653 5.1151 1024 19535569 5.1737 1080 20580341 5.9243 1152 21921901 6.0684 1280 24302527 6.8055 1296 24599717 7.0482 1344 25490893 7.6906 1350 25602229 7.7762 1440 27271147 7.9282 1512 28604657 8.2026 1568 29640913 8.3392 1600 30232693 8.4548 1728 32597297 9.1322 1792 33778141 9.5213 1800 33925711 10.2388 2048 38492887 10.4488 2304 43194913 11.8395 2560 47885689 13.6211 2592 48471289 14.2237 2688 50227213 15.4735 2880 53735041 16.1879 2916 54392209 16.8134 3072 57237889 17.1524 3136 58404433 17.1638 3200 59570449 18.1494 3240 60298969 18.7758 3584 66556463 19.2476 4096 75846319 21.2812 4608 85111207 25.4287 4800 88579669 28.4094 5120 94353877 28.4263 5184 95507747 29.3381 5376 98967641 32.0367 5600 103000823 32.5258 5760 105879517 33.4296 5832 107174381 33.9940 6048 111056879 35.1400 6144 112781477 35.5020 6272 115080019 35.8715 6400 117377567 38.3167 6912 126558077 38.6704 7168 131142761 40.1696 7200 131715607 41.7772 8192 149447533 44.0675 [/code] strange thing is that the mem speed is half of what ist is supposed to be. |
[QUOTE=firejuggler;371177]strange thing is that the mem speed is half of what ist is supposed to be.[/QUOTE]Not uncommon to see that. There's a subtle distinction between the clock frequency of the memory and the rate of data transfers. In the good old days there was one transaction per clock cycle. Then they invented [url=http://en.wikipedia.org/wiki/Double_data_rate]DDR = Double Data Rate[/url] where data is transferred twice per clock cycle. Often utilities will report the memory clock frequency (for modern GDDR5 video cards that's usually in the 2.5-3.0GHz range) whereas marketing materials will report the number of transactions per second (double the clock rate), usually mislabeled with "GHz" (billion cycles per second) rather than GT/s (billion transactions per second).
|
Is there a way to specify the B1 and B2 values manually?
Currently I get a manual assignment, put it in the worktodo txt file and run the program (using 0.20). The software decides on the B1 and B2 values. Adolf |
One way to increase the limits for all assignments, but still let the program calculate them optimally for each exponent, is to specify a higher number of "LL tests saved", like substituting the default "1" or "2" at the end of the line with "3"... "9" (it can be higher, but it is not effective, and generally higher values are waste of time).
|
Yes, for example
[CODE]./CUDAPm1 -b1 value -b2 value exponent[/CODE] |
When submitting results from cuda p-1, I'm getting this:
Found 1 lines to process. processing: P-1 no-factor for M68xxxxxx (B1=645,000, B2=15,802,500, E=4) Error: Missing checksum. Correct the problem or email results to [email]woltman@alum.mit.edu[/email]. Is there anything we can do to fix this? -- Craig |
[QUOTE=nucleon;382377]Error: Missing checksum.
Is there anything we can do to fix this?[/QUOTE]I have fixed the manual results form to accept CUDAPm1 results without checksum, as it was supposed to be. I guess there's potentially also the possibility that George could share his checksum-generating code with the CUDAPm1 authors (either explicitly or wrapped inside a closed-source DLL or similar) in which case CUDAPm1 could generate the correct checksums on its own, but that's a whole other area of discussion. |
Thanks.
I have my titan working on P-1, I'd hate to stop it. :) |
Still no good.
Submitting: M67xxxxxx found no factor (P-1, B1=635000, B2=14446250, e=2, n=4096K CUDAPm1 v0.20) And I get: Found 1 lines to process. processing: P-1 no-factor for M67xxxxxx (B1=635,000, B2=14,446,250, E=2) Error: Missing checksum. Correct the problem or email results to [email]woltman@alum.mit.edu[/email]. -- Craig |
[QUOTE=nucleon;382390]Still no good.[/QUOTE]Sorry, I put the checksum code in the wrong place. Please try again?
|
Its working for me now. Thanks.
|
It is not working for me. When a factor is found I get the answer:
[COLOR=darkgreen]processing: P-1 factor 888024044817831733817 for [URL="http://www.mersenne.org/report_exponent/?exp_lo=2297327&full=1"][COLOR=#000080]M2297327[/COLOR][/URL] (B1=615,000, B2=12,000,000, E=12)[/COLOR] [COLOR=red]Insufficient information for accurate CPU credit. For stats purposes, assuming factor was found using ECM with B1 = 50000. CPU credit is 0.0908 GHz-days. [/COLOR] [COLOR=red] [/COLOR] [COLOR=red] [/COLOR] [COLOR=red] [/COLOR] |
[QUOTE=ugonabuj;382395]It is not working for me. When a factor is found I get the answer:
[COLOR=darkgreen]processing: P-1 factor 888024044817831733817 for [URL="http://www.mersenne.org/report_exponent/?exp_lo=2297327&full=1"][COLOR=#000080]M2297327[/COLOR][/URL] (B1=615,000, B2=12,000,000, E=12)[/COLOR] [COLOR=red]Insufficient information for accurate CPU credit. For stats purposes, assuming factor was found using ECM with B1 = 50000. CPU credit is 0.0908 GHz-days.[/COLOR][/QUOTE]That's very interesting. That shouldn't happen. Investigating... edit: and found the problem. Turns out I was still calling the method-unknown factor recording function rather than the found-by-PM1 function. Sorry about that. |
When I send it to your site it is working perfectly. In fact it has never
worked for CUDAPm1 from the start in May 2013. Userid:Hbendtz |
It's a long-standing problem with mersenne.org's manual results parser which was just upgraded within the past 48 hours. Apparently I'm still finding a few bugs.
The old code ignored anything on the result line other than the exponent+factor, then made some broad assumptions about how factors were found. In this example, the rule that applied was "exponent < 16M, therefore must be ECM". But, the bug should be fixed now, please let me know if you see that message again, because you shouldn't. :smile: [i]edit: There is a probable plan to update the previously-recorded factor results that were falsely recorded as ECM or P-1 when in fact they were P-1 or TF, but that won't happen for a few days at the earliest.[/i] |
I'm happy.
Everything working aok now :) |
The primenet server is still kicking my head (repetedly).
Now when I report a factor for a P-1 assigned exponent the bastard answer: processing: P-1 factor 6272775095469249097847 for M3406181 (B1=1,000,000, B2=20,000,000, E=12) Result type inappropriate for the assignment type. Processing result but not deleting assignment. CPU credit is 0.3099 GHz-days. If I don't have the aid for that exponent anymore I can't unreserve it, it don't shows in my account assignment but it is counted. HBendtz |
This has happened to me 5 times in the past. And it happens when I found a factor using cudapm1.
In my assignment list, there is no P-1 listed. But on my summary page I have 5 P-1 listed under workload. As I have factored those 5 exponents, I'm not to worried. Even though the avg days is almost 100 now. |
PrimeNet tries hard to give credit to the right user. Checking the [url=http://www.mersenne.org/report_exponent/default.php?exp_lo=3406181&full=1]exponent history for M3406181[/url] you can see that the P-1 factor was indeed credited to "HBendtz".
I just had another look at the code, and the "result type inappropriate" message should not have been shown in your case (it was being incorrectly shown for P-1 and ECM factors). I have fixed that now. If the exponent was assigned to you, it would show up under [url=http://www.mersenne.org/workload/]My Account > Assignments[/url] when logged in and you could unreserve or check on the status of the assignment. If it's not there, then it's not assigned to you. |
[QUOTE=houding;382493]In my assignment list, there is no P-1 listed.
But on my summary page I have 5 P-1 listed under workload.[/QUOTE]Could you please tell me these exponents so I can investigate what's happening, please? |
[QUOTE=James Heinrich;382495]Could you please tell me these exponents so I can investigate what's happening, please?[/QUOTE]
I had to go digging through my results page. Luckily all of these were manual testing. 66402887 71020303 69804067 69454157 69453871 As you will see in exponent status, when I submitted the factor, on the same day/moment is says expired as well. Just in case things look fishy - my forum name is houding, my primenet name is AdolfNor. |
At quick glance things seems as they should be, but perhaps I'm not looking in the right section.
Can you please PM or [url=mailto:james@mersenne.ca]email me[/url] a screenshot of where you see these exponents showing up on your list? |
No James it is not shown in [URL="http://www.mersenne.org/workload/"][COLOR=#000080]My Account > Assignments[/COLOR][/URL] but it is counted for
in the number of total assignment for me. When I look in "Manual Testing > Extensions" it is shown. But that does not help there I only can extend the time. HBendtz |
[QUOTE=ugonabuj;382498]No James it is not shown in [URL="http://www.mersenne.org/workload/"][COLOR=#000080]My Account > Assignments[/COLOR][/URL] but it is counted for
in the number of total assignment for me. When I look in "Manual Testing > Extensions" it is shown. But that does not help there I only can extend the time. HBendtz[/QUOTE] If I may please just spend a little bit of hard earned experience to speak some reasonable advice... Some mean to be serious. But some others just want to seriously take the piss to cause a problem for those who actually think. Welcome to realitity.... |
[QUOTE=James Heinrich;382497]At quick glance things seems as they should be, but perhaps I'm not looking in the right section.
Can you please PM or [EMAIL="james@mersenne.ca"]email me[/EMAIL] a screenshot of where you see these exponents showing up on your list?[/QUOTE] James: It looks like $done is properly set to TRUE. I'm guessing the global $t_assigned is not set properly for manual_post_processing to delete the assignment row. |
[QUOTE=chalsall;382506]Welcome to realitity....[/QUOTE]I'm not sure I quite follow what your above statement was supposed to mean.
But [i]houding[/i] and [i]ugonabuj[/i] have indeed pointed out an actual inconsistency in what's displayed in the assignments page. I've got George looking into it. It may just be a display issue or it may involve something deeper in relation to submitting PM1-factor results after an assignment expires. |
[QUOTE=Prime95;382509]James: It looks like \$done is properly set to TRUE. I'm guessing the global \$t_assigned is not set properly for manual_post_processing to delete the assignment row.[/QUOTE]The 5 examples quoted by [I]houding[/I] were all submitted on the old manual_results form. It's possible that whatever bug was there has already been fixed with the new manual_results form. It is, of course, also possible that the bug still exists. If someone submits a new P-1 factor and finds that the assignment still shows up on your Extension or Workload lists, please let me know.
|
[QUOTE=James Heinrich;382512]The 5 examples quoted by [i]houding[/i] were all submitted on the old manual_results form.[/QUOTE]
I'll manually repair the database |
Thank you James for fixing the scripts for P-1 manual reporting.
It is really working now. HBendtz |
Thank you James, George.
I will do a few pm1's with cudapm1 and hopefully find a factor or 2. Will let you know what I find. |
Cudapm1 for linux
I have been on SourceForge, and found release 0.20 of CudaPm1 executable for Windows.
Is there a place where some Linux executables or sources can be found? Luigi |
What parameters fit your needs, ie, Cuda version, cc 2.0, 3.5, etc.? I don't have any posted but if you can't or don't want to build it yourself, I would be happy to post some.
|
[QUOTE=owftheevil;395800]What parameters fit your needs, ie, Cuda version, cc 2.0, 3.5, etc.? I don't have any posted but if you can't or don't want to build it yourself, I would be happy to post some.[/QUOTE]
I will as soon as my new environment is set up (a new 980 on its way), thanks :bow: |
[QUOTE=owftheevil;395800]What parameters fit your needs, ie, Cuda version, cc 2.0, 3.5, etc.? I don't have any posted but if you can't or don't want to build it yourself, I would be happy to post some.[/QUOTE]
Linux Ubuntu 14.04 LTS 64 bit [code] CUDA version info binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 6.50 CUDA device info name GeForce GTX 980 compute capability 5.2 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 16 CUDA cores per MP 128 CUDA cores - total 2048 clock rate (CUDA cores) 1342MHz memory clock rate: 3505MHz memory bus width: 256 bit [/code] Thanks :-) Luigi |
[QUOTE=ET_;395994]Linux Ubuntu 14.04 LTS 64 bit
[code] CUDA version info binary compiled for CUDA 6.50 CUDA runtime version 6.50 CUDA driver version 6.50 CUDA device info name GeForce GTX 980 compute capability 5.2 max threads per block 1024 max shared memory per MP 98304 byte number of multiprocessors 16 CUDA cores per MP 128 CUDA cores - total 2048 clock rate (CUDA cores) 1342MHz memory clock rate: 3505MHz memory bus width: 256 bit [/code] Thanks :-) Luigi[/QUOTE] I can try and build CUDAP-1 by myself, I just need the right source and makefile :smile: Luigi |
With subversion: [CODE] svn checkout svn://svn.code.sf.net/p/cudapm1/code/trunk cudapm1-code [/CODE] or http:
[URL="http://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/#"]http://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/[/URL] The readme is oblolete. Alter the make file as you did for cudalucas. To run with assignments in a worktodo.txt, no command line parameters are needed. |
[QUOTE=owftheevil;401413]With subversion: [CODE] svn checkout svn://svn.code.sf.net/p/cudapm1/code/trunk cudapm1-code [/CODE] or http:
[URL="http://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/#"]http://sourceforge.net/p/cudapm1/code/HEAD/tree/trunk/[/URL] The readme is oblolete. Alter the make file as you did for cudalucas. To run with assignments in a worktodo.txt, no command line parameters are needed.[/QUOTE] Thank you! :-) |
I noticed today that when my screen goes blank my iteration times triple. I am guessing that this is a known feature on windows. Is there an easy way around this?
|
| All times are UTC. The time now is 23:18. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.