![]() |
Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is.
So the answer to your question would be: "4 cores" |
[QUOTE=James Heinrich;266298]Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is."[/QUOTE]
mfaktc performance drops 50% on loading 4th core. If all cores suffer the same penalty then 4/4*0.5 is less then 3/4. But Prime95 performance does not seem to drop off as badly on addition of the 4th core... I'll need to do some tests. As it is 8800 GTS performs at 3/4 of GTX 465 :-( |
Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.
|
Yes. Even with 64 bit mfaktc.
|
[QUOTE=James Heinrich;266306]Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.[/QUOTE]
I do not know how to check but it was said that CUDALucas maxes out GPU. Maybe I'll just run CUDALucas on 465 (+4 Prime95 workers) and run mfaktc on my two 8800? |
1 Attachment(s)
[QUOTE=Christenson;265657]10200 = 27D8....you sure you have the right return-type declared for cudaStreamCreate?
If you are just trying to run mfaktc, I'd be inclined to ignore the "I can't build it" problem. What do you hope to do with the modification?[/QUOTE] I figured this one out - toolkit mismatch. I have also modified mfaktc so it no longer needs atomics and compiles for any cuda compute capability under CUDA 2.2/3.1/3.2. I haven't tried 4.0 but I do not see why it would have a problem with that. Anyway here's modified mfaktc: |
aspen: why didn't you change the version string? Seems that you did [B]alot[/B] more changes than just the removal of the atomics...
Oliver |
aspen: your changes seem to screw something up. :sad:
On my GTX 8800 (CUDA 4.0) it [B]sometimes[/B] fails the short selftest! Performance is half of the expected value (no async CPU/GPU computation?). [CODE] running a simple selftest... ERROR: selftest failed for M49635893! expected result: 000F300E 00B13196 00D84F67 reported result: 001DAC4B 001DAC50 001DAC55 reported result: 001DAC57 001DAC5D 001DAC5F reported result: 001DAC61 001DAC67 001DAC6B reported result: 001DAC70 001DAC73 001DAC7A reported result: 001DAC7E 001DAC84 001DAC8A reported result: 001DAC8C 001DAC8E 001DAC99 reported result: 001DAC9A 001DAC9E 001DACA0 reported result: 001DACA2 001DACA3 001DACA5 reported result: 001DACAF 001DACB0 001DACB5 reported result: 001DACB6 001DACBD 001DACBE Selftest statistics number of tests 31 successfull tests 30 wrong factor reported 1 selftest FAILED! [/CODE] [COLOR="Red"][SIZE="4"]I don't recommend to run aspens version until this if fixed![/SIZE][/COLOR] Oliver |
Hi Oliver:
I've been putting my time into parse.c ... gone through 1 re-write, need another to get it organized with a parse_line function that returns as a structure with both the data found and the original line. I really don't have time to check over apsen's changes right now, as work has gotten rather rough....I'm supposed to be doing something I never have done before, with few resources and little support. |
[QUOTE=TheJudger;266725]aspen: your changes seem to screw something up. :sad:
On my GTX 8800 (CUDA 4.0) it [B]sometimes[/B] fails the short selftest! Performance is half of the expected value (no async CPU/GPU computation?). [COLOR="Red"][SIZE="4"]I don't recommend to run aspens version until this if fixed![/SIZE][/COLOR] Oliver[/QUOTE] Sorry, It wasn't really meant for general consumption That's why I did not post the executable. I was hoping for your to take a look at it. We could transfer this to private conversation. For me all tests (including long one) come up fine. I did have problem in the interim so maybe I need to check if I posted the right version. Also I haven't tested with CUDA 4.0... The idea is simple give each thread it's own chunk of memory to write the results so there's no need to have shared variable. I did have to rearrange the code a little bit to make it possible but I tried to keep it so it's easy to do diff. It could use a little straightening otherwise. |
[QUOTE=James Heinrich;266306]Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.[/QUOTE]
Someone else says you are dead on, James. Here's why: right now mfaktc is sieving on the CPU...so you will either need a very hot CPU core or two cores to reach full potential on a hot GPU card.... |
1 Attachment(s)
[QUOTE=TheJudger;266725]aspen: your changes seem to screw something up. :sad:
[/QUOTE] I originally tried to follow the changes since version 0.8 as it was the last stock version that worked on my machine so it is more probable that I broke something unintentionally. I've now got clean mkfaktc-0.17 and reapplied my changes directly to it. I've made them almost minimal (with the exception of enabling it to compile under CUDA 2.2) so it should be easy to do diff. I do not see any test failures on it so please test it on your system to see if you still experience them. Also, the slowdown was due to synchronous memory copy in the main loop but on my machine it was not as noticeable. I was loosing less then 10% of performance so I was going to look into it later. I have now reworked it too and performance is back on par with stock build. Please check if I did it the right way. |
Hi aspen,
[QUOTE=apsen;266731]The idea is simple give each thread it's own chunk of memory to write the results so there's no need to have shared variable.[/QUOTE] Each thread or each stream? I think it is the latter case and than it won't work reliable. Old GPUs can't even run concurrent kernels so the behavior is the same as without own chunks of memory. [QUOTE=apsen;266751]I've now got clean mkfaktc-0.17 and reapplied my changes directly to it. I've made them almost minimal (with the exception of enabling it to compile under CUDA 2.2) so it should be easy to do diff. [/QUOTE] May I know the reason why you're still on CUDA 2.2 (there might be good ones). Oliver |
[QUOTE=TheJudger;266768]
Each thread or each stream? I think it is the latter case and than it won't work reliable. Old GPUs can't even run concurrent kernels so the behavior is the same as without own chunks of memory. [/QUOTE] I meant for each thread but I guess I indeed did it for each stream :-( I guess I would need to go back and fix it. [QUOTE=TheJudger;266768] May I know the reason why you're still on CUDA 2.2 (there might be good ones). [/QUOTE] The driver on the machine does not support 3.1 or higher and if I upgrade the performance of stock mfaktc-0.8 drops significantly. So while I'm changing mfaktc I need it to test on 2.2. When it's ready I'll compile it for newer CUDA and upgrade the driver. It was late yesterday so I didn't really do it this time. Is there a problem with 2.2? |
[QUOTE=apsen;266778]Is there a problem with 2.2?[/QUOTE]
I don't know any problems related to mfaktc (expect that is doesn't compile as it is now). As CUDA 2.2 is past I don't have any real plans for supporting it in mfaktc. If the needed changes are trivial and have not side effect I might try it anyway. Oliver |
1 Attachment(s)
[QUOTE=TheJudger;266790]As CUDA 2.2 is past I don't have any real plans for supporting it in mfaktc. If the needed changes are trivial and have not side effect I might try it anyway.
[/QUOTE] The only difference is the absence of __launch_bounds__. You could see I've just put conditional macro to do away with it if we compile under CUDA 2.2. I don't care about supporting 2.2 either I just need it until we could make the current version of mkfaktc work with sm_10. [QUOTE=apsen;266778]I guess I would need to go back and fix it. [/QUOTE] I've made the changes but I wouldn't be able test it until later today. But maybe you'd be willing to take a look at it before then to see if my understanding is right. (If you are on Germany time it will be past midnight for you before I get a chance to test.) |
[QUOTE=TheJudger;265318]
use of atomic instructions for access to the results array (this needs CC >=1.1) [/QUOTE] :geek: Just to make sure I understand it right: just blindly replacing atomics with unprotected access to d_RES might result in the problem only when we find more then one factor per class (tf_class_* call) and even then it will report that at least one factor has been found but the factor(s) itself may be scrambled by simultaneous attempt to store them in the result array. So if the program reports no factors found - it will be true. Is this correct? |
correct!
|
Is CPUStreams configuration parameter basically the length of sieve queue?
|
yes!
btw.: pleases change the version string in your modified code to something unique. e.g. "0.17-ap1" Oliver |
Hi Eric,
[QUOTE=Christenson;266730]Hi Oliver: I've been putting my time into parse.c ... gone through 1 re-write, need another to get it organized with a parse_line function that returns as a structure with both the data found and the original line.[/QUOTE] feel free to sent me your stuff (even if it is not finished). Oliver |
[QUOTE=TheJudger;266839]
btw.: pleases change the version string in your modified code to something unique. e.g. "0.17-ap1" Oliver[/QUOTE] Ok. But would you be willing to incorporate my changes in your code once it goes through testing? |
[QUOTE=apsen;266858]Ok. But would you be willing to incorporate my changes in your code once it goes through testing?[/QUOTE]
G80 support: yes, at least as "official patch" or "special G80 version", depending how/if the changes affect the code / performance. I assume that a 1% slowdown on non-G80 GPUs has a bigger impact than the ability to run on G80 GPUs. G80 isn't that hot today. Even entry level Fermis are faster. CUDA < 2.3 support: unlikely, how many people need this? Main reason: needs testing. Cleanups: very welcome. I know that the code is messy. :blush: It is grown and some parts have been changed / expanded many times. Performance improvements: [B]YES[/B] Oliver |
[QUOTE=TheJudger;266863]CUDA < 2.3 support: unlikely, how many people need this? Main reason: needs testing.[/QUOTE]
Do not worry about it - I'll remove it myself eventually :-) [QUOTE=TheJudger;266863] Cleanups: very welcome. I know that the code is messy. :blush: It is grown and some parts have been changed / expanded many times. [/QUOTE] Are you sure you know what you are asking for? :bounce: Actually the reason my fist code had so many changes is that I tried to make sense of the code so I was trying to make it more readable. And all those #ifdef inside the code were getting to me. Sorry, but with you prompting it I could not resist. :redface: [QUOTE=TheJudger;266863]Performance improvements: [B]YES[/B] [/QUOTE] I might be able to cook something here but this is my first encounter with GPU computing. :unsure: |
1 Attachment(s)
Ok. Here it goes. I think this one is ready for more extensive testing as it passes all the selftests multiple times on different machines.
I've included Win64 executable compiled with CUDA 4.0 for multiple compute capabilities. I'd be interested if anyone could run the selftests and any other tests they could think of. Also I'm interested to know if there any noticeable performance differences with stock mfaktc-0.17. |
[QUOTE=apsen;266804]:geek:
Just to make sure I understand it right: just blindly replacing atomics with unprotected access to d_RES might result in the problem only when we find more then one factor per class (tf_class_* call) and even then it will report that at least one factor has been found but the factor(s) itself may be scrambled by simultaneous attempt to store them in the result array. So if the program reports no factors found - it will be true. Is this correct?[/QUOTE] Actually, as far as I understood the code, the problem only appears if multiple factors are found within the same grid of a class. Therefore, reducing the grid size will reduce the risk even more (when was the last time anyone had 2 factors in the same class?). For ATI's HD4xxx GPUs I'm limiting the GridSize to 2, and issue a warning "GPU does not support atomics. There is a small chance mfakto can report wrong factors; if it does, then the exponent has multiple factors in the tested range." And yes, I know it is not the exponent itself having the factors ... |
Why not simply(?) use CPU based code to check the factors found and if one proves to be wrong use CPU based code to re-search the grid/class? The time used to do this should be negligible.
|
[QUOTE=apsen;266891]I've included Win64 executable compiled with CUDA 4.0 for multiple compute capabilities. I'd be interested if anyone could run the selftests and any other tests they could think of.[/QUOTE]I would, but you didn't bundle cudart64_40_17.dll -- can you post a copy of that please to make it easier for testers?
|
1 Attachment(s)
[QUOTE=James Heinrich;266911]I would, but you didn't bundle cudart64_40_17.dll -- can you post a copy of that please to make it easier for testers?[/QUOTE]
Sorry. |
1 Attachment(s)
[QUOTE=James Heinrich;266911]I would, but you didn't bundle cudart64_40_17.dll -- can you post a copy of that please to make it easier for testers?[/QUOTE]
Actually, could you use this version: |
[QUOTE=apsen;266891]I'd be interested if anyone could run the selftests and any other tests they could think of. Also I'm interested to know if there any noticeable performance differences with stock mfaktc-0.17.[/QUOTE]Selftest (for mfaktc17apsen.cuda40.sm_multi.win64) works fine here (Win7x64, i7-920, 8800GT):[code]Selftest statistics
number of tests 4914 successfull tests 4914 selftest PASSED![/code]Speed (in my very, very brief testing) seemed identical to stock. edit: you posted the new version after I'd run my tests |
[QUOTE=apsen;266866]Are you sure you know what you are asking for? :bounce:
Actually the reason my fist code had so many changes is that I tried to make sense of the code so I was trying to make it more readable. And all those #ifdef inside the code were getting to me. Sorry, but with you prompting it I could not resist. :redface: [/QUOTE] No problem. I saw your modifications with the #ifdefs, I'll try to remove/replace some of them in my official version (e.g. optional parameters for debuging, etc.) Oliver |
[QUOTE=James Heinrich;266936]Speed (in my very, very brief testing) seemed identical to stock.[/QUOTE]
No surprise here. The difference is mostly memory usage. The stock one uses 32 integers total to hold results. My version uses 32 integers per thread to avoid using of atomics. It could store up to 10 factors per thread but it is folded down to original amount on output. I did not modify output to minimize diffs with stock versions but it could be easily done later. [QUOTE=James Heinrich;266936]edit: you posted the new version after I'd run my tests[/QUOTE] I did not realize threadId is unique only within block originally so I had to fix that. mfaktc17apsen could have problems if two threads with the same id in different blocks find factors at the same time. mfaktc171apsen fixes that. BTW at what point the program could be deemed good enough to produce? What is the usual standard? |
[QUOTE=TheJudger;266940]I'll try to remove/replace some of them in my official version (e.g. optional parameters for debuging, etc.)
[/QUOTE] Oliver, should I continue posting code here or should I pm them to you. My current code has bare minimum of the changes against 0.17 right now so it should be easy to merge them in. I also have some other ideas and also could help with making it more readable. |
I ran a quick test looking for multiple factors, seemed to work as expected:[code]got assignment: exp=999903251 bit_min=1 bit_max=67
tf(999903251, 1, 67, ...); k_min = 0 k_max = 73794115801 Using GPU kernel "71bit_mul24" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 1/4620 | 2.98M | 0.114s | 26.16M/s | 200000 | n.a. | 371us Result[00]: M999903251 has a factor: 1999806503 1884/4620 | 2.98M | 0.115s | 25.93M/s | 166923 | n.a. | 395us Result[00]: M999903251 has a factor: 2347679090418412249 3153/4620 | 3.10M | 0.116s | 26.69M/s | 143668 | n.a. | 834us Result[00]: M999903251 has a factor: 135710897421958333087 4609/4620 | 2.98M | 0.111s | 26.86M/s | 161503 | n.a. | 631us found 3 factor(s) for M999903251 from 2^ 1 to 2^67 [mfaktc 0.17.1-Win !!!TEST!!! apsen 71bit_mul24] tf(): total time spent: 2m 48.483s[/code] |
[QUOTE=James Heinrich;266951]I ran a quick test looking for multiple factors, seemed to work as expected:[code]got assignment: exp=999903251 bit_min=1 bit_max=67
tf(999903251, 1, 67, ...); k_min = 0 k_max = 73794115801 Using GPU kernel "71bit_mul24" class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait 1/4620 | 2.98M | 0.114s | 26.16M/s | 200000 | n.a. | 371us Result[00]: M999903251 has a factor: 1999806503 1884/4620 | 2.98M | 0.115s | 25.93M/s | 166923 | n.a. | 395us Result[00]: M999903251 has a factor: 2347679090418412249 3153/4620 | 3.10M | 0.116s | 26.69M/s | 143668 | n.a. | 834us Result[00]: M999903251 has a factor: 135710897421958333087 4609/4620 | 2.98M | 0.111s | 26.86M/s | 161503 | n.a. | 631us found 3 factor(s) for M999903251 from 2^ 1 to 2^67 [mfaktc 0.17.1-Win !!!TEST!!! apsen 71bit_mul24] tf(): total time spent: 2m 48.483s[/code][/QUOTE] I wonder if you know the case where multiple factors will fall within the same class... |
[QUOTE=apsen;266953]I wonder if you know the case where multiple factors will fall within the same class...[/QUOTE]I haven't come across any multiple-factor exponents with even the same number of bits or digits in different factors. But, I should be able to dig around in the PrimeNet list of known factors and come up with a list of "close" factor pairs. I don't know how close the two factors have to be to be in the same class...?
|
[QUOTE=James Heinrich;266957] I don't know how close the two factors have to be to be in the same class...?[/QUOTE]
The single assignment is divided into 4620 classes. I'm not sure how the sieving works but I would guess that it assigns candidates sequentially... To be honest I doubt anyone would be able to trigger any bugs related to absence of atomics intentionally. For example difference between my last two versions (assuming candidates are fed sequentially) deals with odds of two factors in the same assignment (and if you allow mfaktc to split them that would mean at the same bit level) fall close enough to fall into 1/4620 of the whole range but further apart then 1/4620/block_count and the streams processing them run at the same time and then multiply this probability by 1/threads_per_block. Does anyone care to estimate a number to that. Of course there could be some other bug unrelated to the use of atomics that makes it fail it every single time but at least we could exclude the later part (every single time, that is) due to the fact that we have run successful selftests. Was this confusing enough? :-) |
Methinks that if you want to excite this problem, the easiest way is going to be to construct something....
|
[QUOTE=James Heinrich;266957]I should be able to dig around in the PrimeNet list of known factors and come up with a list of "close" factor pairs.[/QUOTE]I just tested this list of known multi-factor exponents, and they all found multiple factors real close together, but never in the same class, almost always 1 class apart (with a suspiciously large number of them found in classes 3 and 4?):[code]Factor=1008857,1,23
Factor=1040597,1,23 Factor=1048213,1,27 Factor=1053809,1,27 Factor=1153967,1,25 Factor=1198247,1,25 Factor=1305607,1,28 Factor=1308497,1,27 Factor=1357351,1,25 Factor=1380949,1,25 Factor=1398701,1,24 Factor=1443257,1,24 Factor=1485017,1,24 Factor=1490717,1,24 Factor=1516421,1,24 Factor=1518071,1,27 Factor=1519277,1,24 Factor=1527857,1,24 Factor=1564421,1,24 Factor=1566811,1,25 Factor=1588757,1,24 Factor=1604297,1,24 Factor=1613057,1,24 Factor=1655897,1,24 Factor=1677787,1,25 Factor=1678877,1,24 Factor=1684937,1,24 Factor=1715717,1,24 Factor=1719547,1,25 Factor=1735421,1,24[/code]I guess I'll dig up some more to try... |
[QUOTE=Christenson;266967]Methinks that if you want to excite this problem, the easiest way is going to be to construct something....[/QUOTE]
I'm not aware of any problem with mkfaktc171apsen code so I do not know what could be constructed. mkfaktc17apsen does have a problem. And if we recompile the code with custom number of classes and threads based on chosen exponent and run it with custom number of streams and sieve level we should be able to guaranty that specific candidates will get processed by conflicting threads. But then we would still have to deal with uncertainty of thread scheduling... It is probably easier to prove the problem mathematically then to actually trigger it... |
[QUOTE=James Heinrich;266969]I just tested this list of known multi-factor exponents, and they all found multiple factors real close together, but never in the same class, almost always 1 class apart (with a suspiciously large number of them found in classes 3 and 4?):
... I guess I'll dig up some more to try...[/QUOTE] Thanks, I appreciate the effort. If you'd like I could recompile the program without MORE_CLASSES option. This would reduce the number of classes to 420. You could also play with sieve level to force specific candidates to go into different threads. |
[QUOTE=apsen;266971]If you'd like I could recompile the program without MORE_CLASSES option. This would reduce the number of classes to 420.[/QUOTE]Sounds like a plan. And then you should be able to quickly test it yourself, since I've given you a list of likely assignments that could probably find 2 factors in the same class.
|
[QUOTE=apsen;266970]I'm not aware of any problem with mkfaktc171apsen code so I do not know what could be constructed.
mkfaktc17apsen does have a problem. And if we recompile the code with custom number of classes and threads based on chosen exponent and run it with custom number of streams and sieve level we should be able to guaranty that specific candidates will get processed by conflicting threads. But then we would still have to deal with uncertainty of thread scheduling... It is probably easier to prove the problem mathematically then to actually trigger it...[/QUOTE] It often is easier to prove a problem exists than reliably trigger it...but the construction I would use would work like this: Notice that we are a) sieving for candidates of a certain form, and b) running them through a large, relatively slow expression in parallel. Now, I'm willing to temporarily introduce a "bug" in the program to determine whether a problem exists. What if the exponent had several factors [e.g. look up the exponent "god" was using, or look through the operation Billion Digits web page] and, in addition, the CPU sieving process just "got lucky" and included them all in the same class, preferably one right after another? This is the statistically unlikely construction we are searching for. However, I'd also like to point out that in my practical mfaktc work, actually finding a factor is relatively rare, and finding two within a bit level is definitely rare, though not unknown: [url]http://www.mersenneforum.org/showpost.php?p=265202&postcount=179[/url] Therefore, detecting the problem of a collision somewhat after not quite reporting correct results isn't an unreasonable course of action. The user will be able to re-run a more careful, if slower, TF program over a very narrow range quickly and get the correct results. Something very similar happens when factors are reported to the server and the server verifies them with much less effort than that required to find them in the first place. |
[QUOTE=Christenson;266975]What if the exponent had several factors [e.g. look up the exponent "god" was using[/quote]Since that was me for a while, let me provide a short list (they're easy to find if you know how/where to look :)[code]M800000041 has a factor: 32000001641
M800000041 has a factor: 5357430980168323663 found 2 factor(s) for M800000041 from 2^35 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800000759 has a factor: 1600001519 M800000759 has a factor: 6400006073 M800000759 has a factor: 10240019438409224887 found 3 factor(s) for M800000759 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800002139 has a factor: 14400038503 M800002139 has a factor: 13677324569648791 found 2 factor(s) for M800002139 from 2^34 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800002649 has a factor: 24000079471 M800002649 has a factor: 30400100663 found 2 factor(s) for M800002649 from 2^35 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800002699 has a factor: 14400048583 M800002699 has a factor: 67425000050326478281 found 2 factor(s) for M800002699 from 2^34 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800003401 has a factor: 12800054417 M800003401 has a factor: 64733252796071023 M800003401 has a factor: 18730039702542776497 found 3 factor(s) for M800003401 from 2^34 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800003999 has a factor: 6400031993 M800003999 has a factor: 5133878758848163631 found 2 factor(s) for M800003999 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800004311 has a factor: 1600008623 M800004311 has a factor: 9256351487895290111 found 2 factor(s) for M800004311 from 2^31 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800004847 has a factor: 8000048471 M800004847 has a factor: 1380197213419202209 found 2 factor(s) for M800004847 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800005403 has a factor: 33600226927 M800005403 has a factor: 18992736271326281 found 2 factor(s) for M800005403 from 2^35 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800007251 has a factor: 1600014503 M800007251 has a factor: 85002820037327063 found 2 factor(s) for M800007251 from 2^31 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800008757 has a factor: 6400070057 M800008757 has a factor: 164187461134123649273 M800008757 has a factor: 1940039026319738167 M800008757 has a factor: 26502690101897 found 4 factor(s) for M800008757 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M800009557 has a factor: 56000668991 M800009557 has a factor: 573355047504912077761 M800009557 has a factor: 12495915504747291689 found 3 factor(s) for M800009557 from 2^36 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24] M862436279 has a factor: 1724872559 M862436279 has a factor: 509958343703150891009 found 2 factor(s) for M862436279 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862437671 has a factor: 1724875343 M862437671 has a factor: 986547654080931473 found 2 factor(s) for M862437671 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862439531 has a factor: 1724879063 M862439531 has a factor: 541488560349621103 found 2 factor(s) for M862439531 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862439663 has a factor: 1724879327 M862439663 has a factor: 8697849796780030151 found 2 factor(s) for M862439663 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862446323 has a factor: 1724892647 M862446323 has a factor: 1166882518344965239417 found 2 factor(s) for M862446323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862449803 has a factor: 1724899607 M862449803 has a factor: 79380523255673039 found 2 factor(s) for M862449803 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862452323 has a factor: 1724904647 M862452323 has a factor: 103494278761 M862452323 has a factor: 26768470919200553 M862452323 has a factor: 178517762372762302367 M862452323 has a factor: 50544926613807686257 found 5 factor(s) for M862452323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862458959 has a factor: 1724917919 M862458959 has a factor: 206990150161 M862458959 has a factor: 357041019069209634959 found 3 factor(s) for M862458959 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862460411 has a factor: 1724920823 M862460411 has a factor: 510435025768812467849 M862460411 has a factor: 23636726266737125671 found 3 factor(s) for M862460411 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862464059 has a factor: 1724928119 M862464059 has a factor: 10397394163320570727 found 2 factor(s) for M862464059 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862465931 has a factor: 1724931863 M862465931 has a factor: 99918198394020824183 found 2 factor(s) for M862465931 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862469711 has a factor: 1724939423 M862469711 has a factor: 33635373938280024473 found 2 factor(s) for M862469711 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862472651 has a factor: 1724945303 M862472651 has a factor: 19055217184234607 found 2 factor(s) for M862472651 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862479851 has a factor: 1724959703 M862479851 has a factor: 413990328481 M862479851 has a factor: 714116634061458201143 found 3 factor(s) for M862479851 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862483991 has a factor: 1724967983 M862483991 has a factor: 146587766718190017313 found 2 factor(s) for M862483991 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862484663 has a factor: 1724969327 M862484663 has a factor: 81555017960518513 found 2 factor(s) for M862484663 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M862486043 has a factor: 1724972087 M862486043 has a factor: 5708644866648421057 found 2 factor(s) for M862486043 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719908379 has a factor: 1439816759 M719908379 has a factor: 232789913230194889 found 2 factor(s) for M719908379 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719910071 has a factor: 1439820143 M719910071 has a factor: 2069092085162217007 found 2 factor(s) for M719910071 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719911859 has a factor: 1439823719 M719911859 has a factor: 4394341987337 found 2 factor(s) for M719911859 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719913983 has a factor: 1439827967 M719913983 has a factor: 40274894059742185439 found 2 factor(s) for M719913983 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719916143 has a factor: 1439832287 M719916143 has a factor: 173954498024165765407 found 2 factor(s) for M719916143 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719918123 has a factor: 1439836247 M719918123 has a factor: 14421511271742750433 found 2 factor(s) for M719918123 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719918183 has a factor: 1439836367 M719918183 has a factor: 113245578616121692073 found 2 factor(s) for M719918183 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719918723 has a factor: 1439837447 M719918723 has a factor: 48533437337878127 M719918723 has a factor: 17853201058829377 found 3 factor(s) for M719918723 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719920559 has a factor: 1439841119 M719920559 has a factor: 17887233584232404713 found 2 factor(s) for M719920559 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719923511 has a factor: 1439847023 M719923511 has a factor: 443649962529022924553 found 2 factor(s) for M719923511 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719923703 has a factor: 1439847407 M719923703 has a factor: 4730478123306174401 found 2 factor(s) for M719923703 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719928551 has a factor: 1439857103 M719928551 has a factor: 1170603823927 M719928551 has a factor: 684751402141039 found 3 factor(s) for M719928551 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719930999 has a factor: 1439861999 M719930999 has a factor: 235880898999359710631 found 2 factor(s) for M719930999 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719935823 has a factor: 1439871647 M719935823 has a factor: 21156990157931970833 found 2 factor(s) for M719935823 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719938343 has a factor: 1439876687 M719938343 has a factor: 715511442128893001 found 2 factor(s) for M719938343 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719940659 has a factor: 1439881319 M719940659 has a factor: 1276097697840137 found 2 factor(s) for M719940659 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719941571 has a factor: 1439883143 M719941571 has a factor: 1058879983444807 M719941571 has a factor: 1798414044359 M719941571 has a factor: 135867062928147456463 found 4 factor(s) for M719941571 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719941643 has a factor: 1439883287 M719941643 has a factor: 5910824560626593 found 2 factor(s) for M719941643 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24] M719943923 has a factor: 1439887847 M719943923 has a factor: 84048328604913973369 M719943923 has a factor: 25820857442168282359 found 3 factor(s) for M719943923 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code] [QUOTE=Christenson;266975]finding two within a bit level is definitely rare, though not unknown[/QUOTE]I haven't looked through other ranges, but there are 705 exponents between M1,000,000 and M2,000,000 that have 2 (or more) factors of the same bitsize. |
I found an interesting issue, possibly bug, in mfaktc:[code]
M862452323 has a factor: 1724904647 M862452323 has a factor: 103494278761 M862452323 has a factor: 26768470919200553 M862452323 has a factor: 178517762372762302367 M862452323 has a factor: 50544926613807686257 found 5 factor(s) for M862452323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]The problem? 1724904647 * 103494278761 = 178517762372762302367 So it really only found 4 factors, but [i]also[/i] reported a composite factor of the two smallest factors. How did this happen? edit: found another example:[code] M999916103 has a factor: 1999832207 M999916103 has a factor: 113990435743 M999916103 has a factor: 12648938702951 M999916103 has a factor: 227961744688815374801 found 4 factor(s) for M999916103 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code] 227961744688815374801 = 1999832207 * 113990435743 edit2: another example:[code] M800000759 has a factor: 1600001519 M800000759 has a factor: 6400006073 M800000759 has a factor: 10240019438409224887 found 3 factor(s) for M800000759 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24][/code] 10240019438409224887 = 1600001519 * 6400006073 |
If you're looking for Mersenne numbers with multiple factors in a given range, I have (at least) one.
M43363759 is divisible by 566645733218532866731538439737694884449681 (138.7 bits). 566645733218532866731538439737694884449681 = 692113095422672631017 (69.2 bits) * 818718410280162380393 (69.5 bits) ...If that's not what you're looking for, then pretend this post doesn't exist. :wink: It just so happens that I found the composite factor with P-1, but obviously a TF from 69 to 70 would've found 'em both. |
[QUOTE=KingKurly;266981]It just so happens that I found the composite factor with P-1, but obviously a TF from 69 to 70 would've found 'em both.[/QUOTE]I took a look through the top-10 P-1 factors on my site, which are all composite, and found two more that break into same-bit prime factors:[code]M24742121 : 555087078799770339769608793045705436436591887 = 20563047713721235292639 (74.1 bits) * 26994397257045406913233 (74.5 bits)
M2098193 : 60994061286916913266355388473561822515085761 = 6546629382726350366497 (72.4 bits) * 9316864866041320842913 (72.9 bits)[/code] |
[QUOTE=James Heinrich;266984]I took a look through the top-10 P-1 factors on my site, which are all composite, and found two more that break into same-bit prime factors:[/QUOTE]
Another one: [code]For M1997293: 33579362377488332973816211929918617758917702673 (154.5 bits) = 156319993046900879655881 (77.0 bits) * 214811693136484951130633 (77.5 bits) [/code] |
1 Attachment(s)
[QUOTE=James Heinrich;266974]Sounds like a plan. And then you should be able to quickly test it yourself, since I've given you a list of likely assignments that could probably find 2 factors in the same class.[/QUOTE]
Here's version without MORE_CLASSESS. So far I've tried only a couple of exponents on your list and haven't been lucky to have factors in the same class. |
[QUOTE=Christenson;266975]b) running them through a large, relatively slow expression in parallel. [/QUOTE]
That actually makes it harder to trigger. The atomics deal only with storing thread local data to shared memory. That's only four bytes for each factor, seven lines of c code: [CODE] index=RES[0]++; if(index<10) /* limit to 10 factors per thread */ { RES[index*3 + 1]=f.d2; RES[index*3 + 2]=f.d1; RES[index*3 + 3]=f.d0; } [/CODE] It's very negligible time comparing with the rest of computation. [QUOTE=Christenson;266975] Now, I'm willing to temporarily introduce a "bug" in the program to determine whether a problem exists. What if the exponent had several factors [e.g. look up the exponent "god" was using, or look through the operation Billion Digits web page] and, in addition, the CPU sieving process just "got lucky" and included them all in the same class, preferably one right after another? This is the statistically unlikely construction we are searching for. [/QUOTE] That's quite a lot of work for the bug I believe I've fixed. :smile: Besides I haven't looked into the sieving process so I do not know how to modify it. [QUOTE=Christenson;266975] However, I'd also like to point out that in my practical mfaktc work, actually finding a factor is relatively rare, and finding two within a bit level is definitely rare, though not unknown: [url]http://www.mersenneforum.org/showpost.php?p=265202&postcount=179[/url] Therefore, detecting the problem of a collision somewhat after not quite reporting correct results isn't an unreasonable course of action. The user will be able to re-run a more careful, if slower, TF program over a very narrow range quickly and get the correct results. Something very similar happens when factors are reported to the server and the server verifies them with much less effort than that required to find them in the first place.[/QUOTE] It looks like the approach Bdot is taking with mfakto. BTW I wonder if he's still reading this thread? Maybe he'll borrow my solution? :smile: |
I've run through the entire list again, and also played with upper bit range, NumStreams, CPUStreams, GridSize, and Stages; all to no effect -- I found plenty of factors in adjacent classes, but none on that list came both in the same class. Perhaps tomorrow I can generate another list of potentials, but maybe it would help if I knew more what I was looking for -- obviously simply having two factors of similar size isn't enough to put them in the same class with any regularity.
|
[QUOTE=James Heinrich;266989] but maybe it would help if I knew more what I was looking for -- obviously simply having two factors of similar size isn't enough to put them in the same class with any regularity.[/QUOTE]
To be in the same class they would have the same result when the mod 120 of the factors is calculates - this would give you 16 classes. It would be mod 840 if you are using 96 classes and mod 1155 if you are using 960 classes. |
[QUOTE=James Heinrich;266989]I've run through the entire list again, and also played with upper bit range, NumStreams, CPUStreams, GridSize, and Stages; all to no effect -- I found plenty of factors in adjacent classes, but none on that list came both in the same class. Perhaps tomorrow I can generate another list of potentials, but maybe it would help if I knew more what I was looking for -- obviously simply having two factors of similar size isn't enough to put them in the same class with any regularity.[/QUOTE]
mfaktc TF sieving isn't terribly smart...it just sieves and creates long lists of reasonably "probable" primes of the right form and size, in order on the CPU....and then calculates the mersenne number modulo the "probable" primes on the GPU. I'm not at all surprised if it's not smart enough to eliminate previously found large cofactors. If we are going to prove out Apsen's method, someone (I'm not available this week, have mfaktc-primenet to work on) will need to induce the problem in a way similar to what I suggested or just do it on some simpler, faster calculation. We need either bdot or Oliver for this. If it were mine right now, I would want to make an array of atomically writeable storage with one element for each thread/potential trial factor, and write a value in each element when the TF expression was complete, depending on success or failure -- and then count the number of success expressions there to ensure it was the same as the factors reported. I'd want to do this on the GPU, as it would eat quite a bit of CPU bus bandwidth. I do have this dream of moving sieving off the CPU in mfaktc....it would significantly improve my multi-core LL performance...so your effort to understand how the sieving works in enough detail to make it screw up creatively won't be wasted. |
[QUOTE=Christenson;266975]
However, I'd also like to point out that in my practical mfaktc work, actually finding a factor is relatively rare, and finding two within a bit level is definitely rare, though not unknown: [URL]http://www.mersenneforum.org/showpost.php?p=265202&postcount=179[/URL] [/QUOTE] Expected number of 70 bit factors is 1/70 Probability of 2 factors is e^(-1/70)/9800 (Poisson) David |
Selftest passed OK. Core i7-970 and GTX 580
[FONT=Fixedsys]Result[00]: M3321928619 has a factor: 38814612911305349835664385407[/FONT] [FONT=Fixedsys]found 1 factor(s) for M3321928619 from 2^94 to 2^95 [mfaktc 0.17.1-Win !!!TEST!![/FONT] [FONT=Fixedsys]! apsen 95bit_mul32][/FONT] [FONT=Fixedsys]selftest for M3321928619 passed![/FONT] [FONT=Fixedsys]tf(): total time spent: 0.212s[/FONT] [FONT=Fixedsys]Selftest statistics[/FONT] [FONT=Fixedsys]number of tests 4914[/FONT] [FONT=Fixedsys]successfull tests 4914[/FONT] [FONT=Fixedsys]selftest PASSED![/FONT] |
[QUOTE=Christenson;266992]
I do have this dream of moving sieving off the CPU in mfaktc....it would significantly improve my multi-core LL performance...so your effort to understand how the sieving works in enough detail to make it screw up creatively won't be wasted.[/QUOTE] :smile: I'll see how much time I could spend on it. |
[QUOTE=apsen;266988]
[QUOTE=Christenson;266975] b) running them through a large, relatively slow expression in parallel. [/QUOTE] That actually makes it harder to trigger. The atomics deal only with storing thread local data to shared memory. That's only four bytes for each factor, seven lines of c code: [CODE] index=RES[0]++; if(index<10) /* limit to 10 factors per thread */ { RES[index*3 + 1]=f.d2; RES[index*3 + 2]=f.d1; RES[index*3 + 3]=f.d0; } [/CODE] It's very negligible time comparing with the rest of computation. [/QUOTE] ! But we could [CODE] [COLOR="Red"]__syncthreads();[/COLOR] index=RES[0]++; if(index<10) /* limit to 10 factors per thread */ { RES[index*3 + 1]=f.d2; RES[index*3 + 2]=f.d1; RES[index*3 + 3]=f.d0; } [/CODE] to try threads go for the shared memory at the same time once we figure the way to heard the factors right. |
Oliver, did you get my message?
|
[QUOTE=apsen;267179]
[CODE] [COLOR="Red"]__syncthreads();[/COLOR] index=RES[0]++; if(index<10) /* limit to 10 factors per thread */ { RES[index*3 + 1]=f.d2; RES[index*3 + 2]=f.d1; RES[index*3 + 3]=f.d0; } [/CODE] [/QUOTE] The chance for a lost update is higher than without the update... (I did some expericments in the past). There is another issue in this variant: not all threads execute the __syncthreads() which has AFAIK undefined behavior. [QUOTE=apsen;267203]Oliver, did you get my message?[/QUOTE] Yepp, had YABT (Yet Another Business Trip). Oliver |
Hi James,
[QUOTE=James Heinrich;266979]I found an interesting issue, possibly bug, in mfaktc:[code] M862452323 has a factor: 1724904647 M862452323 has a factor: 103494278761 M862452323 has a factor: 26768470919200553 M862452323 has a factor: 178517762372762302367 M862452323 has a factor: 50544926613807686257 found 5 factor(s) for M862452323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]The problem? 1724904647 * 103494278761 = 178517762372762302367 So it really only found 4 factors, but [i]also[/i] reported a composite factor of the two smallest factors. How did this happen?[/QUOTE] This is known (at least to me). And technically it is correct. 178517762372762302367 is a factor of M862452323. It is a composite factor, not a prime factor. Oliver |
That multiple factor thing might have been avoided with an absurdly high setting for SievePrimes (and code updates to make sure that setting actually worked), wouldn't it?
Not that we wouldn't all grow old waiting for the CPU to do the sieving at the same time! |
[QUOTE=TheJudger;267248]This is known (at least to me).
And technically it is correct. 178517762372762302367 is a factor of M862452323. It is a composite factor, not a prime factor.[/QUOTE]Of course. You could claim that all these are factors, and it would be true, although not neccesarily useful :smile:[code]M862452323 has a factor: 1724904647 M862452323 has a factor: 103494278761 M862452323 has a factor: 26768470919200553 M862452323 has a factor: 50544926613807686257 M862452323 has a factor: 178517762372762302367 M862452323 has a factor: 46173059881613395394669791 M862452323 has a factor: 2770383591317463939447354833 M862452323 has a factor: 87185178798430852389017336279 M862452323 has a factor: 5231110724923700473126386687577 M862452323 has a factor: 1353010398174837130096359433084900121 M862452323 has a factor: 4778647530636042401407668963299608951 M862452323 has a factor: 9023167198392429666531803015720504470319 M862452323 has a factor: 2333813923251096884171353943910429764243762287 M862452323 has a factor: 140028835315238199557965845958940579302616630081[/code]Would it be worth a few extra lines of code to try and factor the found factor to break it down into prime factors on the chance that it happens to be a composite factor? I know PrimeNet will break it down when submitted, but I still think I'd rather know that I found 2 small factors rather than one larger one. |
[QUOTE=James Heinrich;267255]Would it be worth a few extra lines of code to try and factor the found factor to break it down into prime factors on the chance that it happens to be a composite factor? I know PrimeNet will break it down when submitted, but I still think I'd rather know that I found 2 small factors rather than one larger one.[/QUOTE]
Use alpertron or yafu; don't mess with the single minded pursuit of speed by distracting the programmers. |
[QUOTE=TheJudger;267246]The chance for a lost update is higher than without the update...
[/QUOTE] That was related to the talks about how to increase the chance of memory access related problems in special test version if we want to test for the presence of said problems. [QUOTE=TheJudger;267246]There is another issue in this variant: not all threads execute the __syncthreads() which has AFAIK undefined behavior. [/QUOTE] That was just to illustrate the idea but good point. Need to keep that in mind. |
[QUOTE=apsen;266988]
It looks like the approach Bdot is taking with mfakto. BTW I wonder if he's still reading this thread? Maybe he'll borrow my solution? :smile:[/QUOTE] I do follow the thread, just did not have a lot of time recently ... What problem is it that you want to see in mfaktc/o? Just what happens when multiple threads write to the same buffer? For this test you don't need mfaktc - a small kernel like this [code] f=get_global_id(0); index=RES[0]++; if(index<10) /* limit to 10 factors */ { RES[index*3 + 1]=f; RES[index*3 + 2]=f; RES[index*3 + 3]=f; } [/code]will show. Now run that kernel in one, two, five, twenty ... threads and check the RES buffer. No need to actually test any factor ... For the problem to occur in real life, the factors need to be in the same class (we talked about that sufficiently). And they also need to be close together (in the same grid, not just the same bitlevel). mfaktc/o tests up to 2^20 resp. 2^21 factor candidates in one run. Someone with better mathematical background than I have could maybe calculate the chances: [LIST][*]we're typically searching factors above 2^64, more often above 2^68[*]the two factors must be in the same class out of 4620 classes[*]they must not be more than 2^20 (2^21) of our "probably prime" factor candidates apart. (At SievePrimes=200.000, grid size 2^21, "last k" - "first k" of a grid is ~ 5.3E10)[/LIST]@apsen: I don't think that copying 32 bytes per FC from the GPU to the CPU after a grid was tested is a good idea. Currently mfaktc copies 4 byte per FC to the GPU and 32 bytes back in TOTAL. Your approach (as of[URL="http://mersenneforum.org/showpost.php?p=266751&postcount=1079"] this post[/URL]) increases the required bandwidth about 9 times. Probably the comment " /* limit to 10 factors per thread */" was misleading :smile: : A single thread will only test one candidate, and as such will have at most one result. So the code could be changed to just double the required bandwitdh (4 bytes in, 4 bytes out per FC). Still I'm afraid that would be a major performance hit, at least for OpenCL where there is no transfer-hiding support yet. I'll see that I can test the above mini-kernel and verify that for all possible thread numbers at least RES[0] is non-zero. Then we're (almost) certain that we always have some indication that there is at least one factor in the unlikely event above. Unfortunately my Windows-PC has lost its PSU. The good thing about that: Now there is a running version of mfakto for Linux :smile:. |
I tried it. And I have good and bad news :smile:
Good: RES[0] was always at least 1, so we'll always have notification that a factor was found. Also good: The "factors" that appeared in the buffer were not scrambled - they either appeared correctly or not at all - except for the last factor that was incomplete a few times. So we'll always have at least one valid factor. Even when increasing RES[0] in more than 1000 threads, the number of factors reported was always between 1 and 10. When less than 10 threads wrote their "factors", the result was consistently one factor found, the first factor was the only one reported and it was complete. Therefore we can assume that without atomics we will simply miss the second factor and never know there is a "twin". Now the bad news: atomics don't seem to work reliably in OpenCL. I will seek help with AMD. When using atomic_inc, the reported number of factors was around 80% of the expected. And worst: when only two or three "factors" are written to the result buffer, only the first one arrives - just as without atomics. |
Bdot:
From a pragmatics standpoint, where factors are found are reasonably rare (most bit levels don't find any), can we accept that the presence of a factor is a sign to re-run the bit level or class more carefully, possibly on the host CPU instead, since it doesn't happen often? Not that I don't think atomics are worthwhile....I am just thinking that this may be the least price to pay in the absence of atomics. |
[QUOTE=Christenson;267653]Bdot:
From a pragmatics standpoint, where factors are found are reasonably rare (most bit levels don't find any), can we accept that the presence of a factor is a sign to re-run the bit level or class more carefully, possibly on the host CPU instead, since it doesn't happen often? Not that I don't think atomics are worthwhile....I am just thinking that this may be the least price to pay in the absence of atomics.[/QUOTE] Well, I don't think people agree to re-run their tests on the CPU whenever a factor was found ... For the course of GIMPS (finding the next mersenne prime) this has no effect - we have one factor, so the mersenne number is not prime. Of course it would be better to have both factors. I currently think the "problem" can be ignored. We will use atomics when available, and I will see that the OpenCL-atomics get fixed. Without (proper) atomics, we'll exclude the exponent from further prime-hunting with just one of two factors found. As most mersenne numbers have many more factors anyway, that should not be a big deal - am I wrong here? |
I always seem to be TF'ing the mersenne numbers with one or no factors in a given bitlevel.
As for re-running a little bit of TF on the CPU, don't forget that a whole lot of sieving goes on on my CPU to support mfaktc at the moment. Therefore, as long as the process is automatic, I think it will go largely unnoticed. Particularly if only one class is re-run, it won't take long, just a few minutes out of the many days mfaktc/mfakto runs. |
While it's (almost?) guaranteed that there will be another factor, the chance of it being in even the same bitlevel is low; in the same class very low (I haven't been able to find an example). If one factor is found and reported that is sufficient for GIMPS, and not much different from running mfaktc with [i]StopAfterFactor=2[/i] -- there might be another factor in the bitlevel that won't be found.
Decades from now, someone will undoubtedly run through the list of exponents that have only a small factor and use their 100-billion-exaflop GPU to TF them all to 2^100 in a short time, so I wouldn't worry too much about a "missed" factor, since we know almost for certain that there are still unfound factors. My math skills are admittedly atrocious, but isn't the chance of twin factors in the same class something like (1/70)^2 * (1/4620)^2 = about 1:100,000,000,000 ? |
[QUOTE=James Heinrich;267686]My math skills are admittedly atrocious, but isn't the chance of twin factors in the same class something like (1/70)^2 * (1/4620)^2 = about 1:100,000,000,000 ?[/QUOTE]
Quite obviously, if there are twin factors, the chance of them being in the same class is 1 in 4620. Furthermore, you don't always search single bit levels, though I have to admit that the multi-bitlevel case is a rare one unless you really aim for it. |
[QUOTE=ckdo;267692]Quite obviously, if there are twin factors, the chance of them being in the same class is 1 in 4620.[/QUOTE]Again, I admit I don't know much here, but isn't that kind of like saying "if there are twin factors in the same class, the chance of them being in the same class is 1 in 1"?
I figured that there's already a limited chance of there being multiple factors per bit-level, around (1 / <bitlevel>)^<factors in bitlevel> --- am I mistaken on this? That for 2^70 TF, you'd have about 1:70 chance of 1 factor, 1:4900 chance of 2 factors, 1:343000 chance of 3 factors, etc? :unsure: |
Would it be hard to extend mfaktc to larger exponents? It could be very useful for people who wish to search for factors of double Mersenne numbers.
|
that question has been answered a few page earlier : too much rewrite.
|
What's the form of a double-mersenne number?
|
MM2,MM3,MM7, MM127 etc
|
[QUOTE=firejuggler;267720]MM2,MM3,MM7, MM127 etc[/QUOTE]
three of those examples are also triple mersenne numbers as well. |
meaning 2^(2^2-1)-1, 2^(2^3-1)-1, 2^(2^7-1) -1 .... 2^(2^127-1)-1, right....with that last number having 10^40 or so digits....just a wee bit bigger than the 10^9 digits some of us play with right now....
with 127 bits as the minimum factor length, too... Yes, I think you'd destroy mfaktc if you tried to make it do more than the first few double mersennes....but it wouldn't be a bad place to start if you wanted to do that job....I'm simply not up to it right now. I could see building a separate version for each double-mersenne exponent. |
[QUOTE=James Heinrich;267694]
I figured that there's already a limited chance of there being multiple factors per bit-level, around (1 / <bitlevel>)^<factors in bitlevel> --- am I mistaken on this? That for 2^70 TF, you'd have about 1:70 chance of 1 factor, 1:4900 chance of 2 factors, 1:343000 chance of 3 factors, etc? :unsure:[/QUOTE] Between 2[SUP]69[/SUP] and 2[SUP]70[/SUP] Expected number of factors E = ln(70/69) ~ 1/69.50 [CODE] [U]factors[/U] [U]probability[/U] 0 69/70 1 69/70 * E 2 69/70 * E[SUP]2[/SUP]/2! 3 69/70 * E[SUP]3[/SUP]/3! 4 69/70 * E[SUP]4[/SUP]/4! ........ [/CODE] Note that the sum of the probabilities is 69/70 * e[SUP]E[/SUP] = 1 David |
[QUOTE=ixfd64;267716]Would it be hard to extend mfaktc to larger exponents? It could be very useful for people who wish to search for factors of double Mersenne numbers.[/QUOTE]
Not really "hard" but alot of work. The exponent is represented in a single 32bit unsigned integer in mfaktc. Main task in the host code: extend the per class sieve initializations for bigger exponents. This part is not really performance critical as long as the job is "big enough" (runtime per class > than a few seconds). Simple approach: use libgmp. The GPU code "just needs bigger numbers", too. The bad news are that you have to rewrite the complete set of functions (add, sub, mul, div, mod, ...) for bigger numbers. [QUOTE=James Heinrich;267255]Would it be worth a few extra lines of code to try and factor the found factor to break it down into prime factors on the chance that it happens to be a composite factor?[/QUOTE] Simple approach: use libgmp on the CPU to check if factors are composite or prime. Cons:[LIST][*]mfaktc depends on (another) external library[*]makes it even harder (imposible?) to release a closed source version of mfaktc (with automated primenet support)[/LIST] Currently I say: no fix planned. Oliver |
[QUOTE=TheJudger;267892][LIST][*]mfaktc depends on (another) external library[*]makes it even harder (imposible?) to release a closed source version of mfaktc (with automated primenet support)[/LIST][/QUOTE]Certainly not impossible; possibly harder.
GMP is licensed under the LGPL, meaning that you don't have to release your source code. What you must do is release the compiled version of your code in such a way that it can be linked with GMP by those to whom you distribute it. So, for instance, you could ship a library containing all your secret material, a copy of the GMP library and a trivial C program source which reads something like: [code] /* my_source.c */ main (int argc, char **argv) { my_main (argc, argv); } [/CODE] and a Makefile which reads something like: [code] all: gcc -o mfaktc source.c mfaktc.a -lgmp [/code] Paul |
[QUOTE=science_man_88]
[QUOTE= firejuggler] [I]MM2,MM3,MM7, MM127 etc[/I] [/QUOTE] three of those examples are also triple mersenne numbers as well. [/QUOTE] That's the Catalan sequence. MM5, MM13, MM17, MM19, MM31, etc, are also DMN, in the strict way (mersenne with prime exponents). MM9, MM11, MM15, MM23, etc, are also DMN, but they can't be prime, as their exponents are not prime. As the OP is asking about "finding factors" of such numbers, I assume he is not talking about "prime" DMN. |
[QUOTE=xilman;267937]Certainly not impossible; possibly harder.
GMP is licensed under the LGPL, meaning that you don't have to release your source code. What you must do is release the compiled version of your code in such a way that it can be linked with GMP by those to whom you distribute it.[/QUOTE] I think it's even simpler - just dynamically link to libgmp.so and ship a mfaktc binary. The binary isn't under any restrictions at all at that point. I'd worry more about Windows compatibility and cost/effort considerations. |
I'm trying to compile the source in Ubuntu. When I run make in the src folder, I get an error saying that it doesn't support gcc versions > 4.5 , and I have 4.5.2 . Is there a way to get around this? Do I have old code?
|
Hi!
This is not a matter of the mfaktc version. Nvidia doesn't support gcc >= 4.5.0 in their toolkit. You could try to install an older gcc or try to fool the nvcc... :sad: Oliver |
[QUOTE=Dubslow;268875]I'm trying to compile the source in Ubuntu. When I run make in the src folder, I get an error saying that it doesn't support gcc versions > 4.5 , and I have 4.5.2 . Is there a way to get around this? Do I have old code?[/QUOTE]I've just started looking into exactly the same problem but to build under Fedora 15.
There is a lot of chat on the CUDA and Linux fora about this issue. The summary seems to be that the dependency of gcc is for some debugging structures generated by old gcc and understood by Nvidia's tools. Somewhere in the code (perhaps scripts or Makefiles) there is a #error statement (or statements) which can be safely removed and everything then builds. Sorry not to be more precise but the material above should give you enough to start looking for the details. Alternatively, you can wait until I've managed to get CUDA reinstalled on my system after its upgrade. Paul |
It's possible to download, compile and install gcc 4.4.6 quite easily. You might need to install a few additional libraries (MPFR, etc) to compile the compiler.
Download and unpack the gcc 4.4.6 source archive into a directory. Create a second directory gcc-4.4.6-bin. cd into it. ../gcc-4.4.6/configure --enable-languages=c,c++ --prefix=/opt/gcc-4.4.6 If the configure script complains about missing libraries install them with "sudo apt-get install libXYZ-dev". make sudo make install Add the following option to the list of options for nvcc: Either --compiler-bindir /opt/gcc-4.4.6/bin or -ccbin /opt/gcc-4.4.6/bin Note: Replace /opt/gcc-4.4.6 with a target directory of your choice. I usually use /opt/gcc-x.y.z for this purpose. This makes it quite easy to "uninstall" the compiler via rm -rf /opt/gcc-x.y.z |
[QUOTE=xilman;268883]I've just started looking into exactly the same problem but to build under Fedora 15.
There is a lot of chat on the CUDA and Linux fora about this issue. The summary seems to be that the dependency of gcc is for some debugging structures generated by old gcc and understood by Nvidia's tools. Somewhere in the code (perhaps scripts or Makefiles) there is a #error statement (or statements) which can be safely removed and everything then builds. Sorry not to be more precise but the material above should give you enough to start looking for the details. Alternatively, you can wait until I've managed to get CUDA reinstalled on my system after its upgrade.[/QUOTE]Miserable failure here :sad: I can't even install the device driver because the installer checks for elderly versions of the kernel and various crude hacks failed to work. I've posted to the Cuda-on-Linux forum at Nvidia and hope someone there will be able to help. Paul |
This is a little bit wierd to me. I had no difficulty with installing the whole nine yards earlier this year under xubuntu 10.10, 64-bit version.
|
[QUOTE=xilman;268894]Miserable failure here :sad:
I can't even install the device driver because the installer checks for elderly versions of the kernel and various crude hacks failed to work. I've posted to the Cuda-on-Linux forum at Nvidia and hope someone there will be able to help. Paul[/QUOTE] Have you tried to install the 275.21 or the 280.13 drivers? The kernel version check has been modified after the release of 270.41.19. |
[QUOTE=Ralf Recker;268901]Have you tried to install the 275.21 or the 280.13 drivers? The kernel version check has been modified after the release of 270.41.19.[/QUOTE]No, but thanks for the tip. I'll give it a try.
The standard download for V4 appears to be the 270.40.xx release and I hadn't come across the 280.13 version. Paul |
[QUOTE=Christenson;268895]This is a little bit wierd to me. I had no difficulty with installing the whole nine yards earlier this year under xubuntu 10.10, 64-bit version.[/QUOTE]
I bet because 10.10 had an older version of gcc. I'm using 11.04. |
[QUOTE=xilman;268902]No, but thanks for the tip. I'll give it a try.
The standard download for V4 appears to be the 270.40.xx release and I hadn't come across the 280.13 version. Paul[/QUOTE]AFAICT, 280 is not yet available for 64-bit Linux. About to give the 275 beta a try. Paul |
[QUOTE=xilman;269059]AFAICT, 280 is not yet available for 64-bit Linux. About to give the 275 beta a try.
Paul[/QUOTE] [B][SIZE=1]Linux x64 (AMD64/EM64T) Display Driver[/SIZE][/B] [SIZE=1] [/SIZE][B][SIZE=1]Version:[/SIZE][/B] [B][SIZE=1]280.13 Certified[/SIZE][/B] [B][SIZE=1]Release Date:[/SIZE][/B] [B][SIZE=1]2011.08.01[/SIZE][/B] [B][SIZE=1]Operating System:[/SIZE][/B] [B][SIZE=1]Linux 64-bit[/SIZE][/B] [B][SIZE=1]Language:[/SIZE][/B] [B][SIZE=1]English (U.S.)[/SIZE][/B] [B][SIZE=1]File Size:[/SIZE][/B] [B][SIZE=1]52.4 MB [/SIZE][/B] [URL]http://www.nvidia.com/object/linux-display-amd64-280.13-driver.html[/URL] Deutsche Version [B][SIZE=1]Linux x64 (AMD64/EM64T) Display Driver[/SIZE][/B] [B][SIZE=1] Version:[/SIZE][/B] [B][SIZE=1]280.13 Certified[/SIZE][/B] [B][SIZE=1]Freigabedatum:[/SIZE][/B] [B][SIZE=1]2011.08.01[/SIZE][/B] [B][SIZE=1]Betriebssystem:[/SIZE][/B] [B][SIZE=1]Linux 64-bit[/SIZE][/B] [B][SIZE=1]Sprache:[/SIZE][/B] [B][SIZE=1]Deutsch[/SIZE][/B] [B][SIZE=1]Dateigröße:[/SIZE][/B] [B][SIZE=1]52.4 MB [/SIZE] [/B] [URL]http://www.nvidia.de/object/linux-display-amd64-280.13-driver-de.html[/URL] |
[QUOTE=moebius;269062][B][SIZE=1]Linux x64 (AMD64/EM64T) Display Driver[/SIZE][/B]
[SIZE=1] [/SIZE][B][SIZE=1]Version:[/SIZE][/B] [B][SIZE=1]280.13 Certified[/SIZE][/B][/QUOTE] LHQL to the rescue! :big grin: |
[QUOTE=ckdo;269065]LHQL to the rescue! :big grin:[/QUOTE]
Does it work with CUDA 3.X (especially 3.0)? Having a GTX275 I can't access to 2.0 capabilities delivered by CUDA 4.0. Luigi |
| All times are UTC. The time now is 13:00. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.