mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   mfaktc: a CUDA program for Mersenne prefactoring (https://www.mersenneforum.org/showthread.php?t=12827)

James Heinrich 2011-07-13 17:18

Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is.

So the answer to your question would be: "4 cores"

apsen 2011-07-13 18:26

[QUOTE=James Heinrich;266298]Each core adds more overall performance, there's never a case where X cores does more total work than X+1 cores, but the performance of each core drops the more loaded the CPU is."[/QUOTE]

mfaktc performance drops 50% on loading 4th core. If all cores suffer the same penalty then 4/4*0.5 is less then 3/4. But Prime95 performance does not seem to drop off as badly on addition of the 4th core... I'll need to do some tests. As it is 8800 GTS performs at 3/4 of GTX 465 :-(

James Heinrich 2011-07-13 19:10

Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.

Karl M Johnson 2011-07-13 19:47

Yes. Even with 64 bit mfaktc.

apsen 2011-07-13 20:12

[QUOTE=James Heinrich;266306]Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.[/QUOTE]

I do not know how to check but it was said that CUDALucas maxes out GPU. Maybe I'll just run CUDALucas on 465 (+4 Prime95 workers) and run mfaktc on my two 8800?

apsen 2011-07-17 17:55

1 Attachment(s)
[QUOTE=Christenson;265657]10200 = 27D8....you sure you have the right return-type declared for cudaStreamCreate?

If you are just trying to run mfaktc, I'd be inclined to ignore the "I can't build it" problem. What do you hope to do with the modification?[/QUOTE]

I figured this one out - toolkit mismatch.

I have also modified mfaktc so it no longer needs atomics and compiles for any cuda compute capability under CUDA 2.2/3.1/3.2. I haven't tried 4.0 but I do not see why it would have a problem with that.

Anyway here's modified mfaktc:

TheJudger 2011-07-17 22:57

aspen: why didn't you change the version string? Seems that you did [B]alot[/B] more changes than just the removal of the atomics...

Oliver

TheJudger 2011-07-17 23:38

aspen: your changes seem to screw something up. :sad:

On my GTX 8800 (CUDA 4.0) it [B]sometimes[/B] fails the short selftest!
Performance is half of the expected value (no async CPU/GPU computation?).

[CODE]
running a simple selftest...
ERROR: selftest failed for M49635893!
expected result: 000F300E 00B13196 00D84F67
reported result: 001DAC4B 001DAC50 001DAC55
reported result: 001DAC57 001DAC5D 001DAC5F
reported result: 001DAC61 001DAC67 001DAC6B
reported result: 001DAC70 001DAC73 001DAC7A
reported result: 001DAC7E 001DAC84 001DAC8A
reported result: 001DAC8C 001DAC8E 001DAC99
reported result: 001DAC9A 001DAC9E 001DACA0
reported result: 001DACA2 001DACA3 001DACA5
reported result: 001DACAF 001DACB0 001DACB5
reported result: 001DACB6 001DACBD 001DACBE
Selftest statistics
number of tests 31
successfull tests 30
wrong factor reported 1

selftest FAILED!
[/CODE]

[COLOR="Red"][SIZE="4"]I don't recommend to run aspens version until this if fixed![/SIZE][/COLOR]

Oliver

Christenson 2011-07-18 01:49

Hi Oliver:

I've been putting my time into parse.c ... gone through 1 re-write, need another to get it organized with a parse_line function that returns as a structure with both the data found and the original line.

I really don't have time to check over apsen's changes right now, as work has gotten rather rough....I'm supposed to be doing something I never have done before, with few resources and little support.

apsen 2011-07-18 01:51

[QUOTE=TheJudger;266725]aspen: your changes seem to screw something up. :sad:

On my GTX 8800 (CUDA 4.0) it [B]sometimes[/B] fails the short selftest!
Performance is half of the expected value (no async CPU/GPU computation?).

[COLOR="Red"][SIZE="4"]I don't recommend to run aspens version until this if fixed![/SIZE][/COLOR]

Oliver[/QUOTE]

Sorry, It wasn't really meant for general consumption That's why I did not post the executable. I was hoping for your to take a look at it. We could transfer this to private conversation.

For me all tests (including long one) come up fine. I did have problem in the interim so maybe I need to check if I posted the right version. Also I haven't tested with CUDA 4.0...

The idea is simple give each thread it's own chunk of memory to write the results so there's no need to have shared variable.

I did have to rearrange the code a little bit to make it possible but I tried to keep it so it's easy to do diff. It could use a little straightening otherwise.

Christenson 2011-07-18 01:51

[QUOTE=James Heinrich;266306]Someone else will likely correct me, but I believe a GTX 465 needs more than 1 instance to show its potential. I'd try 2 instances of mfaktc and 2 Prime95 workers and see what your overall throughput is like.[/QUOTE]

Someone else says you are dead on, James. Here's why: right now mfaktc is sieving on the CPU...so you will either need a very hot CPU core or two cores to reach full potential on a hot GPU card....

apsen 2011-07-18 05:49

1 Attachment(s)
[QUOTE=TheJudger;266725]aspen: your changes seem to screw something up. :sad:
[/QUOTE]

I originally tried to follow the changes since version 0.8 as it was the last stock version that worked on my machine so it is more probable that I broke something unintentionally.

I've now got clean mkfaktc-0.17 and reapplied my changes directly to it. I've made them almost minimal (with the exception of enabling it to compile under CUDA 2.2) so it should be easy to do diff.

I do not see any test failures on it so please test it on your system to see if you still experience them.

Also, the slowdown was due to synchronous memory copy in the main loop but on my machine it was not as noticeable. I was loosing less then 10% of performance so I was going to look into it later. I have now reworked it too and performance is back on par with stock build. Please check if I did it the right way.

TheJudger 2011-07-18 10:00

Hi aspen,

[QUOTE=apsen;266731]The idea is simple give each thread it's own chunk of memory to write the results so there's no need to have shared variable.[/QUOTE]
Each thread or each stream? I think it is the latter case and than it won't work reliable. Old GPUs can't even run concurrent kernels so the behavior is the same as without own chunks of memory.

[QUOTE=apsen;266751]I've now got clean mkfaktc-0.17 and reapplied my changes directly to it. I've made them almost minimal (with the exception of enabling it to compile under CUDA 2.2) so it should be easy to do diff.
[/QUOTE]

May I know the reason why you're still on CUDA 2.2 (there might be good ones).

Oliver

apsen 2011-07-18 12:08

[QUOTE=TheJudger;266768]
Each thread or each stream? I think it is the latter case and than it won't work reliable. Old GPUs can't even run concurrent kernels so the behavior is the same as without own chunks of memory.
[/QUOTE]
I meant for each thread but I guess I indeed did it for each stream :-( I guess I would need to go back and fix it.


[QUOTE=TheJudger;266768]
May I know the reason why you're still on CUDA 2.2 (there might be good ones).
[/QUOTE]

The driver on the machine does not support 3.1 or higher and if I upgrade the performance of stock mfaktc-0.8 drops significantly. So while I'm changing mfaktc I need it to test on 2.2. When it's ready I'll compile it for newer CUDA and upgrade the driver.

It was late yesterday so I didn't really do it this time.

Is there a problem with 2.2?

TheJudger 2011-07-18 14:01

[QUOTE=apsen;266778]Is there a problem with 2.2?[/QUOTE]

I don't know any problems related to mfaktc (expect that is doesn't compile as it is now). As CUDA 2.2 is past I don't have any real plans for supporting it in mfaktc. If the needed changes are trivial and have not side effect I might try it anyway.

Oliver

apsen 2011-07-18 14:43

1 Attachment(s)
[QUOTE=TheJudger;266790]As CUDA 2.2 is past I don't have any real plans for supporting it in mfaktc. If the needed changes are trivial and have not side effect I might try it anyway.
[/QUOTE]

The only difference is the absence of __launch_bounds__. You could see I've just put conditional macro to do away with it if we compile under CUDA 2.2. I don't care about supporting 2.2 either I just need it until we could make the current version of mkfaktc work with sm_10.


[QUOTE=apsen;266778]I guess I would need to go back and fix it.
[/QUOTE]

I've made the changes but I wouldn't be able test it until later today. But maybe you'd be willing to take a look at it before then to see if my understanding is right. (If you are on Germany time it will be past midnight for you before I get a chance to test.)

apsen 2011-07-18 15:57

[QUOTE=TheJudger;265318]
use of atomic instructions for access to the results array (this needs CC >=1.1)
[/QUOTE]
:geek:
Just to make sure I understand it right: just blindly replacing atomics with unprotected access to d_RES might result in the problem only when we find more then one factor per class (tf_class_* call) and even then it will report that at least one factor has been found but the factor(s) itself may be scrambled by simultaneous attempt to store them in the result array. So if the program reports no factors found - it will be true. Is this correct?

TheJudger 2011-07-18 16:32

correct!

apsen 2011-07-18 18:24

Is CPUStreams configuration parameter basically the length of sieve queue?

TheJudger 2011-07-18 19:38

yes!

btw.: pleases change the version string in your modified code to something unique.
e.g. "0.17-ap1"

Oliver

TheJudger 2011-07-18 20:11

Hi Eric,

[QUOTE=Christenson;266730]Hi Oliver:

I've been putting my time into parse.c ... gone through 1 re-write, need another to get it organized with a parse_line function that returns as a structure with both the data found and the original line.[/QUOTE]

feel free to sent me your stuff (even if it is not finished).

Oliver

apsen 2011-07-18 22:03

[QUOTE=TheJudger;266839]
btw.: pleases change the version string in your modified code to something unique.
e.g. "0.17-ap1"
Oliver[/QUOTE]

Ok. But would you be willing to incorporate my changes in your code once it goes through testing?

TheJudger 2011-07-18 22:25

[QUOTE=apsen;266858]Ok. But would you be willing to incorporate my changes in your code once it goes through testing?[/QUOTE]

G80 support: yes, at least as "official patch" or "special G80 version", depending how/if the changes affect the code / performance. I assume that a 1% slowdown on non-G80 GPUs has a bigger impact than the ability to run on G80 GPUs. G80 isn't that hot today. Even entry level Fermis are faster.

CUDA < 2.3 support: unlikely, how many people need this? Main reason: needs testing.

Cleanups: very welcome. I know that the code is messy. :blush: It is grown and some parts have been changed / expanded many times.

Performance improvements: [B]YES[/B]

Oliver

apsen 2011-07-18 23:04

[QUOTE=TheJudger;266863]CUDA < 2.3 support: unlikely, how many people need this? Main reason: needs testing.[/QUOTE]

Do not worry about it - I'll remove it myself eventually :-)

[QUOTE=TheJudger;266863]
Cleanups: very welcome. I know that the code is messy. :blush: It is grown and some parts have been changed / expanded many times.
[/QUOTE]

Are you sure you know what you are asking for? :bounce:
Actually the reason my fist code had so many changes is that I tried to make sense of the code so I was trying to make it more readable. And all those #ifdef inside the code were getting to me. Sorry, but with you prompting it I could not resist.
:redface:

[QUOTE=TheJudger;266863]Performance improvements: [B]YES[/B]
[/QUOTE]

I might be able to cook something here but this is my first encounter with GPU computing. :unsure:

apsen 2011-07-19 03:19

1 Attachment(s)
Ok. Here it goes. I think this one is ready for more extensive testing as it passes all the selftests multiple times on different machines.

I've included Win64 executable compiled with CUDA 4.0 for multiple compute capabilities. I'd be interested if anyone could run the selftests and any other tests they could think of. Also I'm interested to know if there any noticeable performance differences with stock mfaktc-0.17.

Bdot 2011-07-19 08:55

[QUOTE=apsen;266804]:geek:
Just to make sure I understand it right: just blindly replacing atomics with unprotected access to d_RES might result in the problem only when we find more then one factor per class (tf_class_* call) and even then it will report that at least one factor has been found but the factor(s) itself may be scrambled by simultaneous attempt to store them in the result array. So if the program reports no factors found - it will be true. Is this correct?[/QUOTE]

Actually, as far as I understood the code, the problem only appears if multiple factors are found within the same grid of a class. Therefore, reducing the grid size will reduce the risk even more (when was the last time anyone had 2 factors in the same class?).

For ATI's HD4xxx GPUs I'm limiting the GridSize to 2, and issue a warning "GPU does not support atomics. There is a small chance mfakto can report wrong factors; if it does, then the exponent has multiple factors in the tested range."
And yes, I know it is not the exponent itself having the factors ...

ckdo 2011-07-19 09:32

Why not simply(?) use CPU based code to check the factors found and if one proves to be wrong use CPU based code to re-search the grid/class? The time used to do this should be negligible.

James Heinrich 2011-07-19 11:35

[QUOTE=apsen;266891]I've included Win64 executable compiled with CUDA 4.0 for multiple compute capabilities. I'd be interested if anyone could run the selftests and any other tests they could think of.[/QUOTE]I would, but you didn't bundle cudart64_40_17.dll -- can you post a copy of that please to make it easier for testers?

apsen 2011-07-19 14:07

1 Attachment(s)
[QUOTE=James Heinrich;266911]I would, but you didn't bundle cudart64_40_17.dll -- can you post a copy of that please to make it easier for testers?[/QUOTE]

Sorry.

apsen 2011-07-19 14:50

1 Attachment(s)
[QUOTE=James Heinrich;266911]I would, but you didn't bundle cudart64_40_17.dll -- can you post a copy of that please to make it easier for testers?[/QUOTE]

Actually, could you use this version:

James Heinrich 2011-07-19 16:57

[QUOTE=apsen;266891]I'd be interested if anyone could run the selftests and any other tests they could think of. Also I'm interested to know if there any noticeable performance differences with stock mfaktc-0.17.[/QUOTE]Selftest (for mfaktc17apsen.cuda40.sm_multi.win64) works fine here (Win7x64, i7-920, 8800GT):[code]Selftest statistics
number of tests 4914
successfull tests 4914

selftest PASSED![/code]Speed (in my very, very brief testing) seemed identical to stock.

edit: you posted the new version after I'd run my tests

TheJudger 2011-07-19 17:29

[QUOTE=apsen;266866]Are you sure you know what you are asking for? :bounce:

Actually the reason my fist code had so many changes is that I tried to make sense of the code so I was trying to make it more readable. And all those #ifdef inside the code were getting to me. Sorry, but with you prompting it I could not resist.
:redface: [/QUOTE]

No problem. I saw your modifications with the #ifdefs, I'll try to remove/replace some of them in my official version (e.g. optional parameters for debuging, etc.)

Oliver

apsen 2011-07-19 17:41

[QUOTE=James Heinrich;266936]Speed (in my very, very brief testing) seemed identical to stock.[/QUOTE]
No surprise here. The difference is mostly memory usage. The stock one uses 32 integers total to hold results. My version uses 32 integers per thread to avoid using of atomics. It could store up to 10 factors per thread but it is folded down to original amount on output. I did not modify output to minimize diffs with stock versions but it could be easily done later.

[QUOTE=James Heinrich;266936]edit: you posted the new version after I'd run my tests[/QUOTE]
I did not realize threadId is unique only within block originally so I had to fix that. mfaktc17apsen could have problems if two threads with the same id in different blocks find factors at the same time. mfaktc171apsen fixes that.

BTW at what point the program could be deemed good enough to produce? What is the usual standard?

apsen 2011-07-19 17:50

[QUOTE=TheJudger;266940]I'll try to remove/replace some of them in my official version (e.g. optional parameters for debuging, etc.)
[/QUOTE]

Oliver, should I continue posting code here or should I pm them to you. My current code has bare minimum of the changes against 0.17 right now so it should be easy to merge them in. I also have some other ideas and also could help with making it more readable.

James Heinrich 2011-07-19 18:27

I ran a quick test looking for multiple factors, seemed to work as expected:[code]got assignment: exp=999903251 bit_min=1 bit_max=67
tf(999903251, 1, 67, ...);
k_min = 0
k_max = 73794115801
Using GPU kernel "71bit_mul24"
class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait
1/4620 | 2.98M | 0.114s | 26.16M/s | 200000 | n.a. | 371us
Result[00]: M999903251 has a factor: 1999806503
1884/4620 | 2.98M | 0.115s | 25.93M/s | 166923 | n.a. | 395us
Result[00]: M999903251 has a factor: 2347679090418412249
3153/4620 | 3.10M | 0.116s | 26.69M/s | 143668 | n.a. | 834us
Result[00]: M999903251 has a factor: 135710897421958333087
4609/4620 | 2.98M | 0.111s | 26.86M/s | 161503 | n.a. | 631us
found 3 factor(s) for M999903251 from 2^ 1 to 2^67 [mfaktc 0.17.1-Win !!!TEST!!!
apsen 71bit_mul24]
tf(): total time spent: 2m 48.483s[/code]

apsen 2011-07-19 19:30

[QUOTE=James Heinrich;266951]I ran a quick test looking for multiple factors, seemed to work as expected:[code]got assignment: exp=999903251 bit_min=1 bit_max=67
tf(999903251, 1, 67, ...);
k_min = 0
k_max = 73794115801
Using GPU kernel "71bit_mul24"
class | candidates | time | avg. rate | SievePrimes | ETA | avg. wait
1/4620 | 2.98M | 0.114s | 26.16M/s | 200000 | n.a. | 371us
Result[00]: M999903251 has a factor: 1999806503
1884/4620 | 2.98M | 0.115s | 25.93M/s | 166923 | n.a. | 395us
Result[00]: M999903251 has a factor: 2347679090418412249
3153/4620 | 3.10M | 0.116s | 26.69M/s | 143668 | n.a. | 834us
Result[00]: M999903251 has a factor: 135710897421958333087
4609/4620 | 2.98M | 0.111s | 26.86M/s | 161503 | n.a. | 631us
found 3 factor(s) for M999903251 from 2^ 1 to 2^67 [mfaktc 0.17.1-Win !!!TEST!!!
apsen 71bit_mul24]
tf(): total time spent: 2m 48.483s[/code][/QUOTE]

I wonder if you know the case where multiple factors will fall within the same class...

James Heinrich 2011-07-19 19:42

[QUOTE=apsen;266953]I wonder if you know the case where multiple factors will fall within the same class...[/QUOTE]I haven't come across any multiple-factor exponents with even the same number of bits or digits in different factors. But, I should be able to dig around in the PrimeNet list of known factors and come up with a list of "close" factor pairs. I don't know how close the two factors have to be to be in the same class...?

apsen 2011-07-19 20:21

[QUOTE=James Heinrich;266957] I don't know how close the two factors have to be to be in the same class...?[/QUOTE]
The single assignment is divided into 4620 classes. I'm not sure how the sieving works but I would guess that it assigns candidates sequentially...

To be honest I doubt anyone would be able to trigger any bugs related to absence of atomics intentionally.

For example difference between my last two versions (assuming candidates are fed sequentially) deals with odds of two factors in the same assignment (and if you allow mfaktc to split them that would mean at the same bit level) fall close enough to fall into 1/4620 of the whole range but further apart then 1/4620/block_count and the streams processing them run at the same time and then multiply this probability by 1/threads_per_block.

Does anyone care to estimate a number to that.

Of course there could be some other bug unrelated to the use of atomics that makes it fail it every single time but at least we could exclude the later part (every single time, that is) due to the fact that we have run successful selftests. Was this confusing enough? :-)

Christenson 2011-07-19 21:54

Methinks that if you want to excite this problem, the easiest way is going to be to construct something....

James Heinrich 2011-07-19 22:25

[QUOTE=James Heinrich;266957]I should be able to dig around in the PrimeNet list of known factors and come up with a list of "close" factor pairs.[/QUOTE]I just tested this list of known multi-factor exponents, and they all found multiple factors real close together, but never in the same class, almost always 1 class apart (with a suspiciously large number of them found in classes 3 and 4?):[code]Factor=1008857,1,23
Factor=1040597,1,23
Factor=1048213,1,27
Factor=1053809,1,27
Factor=1153967,1,25
Factor=1198247,1,25
Factor=1305607,1,28
Factor=1308497,1,27
Factor=1357351,1,25
Factor=1380949,1,25
Factor=1398701,1,24
Factor=1443257,1,24
Factor=1485017,1,24
Factor=1490717,1,24
Factor=1516421,1,24
Factor=1518071,1,27
Factor=1519277,1,24
Factor=1527857,1,24
Factor=1564421,1,24
Factor=1566811,1,25
Factor=1588757,1,24
Factor=1604297,1,24
Factor=1613057,1,24
Factor=1655897,1,24
Factor=1677787,1,25
Factor=1678877,1,24
Factor=1684937,1,24
Factor=1715717,1,24
Factor=1719547,1,25
Factor=1735421,1,24[/code]I guess I'll dig up some more to try...

apsen 2011-07-19 22:41

[QUOTE=Christenson;266967]Methinks that if you want to excite this problem, the easiest way is going to be to construct something....[/QUOTE]

I'm not aware of any problem with mkfaktc171apsen code so I do not know what could be constructed.

mkfaktc17apsen does have a problem. And if we recompile the code with custom number of classes and threads based on chosen exponent and run it with custom number of streams and sieve level we should be able to guaranty that specific candidates will get processed by conflicting threads. But then we would still have to deal with uncertainty of thread scheduling...
It is probably easier to prove the problem mathematically then to actually trigger it...

apsen 2011-07-19 22:58

[QUOTE=James Heinrich;266969]I just tested this list of known multi-factor exponents, and they all found multiple factors real close together, but never in the same class, almost always 1 class apart (with a suspiciously large number of them found in classes 3 and 4?):
...
I guess I'll dig up some more to try...[/QUOTE]

Thanks, I appreciate the effort.

If you'd like I could recompile the program without MORE_CLASSES option. This
would reduce the number of classes to 420.
You could also play with sieve level to force specific candidates to go into different threads.

James Heinrich 2011-07-19 23:57

[QUOTE=apsen;266971]If you'd like I could recompile the program without MORE_CLASSES option. This would reduce the number of classes to 420.[/QUOTE]Sounds like a plan. And then you should be able to quickly test it yourself, since I've given you a list of likely assignments that could probably find 2 factors in the same class.

Christenson 2011-07-20 00:12

[QUOTE=apsen;266970]I'm not aware of any problem with mkfaktc171apsen code so I do not know what could be constructed.

mkfaktc17apsen does have a problem. And if we recompile the code with custom number of classes and threads based on chosen exponent and run it with custom number of streams and sieve level we should be able to guaranty that specific candidates will get processed by conflicting threads. But then we would still have to deal with uncertainty of thread scheduling...
It is probably easier to prove the problem mathematically then to actually trigger it...[/QUOTE]

It often is easier to prove a problem exists than reliably trigger it...but the construction I would use would work like this:
Notice that we are a) sieving for candidates of a certain form, and b) running them through a large, relatively slow expression in parallel.

Now, I'm willing to temporarily introduce a "bug" in the program to determine whether a problem exists. What if the exponent had several factors [e.g. look up the exponent "god" was using, or look through the operation Billion Digits web page] and, in addition, the CPU sieving process just "got lucky" and included them all in the same class, preferably one right after another? This is the statistically unlikely construction we are searching for.

However, I'd also like to point out that in my practical mfaktc work, actually finding a factor is relatively rare, and finding two within a bit level is definitely rare, though not unknown: [url]http://www.mersenneforum.org/showpost.php?p=265202&postcount=179[/url] Therefore, detecting the problem of a collision somewhat after not quite reporting correct results isn't an unreasonable course of action. The user will be able to re-run a more careful, if slower, TF program over a very narrow range quickly and get the correct results. Something very similar happens when factors are reported to the server and the server verifies them with much less effort than that required to find them in the first place.

James Heinrich 2011-07-20 00:25

[QUOTE=Christenson;266975]What if the exponent had several factors [e.g. look up the exponent "god" was using[/quote]Since that was me for a while, let me provide a short list (they're easy to find if you know how/where to look :)[code]M800000041 has a factor: 32000001641
M800000041 has a factor: 5357430980168323663
found 2 factor(s) for M800000041 from 2^35 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800000759 has a factor: 1600001519
M800000759 has a factor: 6400006073
M800000759 has a factor: 10240019438409224887
found 3 factor(s) for M800000759 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800002139 has a factor: 14400038503
M800002139 has a factor: 13677324569648791
found 2 factor(s) for M800002139 from 2^34 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800002649 has a factor: 24000079471
M800002649 has a factor: 30400100663
found 2 factor(s) for M800002649 from 2^35 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800002699 has a factor: 14400048583
M800002699 has a factor: 67425000050326478281
found 2 factor(s) for M800002699 from 2^34 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800003401 has a factor: 12800054417
M800003401 has a factor: 64733252796071023
M800003401 has a factor: 18730039702542776497
found 3 factor(s) for M800003401 from 2^34 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800003999 has a factor: 6400031993
M800003999 has a factor: 5133878758848163631
found 2 factor(s) for M800003999 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800004311 has a factor: 1600008623
M800004311 has a factor: 9256351487895290111
found 2 factor(s) for M800004311 from 2^31 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800004847 has a factor: 8000048471
M800004847 has a factor: 1380197213419202209
found 2 factor(s) for M800004847 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800005403 has a factor: 33600226927
M800005403 has a factor: 18992736271326281
found 2 factor(s) for M800005403 from 2^35 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800007251 has a factor: 1600014503
M800007251 has a factor: 85002820037327063
found 2 factor(s) for M800007251 from 2^31 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800008757 has a factor: 6400070057
M800008757 has a factor: 164187461134123649273
M800008757 has a factor: 1940039026319738167
M800008757 has a factor: 26502690101897
found 4 factor(s) for M800008757 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M800009557 has a factor: 56000668991
M800009557 has a factor: 573355047504912077761
M800009557 has a factor: 12495915504747291689
found 3 factor(s) for M800009557 from 2^36 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862436279 has a factor: 1724872559
M862436279 has a factor: 509958343703150891009
found 2 factor(s) for M862436279 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862437671 has a factor: 1724875343
M862437671 has a factor: 986547654080931473
found 2 factor(s) for M862437671 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862439531 has a factor: 1724879063
M862439531 has a factor: 541488560349621103
found 2 factor(s) for M862439531 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862439663 has a factor: 1724879327
M862439663 has a factor: 8697849796780030151
found 2 factor(s) for M862439663 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862446323 has a factor: 1724892647
M862446323 has a factor: 1166882518344965239417
found 2 factor(s) for M862446323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862449803 has a factor: 1724899607
M862449803 has a factor: 79380523255673039
found 2 factor(s) for M862449803 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862452323 has a factor: 1724904647
M862452323 has a factor: 103494278761
M862452323 has a factor: 26768470919200553
M862452323 has a factor: 178517762372762302367
M862452323 has a factor: 50544926613807686257
found 5 factor(s) for M862452323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862458959 has a factor: 1724917919
M862458959 has a factor: 206990150161
M862458959 has a factor: 357041019069209634959
found 3 factor(s) for M862458959 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862460411 has a factor: 1724920823
M862460411 has a factor: 510435025768812467849
M862460411 has a factor: 23636726266737125671
found 3 factor(s) for M862460411 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862464059 has a factor: 1724928119
M862464059 has a factor: 10397394163320570727
found 2 factor(s) for M862464059 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862465931 has a factor: 1724931863
M862465931 has a factor: 99918198394020824183
found 2 factor(s) for M862465931 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862469711 has a factor: 1724939423
M862469711 has a factor: 33635373938280024473
found 2 factor(s) for M862469711 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862472651 has a factor: 1724945303
M862472651 has a factor: 19055217184234607
found 2 factor(s) for M862472651 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862479851 has a factor: 1724959703
M862479851 has a factor: 413990328481
M862479851 has a factor: 714116634061458201143
found 3 factor(s) for M862479851 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862483991 has a factor: 1724967983
M862483991 has a factor: 146587766718190017313
found 2 factor(s) for M862483991 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862484663 has a factor: 1724969327
M862484663 has a factor: 81555017960518513
found 2 factor(s) for M862484663 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M862486043 has a factor: 1724972087
M862486043 has a factor: 5708644866648421057
found 2 factor(s) for M862486043 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719908379 has a factor: 1439816759
M719908379 has a factor: 232789913230194889
found 2 factor(s) for M719908379 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719910071 has a factor: 1439820143
M719910071 has a factor: 2069092085162217007
found 2 factor(s) for M719910071 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719911859 has a factor: 1439823719
M719911859 has a factor: 4394341987337
found 2 factor(s) for M719911859 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719913983 has a factor: 1439827967
M719913983 has a factor: 40274894059742185439
found 2 factor(s) for M719913983 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719916143 has a factor: 1439832287
M719916143 has a factor: 173954498024165765407
found 2 factor(s) for M719916143 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719918123 has a factor: 1439836247
M719918123 has a factor: 14421511271742750433
found 2 factor(s) for M719918123 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719918183 has a factor: 1439836367
M719918183 has a factor: 113245578616121692073
found 2 factor(s) for M719918183 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719918723 has a factor: 1439837447
M719918723 has a factor: 48533437337878127
M719918723 has a factor: 17853201058829377
found 3 factor(s) for M719918723 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719920559 has a factor: 1439841119
M719920559 has a factor: 17887233584232404713
found 2 factor(s) for M719920559 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719923511 has a factor: 1439847023
M719923511 has a factor: 443649962529022924553
found 2 factor(s) for M719923511 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719923703 has a factor: 1439847407
M719923703 has a factor: 4730478123306174401
found 2 factor(s) for M719923703 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719928551 has a factor: 1439857103
M719928551 has a factor: 1170603823927
M719928551 has a factor: 684751402141039
found 3 factor(s) for M719928551 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719930999 has a factor: 1439861999
M719930999 has a factor: 235880898999359710631
found 2 factor(s) for M719930999 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719935823 has a factor: 1439871647
M719935823 has a factor: 21156990157931970833
found 2 factor(s) for M719935823 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719938343 has a factor: 1439876687
M719938343 has a factor: 715511442128893001
found 2 factor(s) for M719938343 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719940659 has a factor: 1439881319
M719940659 has a factor: 1276097697840137
found 2 factor(s) for M719940659 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719941571 has a factor: 1439883143
M719941571 has a factor: 1058879983444807
M719941571 has a factor: 1798414044359
M719941571 has a factor: 135867062928147456463
found 4 factor(s) for M719941571 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719941643 has a factor: 1439883287
M719941643 has a factor: 5910824560626593
found 2 factor(s) for M719941643 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24]

M719943923 has a factor: 1439887847
M719943923 has a factor: 84048328604913973369
M719943923 has a factor: 25820857442168282359
found 3 factor(s) for M719943923 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]

[QUOTE=Christenson;266975]finding two within a bit level is definitely rare, though not unknown[/QUOTE]I haven't looked through other ranges, but there are 705 exponents between M1,000,000 and M2,000,000 that have 2 (or more) factors of the same bitsize.

James Heinrich 2011-07-20 00:33

I found an interesting issue, possibly bug, in mfaktc:[code]
M862452323 has a factor: 1724904647
M862452323 has a factor: 103494278761
M862452323 has a factor: 26768470919200553
M862452323 has a factor: 178517762372762302367
M862452323 has a factor: 50544926613807686257
found 5 factor(s) for M862452323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]The problem?
1724904647 * 103494278761 = 178517762372762302367
So it really only found 4 factors, but [i]also[/i] reported a composite factor of the two smallest factors.

How did this happen?


edit: found another example:[code]
M999916103 has a factor: 1999832207
M999916103 has a factor: 113990435743
M999916103 has a factor: 12648938702951
M999916103 has a factor: 227961744688815374801
found 4 factor(s) for M999916103 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]
227961744688815374801 = 1999832207 * 113990435743


edit2: another example:[code]
M800000759 has a factor: 1600001519
M800000759 has a factor: 6400006073
M800000759 has a factor: 10240019438409224887
found 3 factor(s) for M800000759 from 2^33 to 2^69 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]
10240019438409224887 = 1600001519 * 6400006073

KingKurly 2011-07-20 00:45

If you're looking for Mersenne numbers with multiple factors in a given range, I have (at least) one.

M43363759 is divisible by 566645733218532866731538439737694884449681 (138.7 bits).

566645733218532866731538439737694884449681 = 692113095422672631017 (69.2 bits) * 818718410280162380393 (69.5 bits)

...If that's not what you're looking for, then pretend this post doesn't exist. :wink:

It just so happens that I found the composite factor with P-1, but obviously a TF from 69 to 70 would've found 'em both.

James Heinrich 2011-07-20 01:11

[QUOTE=KingKurly;266981]It just so happens that I found the composite factor with P-1, but obviously a TF from 69 to 70 would've found 'em both.[/QUOTE]I took a look through the top-10 P-1 factors on my site, which are all composite, and found two more that break into same-bit prime factors:[code]M24742121 : 555087078799770339769608793045705436436591887 = 20563047713721235292639 (74.1 bits) * 26994397257045406913233 (74.5 bits)
M2098193 : 60994061286916913266355388473561822515085761 = 6546629382726350366497 (72.4 bits) * 9316864866041320842913 (72.9 bits)[/code]

KingKurly 2011-07-20 01:49

[QUOTE=James Heinrich;266984]I took a look through the top-10 P-1 factors on my site, which are all composite, and found two more that break into same-bit prime factors:[/QUOTE]
Another one:

[code]For M1997293:
33579362377488332973816211929918617758917702673 (154.5 bits) = 156319993046900879655881 (77.0 bits) * 214811693136484951130633 (77.5 bits)
[/code]

apsen 2011-07-20 01:52

1 Attachment(s)
[QUOTE=James Heinrich;266974]Sounds like a plan. And then you should be able to quickly test it yourself, since I've given you a list of likely assignments that could probably find 2 factors in the same class.[/QUOTE]

Here's version without MORE_CLASSESS. So far I've tried only a couple of exponents on your list and haven't been lucky to have factors in the same class.

apsen 2011-07-20 02:33

[QUOTE=Christenson;266975]b) running them through a large, relatively slow expression in parallel. [/QUOTE]

That actually makes it harder to trigger. The atomics deal only with storing thread local data to shared memory. That's only four bytes for each factor, seven lines of c code:

[CODE]
index=RES[0]++;
if(index<10) /* limit to 10 factors per thread */
{
RES[index*3 + 1]=f.d2;
RES[index*3 + 2]=f.d1;
RES[index*3 + 3]=f.d0;
}
[/CODE]

It's very negligible time comparing with the rest of computation.


[QUOTE=Christenson;266975]
Now, I'm willing to temporarily introduce a "bug" in the program to determine whether a problem exists. What if the exponent had several factors [e.g. look up the exponent "god" was using, or look through the operation Billion Digits web page] and, in addition, the CPU sieving process just "got lucky" and included them all in the same class, preferably one right after another? This is the statistically unlikely construction we are searching for.
[/QUOTE]

That's quite a lot of work for the bug I believe I've fixed. :smile:
Besides I haven't looked into the sieving process so I do not know how to modify it.

[QUOTE=Christenson;266975]
However, I'd also like to point out that in my practical mfaktc work, actually finding a factor is relatively rare, and finding two within a bit level is definitely rare, though not unknown: [url]http://www.mersenneforum.org/showpost.php?p=265202&postcount=179[/url] Therefore, detecting the problem of a collision somewhat after not quite reporting correct results isn't an unreasonable course of action. The user will be able to re-run a more careful, if slower, TF program over a very narrow range quickly and get the correct results. Something very similar happens when factors are reported to the server and the server verifies them with much less effort than that required to find them in the first place.[/QUOTE]

It looks like the approach Bdot is taking with mfakto. BTW I wonder if he's still reading this thread? Maybe he'll borrow my solution? :smile:

James Heinrich 2011-07-20 02:33

I've run through the entire list again, and also played with upper bit range, NumStreams, CPUStreams, GridSize, and Stages; all to no effect -- I found plenty of factors in adjacent classes, but none on that list came both in the same class. Perhaps tomorrow I can generate another list of potentials, but maybe it would help if I knew more what I was looking for -- obviously simply having two factors of similar size isn't enough to put them in the same class with any regularity.

gjmccrac 2011-07-20 02:50

[QUOTE=James Heinrich;266989] but maybe it would help if I knew more what I was looking for -- obviously simply having two factors of similar size isn't enough to put them in the same class with any regularity.[/QUOTE]

To be in the same class they would have the same result when the mod 120 of the factors is calculates - this would give you 16 classes. It would be mod 840 if you are using 96 classes and mod 1155 if you are using 960 classes.

Christenson 2011-07-20 02:56

[QUOTE=James Heinrich;266989]I've run through the entire list again, and also played with upper bit range, NumStreams, CPUStreams, GridSize, and Stages; all to no effect -- I found plenty of factors in adjacent classes, but none on that list came both in the same class. Perhaps tomorrow I can generate another list of potentials, but maybe it would help if I knew more what I was looking for -- obviously simply having two factors of similar size isn't enough to put them in the same class with any regularity.[/QUOTE]


mfaktc TF sieving isn't terribly smart...it just sieves and creates long lists of reasonably "probable" primes of the right form and size, in order on the CPU....and then calculates the mersenne number modulo the "probable" primes on the GPU. I'm not at all surprised if it's not smart enough to eliminate previously found large cofactors.

If we are going to prove out Apsen's method, someone (I'm not available this week, have mfaktc-primenet to work on) will need to induce the problem in a way similar to what I suggested or just do it on some simpler, faster calculation. We need either bdot or Oliver for this.

If it were mine right now, I would want to make an array of atomically writeable storage with one element for each thread/potential trial factor, and write a value in each element when the TF expression was complete, depending on success or failure -- and then count the number of success expressions there to ensure it was the same as the factors reported. I'd want to do this on the GPU, as it would eat quite a bit of CPU bus bandwidth.

I do have this dream of moving sieving off the CPU in mfaktc....it would significantly improve my multi-core LL performance...so your effort to understand how the sieving works in enough detail to make it screw up creatively won't be wasted.

davieddy 2011-07-20 10:19

[QUOTE=Christenson;266975]
However, I'd also like to point out that in my practical mfaktc work, actually finding a factor is relatively rare, and finding two within a bit level is definitely rare, though not unknown: [URL]http://www.mersenneforum.org/showpost.php?p=265202&postcount=179[/URL] [/QUOTE]

Expected number of 70 bit factors is 1/70

Probability of 2 factors is e^(-1/70)/9800 (Poisson)

David

Chuck 2011-07-21 02:49

Selftest passed OK. Core i7-970 and GTX 580

[FONT=Fixedsys]Result[00]: M3321928619 has a factor: 38814612911305349835664385407[/FONT]
[FONT=Fixedsys]found 1 factor(s) for M3321928619 from 2^94 to 2^95 [mfaktc 0.17.1-Win !!!TEST!![/FONT]
[FONT=Fixedsys]! apsen 95bit_mul32][/FONT]
[FONT=Fixedsys]selftest for M3321928619 passed![/FONT]
[FONT=Fixedsys]tf(): total time spent: 0.212s[/FONT]
[FONT=Fixedsys]Selftest statistics[/FONT]
[FONT=Fixedsys]number of tests 4914[/FONT]
[FONT=Fixedsys]successfull tests 4914[/FONT]
[FONT=Fixedsys]selftest PASSED![/FONT]

apsen 2011-07-21 21:37

[QUOTE=Christenson;266992]
I do have this dream of moving sieving off the CPU in mfaktc....it would significantly improve my multi-core LL performance...so your effort to understand how the sieving works in enough detail to make it screw up creatively won't be wasted.[/QUOTE]

:smile: I'll see how much time I could spend on it.

apsen 2011-07-21 22:04

[QUOTE=apsen;266988]
[QUOTE=Christenson;266975]
b) running them through a large, relatively slow expression in parallel.
[/QUOTE]

That actually makes it harder to trigger. The atomics deal only with storing thread local data to shared memory. That's only four bytes for each factor, seven lines of c code:

[CODE]
index=RES[0]++;
if(index<10) /* limit to 10 factors per thread */
{
RES[index*3 + 1]=f.d2;
RES[index*3 + 2]=f.d1;
RES[index*3 + 3]=f.d0;
}
[/CODE]

It's very negligible time comparing with the rest of computation.
[/QUOTE]

! But we could

[CODE]
[COLOR="Red"]__syncthreads();[/COLOR]
index=RES[0]++;
if(index<10) /* limit to 10 factors per thread */
{
RES[index*3 + 1]=f.d2;
RES[index*3 + 2]=f.d1;
RES[index*3 + 3]=f.d0;
}
[/CODE]

to try threads go for the shared memory at the same time once we figure the way to heard the factors right.

apsen 2011-07-22 01:42

Oliver, did you get my message?

TheJudger 2011-07-22 11:08

[QUOTE=apsen;267179]
[CODE]
[COLOR="Red"]__syncthreads();[/COLOR]
index=RES[0]++;
if(index<10) /* limit to 10 factors per thread */
{
RES[index*3 + 1]=f.d2;
RES[index*3 + 2]=f.d1;
RES[index*3 + 3]=f.d0;
}
[/CODE]
[/QUOTE]
The chance for a lost update is higher than without the update... (I did some expericments in the past). There is another issue in this variant: not all threads execute the __syncthreads() which has AFAIK undefined behavior.

[QUOTE=apsen;267203]Oliver, did you get my message?[/QUOTE]
Yepp, had YABT (Yet Another Business Trip).

Oliver

TheJudger 2011-07-22 11:10

Hi James,

[QUOTE=James Heinrich;266979]I found an interesting issue, possibly bug, in mfaktc:[code]
M862452323 has a factor: 1724904647
M862452323 has a factor: 103494278761
M862452323 has a factor: 26768470919200553
M862452323 has a factor: 178517762372762302367
M862452323 has a factor: 50544926613807686257
found 5 factor(s) for M862452323 from 2^ 1 to 2^70 [mfaktc 0.18-pre1-Win 71bit_mul24][/code]The problem?
1724904647 * 103494278761 = 178517762372762302367
So it really only found 4 factors, but [i]also[/i] reported a composite factor of the two smallest factors.

How did this happen?[/QUOTE]

This is known (at least to me).
And technically it is correct. 178517762372762302367 is a factor of M862452323. It is a composite factor, not a prime factor.

Oliver

Christenson 2011-07-22 11:49

That multiple factor thing might have been avoided with an absurdly high setting for SievePrimes (and code updates to make sure that setting actually worked), wouldn't it?

Not that we wouldn't all grow old waiting for the CPU to do the sieving at the same time!

James Heinrich 2011-07-22 12:04

[QUOTE=TheJudger;267248]This is known (at least to me).
And technically it is correct. 178517762372762302367 is a factor of M862452323. It is a composite factor, not a prime factor.[/QUOTE]Of course. You could claim that all these are factors, and it would be true, although not neccesarily useful :smile:[code]M862452323 has a factor: 1724904647
M862452323 has a factor: 103494278761
M862452323 has a factor: 26768470919200553
M862452323 has a factor: 50544926613807686257
M862452323 has a factor: 178517762372762302367
M862452323 has a factor: 46173059881613395394669791
M862452323 has a factor: 2770383591317463939447354833
M862452323 has a factor: 87185178798430852389017336279
M862452323 has a factor: 5231110724923700473126386687577
M862452323 has a factor: 1353010398174837130096359433084900121
M862452323 has a factor: 4778647530636042401407668963299608951
M862452323 has a factor: 9023167198392429666531803015720504470319
M862452323 has a factor: 2333813923251096884171353943910429764243762287
M862452323 has a factor: 140028835315238199557965845958940579302616630081[/code]Would it be worth a few extra lines of code to try and factor the found factor to break it down into prime factors on the chance that it happens to be a composite factor? I know PrimeNet will break it down when submitted, but I still think I'd rather know that I found 2 small factors rather than one larger one.

wblipp 2011-07-22 12:34

[QUOTE=James Heinrich;267255]Would it be worth a few extra lines of code to try and factor the found factor to break it down into prime factors on the chance that it happens to be a composite factor? I know PrimeNet will break it down when submitted, but I still think I'd rather know that I found 2 small factors rather than one larger one.[/QUOTE]

Use alpertron or yafu; don't mess with the single minded pursuit of speed by distracting the programmers.

apsen 2011-07-22 14:31

[QUOTE=TheJudger;267246]The chance for a lost update is higher than without the update...
[/QUOTE]

That was related to the talks about how to increase the chance of memory access related problems in special test version if we want to test for the presence of said problems.

[QUOTE=TheJudger;267246]There is another issue in this variant: not all threads execute the __syncthreads() which has AFAIK undefined behavior.
[/QUOTE]

That was just to illustrate the idea but good point. Need to keep that in mind.

Bdot 2011-07-25 23:49

[QUOTE=apsen;266988]
It looks like the approach Bdot is taking with mfakto. BTW I wonder if he's still reading this thread? Maybe he'll borrow my solution? :smile:[/QUOTE]

I do follow the thread, just did not have a lot of time recently ...

What problem is it that you want to see in mfaktc/o? Just what happens when multiple threads write to the same buffer? For this test you don't need mfaktc - a small kernel like this
[code]
f=get_global_id(0);
index=RES[0]++;
if(index<10) /* limit to 10 factors */
{
RES[index*3 + 1]=f;
RES[index*3 + 2]=f;
RES[index*3 + 3]=f;
}
[/code]will show. Now run that kernel in one, two, five, twenty ... threads and check the RES buffer. No need to actually test any factor ...

For the problem to occur in real life, the factors need to be in the same class (we talked about that sufficiently). And they also need to be close together (in the same grid, not just the same bitlevel). mfaktc/o tests up to 2^20 resp. 2^21 factor candidates in one run. Someone with better mathematical background than I have could maybe calculate the chances:
[LIST][*]we're typically searching factors above 2^64, more often above 2^68[*]the two factors must be in the same class out of 4620 classes[*]they must not be more than 2^20 (2^21) of our "probably prime" factor candidates apart. (At SievePrimes=200.000, grid size 2^21, "last k" - "first k" of a grid is ~ 5.3E10)[/LIST]@apsen: I don't think that copying 32 bytes per FC from the GPU to the CPU after a grid was tested is a good idea. Currently mfaktc copies 4 byte per FC to the GPU and 32 bytes back in TOTAL. Your approach (as of[URL="http://mersenneforum.org/showpost.php?p=266751&postcount=1079"] this post[/URL]) increases the required bandwidth about 9 times.

Probably the comment " /* limit to 10 factors per thread */" was misleading :smile: : A single thread will only test one candidate, and as such will have at most one result. So the code could be changed to just double the required bandwitdh (4 bytes in, 4 bytes out per FC). Still I'm afraid that would be a major performance hit, at least for OpenCL where there is no transfer-hiding support yet.

I'll see that I can test the above mini-kernel and verify that for all possible thread numbers at least RES[0] is non-zero. Then we're (almost) certain that we always have some indication that there is at least one factor in the unlikely event above.


Unfortunately my Windows-PC has lost its PSU. The good thing about that: Now there is a running version of mfakto for Linux :smile:.

Bdot 2011-07-26 21:32

I tried it. And I have good and bad news :smile:

Good: RES[0] was always at least 1, so we'll always have notification that a factor was found.
Also good: The "factors" that appeared in the buffer were not scrambled - they either appeared correctly or not at all - except for the last factor that was incomplete a few times. So we'll always have at least one valid factor.

Even when increasing RES[0] in more than 1000 threads, the number of factors reported was always between 1 and 10. When less than 10 threads wrote their "factors", the result was consistently one factor found, the first factor was the only one reported and it was complete. Therefore we can assume that without atomics we will simply miss the second factor and never know there is a "twin".

Now the bad news: atomics don't seem to work reliably in OpenCL. I will seek help with AMD. When using atomic_inc, the reported number of factors was around 80% of the expected. And worst: when only two or three "factors" are written to the result buffer, only the first one arrives - just as without atomics.

Christenson 2011-07-27 03:38

Bdot:
From a pragmatics standpoint, where factors are found are reasonably rare (most bit levels don't find any), can we accept that the presence of a factor is a sign to re-run the bit level or class more carefully, possibly on the host CPU instead, since it doesn't happen often?

Not that I don't think atomics are worthwhile....I am just thinking that this may be the least price to pay in the absence of atomics.

Bdot 2011-07-27 07:25

[QUOTE=Christenson;267653]Bdot:
From a pragmatics standpoint, where factors are found are reasonably rare (most bit levels don't find any), can we accept that the presence of a factor is a sign to re-run the bit level or class more carefully, possibly on the host CPU instead, since it doesn't happen often?

Not that I don't think atomics are worthwhile....I am just thinking that this may be the least price to pay in the absence of atomics.[/QUOTE]

Well, I don't think people agree to re-run their tests on the CPU whenever a factor was found ... For the course of GIMPS (finding the next mersenne prime) this has no effect - we have one factor, so the mersenne number is not prime. Of course it would be better to have both factors.

I currently think the "problem" can be ignored. We will use atomics when available, and I will see that the OpenCL-atomics get fixed. Without (proper) atomics, we'll exclude the exponent from further prime-hunting with just one of two factors found. As most mersenne numbers have many more factors anyway, that should not be a big deal - am I wrong here?

Christenson 2011-07-27 12:41

I always seem to be TF'ing the mersenne numbers with one or no factors in a given bitlevel.

As for re-running a little bit of TF on the CPU, don't forget that a whole lot of sieving goes on on my CPU to support mfaktc at the moment. Therefore, as long as the process is automatic, I think it will go largely unnoticed. Particularly if only one class is re-run, it won't take long, just a few minutes out of the many days mfaktc/mfakto runs.

James Heinrich 2011-07-27 13:28

While it's (almost?) guaranteed that there will be another factor, the chance of it being in even the same bitlevel is low; in the same class very low (I haven't been able to find an example). If one factor is found and reported that is sufficient for GIMPS, and not much different from running mfaktc with [i]StopAfterFactor=2[/i] -- there might be another factor in the bitlevel that won't be found.

Decades from now, someone will undoubtedly run through the list of exponents that have only a small factor and use their 100-billion-exaflop GPU to TF them all to 2^100 in a short time, so I wouldn't worry too much about a "missed" factor, since we know almost for certain that there are still unfound factors.

My math skills are admittedly atrocious, but isn't the chance of twin factors in the same class something like (1/70)^2 * (1/4620)^2 = about 1:100,000,000,000 ?

ckdo 2011-07-27 16:14

[QUOTE=James Heinrich;267686]My math skills are admittedly atrocious, but isn't the chance of twin factors in the same class something like (1/70)^2 * (1/4620)^2 = about 1:100,000,000,000 ?[/QUOTE]

Quite obviously, if there are twin factors, the chance of them being in the same class is 1 in 4620.

Furthermore, you don't always search single bit levels, though I have to admit that the multi-bitlevel case is a rare one unless you really aim for it.

James Heinrich 2011-07-27 16:43

[QUOTE=ckdo;267692]Quite obviously, if there are twin factors, the chance of them being in the same class is 1 in 4620.[/QUOTE]Again, I admit I don't know much here, but isn't that kind of like saying "if there are twin factors in the same class, the chance of them being in the same class is 1 in 1"?

I figured that there's already a limited chance of there being multiple factors per bit-level, around (1 / <bitlevel>)^<factors in bitlevel> --- am I mistaken on this? That for 2^70 TF, you'd have about 1:70 chance of 1 factor, 1:4900 chance of 2 factors, 1:343000 chance of 3 factors, etc? :unsure:

ixfd64 2011-07-27 22:41

Would it be hard to extend mfaktc to larger exponents? It could be very useful for people who wish to search for factors of double Mersenne numbers.

firejuggler 2011-07-27 22:50

that question has been answered a few page earlier : too much rewrite.

Christenson 2011-07-27 23:49

What's the form of a double-mersenne number?

firejuggler 2011-07-27 23:54

MM2,MM3,MM7, MM127 etc

science_man_88 2011-07-28 01:30

[QUOTE=firejuggler;267720]MM2,MM3,MM7, MM127 etc[/QUOTE]


three of those examples are also triple mersenne numbers as well.

Christenson 2011-07-28 01:46

meaning 2^(2^2-1)-1, 2^(2^3-1)-1, 2^(2^7-1) -1 .... 2^(2^127-1)-1, right....with that last number having 10^40 or so digits....just a wee bit bigger than the 10^9 digits some of us play with right now....
with 127 bits as the minimum factor length, too...

Yes, I think you'd destroy mfaktc if you tried to make it do more than the first few double mersennes....but it wouldn't be a bad place to start if you wanted to do that job....I'm simply not up to it right now. I could see building a separate version for each double-mersenne exponent.

davieddy 2011-07-28 09:37

[QUOTE=James Heinrich;267694]
I figured that there's already a limited chance of there being multiple factors per bit-level, around (1 / <bitlevel>)^<factors in bitlevel> --- am I mistaken on this? That for 2^70 TF, you'd have about 1:70 chance of 1 factor, 1:4900 chance of 2 factors, 1:343000 chance of 3 factors, etc? :unsure:[/QUOTE]

Between 2[SUP]69[/SUP] and 2[SUP]70[/SUP]

Expected number of factors E = ln(70/69) ~ 1/69.50
[CODE]
[U]factors[/U] [U]probability[/U]

0 69/70
1 69/70 * E
2 69/70 * E[SUP]2[/SUP]/2!
3 69/70 * E[SUP]3[/SUP]/3!
4 69/70 * E[SUP]4[/SUP]/4!
........
[/CODE]

Note that the sum of the probabilities is 69/70 * e[SUP]E[/SUP] = 1

David

TheJudger 2011-07-29 20:38

[QUOTE=ixfd64;267716]Would it be hard to extend mfaktc to larger exponents? It could be very useful for people who wish to search for factors of double Mersenne numbers.[/QUOTE]

Not really "hard" but alot of work. The exponent is represented in a single 32bit unsigned integer in mfaktc. Main task in the host code: extend the per class sieve initializations for bigger exponents. This part is not really performance critical as long as the job is "big enough" (runtime per class > than a few seconds). Simple approach: use libgmp. The GPU code "just needs bigger numbers", too. The bad news are that you have to rewrite the complete set of functions (add, sub, mul, div, mod, ...) for bigger numbers.

[QUOTE=James Heinrich;267255]Would it be worth a few extra lines of code to try and factor the found factor to break it down into prime factors on the chance that it happens to be a composite factor?[/QUOTE]

Simple approach: use libgmp on the CPU to check if factors are composite or prime. Cons:[LIST][*]mfaktc depends on (another) external library[*]makes it even harder (imposible?) to release a closed source version of mfaktc (with automated primenet support)[/LIST]
Currently I say: no fix planned.

Oliver

xilman 2011-07-30 09:00

[QUOTE=TheJudger;267892][LIST][*]mfaktc depends on (another) external library[*]makes it even harder (imposible?) to release a closed source version of mfaktc (with automated primenet support)[/LIST][/QUOTE]Certainly not impossible; possibly harder.

GMP is licensed under the LGPL, meaning that you don't have to release your source code. What you must do is release the compiled version of your code in such a way that it can be linked with GMP by those to whom you distribute it. So, for instance, you could ship a library containing all your secret material, a copy of the GMP library and a trivial C program source which reads something like:
[code]
/* my_source.c */

main (int argc, char **argv)
{
my_main (argc, argv);
}
[/CODE]
and a Makefile which reads something like:
[code]
all:
gcc -o mfaktc source.c mfaktc.a -lgmp
[/code]

Paul

LaurV 2011-08-01 11:27

[QUOTE=science_man_88]
[QUOTE= firejuggler]
[I]MM2,MM3,MM7, MM127 etc[/I]
[/QUOTE]
three of those examples are also triple mersenne numbers as well.
[/QUOTE]

That's the Catalan sequence. MM5, MM13, MM17, MM19, MM31, etc, are also DMN, in the strict way (mersenne with prime exponents). MM9, MM11, MM15, MM23, etc, are also DMN, but they can't be prime, as their exponents are not prime. As the OP is asking about "finding factors" of such numbers, I assume he is not talking about "prime" DMN.

kjaget 2011-08-01 19:50

[QUOTE=xilman;267937]Certainly not impossible; possibly harder.

GMP is licensed under the LGPL, meaning that you don't have to release your source code. What you must do is release the compiled version of your code in such a way that it can be linked with GMP by those to whom you distribute it.[/QUOTE]

I think it's even simpler - just dynamically link to libgmp.so and ship a mfaktc binary. The binary isn't under any restrictions at all at that point.

I'd worry more about Windows compatibility and cost/effort considerations.

Dubslow 2011-08-11 03:48

I'm trying to compile the source in Ubuntu. When I run make in the src folder, I get an error saying that it doesn't support gcc versions > 4.5 , and I have 4.5.2 . Is there a way to get around this? Do I have old code?

TheJudger 2011-08-11 08:17

Hi!

This is not a matter of the mfaktc version.
Nvidia doesn't support gcc >= 4.5.0 in their toolkit. You could try to install an older gcc or try to fool the nvcc... :sad:

Oliver

xilman 2011-08-11 08:39

[QUOTE=Dubslow;268875]I'm trying to compile the source in Ubuntu. When I run make in the src folder, I get an error saying that it doesn't support gcc versions > 4.5 , and I have 4.5.2 . Is there a way to get around this? Do I have old code?[/QUOTE]I've just started looking into exactly the same problem but to build under Fedora 15.

There is a lot of chat on the CUDA and Linux fora about this issue. The summary seems to be that the dependency of gcc is for some debugging structures generated by old gcc and understood by Nvidia's tools. Somewhere in the code (perhaps scripts or Makefiles) there is a #error statement (or statements) which can be safely removed and everything then builds.

Sorry not to be more precise but the material above should give you enough to start looking for the details. Alternatively, you can wait until I've managed to get CUDA reinstalled on my system after its upgrade.


Paul

Ralf Recker 2011-08-11 09:27

It's possible to download, compile and install gcc 4.4.6 quite easily. You might need to install a few additional libraries (MPFR, etc) to compile the compiler.

Download and unpack the gcc 4.4.6 source archive into a directory. Create a second directory gcc-4.4.6-bin. cd into it.

../gcc-4.4.6/configure --enable-languages=c,c++ --prefix=/opt/gcc-4.4.6

If the configure script complains about missing libraries install them with "sudo apt-get install libXYZ-dev".

make
sudo make install

Add the following option to the list of options for nvcc:

Either --compiler-bindir /opt/gcc-4.4.6/bin or -ccbin /opt/gcc-4.4.6/bin

Note: Replace /opt/gcc-4.4.6 with a target directory of your choice. I usually use /opt/gcc-x.y.z for this purpose. This makes it quite easy to "uninstall" the compiler via rm -rf /opt/gcc-x.y.z

xilman 2011-08-11 11:03

[QUOTE=xilman;268883]I've just started looking into exactly the same problem but to build under Fedora 15.

There is a lot of chat on the CUDA and Linux fora about this issue. The summary seems to be that the dependency of gcc is for some debugging structures generated by old gcc and understood by Nvidia's tools. Somewhere in the code (perhaps scripts or Makefiles) there is a #error statement (or statements) which can be safely removed and everything then builds.

Sorry not to be more precise but the material above should give you enough to start looking for the details. Alternatively, you can wait until I've managed to get CUDA reinstalled on my system after its upgrade.[/QUOTE]Miserable failure here :sad:

I can't even install the device driver because the installer checks for elderly versions of the kernel and various crude hacks failed to work.

I've posted to the Cuda-on-Linux forum at Nvidia and hope someone there will be able to help.

Paul

Christenson 2011-08-11 11:18

This is a little bit wierd to me. I had no difficulty with installing the whole nine yards earlier this year under xubuntu 10.10, 64-bit version.

Ralf Recker 2011-08-11 12:45

[QUOTE=xilman;268894]Miserable failure here :sad:

I can't even install the device driver because the installer checks for elderly versions of the kernel and various crude hacks failed to work.

I've posted to the Cuda-on-Linux forum at Nvidia and hope someone there will be able to help.

Paul[/QUOTE]
Have you tried to install the 275.21 or the 280.13 drivers? The kernel version check has been modified after the release of 270.41.19.

xilman 2011-08-11 15:08

[QUOTE=Ralf Recker;268901]Have you tried to install the 275.21 or the 280.13 drivers? The kernel version check has been modified after the release of 270.41.19.[/QUOTE]No, but thanks for the tip. I'll give it a try.

The standard download for V4 appears to be the 270.40.xx release and I hadn't come across the 280.13 version.

Paul

Dubslow 2011-08-11 16:55

[QUOTE=Christenson;268895]This is a little bit wierd to me. I had no difficulty with installing the whole nine yards earlier this year under xubuntu 10.10, 64-bit version.[/QUOTE]

I bet because 10.10 had an older version of gcc. I'm using 11.04.

xilman 2011-08-14 10:15

[QUOTE=xilman;268902]No, but thanks for the tip. I'll give it a try.

The standard download for V4 appears to be the 270.40.xx release and I hadn't come across the 280.13 version.

Paul[/QUOTE]AFAICT, 280 is not yet available for 64-bit Linux. About to give the 275 beta a try.

Paul

moebius 2011-08-14 10:43

[QUOTE=xilman;269059]AFAICT, 280 is not yet available for 64-bit Linux. About to give the 275 beta a try.

Paul[/QUOTE]

[B][SIZE=1]Linux x64 (AMD64/EM64T) Display Driver[/SIZE][/B]
[SIZE=1]
[/SIZE][B][SIZE=1]Version:[/SIZE][/B]

[B][SIZE=1]280.13 Certified[/SIZE][/B]

[B][SIZE=1]Release Date:[/SIZE][/B]

[B][SIZE=1]2011.08.01[/SIZE][/B]

[B][SIZE=1]Operating System:[/SIZE][/B]

[B][SIZE=1]Linux 64-bit[/SIZE][/B]

[B][SIZE=1]Language:[/SIZE][/B]

[B][SIZE=1]English (U.S.)[/SIZE][/B]

[B][SIZE=1]File Size:[/SIZE][/B]

[B][SIZE=1]52.4 MB [/SIZE][/B]

[URL]http://www.nvidia.com/object/linux-display-amd64-280.13-driver.html[/URL]

Deutsche Version

[B][SIZE=1]Linux x64 (AMD64/EM64T) Display Driver[/SIZE][/B]
[B][SIZE=1]
Version:[/SIZE][/B]

[B][SIZE=1]280.13 Certified[/SIZE][/B]

[B][SIZE=1]Freigabedatum:[/SIZE][/B]

[B][SIZE=1]2011.08.01[/SIZE][/B]

[B][SIZE=1]Betriebssystem:[/SIZE][/B]

[B][SIZE=1]Linux 64-bit[/SIZE][/B]

[B][SIZE=1]Sprache:[/SIZE][/B]

[B][SIZE=1]Deutsch[/SIZE][/B]

[B][SIZE=1]Dateigröße:[/SIZE][/B]

[B][SIZE=1]52.4 MB [/SIZE] [/B]

[URL]http://www.nvidia.de/object/linux-display-amd64-280.13-driver-de.html[/URL]

ckdo 2011-08-14 12:57

[QUOTE=moebius;269062][B][SIZE=1]Linux x64 (AMD64/EM64T) Display Driver[/SIZE][/B]
[SIZE=1]
[/SIZE][B][SIZE=1]Version:[/SIZE][/B]

[B][SIZE=1]280.13 Certified[/SIZE][/B][/QUOTE]

LHQL to the rescue! :big grin:

ET_ 2011-08-14 14:24

[QUOTE=ckdo;269065]LHQL to the rescue! :big grin:[/QUOTE]

Does it work with CUDA 3.X (especially 3.0)?

Having a GTX275 I can't access to 2.0 capabilities delivered by CUDA 4.0.

Luigi


All times are UTC. The time now is 13:00.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.