![]() |
I use instead 0.13 x32 the x64 and can´t find any difference because I only sieve on GPU. Performance with CPU-sieve = I don´t know.
|
[QUOTE=kracker;344685]Why not?[/QUOTE]
I was asking in case the new changes might lead to a problem/conflict of one sort or another. But then I thought, what the heck, let's see what happens -- and the result shows up two posts above this one. I decided not to try the one remaining idea, setting SieveOnGPU=0, as if possible I'd like to do the sieving on the GPU so as to leave Prime95 unaffected. Unless the expected throughput gain for mmfakto from that setting might outweigh the loss to Prime95... Rodrigo |
[QUOTE=NormanRKN;344687]I use instead 0.13 x32 the x64 and can´t find any difference because I only sieve on GPU. Performance with CPU-sieve = I don´t know.[/QUOTE]
I might try that. My earlier results with x32, though, were better than with x64. Rodrigo |
[QUOTE=Jayder;343478]Will the "best fit" GPUSievePrimes value remain constant? Or will it change as you change bit level, exponent, or possibly kernel? For example, if a GPUSievePrimes value of 52000 works best on a 332M exponent going from 2^69 to 2^70, will 52000 also be the best for a 65M exponent and 2^73 to 2^74? What about an 8M exponent and 2^60 to 2^61? Etc.
Are there any good strategies for finding the best value? Other than intelligent trial and error. I searched for an answer in the mfaktc thread, but it is kind of a massive thread. I also considered experimenting and finding out for myself, but it would take a very long time to find out on my slow GPU. Hopefully I am not missing an answer that is staring me in the face.[/QUOTE] [QUOTE=Bdot;343640]I have not tested that yet, but here's my theory: Sieving has a constant speed for different kernels, factor sizes etc. Its speed depends only on the GPUSieve* parameters. The optimal selection of GPUSievePrimes depends on finding the best relation of the kernel run times of the sieve kernel versus the trial factoring kernel. Better (i. e. longer) sieving means less (i.e. shorter) trial factoring, in a non-linear dependency. Therefore, anything that changes the speed of trial factoring will also change the optimal GPUSievePrimes. If testing takes longer (because a slower kernel needs to be used, or because the exponent has more bits), then GPUSievePrimes can be a bit higher. However, the differences should be rather small, and so are the achievable improvements. I'll test that and add more details to my theoretical explanation soon. I'm also working on an automatic optimizer for those and other variables, so that no more manual change-and-test cycle will be needed.[/QUOTE] I finally tested a bit on a system with an i7 2600 @ 3.4GHz + HD7850 @ 975 MHz, Win7, 2/4 cores busy with prime95, Catalyst 13.4 (yes, the one with one core for mfakto even though GPU sieving). [LIST][*]On this system, 32 or 64 bit makes almost no difference. If any, then 64 bits tends to be ~0.05% faster.[*]GPUSievePrimes peaked at 110k for both 333M and 63M exponents, going at least 73 -> 74 bits, up to 80 -> 81 bits[*]Going 72 -> 73 bits, peak GPUSievePrimes is at 100k[*]The performance differences between GPUSievePrimes 100k and 110k (as an example) were consistent, but small (~0.2%).[*]Stopping prime95 adds ~1% performance to mfakto.[/LIST]There are still a lot more tests to be done, but a few conclusions: [LIST][*]being a bit off the optimum for GPUSievePrimes has very little effect[*]the optimum does not measurably depend on the exponent size nor the factor size, as long as the same kernel is used.[*]the optimum does depend on the selected kernel: "smaller" kernels are faster and should be used with slightly lower GPUSievePrimes[/LIST]I'm not sure to which extent these results apply to other systems - I think this should be similar on all systems ... however, LaurV reported much lower GPUSievePrimes be optimal, so there may be differences. LaurV, could you do me a favor and run the following performance test for me, if you have some time: [LIST][*]switch to CPU sieving: SieveOnGPU=0, SievePrimes=2000, SievePrimesMin=2000[*]download the [URL="http://mersenneforum.org/mfakto/mfakto-0.13/specialVersions_x64.zip"]mfakto special versions[/URL] and put them in the mfakto directory[*]On an otherwise idle machine (yes, please stop prime95 et al.), run mfakto-pi -st > st-pi.log[*]Send me (or post) st-pi.log[/LIST]The "big" GCN cards have a different SP:DP ratio. mfakto is not using DP, but 32-bit integer math uses the same H/W units. Maybe this is the reason why 79xx is a bit different again ... |
I've been trying to figure out if it is possible to solve problems such as [URL="http://en.wikipedia.org/wiki/RSA_Factoring_Challenge"]RSA Factoring[/URL] using OpenCL using redily availabel software. Would I be able to use this software in its current form to factor large semiprimes? I have a system with three HD 7970s I would like to attempt to run a small RSA number on. Is there another program that will solve these problems with OpenCL? All of the factoring projects I can find seem to be CPU based.
|
BUGWARNING
[QUOTE=Rodrigo;344682]Lowered GPUSieveSize from the default 64 to 48, then tested the same exponent.
[/QUOTE] this made me think if non-power-of-two values for GPUSieveSize are allowed at all. Yes they are. But thinking a bit more about the dependencies, I think I found a bug in mfakto and mfaktc: When using GPUSieveProcessSize=24 and GPUSieveSize that is not divisible by 3, then some FCs may go untested. (GPUSieveSize * 1024) must be divisible by GPUSieveProcessSize. Worst case would be GPUSieveProcessSize=24 and GPUSieveSize=4, in which case about 1 in 256 FCs would go untested. Typical settings of GPUSieveProcessSize=24 and GPUSieveSize=64 leaves 1 in 4096 FCs untested, GPUSieveProcessSize=24 and GPUSieveSize=128 about 1 in 16384. Unfortunately this is something that the selftest cannot cover without increasing the selftest runtime by a factor of 10 at least. How many of you use GPU sieving with GPUSieveProcessSize=24 ? What do we want to do with those tests? |
Right now, I have GPUSieveSize=64 and GPUSieveProcessSize=24.
|
GPUSieveProcessSize=24, yes I use that :alex:
|
[QUOTE=NormanRKN;345005]GPUSieveProcessSize=24, yes I use that :alex:[/QUOTE]
Then please switch to GPUSieveSize=63 or GPUSieveSize=126 (or whatever multiple of 3 you like). |
[QUOTE=Bdot;344964]LaurV, could you do me a favor and run the following performance test for me, if you have some time:
[LIST][*]switch to CPU sieving: SieveOnGPU=0, SievePrimes=2000, SievePrimesMin=2000[*]download the [URL="http://mersenneforum.org/mfakto/mfakto-0.13/specialVersions_x64.zip"]mfakto special versions[/URL] and put them in the mfakto directory[*]On an otherwise idle machine (yes, please stop prime95 et al.), run mfakto-pi -st > st-pi.log[*]Send me (or post) st-pi.log[/LIST][/QUOTE] Sure, this is a very well formulated request, I understood exactly what I am requested to do, hehe. I will do it and post the results immediately when I reach home tonight (about 7 hours from now). I feel already in debt to the community and to you. You (and few others here) are doing a wonderful job maintaining all these tools. |
[QUOTE=zink;344967]I've been trying to figure out if it is possible to solve problems such as [URL="http://en.wikipedia.org/wiki/RSA_Factoring_Challenge"]RSA Factoring[/URL] using OpenCL using redily availabel software. Would I be able to use this software in its current form to factor large semiprimes? I have a system with three HD 7970s I would like to attempt to run a small RSA number on. Is there another program that will solve these problems with OpenCL? All of the factoring projects I can find seem to be CPU based.[/QUOTE]
mfakto is not usable as a general factorization tool. It depends on special properties of mersenne numbers. It could be modified to allow for other special sieving tasks (like Wagstaff, Fermat, ...), but even that has not been done yet. |
| All times are UTC. The time now is 23:11. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.