mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2012-01-30, 00:30   #364
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2·5·29 Posts
Default

Quote:
Originally Posted by kracker View Post
+1 Here also.
Yes, it works fantastic on Win 7. But my Linux box bombed when I installed it. I had to roll back.
KyleAskine is offline   Reply With Quote
Old 2012-01-30, 14:57   #365
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

Is there a... dunno the right 'word'... a changeover in mfakto around 29.504-29.505M? I was just noticing that my GPU is taking now 51-55 minutes to complete 29.505M exp's when it had been taking 43 minutes to do 29.503M ones.

Code:
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4617/4620 |    271.32M |  2.718s |  99.82M/s |        5000 |  0m00s |    6169us
no factor for M29504119 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 43m 40.808s
got assignment: exp=29504159 bit_min=68 bit_max=69
tf(29504159, 68, 69, ...);
 k_min = 5001801697200 -  k_max = 10003603396367
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4612/4620 |    271.32M |  2.727s |  99.49M/s |        5000 |  0m00s |    6275us
no factor for M29504159 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 43m 41.343s
got assignment: exp=29504177 bit_min=68 bit_max=69
tf(29504177, 68, 69, ...);
 k_min = 5001798643380 -  k_max = 10003597293337
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4612/4620 |    271.32M |  2.727s |  99.49M/s |        5000 |  0m00s |    6264us
no factor for M29504177 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 43m 41.141s
got assignment: exp=29504227 bit_min=68 bit_max=69
tf(29504227, 68, 69, ...);
 k_min = 5001790165680 -  k_max = 10003580340517
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4617/4620 |    271.32M |  2.715s |  99.93M/s |        5000 |  0m00s |    6165us
no factor for M29504227 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 43m 41.050s
got assignment: exp=29504269 bit_min=68 bit_max=69
tf(29504269, 68, 69, ...);
 k_min = 5001783046260 -  k_max = 10003566100192
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4616/4620 |    271.32M |  2.727s |  99.49M/s |        5000 |  0m00s |    6235us
no factor for M29504269 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 43m 41.435s
got assignment: exp=29504351 bit_min=68 bit_max=69
tf(29504351, 68, 69, ...);
 k_min = 5001769144680 -  k_max = 10003538297770
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4609/4620 |    271.32M |  2.820s |  96.21M/s |        5000 |  0m00s |    6629us
no factor for M29504351 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 48m 29.711s
got assignment: exp=29504383 bit_min=68 bit_max=69
tf(29504383, 68, 69, ...);
 k_min = 5001763720800 -  k_max = 10003527448086
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4613/4620 |    271.32M |  2.787s |  97.35M/s |        5000 |  0m00s |    6448us
no factor for M29504383 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 52m 37.465s
got assignment: exp=29504399 bit_min=68 bit_max=69
tf(29504399, 68, 69, ...);
 k_min = 5001761008860 -  k_max = 10003522023253
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4617/4620 |    271.32M |  2.789s |  97.28M/s |        5000 |  0m00s |    6405us
no factor for M29504399 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 52m 11.283s
got assignment: exp=29504443 bit_min=68 bit_max=69
tf(29504443, 68, 69, ...);
 k_min = 5001753552180 -  k_max = 10003507104992
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4617/4620 |    271.32M |  2.832s |  95.80M/s |        5000 |  0m00s |    6659us
no factor for M29504443 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 56m 20.353s
got assignment: exp=29504507 bit_min=68 bit_max=69
tf(29504507, 68, 69, ...);
 k_min = 5001742699800 -  k_max = 10003485405784
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4617/4620 |    271.32M |  2.824s |  96.08M/s |        5000 |  0m00s |    6597us
no factor for M29504507 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 55m 39.322s
got assignment: exp=29504509 bit_min=68 bit_max=69
tf(29504509, 68, 69, ...);
 k_min = 5001742362540 -  k_max = 10003484727685
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4619/4620 |    271.32M |  2.823s |  96.11M/s |        5000 |  0m00s |    6633us
no factor for M29504509 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 51m 11.275s
got assignment: exp=29504569 bit_min=68 bit_max=69
tf(29504569, 68, 69, ...);
 k_min = 5001732189300 -  k_max = 10003464384765
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4616/4620 |    271.32M |  2.721s |  99.71M/s |        5000 |  0m00s |    6085us
no factor for M29504569 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 52m 12.215s
got assignment: exp=29504669 bit_min=68 bit_max=69
tf(29504669, 68, 69, ...);
 k_min = 5001715238520 -  k_max = 10003430480082
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4612/4620 |    271.32M |  2.818s |  96.28M/s |        5000 |  0m00s |    6580us
no factor for M29504669 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 54m 25.430s
got assignment: exp=29504677 bit_min=68 bit_max=69
tf(29504677, 68, 69, ...);
 k_min = 5001713880240 -  k_max = 10003427767718
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4619/4620 |    271.32M |  2.822s |  96.14M/s |        5000 |  0m00s |    6607us
no factor for M29504677 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 53m 29.768s
got assignment: exp=29504693 bit_min=68 bit_max=69
tf(29504693, 68, 69, ...);
 k_min = 5001711168300 -  k_max = 10003422342993
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4615/4620 |    271.32M |  2.846s |  95.33M/s |        5000 |  0m00s |    6649us
no factor for M29504693 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 51m 17.654s
got assignment: exp=29504773 bit_min=68 bit_max=69
tf(29504773, 68, 69, ...);
 k_min = 5001697608600 -  k_max = 10003395219456
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4616/4620 |    271.32M |  2.817s |  96.31M/s |        5000 |  0m00s |    6581us
no factor for M29504773 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 53m 33.822s
got assignment: exp=29504801 bit_min=68 bit_max=69
tf(29504801, 68, 69, ...);
 k_min = 5001692859240 -  k_max = 10003385726253
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4615/4620 |    271.32M |  2.757s |  98.41M/s |        5000 |  0m00s |    6250us
no factor for M29504801 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 53m 46.318s
got assignment: exp=29504863 bit_min=68 bit_max=69
tf(29504863, 68, 69, ...);
 k_min = 5001682348740 -  k_max = 10003364705653
Using GPU kernel "mfakto_cl_barrett79"
    class | candidates |    time | avg. rate | SievePrimes |    ETA | avg. wait
4617/4620 |    271.32M |  2.831s |  95.84M/s |        5000 |  0m00s |    6653us
no factor for M29504863 from 2^68 to 2^69 [mfakto 0.09-Win mfakto_cl_barrett79]
tf(): total time spent: 51m 21.457s
bcp19 is offline   Reply With Quote
Old 2012-01-31, 15:45   #366
bcp19
 
bcp19's Avatar
 
Oct 2011

7×97 Posts
Default

mfakto just moved up to 29.507M exps and has dropped back down to 43 min per exp. Interesting/weird little bump in that small a range.
bcp19 is offline   Reply With Quote
Old 2012-01-31, 21:05   #367
KyleAskine
 
KyleAskine's Avatar
 
Oct 2011
Maryland

2×5×29 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
Yes, it works fantastic on Win 7. But my Linux box bombed when I installed it. I had to roll back.
I think this was an installation issue. It now works.
KyleAskine is offline   Reply With Quote
Old 2012-02-01, 00:09   #368
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3×199 Posts
Default

Quote:
Originally Posted by bcp19 View Post
mfakto just moved up to 29.507M exps and has dropped back down to 43 min per exp. Interesting/weird little bump in that small a range.
I bet if you do the range again, it will be fast. I'd blame Windows and all the background tasks it is performing (indexer, backup, Wupdate, defender / virus scanning ...). Even if it is none of that I'd try blaming Windows ;-)
Bdot is offline   Reply With Quote
Old 2012-02-01, 00:23   #369
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by KyleAskine View Post
I think this was an installation issue. It now works.
I have 12.1 running for 2 days on my Linux box now. No aborts nor other issues. So I think it was some old library still on the system, or old values in LD_LIBRARY_PATH.
Bdot is offline   Reply With Quote
Old 2012-02-01, 00:56   #370
bcp19
 
bcp19's Avatar
 
Oct 2011

10101001112 Posts
Default

Quote:
Originally Posted by Bdot View Post
I bet if you do the range again, it will be fast. I'd blame Windows and all the background tasks it is performing (indexer, backup, Wupdate, defender / virus scanning ...). Even if it is none of that I'd try blaming Windows ;-)
I can't imagine what process would run for 16+ hours impacting core 4 without affecting other items. Core 1 running 332M LL stayed at .168ms/iter, Core 2 still averaged 20 min 41-46sec on mfaktc, core 3 running 45M LL stayed at .017 ms/iter. So, something impacted Core 4? M/s stayed at 97-99. Odd thing to me was I caught 2 exps ending, and after seeing the ~54min post, the first class printed said 43min to go.
bcp19 is offline   Reply With Quote
Old 2012-02-01, 12:49   #371
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by bcp19 View Post
I can't imagine what process would run for 16+ hours impacting core 4 without affecting other items. Core 1 running 332M LL stayed at .168ms/iter, Core 2 still averaged 20 min 41-46sec on mfaktc, core 3 running 45M LL stayed at .017 ms/iter. So, something impacted Core 4? M/s stayed at 97-99. Odd thing to me was I caught 2 exps ending, and after seeing the ~54min post, the first class printed said 43min to go.
Even after thinking about this a little bit more, I have no explanation. I thought about the difference in the exponents. The barrett kernels need a tiny bit longer to process a "1" instead of a "0" in the binary representation of the exponent. Usually, the first 7 bits are preprocessed on the host, so they don't count. M29504269 has just 8 times "1", M29504399 has 10. I doubt this would really be measurable, and for sure it is not accountable for +25% runtime.
mfakto 0.09 still wrote checkpoints after each class. You have 11min=660s more runtime. That is 660/960 ~ 0.7s more per class. As the reported times per class do not fluctuate by that much, it is quite likely that the delay is rather on the host code. If you don't have any task specifically pinned to core #4, and the other tasks are not affected, this really just leaves disk access as the culprit. Which mfaktc-version are you running? If that is before 0.18, then mfaktc would also write CPs after every class, so it should also be delayed by ~0.7s per class ... but if you did not switch to the latest mfakto-version, then I assume you also did not switch to the latest mfaktc-version. And if mfaktc < 0.18 was not affected I'm at the end of my knowledge/guesswork.
BTW, both indexing and virus scan can take forever if you have lots of files. On a dev machine with some GB of subversion repositories (~500k files including the .svn ones), it did not finish within one day - I had to disable them.
Bdot is offline   Reply With Quote
Old 2012-02-01, 19:13   #372
bcp19
 
bcp19's Avatar
 
Oct 2011

10101001112 Posts
Default

Quote:
Originally Posted by Bdot View Post
Even after thinking about this a little bit more, I have no explanation. I thought about the difference in the exponents. The barrett kernels need a tiny bit longer to process a "1" instead of a "0" in the binary representation of the exponent. Usually, the first 7 bits are preprocessed on the host, so they don't count. M29504269 has just 8 times "1", M29504399 has 10. I doubt this would really be measurable, and for sure it is not accountable for +25% runtime.
mfakto 0.09 still wrote checkpoints after each class. You have 11min=660s more runtime. That is 660/960 ~ 0.7s more per class. As the reported times per class do not fluctuate by that much, it is quite likely that the delay is rather on the host code. If you don't have any task specifically pinned to core #4, and the other tasks are not affected, this really just leaves disk access as the culprit. Which mfaktc-version are you running? If that is before 0.18, then mfaktc would also write CPs after every class, so it should also be delayed by ~0.7s per class ... but if you did not switch to the latest mfakto-version, then I assume you also did not switch to the latest mfaktc-version. And if mfaktc < 0.18 was not affected I'm at the end of my knowledge/guesswork.
BTW, both indexing and virus scan can take forever if you have lots of files. On a dev machine with some GB of subversion repositories (~500k files including the .svn ones), it did not finish within one day - I had to disable them.
I'm running .18 on mfaktc and .09 on mfakto. Disk access should not be a factor since it is being run from a ramdisk and I highly doubt my ram has a latency of .7s.
bcp19 is offline   Reply With Quote
Old 2012-02-01, 19:20   #373
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

978210 Posts
Default

Quote:
Originally Posted by bcp19 View Post
I'm running .18 on mfaktc and .09 on mfakto. Disk access should not be a factor since it is being run from a ramdisk and I highly doubt my ram has a latency of .7s.
Things which make you go "Hmmmm... That's unusual..." are things which should be investigaged.

Perhaps these exponents should be run again by another mfakto worker (using the same code) to see if the same behaviour is observed.

Last fiddled with by chalsall on 2012-02-01 at 19:24 Reason: Important that the experiment be re-run with the same code.
chalsall is online now   Reply With Quote
Old 2012-02-01, 23:31   #374
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

10010101012 Posts
Default

Quote:
Originally Posted by chalsall View Post
Things which make you go "Hmmmm... That's unusual..." are things which should be investigaged.

Perhaps these exponents should be run again by another mfakto worker (using the same code) to see if the same behaviour is observed.
Yes, you're right - if it is reproducible at all.

bcp19, could you please rerun one of the slow exponents, just to make sure it is something in mfakto? If it is slow again, then I'd like to know what Windows you're running, and which Catalyst version so I can setup the same ...
Bdot is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3498 2021-08-06 21:07
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2719 2021-08-05 22:43
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 22:10.


Fri Aug 6 22:10:28 UTC 2021 up 14 days, 16:39, 1 user, load averages: 3.09, 3.18, 2.94

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.