![]() |
|
|
#331 | ||
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi Ludovic,
Quote:
Code:
./mfaktc.exe -st Quote:
Oliver |
||
|
|
|
|
|
#332 |
|
Oct 2002
France
33×5 Posts |
So roughtly, it will be tested in 60 hours, 2 1/2 days.
Code:
tf(3321931967, 76, 77, ...); k_min = 11372578438620 k_max = 22745156877535 Using GPU kernel "95bit_mul32" class 0: tested 5293211648 candidates in 8622263ms (613900/sec) (avg. wait: 1657638usec) avg. wait > 500usec, increasing SievePrimes to 27000 class 4: tested 5257560064 candidates in 8564600ms (613871/sec) (avg. wait: 1656671usec) avg. wait > 500usec, increasing SievePrimes to 29000 class 9: tested 5225054208 candidates in 8509699ms (614011/sec) (avg. wait: 1653730usec) avg. wait > 500usec, increasing SievePrimes to 31000 class 12: tested 5195694080 candidates in 8461917ms (614009/sec) (avg. wait: 1654384usec) avg. wait > 500usec, increasing SievePrimes to 33000 class 24: tested 5168431104 candidates in 8417560ms (614005/sec) (avg. wait: 1655990usec) avg. wait > 500usec, increasing SievePrimes to 35000 class 25: tested 5142216704 candidates in 8374795ms (614011/sec) (avg. wait: 1655011usec) avg. wait > 500usec, increasing SievePrimes to 37000 class 28: tested 5118099456 candidates in 8335604ms (614004/sec) (avg. wait: 1653319usec) avg. wait > 500usec, increasing SievePrimes to 39000 class 33: tested 5096079360 candidates in 8299648ms (614011/sec) (avg. wait: 1651130usec) avg. wait > 500usec, increasing SievePrimes to 41000 class 37: tested 5074059264 candidates in 8263817ms (614009/sec) (avg. wait: 1652407usec) avg. wait > 500usec, increasing SievePrimes to 43000 class 40: tested 5054136320 candidates in 8231369ms (614009/sec) (avg. wait: 1651757usec) avg. wait > 500usec, increasing SievePrimes to 45000 class 45: tested 5035261952 candidates in 8201827ms (613919/sec) (avg. wait: 1647743usec) avg. wait > 500usec, increasing SievePrimes to 47000 class 49: tested 5017436160 candidates in 8176214ms (613662/sec) (avg. wait: 1642147usec) avg. wait > 500usec, increasing SievePrimes to 49000 class 52: tested 4999610368 candidates in 8142560ms (614009/sec) (avg. wait: 1635144usec) avg. wait > 500usec, increasing SievePrimes to 51000 Code:
Compiletime Options THREADS_PER_GRID_MAX 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits VERBOSE_TIMING disabled MORE_CLASSES disabled Runtime Options SievePrimes 25000 SievePrimesAdjust 1 NumStreams 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor bitlevel CUDA device info name: Quadro NVS 140M compute capability: 1.1 maximum threads per block: 512 number of multiprocessors: 2 (16 shader cores) clock rate: 800MHz Automatic parameters threads per grid: 1048576 |
|
|
|
|
|
#333 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hi Ludovic,
for your next run (if you do any) I recommend that you edit mfaktc.ini and set SievePrimes=100000. Your CPU is easily capable to feed your GPU with factor candidates fast enough. You had problems with mfaktc 0.09 (cudaStreamCreate() failed), how ofter have you tried it? I've noticed on my system that the driver 256.35 sometimes isn't capable of running even the simplest examples from the SDK. I shutdown my X server, reload the nvidia kernel module and start X again in this case. The code for memory/stream allocation is unchanged between 0.09 and 0.10 (expect a printf() change). --- Luigi, any idea how Ludovics timings compare to a recent CPU running factor5? Oliver |
|
|
|
|
|
#334 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
61·79 Posts |
Quote:
[Done] About 3,25 days on an i5-750 @ 2.67 GHz and 4 cores used. Luigi Last fiddled with by ET_ on 2010-07-29 at 13:56 |
|
|
|
|
|
|
#335 |
|
"Oliver"
Mar 2005
Germany
45716 Posts |
Hi Luigi, Ludovic,
so when I compare those numbers I would say that it is perhaps not worth running mfaktc on this GPU. ![]() So when SievePrimes reaches 100000 I assume a 8000000msec per class. This is 8000 seconds or a bit over 2 hours for one class. 8000s * 96(classes) = 768000s = ~213h = ~8.9days! OK, it is just one CPU core (at unknown speed) but compared to your i750 this isn't much faster than a single core of an i7. Oliver |
|
|
|
|
|
#336 |
|
Oct 2002
France
33·5 Posts |
Hi,
I run mfaktc on a laptop with a core 2 duo: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz. It's not a i7 quad core, so 2 1/2 days with one core, and the other one for other crunching it's not bad ![]() I was looking for factor5 for Linux 32bit but didn't found it, so I have tried mfaktc. I will let mfakt finish its task; I will then try with SievePrimes=100000 as you suggest to see if there is any improvement. About the difference between the 2 version, I think its about the compilation option in Makefile (the -arch). Ludovic Last fiddled with by Aillas on 2010-07-29 at 14:42 |
|
|
|
|
|
#337 | |
|
Oct 2002
France
33·5 Posts |
Hi,
I should miss something or don't understand how class are working. I thought I will finish my exponent this week end and found this this morning: Code:
class 273: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec) Code:
class 96: tested 5118099456 candidates in .... In one of your post, you said: Quote:
Code:
If MORE_CLASSES is defined than the while TF process is split into 4620 (4 * 3*5*7*11) classes. Otherwise it will be split into 420 (4 * 3*5*7) classes. With 4620 the siever runs a bit more efficent at the cost of 10 times more sieve initializations. This will allow to increase SIEVE_PRIMES a little bit further. This starts to become usefull on my system for e.g. TF M66xxxxxx from 2^66 to Code:
class 420: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec) (420-273) * 7700 sec = 314 hours = 13 days (??) Thanks Ludovic |
|
|
|
|
|
|
#338 | |
|
Banned
"Luigi"
Aug 2002
Team Italia
12D316 Posts |
Quote:
|
|
|
|
|
|
|
#339 | |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Hi Aillas,
Quote:
You're running without MORE_CLASSES so there are 420 classes but most of them can be removed totally. Remaining are 96 classes. So take the time of one class and multiply by 96. See post #335, I've predicted 8.9 days. With your latest timing (7693 seconds per class) you're are at 7693s * 96 classes = 8.55days. You can see the the time needed per class goes down over time on your system because SievePrimes increases all the way up too 100k. This removes more candidates by sieving, you can see the the number of candidates tested per class goes down from the start. Next time you can start directly at SievePrimes=100000 in mfaktc.ini. You're totally GPU-bound (very high average wait) so you CPU handles easily sieving that much. Oliver |
|
|
|
|
|
|
#340 |
|
Oct 2002
France
33×5 Posts |
Thanks for the explanation.
Go back to crunch now. Just do it
|
|
|
|
|
|
#341 |
|
"Oliver"
Mar 2005
Germany
11·101 Posts |
Hello,
just some GPU benchmark for those who are interessted in. Values are the raw GPU speed without paying attention on the sieve performance. On GTX 4x0 you'll usually need two instances of mfaktc to utilize the GPU 100%. Percentages are the speed compared to the 71bit kernel. Slightly factory overclocked GTX 275 (1458 MHz SP clock (reference 1404MHz)): Code:
kernel | M66362159, 2^64 to 2^67 | M3321932839, 2^50 to 2^71 -------+-------------------------+-------------------------- 71bit | 80.8M/s | 62.3M/s 0.09-pre5 75bit | 62.5M/s 77.4% | 48.2M/s 77.4% 95bit | 52.2M/s 64.6% | 40.3M/s 64.7% Code:
kernel | M66362159, 2^64 to 2^67 | M3321932839, 2^50 to 2^71 -------+-------------------------+-------------------------- 71bit | 85.2M/s | 65.8M/s 0.10-pre7 75bit | 145.2M/s 170.4% | 112.2M/s 170.5% 95bit | 120.2M/s 141.1% | 92.8M/s 141.0% Code:
kernel | M66362159, 2^64 to 2^67 | M3321932839, 2^50 to 2^71 -------+-------------------------+-------------------------- 71bit | 102.7M/s | 79.7M/s 0.10-pre7 75bit | 183.8M/s 179.0% | 143.8M/s 180.4% 95bit | 155.4M/s 151.3% | 121.2M/s 152.1% ![]() My GTX 275 feels kind of slow these days ![]() Oliver |
|
|
|
![]() |
| Thread Tools | |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |