![]() |
Hi Ludovic,
[QUOTE=Aillas;223167]This is the standard behavior. So it's ok for me. I didn't want to run the program many days for nothing.[/QUOTE] You can (should) run the builtin selftest: [CODE]./mfaktc.exe -st[/CODE] This might remove your checkpoint file so perhaps make a copy of the mfaktc.ckp file. [QUOTE=Aillas;223167]Now, I'm curious how many days it will take to sieve 3321931967 from 76 to 77 bit on a Quatro 140 M.[/QUOTE] Once it finishes the first class you can extrapolate the total runtime. In standard configuration there are 96 classes (and with MORE_CLASSES in params.h enabled there are 960 classes). So just multiply the time for the first class by 96. I assume that you leave the defaults in mfaktc.ini. I think SievePrimes will increases from class to class on your system up to 100000 and the time per class goes down a little bit. Oliver |
So roughtly, it will be tested in 60 hours, 2 1/2 days.
[CODE]tf(3321931967, 76, 77, ...); k_min = 11372578438620 k_max = 22745156877535 Using GPU kernel "95bit_mul32" class 0: tested 5293211648 candidates in 8622263ms (613900/sec) (avg. wait: 1657638usec) avg. wait > 500usec, increasing SievePrimes to 27000 class 4: tested 5257560064 candidates in 8564600ms (613871/sec) (avg. wait: 1656671usec) avg. wait > 500usec, increasing SievePrimes to 29000 class 9: tested 5225054208 candidates in 8509699ms (614011/sec) (avg. wait: 1653730usec) avg. wait > 500usec, increasing SievePrimes to 31000 class 12: tested 5195694080 candidates in 8461917ms (614009/sec) (avg. wait: 1654384usec) avg. wait > 500usec, increasing SievePrimes to 33000 class 24: tested 5168431104 candidates in 8417560ms (614005/sec) (avg. wait: 1655990usec) avg. wait > 500usec, increasing SievePrimes to 35000 class 25: tested 5142216704 candidates in 8374795ms (614011/sec) (avg. wait: 1655011usec) avg. wait > 500usec, increasing SievePrimes to 37000 class 28: tested 5118099456 candidates in 8335604ms (614004/sec) (avg. wait: 1653319usec) avg. wait > 500usec, increasing SievePrimes to 39000 class 33: tested 5096079360 candidates in 8299648ms (614011/sec) (avg. wait: 1651130usec) avg. wait > 500usec, increasing SievePrimes to 41000 class 37: tested 5074059264 candidates in 8263817ms (614009/sec) (avg. wait: 1652407usec) avg. wait > 500usec, increasing SievePrimes to 43000 class 40: tested 5054136320 candidates in 8231369ms (614009/sec) (avg. wait: 1651757usec) avg. wait > 500usec, increasing SievePrimes to 45000 class 45: tested 5035261952 candidates in 8201827ms (613919/sec) (avg. wait: 1647743usec) avg. wait > 500usec, increasing SievePrimes to 47000 class 49: tested 5017436160 candidates in 8176214ms (613662/sec) (avg. wait: 1642147usec) avg. wait > 500usec, increasing SievePrimes to 49000 class 52: tested 4999610368 candidates in 8142560ms (614009/sec) (avg. wait: 1635144usec) avg. wait > 500usec, increasing SievePrimes to 51000 [/CODE] The config: [CODE]Compiletime Options THREADS_PER_GRID_MAX 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 230945bits VERBOSE_TIMING disabled MORE_CLASSES disabled Runtime Options SievePrimes 25000 SievePrimesAdjust 1 NumStreams 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor bitlevel CUDA device info name: Quadro NVS 140M compute capability: 1.1 maximum threads per block: 512 number of multiprocessors: 2 (16 shader cores) clock rate: 800MHz Automatic parameters threads per grid: 1048576 [/CODE] |
Hi Ludovic,
for your next run (if you do any) I recommend that you edit mfaktc.ini and set SievePrimes=100000. Your CPU is easily capable to feed your GPU with factor candidates fast enough. You had problems with mfaktc 0.09 (cudaStreamCreate() failed), how ofter have you tried it? I've noticed on my system that the driver 256.35 sometimes isn't capable of running even the simplest examples from the SDK. I shutdown my X server, reload the nvidia kernel module and start X again in this case. The code for memory/stream allocation is unchanged between 0.09 and 0.10 (expect a printf() change). --- Luigi, any idea how Ludovics timings compare to a recent CPU running factor5? Oliver |
[QUOTE=TheJudger;223230]Hi Ludovic,
for your next run (if you do any) I recommend that you edit mfaktc.ini and set SievePrimes=100000. Your CPU is easily capable to feed your GPU with factor candidates fast enough. You had problems with mfaktc 0.09 (cudaStreamCreate() failed), how ofter have you tried it? I've noticed on my system that the driver 256.35 sometimes isn't capable of running even the simplest examples from the SDK. I shutdown my X server, reload the nvidia kernel module and start X again in this case. The code for memory/stream allocation is unchanged between 0.09 and 0.10 (expect a printf() change). --- Luigi, any idea how Ludovics timings compare to a recent CPU running factor5? Oliver[/QUOTE] I'm running some benchmark just now, I'll tell you soon. [Done] About 3,25 days on an i5-750 @ 2.67 GHz and 4 cores used. Luigi |
Hi Luigi, Ludovic,
so when I compare those numbers I would say that it is perhaps not worth running mfaktc on this GPU. :sad: So when SievePrimes reaches 100000 I assume a 8000000msec per class. This is 8000 seconds or a bit over 2 hours for one class. 8000s * 96(classes) = 768000s = ~213h = ~8.9days! OK, it is just one CPU core (at unknown speed) but compared to your i750 this isn't much faster than a single core of an i7. Oliver |
Hi,
I run mfaktc on a laptop with a core 2 duo: Intel(R) Core(TM)2 Duo CPU T7500 @ 2.20GHz. It's not a i7 quad core, so 2 1/2 days with one core, and the other one for other crunching it's not bad :smile: I was looking for factor5 for Linux 32bit but didn't found it, so I have tried mfaktc. I will let mfakt finish its task; I will then try with SievePrimes=100000 as you suggest to see if there is any improvement. About the difference between the 2 version, I think its about the compilation option in Makefile (the -arch). Ludovic |
Hi,
I should miss something or don't understand how class are working. I thought I will finish my exponent this week end and found this this morning: [CODE]class 273: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec)[/CODE] When you talked about 96 classes, I thought the program will end something like below will be displayed [CODE]class 96: tested 5118099456 candidates in ....[/CODE] So I read a little more carefully this thread and found something about MORE_CLASSES. In one of your post, you said: [QUOTE]Once it finishes the first class you can extrapolate the total runtime. In standard configuration there are 96 classes (and with MORE_CLASSES in params.h enabled there are 960 classes).[/QUOTE] I open params.h : [CODE]If MORE_CLASSES is defined than the while TF process is split into 4620 (4 * 3*5*7*11) classes. Otherwise it will be split into 420 (4 * 3*5*7) classes. With 4620 the siever runs a bit more efficent at the cost of 10 times more sieve initializations. This will allow to increase SIEVE_PRIMES a little bit further. This starts to become usefull on my system for e.g. TF M66xxxxxx from 2^66 to[/CODE] So, does that means I will finish my exponent when I'll reach class 420? [CODE]class 420: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec)[/CODE] If so, for this exponent, time remaining to finish it should be: (420-273) * 7700 sec = 314 hours = 13 days (??) Thanks Ludovic |
[QUOTE=Aillas;223688]Hi,
I should miss something or don't understand how class are working. I thought I will finish my exponent this week end and found this this morning: [CODE]class 273: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec)[/CODE] When you talked about 96 classes, I thought the program will end something like below will be displayed [CODE]class 96: tested 5118099456 candidates in ....[/CODE] So I read a little more carefully this thread and found something about MORE_CLASSES. In one of your post, you said: I open params.h : [CODE]If MORE_CLASSES is defined than the while TF process is split into 4620 (4 * 3*5*7*11) classes. Otherwise it will be split into 420 (4 * 3*5*7) classes. With 4620 the siever runs a bit more efficent at the cost of 10 times more sieve initializations. This will allow to increase SIEVE_PRIMES a little bit further. This starts to become usefull on my system for e.g. TF M66xxxxxx from 2^66 to[/CODE] So, does that means I will finish my exponent when I'll reach class 420? [CODE]class 420: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec)[/CODE] If so, for this exponent, time remaining to finish it should be: (420-273) * 7700 sec = 314 hours = 13 days (??) Thanks Ludovic[/QUOTE] I think so. You need to perform the whole cycle of 420 classes, of which only 96 are used for sieving. You can notice that the progression of the class indicator does not advance by 1 number per iteration. |
Hi Aillas,
[QUOTE=Aillas;223688]Hi, I should miss something or don't understand how class are working. I thought I will finish my exponent this week end and found this this morning: [CODE]class 273: tested 4722786304 candidates in 7693152ms (613894/sec) (avg. wait: 1608724usec)[/CODE] [/QUOTE] maybe me describtion was not detailed enough, sorry. You're running without MORE_CLASSES so there are 420 classes but most of them can be removed totally. Remaining are 96 classes. So take the time of one class and multiply by 96. See post #335, I've predicted 8.9 days. With your latest timing (7693 seconds per class) you're are at 7693s * 96 classes = 8.55days. You can see the the time needed per class goes down over time on your system because SievePrimes increases all the way up too 100k. This removes more candidates by sieving, you can see the the number of candidates tested per class goes down from the start. Next time you can start directly at SievePrimes=100000 in mfaktc.ini. You're totally GPU-bound (very high average wait) so you CPU handles easily sieving that much. Oliver |
Thanks for the explanation.
Go back to crunch now. Just do it :smile: |
Hello,
just some GPU benchmark for those who are interessted in. Values are the [B]raw[/B] GPU speed without paying attention on the sieve performance. On GTX 4x0 you'll usually need two instances of mfaktc to utilize the GPU 100%. Percentages are the speed compared to the 71bit kernel. Slightly factory overclocked GTX 275 (1458 MHz SP clock (reference 1404MHz)): [CODE] kernel | M66362159, 2^64 to 2^67 | M3321932839, 2^50 to 2^71 -------+-------------------------+-------------------------- 71bit | 80.8M/s | 62.3M/s 0.09-pre5 75bit | 62.5M/s 77.4% | 48.2M/s 77.4% 95bit | 52.2M/s 64.6% | 40.3M/s 64.7% [/CODE] Impressive factory overclocked GTX 460 (1600MHz SP clock (reference 1350MHz)): [CODE] kernel | M66362159, 2^64 to 2^67 | M3321932839, 2^50 to 2^71 -------+-------------------------+-------------------------- 71bit | 85.2M/s | 65.8M/s 0.10-pre7 75bit | 145.2M/s 170.4% | 112.2M/s 170.5% 95bit | 120.2M/s 141.1% | 92.8M/s 141.0% [/CODE] Stock GTX 470: [CODE] kernel | M66362159, 2^64 to 2^67 | M3321932839, 2^50 to 2^71 -------+-------------------------+-------------------------- 71bit | 102.7M/s | 79.7M/s 0.10-pre7 75bit | 183.8M/s 179.0% | 143.8M/s 180.4% 95bit | 155.4M/s 151.3% | 121.2M/s 152.1% [/CODE] In this comparison the GTX 470 gives most bank for the buck while the GTX 460 has the highest performance per watt. :smile: My GTX 275 feels kind of slow these days :sad: Oliver |
| All times are UTC. The time now is 22:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.