mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   GPU Computing (https://www.mersenneforum.org/forumdisplay.php?f=92)
-   -   What SIEVE_PRIMES are you using? (https://www.mersenneforum.org/showthread.php?t=16991)

Prime95 2012-07-20 19:16

What SIEVE_PRIMES are you using?
 
I'm working on GPU sieving and I'd like to know what sieve_primes setting mfaktc most users are using.

I think mfaktc can auto-adjust this value. If so, please report the sieve_primes value mfaktc thinks is optimal.

My goal is to create a GPU sieve that is as fast as the CPU sieve for most users. To do this, the GPU sieve must be fast enough so that it can sieve deeper than the CPU -- where the time saved by reduced TF candidates equals the time spent doing GPU sieving.

This may not be possible, but it is good to have goals!

pinhodecarlos 2012-07-20 19:37

You need to think it as a specific energy ratio, like (kWh/sieve_depth)_GPU vs (kWh/sieve_depth)_CPU_all_cores.

kladner 2012-07-20 22:03

1 Attachment(s)
The upper instance is working on a low 50M. The lower instance is a low 60M.
This is on a GTX 460 running at 810 MHz being fed by 2 cores of a Phenom II x6 1090T running at 3.5 GHz.

Xyzzy 2012-07-20 23:52

We use 5,000 per instance of mfaktc, with each instance locked to one core. This uses 9-12% of each core on our (underclocked to 3GHz) 2500K processors. The remaining cycles per box are used with Prime95 running a P-1 test with four worker threads. The GPUs (GTX 570) are almost saturated with three instances so the fourth instance is there as a backup in case one instance fails and to fully utilize the GPU. TF from 71 to 72 bits takes around three hours per core and P-1 results are around two per day per box.

We can get the same TF performance, without the P-1 work, by underclocking the CPUs to 2GHz. Our room temperature is 83°F with just TF and 88°F with TF and P-1.

Using ~25,000 per instance of mfaktc improves 71 to 72 bit times by 15 minutes or so, which is not an efficient use of cycles in our opinion.

Dubslow 2012-07-21 00:18

With a 460 and one core of a 2600K, 5000 was barely low enough to saturate it.

Are you starting with rcv's sieve or from scratch?

TObject 2012-07-21 01:10

1 Attachment(s)
GTX 580, 797MHz on Core i7 920, 2.67GHz

Prime95 2012-07-21 02:24

[QUOTE=Dubslow;305302]Are you starting with rcv's sieve or from scratch?[/QUOTE]

I started from bsquared's code as well as many ideas from rcv's code.

At this point, I think I can make a GPU sieve as fast as a SIEVE_PRIMES=5000 CPU sieve. I've got a few more ideas I'm working on.

The responses are interesting in that the 3 replies have SIEVE_PRIMES anywhere from 5000 to 40000. Also interesting is that Kladner's 460 reports 100M/sec, yet my 460 reports it can do up to 205M/sec.

Dubslow 2012-07-21 02:43

Kladner runs two instances, so between them they get 200M/s. That's also roughly what I got with mine. The main reason is he's using a Phenom II, whose cores are individually slower than our SB cores. (At least I presume your 460 is in your SB.)

SP on its own mostly varies with CPU/GPU throughput ratios. For all but the slowest cards or fastest chips though, SP has to be as low as possible (5000 was the lower bound set by TheJudger, though rcv et. al. demonstrated that SP below 2000 runs even faster.)

kladner 2012-07-21 02:53

1 Attachment(s)
[QUOTE=Dubslow;305331]Kladner runs two instances, so between them they get 200M/s. That's also roughly what I got with mine. The main reason is he's using a Phenom II, whose cores are individually slower than our SB cores. (At least I presume your 460 is in your SB.)

SP on its own mostly varies with CPU/GPU throughput ratios. For all but the slowest cards or fastest chips though, SP has to be as low as possible (5000 was the lower bound set by TheJudger, though rcv et. al. demonstrated that SP below 2000 runs even faster.)[/QUOTE]

Correct on my setup, and the reasons for it. I did have the idea that SP adjustment was driven by CPU wait time in current versions of mfaktc. I'm running v 0.18.

EDIT: The attached is my night setup with SP @ 19000 adjustable, and NumStreams=5, Priority=High. The machine is virtually unusable. I have to shut mfaktc down first thing in the AM and reset Numstreams back to 3 and Priority back to low.

EDIT2: These are the same exponents as the previous post.

LaurV 2012-07-21 05:17

GTX580x2 or x3, with i7 2600k, SievePrimes set on [B]auto-adjust, 25000[/B], all threads locked to phys cores if less then 4 threads, or to logical cores if more then 4 threads.

With two instances of mfaktc per card and 2 cards, the numbers stabilize around 5800-6500, more or less. GPU occupancy ~87-90%. Raising the CPU clock to around 4.4Gigs increase the number to 7000-8000 or more (but unstable, raises and falls) and the GPU occu close to 96-98%.

Two cards with 3 instances per card, or 3 cards with 2 instances per card, the numbers go down fast to 5000 and stay there. GPU occu goes to 92-97 and stays there (2 cards, for 3 cards is lower, about 70). There is no way to max the cards with this setup. CPU is the bottleneck, even if I clock it over 4.5G for short periods of time. I can't let the clock so high, even with water-cooled asus maximus extreme, I saw a couple of blue screens. Usually depending on the weather (the radiator is outside of the house) the CPU is clocked between minimum 3.6GHz during the day and 4.4GHz during the night.

With 3 cards I can run a total of 8 instances, mapped to logical cores, the active (video) card gets two instances only, and I still can watch videos (what I never do, in fact, but the low res video, like DVD-Rip quality, is playable with VLC player), GPUs stabilizes somewhere at 70% average (the one with 2 instances is less, I tried 3 instances per card but the computer can't be used anymore and there is no improvement in output, also, locking the ninth instance to any number of cpus won't help). Of course, the SievePrimes goes down fast to 5000 and stays there, no matter what I do with the CPU clock.

Sometimes I would like mfaktc to let me adjust the number under 5000, as the system is CPU.. not bottlenecked, but totally "strangled". P95 never runs together with mfaktc, as the performance of both become really lousy. When I do TF with one card only, for example (like the other two are doing CL, or are busy with my fantasies or daily job) then the free remaining CPU cores (if any) would do aliquiet and yafu.

LaurV 2012-07-21 05:20

Summary: tl;dr version:
SievePrimes is 5000, and it should be very nice if you implement GPU sieving into mfaktc. :smile:

Chuck 2012-07-21 12:53

1 Attachment(s)
[QUOTE=Prime95;305283]I'm working on GPU sieving and I'd like to know what sieve_primes setting mfaktc most users are using. [/QUOTE]

5000. Two instances of mfaktc on GTX-580 with core-i7 970. No affinities.

Jaxon 2012-07-21 13:06

1 Attachment(s)
~20,000 running 3 instances of mfaktc on a GTX 580. My CPU is a i5-3570k running at 4100MHz. GPU use is near 100%.

ATH 2012-07-21 13:39

Geforce GTX 460 running on a Q9450 (Yorksfield):

1 instance of mfaktc 80M exponent 71-72 bits:
SievePrimes auto adjusts to 5000. avg rate 125M/s. GPU load 73-75%. CPU load 25% (1 full core)

2 instances of mfaktc 80M exponent 71-72 bits:
SievePrimes auto adjusts to ~ 17000-19000. avg rate 2 x 85M/s. GPU load 95-99%. CPU load 50% (2 cores)

kjaget 2012-07-23 15:12

As with everyone else, depends on the number of instances of mfaktc.

I have a GTX560ti448 (basically a 570) with an OCd i5-750

1 or 2 instances don't max out the GPU even at sieve primes = 5000. Going from 1 to 2 instances doubles my effective throughput.

With 3 instances I can max the GPU so I only get a ~27% speedup, sieve primes are in the 10-12,000 range.

Going to 4 instances gives me a ~10% speedup, sieve primes go up to ~3x,xxx (from memory, might not be 100% accurate).

For your entertainment, here's how number of candidates is influenced by sieve primes on a really slow GPU. Candidates is in millions, time is run time per class and rate is M candidates / sec. This is from a 49M exponent from 70 to 71. I only ran 1 class per sieve primes value so there's some noise in the timings, but the candidates should be correct. Looks like ~5-6% reduction in candidates every time you double sieve primes, at least in the 5-200K range that mfaktc uses.

[code]
Sieve Primes candidates time rate
5,000 643.83 79.133 8.14
5,625 636.49 78.161 8.14
6,328 628.62 77.283 8.13
7,119 621.81 76.271 8.15
7,500 618.14 75.288 8.21
8,008 614.47 75.369 8.15
8,437 611.32 74.453 8.21
9,009 607.65 74.525 8.15
9,491 604.5 73.627 8.21
10,135 600.83 73.66 8.16
10,677 597.69 72.794 8.21
11,401 594.02 72.872 8.15
12,011 590.87 71.963 8.21
12,826 587.2 72.067 8.15
13,512 584.58 71.202 8.21
14,429 580.91 71.366 8.14
15,201 578.29 70.436 8.21
16,232 574.62 70.002 8.21
17,101 572 69.666 8.21
18,261 568.85 69.897 8.14
19,238 566.23 70.685 8.01
20,000 564.13 68.713 8.21
20,543 562.56 68.4 8.22
21,642 559.94 68.201 8.21
22,500 558.37 68.01 8.21
23,110 556.79 67.644 8.23
24,347 554.17 67.5 8.21
25,998 551.03 67.644 8.15
27,339 548.41 66.802 8.21
29,247 545.78 66.935 8.15
30,813 543.16 66.158 8.21
32,902 540.02 66.301 8.14
34,664 537.4 65.458 8.21
37,014 534.77 65.492 8.17
38,997 532.15 64.818 8.21
41,640 529.53 64.805 8.17
43,871 526.91 64.181 8.21
46,845 524.29 66.124 7.93
49,354 522.19 63.611 8.21
50,000 521.14 63.479 8.21
52,700 519.05 63.399 8.19
55,523 516.95 64.691 7.99
56,250 516.42 62.906 8.21
59,287 514.33 63.179 8.14
62,463 512.23 62.4 8.21
63,281 511.18 62.267 8.21
66,697 509.08 62.493 8.15
70,270 506.99 61.759 8.21
71,191 506.46 61.692 8.21
75,000 504.37 61.443 8.21
75,034 504.37 61.935 8.14
79,053 502.27 61.185 8.21
80,089 501.74 61.122 8.21
84,413 499.65 61.352 8.14
88,934 498.07 60.674 8.21
90,100 497.55 60.612 8.21
94,964 495.45 60.851 8.14
100,000 493.36 60.711 8.13
100,050 493.36 60.102 8.21
101,362 492.83 60.043 8.21
106,834 490.73 60.273 8.14
112,500 489.16 59.756 8.19
112,556 488.64 59.528 8.21
114,032 488.64 59.53 8.21
120,188 486.54 59.694 8.15
126,562 484.44 59.048 8.20
126,625 484.44 59.021 8.21
128,286 483.92 58.958 8.21
135,211 482.34 59.352 8.13
142,382 480.25 58.512 8.21
142,453 480.25 58.509 8.21
144,321 479.72 60.202 7.97
152,112 478.15 60.392 7.92
160,179 476.05 57.998 8.21
160,259 476.05 57.998 8.21
162,361 475.53 57.939 8.21
171,126 473.96 58.494 8.10
180,201 472.38 57.555 8.21
180,291 472.38 59.309 7.96
182,656 471.86 57.493 8.21
192,516 469.76 57.5 8.17
200,000 468.71 57.606 8.14
[/code]

Bdot 2012-07-23 20:00

One HD5770 with 3 instances at SP 55,000 fix each consuming a little over 2 cores of an Phenom X4 955 (3.2GHz).

Another HD5770 with 3 instances at auto-adjust, SP ~100,000 with 3 full cores of a Xeon hex-core. As my GPUs deliver only about 150-160M/s I can trade the 3rd core's 2.6 CPU-GHz-days against 10-12 GPU-TF-GHz-days. With just 2 instances SP would settle at ~30,000.

An NV Quadro2000, 2x ~95,000 auto-adjust, 2 cores of a Xeon-hex.

BTW, I'm on rcv's code to port it to OpenCL.

bcp19 2012-07-25 01:24

[QUOTE=Prime95;305283]I'm working on GPU sieving and I'd like to know what sieve_primes setting mfaktc most users are using.

I think mfaktc can auto-adjust this value. If so, please report the sieve_primes value mfaktc thinks is optimal.

My goal is to create a GPU sieve that is as fast as the CPU sieve for most users. To do this, the GPU sieve must be fast enough so that it can sieve deeper than the CPU -- where the time saved by reduced TF candidates equals the time spent doing GPU sieving.

This may not be possible, but it is good to have goals![/QUOTE]

I have 6 machines TFing using 7 cards and each is set up differently. 2500K/480, 2 instances, SP=5000, will up it to 3 after my 100M LL finishes and probably have SP=20,000-30,000. 2400/480, 3 instances, SP=11000(when it had the 560Ti SP = 33000). Q8200/560 Ti, 3 instances, SP=14000. Q6600/560, 3 instances, SP=9500. Q8200/550 Ti, 3 instances, SP=40000. Phenom II 1055T/460, 3 instances, SP=35000, 1055T/6770, 2 instances, SP=5000.

Prime95 2012-07-25 02:31

Thanks for all the data -- I was not expecting quite that much variety!

Just an update as to where I stand. I've built a test program that emulates the work required by a mfaktc GPU sieve. 95% of this code would be usable in a real mfaktc implementation.

On my 460, the GPU sieve reaches the point of diminishing returns at roughly 71265 Primes (First Prime Is 13, Last Prime Is 899851). The speed of the GPU sieve is such that it achieves as much throughput as a CPU sieve sieving 14177 primes from (first prime is 13, last prime is 153877).

I've got a few tweaks in mind that might improve things a tad.

Of course, this is all theory. A real implementation might run into roadblocks that I had not considered.


[QUOTE=Bdot;305648]BTW, I'm on rcv's code to port it to OpenCL.[/QUOTE]

PM me if you'd like a copy of my test code. It is a blending of erato.cu from the "second dumbest cuda" thread and rcv's ideas as well as several ideas of my own.

kracker 2012-07-25 16:08

1 Attachment(s)
I don't run mfaktc since I don't have a Nvidia card... but I have mfakto
Damn, wish the attachment limit file size was just a bit bigger...:smile:

Xyzzy 2012-07-25 18:54

1 Attachment(s)
[QUOTE]Damn, wish the attachment limit file size was just a bit bigger...:smile:[/QUOTE]Just use a lower compression setting. The attached file at 123kb is very close to your original.

kracker 2012-07-25 19:07

[QUOTE=Xyzzy;305950]Just use a lower compression setting. The attached file at 123kb is very close to your original.[/QUOTE]

Ah I see. Thanks! :smile:

TObject 2012-07-25 19:28

Also try GIF; it beats JPEG when there are a lot of continuous tones in the picture.

CRGreathouse 2012-07-25 21:22

[QUOTE=TObject;305964]Also try GIF; it beats JPEG when there are[B]n't[/B] a lot of continuous tones in the picture.[/QUOTE]

Fixed that for you. Actually, PNG is better still in this case...

kracker 2012-07-26 01:33

[QUOTE=CRGreathouse;305975]Fixed that for you. Actually, PNG is better still in this case...[/QUOTE]

PNG has excellent quality, but it tends to get big.

patrik 2012-07-28 10:26

I run two instances of mfaktc on a GTX 560 with four workers of mprime also running (on a Core i5 2500K). Then it runs with SievePrimes==5000. When I stop mprime, SievePrimes slowly increases to about 25000 until I start mprime again. (One instance of mfaktc shown below.)
[CODE] class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait
96/4620 | 721.16M | 6.546s | 1h42m | 110.17M/s | 5000 | 0.91%
101/4620 | 721.16M | 6.542s | 1h42m | 110.24M/s | 5000 | 0.93%
105/4620 | 721.16M | 6.578s | 1h42m | 109.63M/s | 5000 | 0.92%
116/4620 | 721.16M | 6.466s | 1h40m | 111.53M/s | 5000 | 0.80%
117/4620 | 721.16M | 6.557s | 1h42m | 109.98M/s | 5000 | 0.98%
129/4620 | 721.16M | 6.460s | 1h40m | 111.63M/s | 5000 | 9.90%
137/4620 | 712.90M | 6.326s | 1h38m | 112.69M/s | 5625 | 32.87%
140/4620 | 704.64M | 6.217s | 1h36m | 113.34M/s | 6328 | 31.02%
141/4620 | 696.39M | 6.290s | 1h37m | 110.71M/s | 7119 | 31.54%
144/4620 | 688.13M | 6.155s | 1h35m | 111.80M/s | 8008 | 28.21%
149/4620 | 680.79M | 6.117s | 1h34m | 111.29M/s | 9009 | 26.61%
152/4620 | 672.53M | 5.984s | 1h32m | 112.39M/s | 10135 | 24.15%
156/4620 | 665.19M | 5.870s | 1h30m | 113.32M/s | 11401 | 21.52%
161/4620 | 657.85M | 5.831s | 1h30m | 112.82M/s | 12826 | 19.00%
165/4620 | 650.51M | 5.730s | 1h28m | 113.53M/s | 14429 | 15.82%
176/4620 | 644.09M | 5.683s | 1h27m | 113.34M/s | 16232 | 13.40%
177/4620 | 636.75M | 5.632s | 1h26m | 113.06M/s | 18261 | 11.72%
180/4620 | 630.33M | 5.557s | 1h25m | 113.43M/s | 20543 | 9.00%
185/4620 | 623.90M | 5.544s | 1h25m | 112.54M/s | 23110 | 6.57%
189/4620 | 617.48M | 5.413s | 1h23m | 114.07M/s | 25998 | 1.93%
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait
192/4620 | 624.82M | 5.570s | 1h25m | 112.18M/s | 22748 | 7.19%
200/4620 | 618.40M | 5.477s | 1h23m | 112.91M/s | 25591 | 3.15%
201/4620 | 618.40M | 5.430s | 1h22m | 113.89M/s | 25591 | 2.21%
204/4620 | 618.40M | 5.495s | 1h23m | 112.54M/s | 25591 | 3.62%
212/4620 | 618.40M | 5.449s | 1h23m | 113.49M/s | 25591 | 2.70%
221/4620 | 618.40M | 5.436s | 1h22m | 113.76M/s | 25591 | 2.46%
224/4620 | 618.40M | 5.452s | 1h22m | 113.43M/s | 25591 | 2.77%
225/4620 | 618.40M | 5.437s | 1h22m | 113.74M/s | 25591 | 2.39%
236/4620 | 618.40M | 5.420s | 1h22m | 114.10M/s | 25591 | 2.39%
240/4620 | 618.40M | 5.447s | 1h22m | 113.53M/s | 25591 | 2.78%
245/4620 | 618.40M | 5.468s | 1h22m | 113.09M/s | 25591 | 3.17%
249/4620 | 618.40M | 5.476s | 1h22m | 112.93M/s | 25591 | 3.40%
257/4620 | 618.40M | 5.430s | 1h22m | 113.89M/s | 25591 | 2.60%
260/4620 | 618.40M | 5.481s | 1h22m | 112.83M/s | 25591 | 3.31%
261/4620 | 618.40M | 5.466s | 1h22m | 113.14M/s | 25591 | 3.12%
264/4620 | 618.40M | 5.416s | 1h21m | 114.18M/s | 25591 | 2.27%
269/4620 | 618.40M | 5.431s | 1h21m | 113.86M/s | 25591 | 2.61%
276/4620 | 618.40M | 5.423s | 1h21m | 114.03M/s | 25591 | 2.69%
281/4620 | 618.40M | 5.460s | 1h21m | 113.26M/s | 25591 | 3.15%
284/4620 | 618.40M | 5.442s | 1h21m | 113.63M/s | 25591 | 2.76%
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait
297/4620 | 618.40M | 5.424s | 1h21m | 114.01M/s | 25591 | 2.47%
305/4620 | 618.40M | 5.502s | 1h22m | 112.40M/s | 25591 | 3.83%
309/4620 | 618.40M | 5.458s | 1h21m | 113.30M/s | 25591 | 2.87%
312/4620 | 618.40M | 5.515s | 1h22m | 112.13M/s | 25591 | 2.78%
317/4620 | 618.40M | 5.782s | 1h26m | 106.95M/s | 25591 | 0.57%
320/4620 | 625.74M | 8.418s | 2h05m | 74.33M/s | 22392 | 0.60%
324/4620 | 633.08M | 7.911s | 1h57m | 80.02M/s | 19593 | 0.62%
332/4620 | 640.42M | 7.762s | 1h55m | 82.51M/s | 17143 | 0.63%
336/4620 | 648.68M | 7.675s | 1h53m | 84.52M/s | 15000 | 0.75%
341/4620 | 656.93M | 7.385s | 1h49m | 88.96M/s | 13125 | 0.59%
344/4620 | 665.19M | 7.307s | 1h48m | 91.03M/s | 11484 | 0.74%
345/4620 | 673.45M | 7.173s | 1h46m | 93.89M/s | 10048 | 0.75%
357/4620 | 681.71M | 7.105s | 1h45m | 95.95M/s | 8792 | 0.77%
360/4620 | 690.88M | 6.871s | 1h41m | 100.55M/s | 7693 | 0.65%
365/4620 | 700.06M | 6.856s | 1h41m | 102.11M/s | 6731 | 1.24%
369/4620 | 709.23M | 6.672s | 1h38m | 106.30M/s | 5889 | 0.63%
372/4620 | 719.32M | 6.563s | 1h36m | 109.60M/s | 5152 | 1.13%
380/4620 | 721.16M | 6.514s | 1h35m | 110.71M/s | 5000 | 0.83%
381/4620 | 721.16M | 6.499s | 1h35m | 110.96M/s | 5000 | 0.86%
389/4620 | 721.16M | 6.524s | 1h35m | 110.54M/s | 5000 | 1.48%
class | candidates | time | ETA | avg. rate | SievePrimes | CPU wait
392/4620 | 721.16M | 6.544s | 1h35m | 110.20M/s | 5000 | 2.11%
396/4620 | 721.16M | 6.516s | 1h35m | 110.67M/s | 5000 | 1.22%
401/4620 | 721.16M | 6.520s | 1h35m | 110.61M/s | 5000 | 0.86%
[/CODE]

NormanRKN 2012-07-28 11:00

it seems to by a memorybus bottleneck. i know that problem. here too with i7. try to reduce the cores of mprime step by step an show what mfaktc does. maybe monitor the gpu with gpu-z or another tool for perfomance.

Norman

bcp19 2012-07-28 13:25

[QUOTE=NormanRKN;306285]it seems to by a memorybus bottleneck. i know that problem. here too with i7. try to reduce the cores of mprime step by step an show what mfaktc does. maybe monitor the gpu with gpu-z or another tool for perfomance.

Norman[/QUOTE]
It's not a function of memory. It's a sharing of cores. Since there is less CPU processing available, less SP is sent to it.

henryzz 2012-07-28 22:17

[QUOTE=bcp19;306290]It's not a function of memory. It's a sharing of cores. Since there is less CPU processing available, less SP is sent to it.[/QUOTE]
Yes but I am assuming that prime95 is running at a lower priority so there shouldn't be much difference cpu-wise.

bcp19 2012-07-29 00:49

[QUOTE=henryzz;306317]Yes but I am assuming that prime95 is running at a lower priority so there shouldn't be much difference cpu-wise.[/QUOTE]
The simple fact that M/s climbs when mprime stops shows that the GPU is not running at it's full potential when both are running. Admittedly it's very minor (~1%), but is still there.

When I first had my 2500K running, I noticed 1 core nearly exactly matched a 560, so I had a second core use mfakto on a 5770 sharing with P95 (mainly cause without P95 SP was 200,000 and cpu wait was over 15%) The timings showed P95 was using over 20% of the core and the SP balanced aroung 60,000.

James Heinrich 2012-07-31 23:51

1 Attachment(s)
I actually have SievePrimes set to auto-adjust in the range 2000-10000, with a default of 5000; it rarely strays much from 5000 unless I'm doing a bunch of other work that spills onto mfaktc's cores, and even then only maybe down to 4500 or up to 6000.

Two instances of mfaktc, one GTX 570, locked to the first two physical cores of i7-3930K @4125MHz (the last 4 cores are doing P-1). GPU usage averages 95%.

kladner 2012-08-01 01:44

[QUOTE=James Heinrich;306573]I actually have SievePrimes set to auto-adjust in the range 2000-10000, with a default of 5000.......[/QUOTE]

Pardon, James. is this possible in mfaktc v 0.18? If so, how?

Apologies if this has been covered before.:confused:

James Heinrich 2012-08-01 01:54

[QUOTE=kladner;306582]Pardon, James. is this possible in mfaktc v 0.18?[/QUOTE]No.[quote=mfaktc changelog]version 0.19-pre9 (2012-07-08)
... other stuff ...
- SievePrimesMin is lowered to 2000 (usually not very usefull but requested quiet often)[/quote]

kladner 2012-08-01 03:25

Thanks!

LaurV 2012-08-01 03:37

[QUOTE=James Heinrich;306583]2000 requested quiet often[/QUOTE]
Hm... I [strike]"quiet"[/strike] "quite" need this... When can we have it? Can it include George's barrett77 code? Will the "low expos" version be merged with it or still be a separate trust-based distribution?
Thanks!

NormanRKN 2012-08-01 10:33

2000 ? oh, very low.
by the way, is it possible to use more then 1 cpu for 1 instance ? p.e. core 0+1 or core 2+3 or all cores ?

Dubslow 2012-08-01 10:36

[QUOTE=NormanRKN;306619]
by the way, is it possible to use more then 1 cpu for 1 instance ? p.e. core 0+1 or core 2+3 or all cores ?[/QUOTE]

No. The sieving code is not multithreaded. (You can set the affinity for the process to multiple cores, but it won't actually go any faster.)

LaurV 2012-08-01 15:04

[QUOTE=Dubslow;306620]The sieving code is not multithreaded[/QUOTE]

Correct. Why should it be? Our problem (the guys wanting a lower SievePrimes) is that the CPU is heavily suffocated. If you have a system where the balance is in favor of the CPU (i.e. more CPU power, less GPU power), then you have plenty of possibilities to make it running as you want. You either can use more instances, set affinities of more instances on the same core, whatever, in such a way to max the GPU's and if some CPU resources are free, you still can play P-1, aliquots, whatever, to max both. But in a system where more GPU power is available, few instances of mfaktc will immediately deplete the CPU power. In my system where I have two gtx580 (factory overclocked), there is no way to max both of them without overclocking my i7-2600k CPU to over 4.4Gigs, and in that case the computer can't be used for anything else. To avoid that, and still have both GPU/CPU maxed, I have to use some CudaLucas too...

It would be more convenient for me to "mfaktc with the GPU only" i.e. have the possibility to set the SievePrimes lower, to let the GPU do more exponentiation, and free the CPU by sieving less, close to nothing, if possible. In this case I can use the CPU for other things.

kladner 2012-08-01 16:09

Sieve Primes changes with BIOS update
 
1 Attachment(s)
Yesterday, I realized that I had not updated my system BIOS in some time. I went to the Asus site and got the latest and flashed it.

When I got all the BIOS settings back to their usual values and booted Windows and mfaktc, I noticed that SP was behaving a little differently. I've now switched the BIOS back and forth a couple of times and came up with the attached. The older BIOS is above the red line, the newer below. The summary is that the older runs at SP 28125 while the newer stayed at 25000.

Can someone help me interpret this in terms of which BIOS is more advantageous if this difference is significant in the first place?

EDIT: Sorry for the poor legibility of the attachment. I had to downsize it and use heavy JPEG to meet the upload limit.

EDIT2: These are the same three exponents in both sets. Also, I hope it is not [I]too [/I]OT to post this here.

bcp19 2012-08-01 21:08

[QUOTE=kladner;306628]Yesterday, I realized that I had not updated my system BIOS in some time. I went to the Asus site and got the latest and flashed it.

When I got all the BIOS settings back to their usual values and booted Windows and mfaktc, I noticed that SP was behaving a little differently. I've now switched the BIOS back and forth a couple of times and came up with the attached. The older BIOS is above the red line, the newer below. The summary is that the older runs at SP 28125 while the newer stayed at 25000.

Can someone help me interpret this in terms of which BIOS is more advantageous if this difference is significant in the first place?

EDIT: Sorry for the poor legibility of the attachment. I had to downsize it and use heavy JPEG to meet the upload limit.

EDIT2: These are the same three exponents in both sets. Also, I hope it is not [I]too [/I]OT to post this here.[/QUOTE]
In technical terms, the older bios should be more productive since the CPU is Sieving deeper. About all you could do is time some full runs to see if there is much difference (which may fall within the normal variance between runs)

kladner 2012-08-01 22:54

Thanks Pete,

There's more going on than I realized. I'm still gathering information to describe it. I'll start a new thread when I have it together.

EDIT: Suffice it to say that the BIOS may not have had a causal relationship with the speed change.

LaurV 2012-08-02 02:51

[QUOTE=kladner;306667]say that the BIOS may not have had a causal relationship with the speed change.[/QUOTE]
That was what I wanted to say, but wasn't sure how to put it, I have asus maximus mobos with multiple bios, you use a switch on the case and select which bios you want to launch (very good for guys who overclock or make a lot of hardware experiments as I do). What I want to say, in spite of the fact their setting are totally different, there is no difference in mfaktc running under different bios'es, beside of the difference given by overclocking - faster CPU sieves faster.

On the other hand, SievePrimes stabilizing higher (or lower) have nothing to do with the general speed, I mean, one ca not say which one is better/faster just looking at those screens. Say I raise the CPU clock in bios, overclock few %, then the sieveprime can get higher (or not) and the speed of TF will increase. Now assume I go to the bios and do strange settings for PCIe slots, insert some wait states, reduce apertures, whatever, make the GPU or the communication slower, the effect would be the same: the CPU will have time to sieve more, but the "global" effect will be diminishing mfaktc output.

The only way to say which one is better, as bcp suggested already, is to take the stopwatch and count... There are more thinks involved, sometime you can get few percents more output, but the system will generate a lot more heat and waste a lot more energy, raising the electricity bill. You can overclock and have an apparently FASTER output rate, like 20%, 30%, whoaaaa, but if the system become unstable and you have to repeat one test in 5 (cudalucas) or miss factors (mfaktc) then you will get the same output at the end, only will have to pay more money for electricity, and you don't help the project. You should play around a while, then select the run mode which is most STABLE. If the speed come with it, well, that is bonus. What I want to say, after thousands of experiments and burned fingers, I reached the conclusion that the fastest way is not always the best.

kladner 2012-08-02 03:46

[QUOTE=LaurV;306684]That was what I wanted to say, but wasn't sure how to put it,.....[/QUOTE]

I lost track of how many tweaks I had in place at various times. When I did the initial BIOS change I also cut back all OC's, etc. I'm not sure if there was still some blip, but now I am reproducing the results I had earlier.....with NumStreams=5 and mfaktc priority=Normal or higher.

In the meantime, I ran through various BIOS and nVidia driver versions. Currently, I have a rather old BIOS (though not the oldest [on the MB CD]), and GF drivers 285.62. The other variable I overlooked was setting NumStreams=5 instead of 3, combined with Normal or High Priority. This gives great results on a Windows machine as long as you don't need to use it for much else.


All times are UTC. The time now is 07:32.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.