mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-10-12, 05:16   #23
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by Chuck View Post
I am using the EVGA precision utility which allows for overclocking, but also displays GPU usage statistics. You could also use MSI Afterburner, a different skin but performs the same functions.

Chuck
Or GPU-Z
Dubslow is offline   Reply With Quote
Old 2011-10-12, 05:22   #24
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by NBtarheel_33 View Post
How much CPU should typically be allocated to each instance of mfaktc? I was working with an 8-core Nehalem Xeon system @ 2.66 GHz with 2 Fermi GPUs. At first, I had Prime95 TFing on all 8 cores, as well as 2 instances of mfaktc going on the GPUs. I noticed that three of the CPU cores were slower than the others, so I stopped their workers, and ran 5 TF CPU cores, and two instances of mfaktc. Things picked up nicely at this point. I also experimented with a third instance of mfaktc; it slowed the GPUs down into the 65-70M/sec range.

If we have X GPUs, should we run exactly X copies of mfaktc, or does it make more efficient use of the GPUs to run more?
It's not mfaktc threads per GPU, it's CPU cores per GPU, and it's just an accident that it's one mfaktc thread per CPU core.

Rephrase that: One CPU core may or may not be enough to saturate the GPU. If not, run a second core on the same GPU. In order to actually do this, you run a second mfaktc instance. So figure out how many cores are necessary to saturate the GPU, then run that number of mfaktc's.

Now, the other MAJOR piece of advice: please Please PLEASE set each mfaktc thread to run on a specific core. When I ran three cores P95 with one mfaktc, the P95's got about 80% efficiency, and the fourth core got 50% efficiency, without setting mfaktc to a specific core. I figured out (with help on the forum) how to do that, and both P95 and mfaktc performance went up dramatically. The Windows or Linux scheduler isn't designed to handle such specific things. In Windows, you can set the "affinity" of each process under the Process tab on the Task Manager by right clicking on whatever process you want. Please try this before putting more than one CPU core per GPU. I'd think one Nehalem core would be pretty good for a GPU, unless it's like a 590 or something.

Note: You'll need to know which cores P95 is using to avoid setting the wrong affinity.

Last fiddled with by Dubslow on 2011-10-12 at 05:23
Dubslow is offline   Reply With Quote
Old 2011-10-12, 14:35   #25
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

25·29 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Or GPU-Z
Right, that's a better choice since it is a monitoring utility. I have it on the desktop and forgot about it since I haven't used it for a few months.

Chuck
Chuck is offline   Reply With Quote
Old 2011-10-12, 14:45   #26
Chuck
 
Chuck's Avatar
 
May 2011
Orange Park, FL

3A016 Posts
Default

Quote:
Originally Posted by Dubslow View Post
It's not mfaktc threads per GPU, it's CPU cores per GPU, and it's just an accident that it's one mfaktc thread per CPU core.

Rephrase that: One CPU core may or may not be enough to saturate the GPU. If not, run a second core on the same GPU. In order to actually do this, you run a second mfaktc instance. So figure out how many cores are necessary to saturate the GPU, then run that number of mfaktc's.

Now, the other MAJOR piece of advice: please Please PLEASE set each mfaktc thread to run on a specific core. When I ran three cores P95 with one mfaktc, the P95's got about 80% efficiency, and the fourth core got 50% efficiency, without setting mfaktc to a specific core. I figured out (with help on the forum) how to do that, and both P95 and mfaktc performance went up dramatically. The Windows or Linux scheduler isn't designed to handle such specific things. In Windows, you can set the "affinity" of each process under the Process tab on the Task Manager by right clicking on whatever process you want. Please try this before putting more than one CPU core per GPU. I'd think one Nehalem core would be pretty good for a GPU, unless it's like a 590 or something.

Note: You'll need to know which cores P95 is using to avoid setting the wrong affinity.
I have tried fiddling with the CPU affinities, but any changes I make have ended up slowing things down.
Chuck is offline   Reply With Quote
Old 2011-10-12, 22:42   #27
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

Hmm. My gut-reaction guess would be that without setting a cpu, the scheduler sets as much time on as many cores as mfaktc can use, but then setting the affinity limits the cpu to the amount of work that one core can do. This would mean that mfaktc uses more than one core's worth of power to keep up with the 580, and without the affinities set, the scheduler lets it have that time. Have you tried setting mfaktc to two or three cores?

Let me do that again after reviewing your hardware: You have two separate instances of mfaktc to saturate just one 580, and four simultaneous P95 threads? Okay. Do you use the "best cpu" setting in P95, or do you set that to use specific CPU's? If you set mfaktc affinities without P95 doing the same, the scheduler could still create interference between the two. If P95 is already set, then...

Try one mfaktc instance with affinity set to two cores (obviously the ones not in use by P95). If not, try the two instances you currently use, and set them each to one core. If not, then I'm not really sure. What cpu affinity settings have you already tried?
Dubslow is offline   Reply With Quote
Old 2011-10-13, 03:31   #28
Christenson
 
Christenson's Avatar
 
Dec 2010
Monticello

5·359 Posts
Default

Quote:
Originally Posted by Chuck View Post
I have tried fiddling with the CPU affinities, but any changes I make have ended up slowing things down.
Some things to be aware of, but you may already know:
1) P95 tries to run at lowest priority, and has some intelligence about cores and threads.
2) mfaktc, as currently written, is 1 instance, one thread. When it starts communicating automatically, it will gain a communications thread, but it still is really not aware of processes or threads except on the GPU side. So each instance is running at whatever priority it inherits from the window it starts in, unless you fool with it in task manager.
3) mfaktc is modelled as Sieve on CPU feeds factor candidates to test on GPU. That means that increasing CPU performance to feed more or better factor candidates takes CPU away from P95.
Christenson is offline   Reply With Quote
Old 2012-07-16, 23:08   #29
TObject
 
TObject's Avatar
 
Feb 2012

34×5 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Now, the other MAJOR piece of advice: please Please PLEASE set each mfaktc thread to run on a specific core. When I ran three cores P95 with one mfaktc, the P95's got about 80% efficiency, and the fourth core got 50% efficiency, without setting mfaktc to a specific core. I figured out (with help on the forum) how to do that, and both P95 and mfaktc performance went up dramatically. The Windows or Linux scheduler isn't designed to handle such specific things. In Windows, you can set the "affinity" of each process under the Process tab on the Task Manager by right clicking on whatever process you want.
This needs to be placed somewhere on the GPU FAQ. I just came upon this piece of advice; tried it, and it is indeed very important.
TObject is offline   Reply With Quote
Old 2012-07-27, 09:52   #30
NormanRKN
 
NormanRKN's Avatar
 
Jul 2012
Saarland / Germany

22·17 Posts
Default

2^?
what is the max exponent i can use in mfaktc ?
NormanRKN is offline   Reply With Quote
Old 2012-07-27, 10:18   #31
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
"name field"
Jun 2011
Thailand

41·251 Posts
Default

Bigger then you need. Theoretically unlimited. As the exponent goes higher, less candidates have to be tested for each bit level, and the amount of work that need to be done for the same bit level is lower. Practically, there might be a hardware cap somewhere at 64, or 80, or 96 bits, who cares? It would not make too much sense to go over 10G.

The range below 1G is handled by Primenet, the higher ranges by OBD projects. Beside of strange experimenting, there will be totally NO NEED to go higher then 10G.

Better question would be "what is the maximum bitlevel" for a factor. That is somewhere at 92 or 96 bits, again bigger then one needs with the current available hardware (speeds).
LaurV is offline   Reply With Quote
Old 2012-07-27, 10:21   #32
axn
 
axn's Avatar
 
Jun 2003

23·683 Posts
Default

I think, as currently designed, mfaktc has a hard limit of 32-bit exponent (2^32-1). There might be other internal limitations that reduces this even further. Try running mfaktc with 4294967291, the largest 32-bit prime, and see if it accepts it.
axn is offline   Reply With Quote
Old 2012-07-27, 10:26   #33
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Quote:
Originally Posted by NormanRKN View Post
2^?
what is the max exponent i can use in mfaktc ?
Quote:
Originally Posted by LaurV View Post
Bigger then you need. Theoretically unlimited. As the exponent goes higher, less candidates have to be tested for each bit level, and the amount of work that need to be done for the same bit level is lower. Practically, there might be a hardware cap somewhere at 64, or 80, or 96 bits, who cares? It would not make too much sense to go over 10G.

The range below 1G is handled by Primenet, the higher ranges by OBD projects. Beside of strange experimenting, there will be totally NO NEED to go higher then 10G.

Better question would be "what is the maximum bitlevel" for a factor. That is somewhere at 92 or 96 bits, again bigger then one needs with the current available hardware (speeds).
Quote:
Originally Posted by axn View Post
I think, as currently designed, mfaktc has a hard limit of 32-bit exponent (2^32-1). There might be other internal limitations that reduces this even further. Try running mfaktc with 4294967291, the largest 32-bit prime, and see if it accepts it.
I found a limit on the size of k in the valid assignment checker.

Code:
int ret = 1;
  
       if(exp < 1000000)      {ret = 0; if(verbosity >= 1)printf("WARNING: exponents < 1000000 are not supported!\n");}
  else if(!isprime(exp))      {ret = 0; if(verbosity >= 1)printf("WARNING: exponent is not prime!\n");}
  else if(bit_min < 1 )       {ret = 0; if(verbosity >= 1)printf("WARNING: bit_min < 1 doesn't make sense!\n");}
  else if(bit_min > 94)       {ret = 0; if(verbosity >= 1)printf("WARNING: bit_min > 94 is not supported!\n");}
  else if(bit_min >= bit_max) {ret = 0; if(verbosity >= 1)printf("WARNING: bit_min >= bit_max doesn't make sense!\n");}
  else if(bit_max > 95)       {ret = 0; if(verbosity >= 1)printf("WARNING: bit_max > 95 is not supported!\n");}
  else if(((double)(bit_max-1) - (log((double)exp) / log(2.0F))) > 63.9F) /* this leave enough room so k_min/k_max won't overflow in tf_XX() */
                              {ret = 0; if(verbosity >= 1)printf("WARNING: k_max > 2^63.9 is not supported!\n");}
With those limits, the maximum exponent is somewhere around 1.15081*109, or just north of 1G, or just over a quarter of axn's limit.

http://www.wolframalpha.com/input/?i...1.15081*10%5E9

Edit: It's probably not that hard to modify the code to go higher, but I'll let axn/TheJudger/other knowledgeable persons speak to that.

Last fiddled with by Dubslow on 2012-07-27 at 10:30
Dubslow is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
SievePrimes is too big for the current assignment mattmill30 GPU to 72 21 2017-02-01 00:20
4000 < k < 5000 otutusaus Riesel Prime Data Collecting (k*2^n-1) 5 2012-03-07 20:01
5000 < k < 6000 justinsane Riesel Prime Data Collecting (k*2^n-1) 26 2010-12-31 12:27
Factoring on a 5000+ jasong Hardware 3 2006-06-17 08:50
Top-5000 List edorajh Riesel Prime Search 17 2006-03-28 21:57

All times are UTC. The time now is 15:10.


Fri Jul 7 15:10:35 UTC 2023 up 323 days, 12:39, 0 users, load averages: 0.78, 1.04, 1.11

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔