![]() |
|
|
#771 |
|
Mar 2010
6338 Posts |
From what I've heard, 4.0 is total bullshift, except inline ptx assembly support.
|
|
|
|
|
|
#772 | |
|
"Mike"
Aug 2002
2·23·179 Posts |
No, but we will now!
![]() Quote:
|
|
|
|
|
|
|
#773 | |
|
Dec 2010
Monticello
5×359 Posts |
Quote:
But I need to be careful, or I'm gonna get myself signed up to audit for this problem and maybe rewrite the readme.txt a bit. |
|
|
|
|
|
|
#774 |
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
|
|
|
|
|
|
#775 |
|
Mar 2003
Melbourne
5×103 Posts |
|
|
|
|
|
|
#776 |
|
"Mike"
Aug 2002
823410 Posts |
We upped "SievePrimes" higher but the GPU load dropped dramatically. The CPU usage did not change (the core tied to the instance was already at 100%) but if each instance is tied to a core then that makes sense. We expected the system memory to get used more but it remained stable. 2GB (!) in a box would be very usable with plenty of headroom.
If we ran two instances can we tie two cores to each instance? At this point we would rather have (if they sold them) a silly fast dual core CPU than a moderately fast (3.3GHz) quad core. (Again, we are not going to overclock.) Turbo mode (3.7GHz) never kicks in for us. We doubt the GTX 580 would be dramatically ($150) better. For fun tonight we are going to try playing with "SievePrimes" to see if we can alter each instance/core to drop the core to less than all out. We still have not decided whether or not to run 2 or 3 instances. We did, however, turn in 740GHz/days of work today, which we think represents more than a day but less than two days work. |
|
|
|
|
|
#777 | ||
|
"Oliver"
Mar 2005
Germany
11×101 Posts |
Quote:
No, one core per instance (no multithreading in the CPU part). Quote:
Oliver |
||
|
|
|
|
|
#778 |
|
Oct 2010
191 Posts |
on my 64 bit Linux box.
Code:
mfaktc v0.16p1 Compiletime options THREADS_PER_GRID_MAX 1048576 THREADS_PER_BLOCK 256 SIEVE_SIZE_LIMIT 32kiB SIEVE_SIZE 193154bits SIEVE_SPLIT 250 VERBOSE_TIMING disabled MORE_CLASSES enabled Runtime options SievePrimes 25000 SievePrimesAdjust 1 NumStreams 3 CPUStreams 3 WorkFile worktodo.txt Checkpoints enabled Stages enabled StopAfterFactor bitlevel PrintMode 0 CUDA device info name GeForce GTX 470 compute capability 2.0 maximum threads per block 1024 number of multiprocessors 14 (448 shader cores) clock rate 1215MHz CUDA version info binary compiled for CUDA 4.0 CUDA driver version 4.0 CUDA runtime version 4.0 Automatic parameters threads per grid 917504 running a simple selftest... Selftest statistics number of tests 31 successfull tests 31 selftest PASSED! Code:
Selftest statistics number of tests 4914 successfull tests 4914 selftest PASSED! Last fiddled with by Ralf Recker on 2011-04-23 at 08:36 Reason: Added mfaktc output |
|
|
|
|
|
#779 |
|
"Mike"
Aug 2002
100000001010102 Posts |
For fun, we measured the throughput of our current system. It turns out to be 1 GHz day every 2 minutes and 40 seconds.
Question: If we wanted to run all of the instances in one directory, is there a way to specify individual "worktodo.txt" and "results.txt" files per instance? |
|
|
|
|
|
#780 |
|
"Mike"
Aug 2002
2×23×179 Posts |
We are still stuck on Windows, so we were mucking about and remembered the "start" command and the "affinity" option. (In Linux we never give processor or core affinity much thought.)
Anyways, using (or not using) core affinity, we see two different profiles. 1 - Two cores pegged and at ~65°C. Two cores idle and at ~56°C. 2 - All four cores share the load and all at ~60°C. We have attached two images from "Tast Manager". Questions: 1 - Is running individual cores better than averaging the cores out? Which two cores should one choose? Is running individual cores hotter an issue? 2 - Without having an instance tied to a core, is a lot of efficiency lost to context switching? |
|
|
|
|
|
#781 | |
|
Mar 2003
Melbourne
10038 Posts |
Quote:
To preempt another question - how does affinity work in windows? I had trouble finding suitable help on the topic. By trial and error I found the affinity value to be a bitmask hex value. i.e. if you take your cpu as cores 0,1,2,3, the hex affinity mask becomes 1 to run on core0 ,2 to run on core1 ,4,8. etc.... The mask can be set to 3 to be affinity on core0 and core1. But mfaktc can only take advantage of one core at a time. If you have a HT cpu with 4 real cores, and 4x virtual cores, the mask becomes 1,4,10,40 to run the processes on different real cores. mfaktc needs a real core. Also I run mfaktc with low priority so the system and any other processes aren't affected. I also install bash from cygwin to give me a unix-y shell. The bash script for me becomes: Code:
#!/bin/bash AFFINITY=`cat limit.affinity` cmd.exe /C start /low /affinity $AFFINITY ./mfaktc-win-64.exe -- Craig |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mfakto: an OpenCL program for Mersenne prefactoring | Bdot | GPU Computing | 1676 | 2021-06-30 21:23 |
| The P-1 factoring CUDA program | firejuggler | GPU Computing | 753 | 2020-12-12 18:07 |
| gr-mfaktc: a CUDA program for generalized repunits prefactoring | MrRepunit | GPU Computing | 32 | 2020-11-11 19:56 |
| mfaktc 0.21 - CUDA runtime wrong | keisentraut | Software | 2 | 2020-08-18 07:03 |
| World's second-dumbest CUDA program | fivemack | Programming | 112 | 2015-02-12 22:51 |