mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2011-09-15, 13:23   #1200
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

3·2,141 Posts
Default

Quote:
They both run with the same Sieve Prime setting. (And auto-adjusting is on.)
This does not occur in Windows. With the same affinities and Prime95 running 1-6, I still get around 165-170 with mfaktc.
The CPU numbering isn't necessarily the same in Windows and Linux; it's entirely possible that Windows has the pairing of hyperthreads as (12)(34)(56)(78) whilst, on my i7/920, I know that Linux has (18)(27)(35)(46).

Could you put up the result from 'grep apicid /proc/cpuinfo' ?
fivemack is offline   Reply With Quote
Old 2011-09-16, 05:25   #1201
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Hmm... Since I booted back to Windows to get back to the really fast rates, suddenly Windows only gets 75-80M/s... I didn't change anything!!!! What the hell?!?!? :(
Also, in Windows, it increases Sieve Primes massively until the average wait is down to a decent number; that doesn't happen in Linux with 75-80M/s rate. So confused x(
EDIT: Let me clarify. Currently, mfaktc-lin is running ~90M/s, Sieve Primes=5000 (adjust=1) and avg. wait is <100 us. Most recently in Windows, mfaktc was running 90M/s rate, 5000 SP, and 3000+ us avg. wait at start, which would shift to 75M/s, 15000+ SP, and 300-400 us avg wait after enough classes completed. I don't know why/how this behavior changed from the 170M/s +, as far as I know I didn't change anything.

Imma boot to Linux now and run that command. On the other hand, I remember testing it, and it seemed pretty clear to me that it was (12)(34)(56)(78), even though MPrime detects (15)(26)(37)(48). Huh.

This computer hates me.

EDIT:
Code:
apicid		: 0
initial apicid	: 0
apicid		: 2
initial apicid	: 2
apicid		: 4
initial apicid	: 4
apicid		: 6
initial apicid	: 6
apicid		: 1
initial apicid	: 1
apicid		: 3
initial apicid	: 3
apicid		: 5
initial apicid	: 5
apicid		: 7
initial apicid	: 7
Could you please explain what exactly APIC id is?

Also also, each OS seems incapable of reading the checkpoint files of the other. Should I be converting those too?

Last fiddled with by Dubslow on 2011-09-16 at 05:45
Dubslow is offline   Reply With Quote
Old 2011-09-16, 10:25   #1202
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi,

Quote:
Originally Posted by Dubslow View Post
Also also, each OS seems incapable of reading the checkpoint files of the other. Should I be converting those too?
this is not a CR/CRLF/LF problem. For safety reason mfaktc refuses to load checkpoint files from other versions than itself. Windows and Linux versions have a different version string.

I might change this in the future but not now.

Oliver
TheJudger is offline   Reply With Quote
Old 2011-09-16, 10:52   #1203
fivemack
(loop (#_fork))
 
fivemack's Avatar
 
Feb 2006
Cambridge, England

11001000101112 Posts
Default

Quote:
Originally Posted by Dubslow View Post
Code:
apicid		: 0
initial apicid	: 0
apicid		: 2
initial apicid	: 2
apicid		: 4
initial apicid	: 4
apicid		: 6
initial apicid	: 6
apicid		: 1
initial apicid	: 1
apicid		: 3
initial apicid	: 3
apicid		: 5
initial apicid	: 5
apicid		: 7
initial apicid	: 7
Could you please explain what exactly APIC id is?
It's a hardware identifier for the processors; the two hyperthreads on processor n have ID 2n and 2n+1, so what you're seeing is (15)(26)(37)(48) as mprime has picked up. I've no idea how Linux assigns the processor numbers that it uses ... I'm reasonably trusting that, on an idle single-chip machine, a job running four threads will get them on four distinct processors rather than having two run in hyperthreads on the same processor, but on more complicated systems it doesn't seem to work as perfectly as I would hope.
fivemack is offline   Reply With Quote
Old 2011-09-16, 14:59   #1204
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

1C3516 Posts
Default

Urrgggg....
Now Windows is back to 165-170M/s again and I still haven't changed anything...
In fact, I haven't even shut it down in any way since the last time it was at 75M/s...
Thanks fivemack, I'll look into modifying my MPrime settings.
Dubslow is offline   Reply With Quote
Old 2011-09-16, 21:14   #1205
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

RRRRRRRAAAAAAAAAGGGGGGGGGEEEEEEEEEE

mfaktc was going at 165M/s. I closed and immediately reopened it, and then it went down to 70M/s again. WHAT CHANGEDa
sdfnva'jub;gvsopofIZJX
IOHNS
io'j;JUB;;kbguxhdvc
Dubslow is offline   Reply With Quote
Old 2011-09-16, 21:51   #1206
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Dubslow: just to be sure
  • precompiled binary or selfcompiled?
  • settings in mfaktc.ini (if not default)?

Did you check if you GPU runs in highest performance mode? On Windows you can use GPU-Z to check the actuall GPU speed (Tab: Sensors). On Linux the you can start nvidia-settings to view the actual performance mode.

Oliver
TheJudger is offline   Reply With Quote
Old 2011-09-17, 09:09   #1207
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3×29×83 Posts
Default

Pre compiled, 1.7. I couldn't compile anything in Windows for the life of me. I think I had 4 streams, SP=5000. Note: normally I keep SPAdjust=1, but when it's at 90M/s, the avg. wait skyrockets, so the SP goes up, which winds up reducing the avg. rate to 75M/s, so I decided to keep SP stuck to 5000 until I can get back up to previous level, i.e. 165-170M/s.
Code:
SievePrimes=5000
# Set this to 1 to enable automatically adjustments of SievePrimes during

# runtime based on the "average wait times".


SievePrimesAdjust=0


# Set the number of CUDA streams / data sets used by mfaktc.

# NumStreams must be >= 1. In this case mfaktc can process one stream /

# data set on the GPU while the GPU can preprocess the other one. When

# NumStreams is >= 2 than the time needed to upload (CPU->GPU transfer)

# the data sets can be hidden (if the hardware supports it (most GPUs are

# supporting this)).

# On Linux systems 2 or 3 seems a good numbers. There are comments that

# Windows systems need a greater number of streams.

# A greater number increases the memory consumed by mfaktc (host and GPU

# memory). The current limit for the number of streams is 10!


NumStreams=4
# Set the number of data sets which can be preprocessed on CPU. This allows

# to tolerate more jitter on runtime of preprocessing and GPU stream

# runtime.


CPUStreams=4
# The GridSize affects the number of threads per grid.

# Depending on the number of multiprocessors of your GPU, too, the

# automatic parameter threads per grid is set to:

#   GridSize = 0:  65536 < threads per grid <=  131072

#   GridSize = 1: 131072 < threads per grid <=  262144

#   GridSize = 2: 262144 < threads per grid <=  524288

#   GridSize = 3: 524288 < threads per grid <= 1048576 (default)

# A smaller GridSize has more overhead than a bigger GridSize for long

# running jobs. For really small jobs there can be a small benefit on

# computation time if the GridSize is small. A smaller GridSize directly

# reduces the runtime per kernel launch and might result in a better

# interactivity one older GPUs.


GridSize=3
# WorkFile: the name of the file which contains the factoring assignments.

# e.g.

# worktodo.ini (Prime95 v24 and earlier)

# worktodo.txt (Prime95 v25 and newer)


WorkFile=worktodo.txt



# Checkpoints = 0: disable checkpoints

# Checkpoints = 1: enable checkpoints

# Checkpoints are needed for resume capability, after a class is finished a

# checkpoint file is written. When mfaktc is interrupted during the run and

# restarted later it will begin at the last processed class.


Checkpoints=1



# Allow to split an assignment into multiple bit ranges.

# 0 = disabled

# 1 = enabled

# Enabled Stages make only sense when StopAfterFactor is 1 or 2.


Stages=0


# possible values for StopAfterFactor:

# 0: Do not stop the current assignment after a factor was found.

# 1: When a factor was found for the current assignment stop after the

#    current bitlevel. This makes only sense when Stages is enabled.

# 2: When a factor was found for the current assignment stop after the

#    current class.


StopAfterFactor=2


# possible values for PrintMode:

# 0: print a new line for each finished class

# 1: overwrite the current line (more compact output)


PrintMode=1


# allow the CPU to sleep if nothing can be preprocessed?
# 0: Do not sleep if the CPU must wait for the GPU

# 1: The CPU can sleep for a short time if it has to wait for the 
GPU

AllowSleep=0
The frustrating thing is that I didn't change anything.

I do know it's not related to GPU clock rate. (I use MSI Afterburner in Windows.) When it's at 165M/s, changing the clock rate directly affects the avg. rate. I just tested the 90M/s half rate now: I verified it was somewhat OC'ed. I killed mfaktc, reset the clock to stock, restarted mfaktc, got exactly the same avg. rate (90M/s), then upped the rate back to slight OC (with mfaktc running this time) and the avg. rate didn't budge. (I have previously tested that avg. rate changes with clock rate changes, while mfaktc is running.) This seems to indicate to me that it's something funky with the program. GPU load is 86-87%, same as it was when running at 165M/s (I got that reading sometime last week.)

EDIT: Oh my god Windows/Linux I swear there's no extra lines in mfaktc.cfg or whatever it's called. lol, that's what you guys said :P:)

Last fiddled with by Dubslow on 2011-09-17 at 09:13
Dubslow is offline   Reply With Quote
Old 2011-09-17, 09:21   #1208
Dubslow
Basketry That Evening!
 
Dubslow's Avatar
 
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88

3·29·83 Posts
Default

Hmm, just rebooted, now it's back to 170M/s, though as I said a few posts back, it's not just the rebooting...
How can I restart the graphics driver without rebooting?
Also, GPU-Z now shows 75-76% load, not the 85% I said earlier.
Dubslow is offline   Reply With Quote
Old 2011-09-17, 09:50   #1209
Ralf Recker
 
Ralf Recker's Avatar
 
Oct 2010

191 Posts
Default

Quote:
Originally Posted by Dubslow View Post
I do know it's not related to GPU clock rate. (I use MSI Afterburner in Windows.) When it's at 165M/s, changing the clock rate directly affects the avg. rate. I just tested the 90M/s half rate now: I verified it was somewhat OC'ed. I killed mfaktc, reset the clock to stock, restarted mfaktc, got exactly the same avg. rate (90M/s), then upped the rate back to slight OC (with mfaktc running this time) and the avg. rate didn't budge.
This is probably the driver downclock bug.
Ralf Recker is offline   Reply With Quote
Old 2011-09-17, 14:34   #1210
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Quote:
Originally Posted by Ralf Recker View Post
This is probably the driver downclock bug.
Yes, exiting from CUDA applications incorrectly causes top perf. levels clocks to be stuck at 410 Mhz core.
Karl M Johnson is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 10:25.


Mon Aug 2 10:25:43 UTC 2021 up 10 days, 4:54, 0 users, load averages: 1.96, 1.33, 1.20

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.