mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2010-07-10, 14:01   #309
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi,

I just re-read the CUDA documentation about streams... it is a bad idea to hope that the independent streams are executed in a specific order.

I'll try to fix this in 0.10.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-07-10, 17:41   #310
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

Quote:
Originally Posted by ckdo View Post
ERROR: cudaGetLastError() returned 8: invalid device function
A simple "--gpu-architecture sm_10" flag while compiling fixes that.
Karl M Johnson is offline   Reply With Quote
Old 2010-07-10, 19:27   #311
Ethan (EO)
 
Ethan (EO)'s Avatar
 
"Ethan O'Connor"
Oct 2002
GIMPS since Jan 1996

22·23 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Hi Ethan,



my first guess is that you modify those h_ktabs before they are uploaded completely to the GPU.
Ah, good call! In fact I was always writing the next h_ktab directly on top of the one most recently queued for transfer, so this would be a very likely occurrence. The failures are non-deterministic so this argues for the same problem.
Ethan (EO) is offline   Reply With Quote
Old 2010-07-13, 00:04   #312
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2×5×61 Posts
Default Benchmark in GTX 460

Hi TheJudger,
Quote:
Originally Posted by TheJudger View Post
Here is mfaktc 0.09!
My GTX460 is cheaper version.
Inno3D N460-1DDN-G5GW
GDDR5 768MB (192bit-bus)
core clock:675MHz
mem clock:3600MHz (real clock:900MHz)

Thank you,
Attached Files
File Type: gz log.txt.gz (1.3 KB, 88 views)
msft is offline   Reply With Quote
Old 2010-07-13, 09:01   #313
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

21278 Posts
Default

Hi msft,

thank you for the benchmark.

I can see that the GTX 460 ist at least as fast as my GTX 275 for the 75bit kernel. Seems to be still CPU limited (SievePrimes too high).
Can you rerun it with the mfaktc.ini in the current directory (it seems to be missing in your run, mfaktc.ini contains the runtime parameters)?

Thank you!

Oliver
TheJudger is offline   Reply With Quote
Old 2010-07-13, 09:31   #314
msft
 
msft's Avatar
 
Jul 2009
Tokyo

10011000102 Posts
Default

Quote:
Originally Posted by TheJudger View Post
Can you rerun it with the mfaktc.ini in the current directory (it seems to be missing in your run, mfaktc.ini contains the runtime parameters)?
rerun.
Attached Files
File Type: gz mfaktc.log.gz (1.2 KB, 84 views)
msft is offline   Reply With Quote
Old 2010-07-13, 10:14   #315
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

45716 Posts
Default

Thank you, msft!

so for the 75 bit kernel we have
~175% performance compared to my (mild factory overclocked) GTX 275
~55% performance of an GTX 480

The 1GiB variant of the GTX 460 shouldn't change those timings, memory bandwidth isn't very important for mfaktc.
There are allready some GTX 460 with 1600MHz shader clock rate (1350MHz default)

You made my decission easy... I'll upgrade to a GTX 460!
According to the reviews in the web the GTX 460 consumes less power than my GTX 275 and generate less noise.

Oliver
TheJudger is offline   Reply With Quote
Old 2010-07-13, 11:17   #316
Karl M Johnson
 
Karl M Johnson's Avatar
 
Mar 2010

3·137 Posts
Default

msft, I wonder why your GTX 460 has only 224 unified shaders.
Should be 336.
Could mfaktc be wrong?
Karl M Johnson is offline   Reply With Quote
Old 2010-07-13, 11:50   #317
Aillas
 
Aillas's Avatar
 
Oct 2002
France

33×5 Posts
Default nvidia Quatro doesn't work

Hi,

I compile mfakt 0.09 (with CUDA 3.1) and the program doesn't run.

Config:
Ubuntu 10.04
nvidia driver 256.35
CUDA 3.1
GPU: NVIDIA QUATTRO 140M

when I run mfakt -st here is the result:

Code:
Compiletime Options
  THREADS_PER_GRID    983040
  THREADS_PER_BLOCK   256
  SIEVE_SIZE_LIMIT    32kiB
  SIEVE_SIZE          230945bits
  VERBOSE_TIMING      disabled
  MORE_CLASSES        disabled

Runtime Options
  SievePrimes         25000
  SievePrimesAdjust   1
  NumStreams          3
  WorkFile            worktodo.txt
  Checkpoints         enabled

CUDA device info
  name:                      Quadro NVS 140M
  compute capabilities:      1.1
  maximum threads per block: 512
  number of multiprocessors: 2 (16 shader cores)
  clock rate:                800MHz

cudaStreamCreate() failed
Any suggestion ?

PS:(I also tried the compilation option --gpu-architecture=sm_10)

Thanks
Aillas is offline   Reply With Quote
Old 2010-07-13, 11:53   #318
msft
 
msft's Avatar
 
Jul 2009
Tokyo

2·5·61 Posts
Default

Quote:
Originally Posted by Karl M Johnson View Post
Could mfaktc be wrong?
Maybe.
Quote:
if(deviceinfo.major == 1)i=8; /* device with compute capability 1.x have 8 shader cores per multiprocessor */
else if(deviceinfo.major == 2)i=32; /* assuming 32 shader cores per multiprocessor for compute capability 2.x */
printf(" number of multiprocessors: %d (%d shader cores)\n", deviceinfo.multiProcessorCount, deviceinfo.multiProcessorCount * i);
Need "48"...
msft is offline   Reply With Quote
Old 2010-07-13, 12:23   #319
TheJudger
 
TheJudger's Avatar
 
"Oliver"
Mar 2005
Germany

11×101 Posts
Default

Hi Aillas,

Quote:
Originally Posted by Aillas View Post
Code:
Ubuntu 10.04
nvidia driver 256.35
CUDA 3.1
GPU: NVIDIA QUATTRO 140M
...
cudaStreamCreate() failed
Any suggestion ?
no, not really.
Did you try the examples from the CUDA SDK?
---
hi msft, Karl

Quote:
Originally Posted by msft View Post
Maybe.

Need "48"...
Yes, I need to adjust this. But it is only a cosmetic error. I calculate the number of shader cores only for display, nothing depends on this calculation.
I thought that some users might feel uncomfortable with the number of multiprocessors: "Hey, my GPU has e.g. 256 cores, why does mfaktc only show 32 (multiprocessors)?")
The calculation of shader cores was easy before Fermi: just multiply the number of multiprocessors by 8.
Now Nvidia has other configurations, too...
32 cores per multiprocessor (compute capabiltiy 2.0 / GTX 465/470/480)
48 cores per multiprocessor (compute capabiltiy 2.1 / GTX 460)
But Nvidia doesn't tell the number of cores per multiprocessor before they launch the products... But again, it is just a cosmetic issue!

Oliver
TheJudger is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
mfakto: an OpenCL program for Mersenne prefactoring Bdot GPU Computing 1676 2021-06-30 21:23
The P-1 factoring CUDA program firejuggler GPU Computing 753 2020-12-12 18:07
gr-mfaktc: a CUDA program for generalized repunits prefactoring MrRepunit GPU Computing 32 2020-11-11 19:56
mfaktc 0.21 - CUDA runtime wrong keisentraut Software 2 2020-08-18 07:03
World's second-dumbest CUDA program fivemack Programming 112 2015-02-12 22:51

All times are UTC. The time now is 06:00.


Fri Aug 6 06:00:56 UTC 2021 up 14 days, 29 mins, 1 user, load averages: 3.13, 3.16, 3.14

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.