mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2015-07-25, 22:26   #1310
glebm
 
"Gleb Mazovetskiy"
Jul 2015
London, UK

22 Posts
Default

Fixing int sign change warnings had no effect. Since GPU sieve works, I set GWDEBUG in gpusieve.cl and got lots of output like this:
Code:
sloppy_mod_p: p doesn't match pinv!! p = 713509, pinv = 6018 (should be 6019)
When running without any arguments (a simple selftest) I get:
Code:
ERROR: selftest failed for M1031831 (cl_barrett15_69_gs)
  no factor found
Selftest statistics
  number of tests           30
  successful tests          29
  no factor found           1

selftest FAILED!
glebm is offline   Reply With Quote
Old 2015-08-07, 04:09   #1311
maxa
 
Jul 2015

116 Posts
Default Fiji GPU

When I try to run mfakto on a R9 Fury X it tells me:

Code:
 
Select device - Get device info - Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Fiji (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto1.ini to select a GPU type yourself to avoid this
 warning.
OpenCL device info
  name                      Fiji (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 2.0 AMD-APP (1800.8) (1800.8 (VM))
  maximum threads per block 256
  maximum threads per grid  16777216
  number of multiprocessors 64 (4096 compute elements)
  clock rate                1050MHz
Automatic parameters
  threads per grid          256
  optimizing kernels for    GCN
Started a simple selftest ...
######### testcase 4/17 (M50863909[69-70]) #########
mfakto will exit once the current test is finished.
and then hangs in testcase 4/17.
Is there any setting I can change to make it work or does it require an updated version of mfakto?
maxa is offline   Reply With Quote
Old 2015-08-09, 19:23   #1312
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by glebm View Post
The latest source from GitHub, compiled Visual Studio 2013 and App SDK 3.0 Beta running on Tonga (R9 380) fails both selftests (0.14) works.
Version 0.14 is currently the latest stable one. Many of the features for 0.15 are unfinished yet; the version from github cannot be used right now (and it's not because of the unsigned warnings).

I need to find some time to fix the new features ...
Bdot is offline   Reply With Quote
Old 2015-08-09, 19:30   #1313
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

11258 Posts
Default

Quote:
Originally Posted by maxa View Post
When I try to run mfakto on a R9 Fury X it tells me:

Code:
 
Select device - Get device info - Loading binary kernel file mfakto_Kernels.elf
Compiling kernels.
WARNING: Unknown GPU name, assuming GCN. Please post the device name "Fiji (Advanced Micro Devices, Inc.)" to http://www.mersenneforum.org/showthread.php?t=15646 to have it added to mfakto. Set GPUType in mfakto1.ini to select a GPU type yourself to avoid this
 warning.
OpenCL device info
  name                      Fiji (Advanced Micro Devices, Inc.)
  device (driver) version   OpenCL 2.0 AMD-APP (1800.8) (1800.8 (VM))
  maximum threads per block 256
  maximum threads per grid  16777216
  number of multiprocessors 64 (4096 compute elements)
  clock rate                1050MHz
Automatic parameters
  threads per grid          256
  optimizing kernels for    GCN
Started a simple selftest ...
######### testcase 4/17 (M50863909[69-70]) #########
mfakto will exit once the current test is finished.
and then hangs in testcase 4/17.
Is there any setting I can change to make it work or does it require an updated version of mfakto?
I'll add Fiji to the list of known GCN chips. It would be good to have some performance test so I know how to best use the chip, but I guess for that the other issue needs to be fixed first:

Could you please run
Code:
 mfakto -i mfakto1.ini -st2
this should show exactly at which kernel it hangs,

And maybe give the perftestmfakto.cmd from http://mersenneforum.org/mfakto/mfak...o-0.15pre5.zip a chance - if it does not stop right away, the output might be helpful.
Bdot is offline   Reply With Quote
Old 2015-08-09, 19:54   #1314
Bdot
 
Bdot's Avatar
 
Nov 2010
Germany

3·199 Posts
Default

Quote:
Originally Posted by Ethan (EO) View Post
I've pulled the card now, but did manage to run the 0.15pre5 benchmarking script first; results attached.
Thank you for the HD6950 Cayman results.
Code:
Resulting speed for M66362159:
bit_min - bit_max  GHz-days/day  kernelname
     60 -      69       209.504  cl_barrett15_69_gs  
     69 -      70       196.994  cl_barrett15_71_gs  
     70 -      74       185.303  cl_barrett15_74_gs  
     74 -      77       162.719  cl_barrett32_77_gs  
     77 -      88       147.887  cl_barrett32_88_gs  
     88 -      92       117.277  cl_barrett32_92_gs
They confirm that the burnt electricity is probably not worth it - there are way faster and more efficient cards these days. The 205 GHz-days/day that are listed on http://www.mersenne.ca/mfaktc.php for the 6950 are probably a bit optimistic and can only be achieved when factoring up to 69 bits.
Bdot is offline   Reply With Quote
Old 2015-08-22, 14:03   #1315
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

11·47 Posts
Default

I am seeing something interested and unexpected with mfakto and multiple GPUs. On one much older host system, the PCIe links are reducing to 5GT/s x16 for the primary card and 5GT/s x4 for the secondary. What is surprising is this has a pretty massive effect on mfakto's performance. This is strange because mfakto should not need much bandwidth to the cards at all, and to my understanding of the code should only be making a (trivial) call for each class.

These are Fury X cards which normally get around 1000GhzDay/Day at 8GT/s x16 PCIe speeds.

I will see a few different tiers of speeds that directly correlate to negotiated PCIe speeds, all factoring using cl_barret15_73_gs_2.

PCIe Speed - GhzDay/day
8GT/s x16 - 1000
5GT/s x16 - 890
5GT/s x8 - 690
5GT/s x4 - 480
2.5GT/s x4 - 190

Increasing the number of streams did not seem to have any appreciable effect.

Last fiddled with by airsquirrels on 2015-08-22 at 14:03
airsquirrels is offline   Reply With Quote
Old 2015-08-22, 17:35   #1316
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32·29·37 Posts
Default

That is strange! I always have a mixture of x8 and x16 cards and don't see any difference in output for my gtx580s...
Are you sure you call mfaktX with appropriate "-d gpu_number"?

Last fiddled with by LaurV on 2015-08-22 at 17:36
LaurV is offline   Reply With Quote
Old 2015-08-22, 19:01   #1317
airsquirrels
 
airsquirrels's Avatar
 
"David"
Jul 2015
Ohio

10000001012 Posts
Default

Yes, I can verify the correct card is initialized and the performance matches the PCIe lane speed even if the other card is idle. With my slower cards this is not noticeable until 5GT/4x or even 2.5GT/4x but the Fury X sees the penalty much earlier. I might try Windows instead of Debian to see about the Catalyst driver version.

I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset. I will say both of these systems are AMD CPUs, but my Intel systems are all much faster systems overall with plenty of PCIe lanes.

I did notice that we do actually do multiple kernel schedulings per class as it works through the K range, but adjusting the sieve size parameters changed performance an order of magnitude less than the PCIe slot.
airsquirrels is offline   Reply With Quote
Old 2015-08-22, 19:21   #1318
chalsall
If I May
 
chalsall's Avatar
 
"Chris Halsall"
Sep 2002
Barbados

9,767 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset.
"The most exciting phrase to hear in science, the one that heralds new discoveries, is not 'Eureka!' but 'That's funny...'" -- Isaac Asimov
chalsall is offline   Reply With Quote
Old 2015-08-23, 05:53   #1319
axn
 
axn's Avatar
 
Jun 2003

117328 Posts
Default

Quote:
Originally Posted by airsquirrels View Post
Yes, I can verify the correct card is initialized and the performance matches the PCIe lane speed even if the other card is idle. With my slower cards this is not noticeable until 5GT/4x or even 2.5GT/4x but the Fury X sees the penalty much earlier. I might try Windows instead of Debian to see about the Catalyst driver version.

I have switched the two cards and the performance definitely follows the slot, this is true in two different systems as well so that rules out the motherboard/chipset. I will say both of these systems are AMD CPUs, but my Intel systems are all much faster systems overall with plenty of PCIe lanes.

I did notice that we do actually do multiple kernel schedulings per class as it works through the K range, but adjusting the sieve size parameters changed performance an order of magnitude less than the PCIe slot.
I guess it doesn't hurt to ask. Are you using GPU Sieve or CPU Sieve?
axn is offline   Reply With Quote
Old 2015-08-23, 14:37   #1320
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

32×29×37 Posts
Default

Quote:
Originally Posted by axn View Post
I guess it doesn't hurt to ask. Are you using GPU Sieve or CPU Sieve?
Bingo, good question! (as usual axn to the point!).
LaurV is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
gpuOwL: an OpenCL program for Mersenne primality testing preda GpuOwl 2718 2021-07-06 18:30
mfaktc: a CUDA program for Mersenne prefactoring TheJudger GPU Computing 3497 2021-06-05 12:27
LL with OpenCL msft GPU Computing 433 2019-06-23 21:11
OpenCL for FPGAs TObject GPU Computing 2 2013-10-12 21:09
Program to TF Mersenne numbers with more than 1 sextillion digits? Stargate38 Factoring 24 2011-11-03 00:34

All times are UTC. The time now is 17:23.


Mon Aug 2 17:23:32 UTC 2021 up 10 days, 11:52, 0 users, load averages: 1.99, 2.22, 2.23

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.