mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware > GPU Computing

Reply
 
Thread Tools
Old 2019-10-04, 09:21   #2795
GhettoChild
 
"Ghetto_Child"
Jul 2014
Montreal, QC, Canada

41 Posts
Default

I changed GPU clock/power settings during a test and corrupted the results. I didn't backup the work files prior so on stopping the program some of the corruption saved. I did save the screen output with the last 3-5 good residue results. How can I restart the test from close to the good residue output? The only save file I have is right when the corruption started so it always results in suspicious identical residues until eventually an illegal residue error.
GhettoChild is offline   Reply With Quote
Old 2019-10-05, 00:27   #2796
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×43×83 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I changed GPU clock/power settings during a test and corrupted the results. I didn't backup the work files prior so on stopping the program some of the corruption saved. I did save the screen output with the last 3-5 good residue results. How can I restart the test from close to the good residue output? The only save file I have is right when the corruption started so it always results in suspicious identical residues until eventually an illegal residue error.
I'm afraid you are out of luck. Do you use an Internet backup service that might have a save file prior to the corruption?

This might be a good time to look into gpuowl. It is virtually immune to hardware errors. It does PRP tests instead of LL tests so it is not good for double-checking.
Prime95 is offline   Reply With Quote
Old 2019-10-05, 13:19   #2797
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31·149 Posts
Default

Quote:
Originally Posted by GhettoChild View Post
I changed GPU clock/power settings during a test and corrupted the results. I didn't backup the work files prior so on stopping the program some of the corruption saved. I did save the screen output with the last 3-5 good residue results. How can I restart the test from close to the good residue output? The only save file I have is right when the corruption started so it always results in suspicious identical residues until eventually an illegal residue error.
In CUDALucas.ini:
Code:
# SaveAllCheckpoints is the same as the -s option. When active, CUDALucas will
# save each checkpoint separately in the folder specified in the "SaveFolder"
# option. This is a binary option; set to 1 to activate, 0 to de-activate.

SaveAllCheckpoints=1

# This option is the name of the folder where the separate checkpoint files are
# saved. This option is only checked if SaveAllCheckpoints is activated.

SaveFolder=savefiles
If SaveAllCheckpoints was set to 1 for the exponent run, there would be lots of earlier save files to revert to and try to continue from, in the savefiles directory.

If you don't have savefiles, all you have are cx and tx files for the exponent x, and anything you can find in system backups or the recycle bin or manually made copies from before a change. C is the most recent, t is preceding. It sounds like you've already gone through the drill of copying the c and t files, then attempting to resume. Sometimes the t file is good and the c file needs to be removed or renamed out of the way. Sometimes both are bad before a problem is found; that situation is what saveallcheckpoints is for. The down side of saving all checkpoints is it fills a lot of disk space over time.

CUDALucas is good but does not have even the Jacobi check with its 50% probability of detecting an error. CUDALucas will run on even old NVIDIA gpus with CUDA compute capability as low as 2.0.

GpuOwl PRP3 has the superior Gerbicz check which has almost 100% error detection and rollback / resume from last known good state. While GpuOwl was originally developed for AMD gpus, V6.5 and later will run on some NVIDIA gpus, but not the older ones. (CUDA compute 2.x, 3.0 fail to run gpuowl in my experience.) GpuOwl V6.5 keeps checkpoints at every 20M iterations. GpuOwl was switched at V6.8 to not saving checkpoint files every 20M iterations, so it now keeps only x.owl and x-prev.owl, analogous to CUDALucas cx and tx. https://www.mersenneforum.org/showpo...83&postcount=7

Last fiddled with by kriesel on 2019-10-05 at 13:33
kriesel is offline   Reply With Quote
Old 2019-11-14, 12:56   #2798
ATH
Einyen
 
ATH's Avatar
 
Dec 2003
Denmark

B9816 Posts
Default

I got a Tesla P100 on Google Colab and I compiled CUDALucas again. I can run -cufftbench without problems.

But if I run -threadbench with any range or -r 0 or -r 1 or just starts CUDALucas on an exponent I get:
*** buffer overflow detected ***

-threadbench runs all the way through the test and fails at the very end without creating the *threads*.txt file.


It is compiled for the Compute Capability 6.0 which it uses.

Last fiddled with by ATH on 2019-11-14 at 13:33
ATH is offline   Reply With Quote
Old 2019-11-26, 06:13   #2799
wfgarnett3
 
wfgarnett3's Avatar
 
"William Garnett III"
Oct 2002
Bensalem, PA

2×43 Posts
Default CUDA10.1 and CUDA9.2 versions slower than CUDA8.0 on my setup

The CUDA10.1 and CUDA9.2 -Windows-x64.exe versions of CUDALucas2.06 (with respective libraries) from your official sourceforge link listed at the bottom below run slower on my GPU than the CUDA8.0 version with respective libraries.

For the 57593359 exponent I am manually testing for mersenne.org the CUDA8.0 version is about 10.7ms per iteration while the CUDA9.2 AND CUDA10.1 versions are about 12.3ms per iteration. (all are 3136 FFT with same CUDALucas.ini file)

The previous version of 2.05.1_CUDA8.0 CUDALucas (with respective libraries) I used to use from your website before yesterday has the same per iteration time of 10.7 as the 2.06_CUDA8.0 version so something is up with the CUDA9.2 and CUDA10.1 versions of 2.06 CUDALucas on my setup..

The info about my GPU setup is listed below.

Can someone tell me why there is a significant slow down with these newer versions?

EVGA GeForce GTX 1050 SC GAMING (2GB GDDR5)
Part number: 02G-P4-6152-KR

Dell Desktop Tower with Windows 10
Intel i3-4150 @ 3.5GHz
Memory: 8.00 GB

http://sourceforge.net/projects/cudalucas/files/
wfgarnett3 is offline   Reply With Quote
Old 2019-11-26, 07:25   #2800
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

31×149 Posts
Default

Quote:
Originally Posted by wfgarnett3 View Post
Can someone tell me why there is a significant slow down with these newer versions?

EVGA GeForce GTX 1050 SC GAMING (2GB GDDR5)
Newer software isn't always better or faster for a given card. Its function is to support newly introduced cards so they sell and the company makes money. CUDA 8 was introduced with the GTX10xx family. Later CUDA versions, later card models. Which CUDA version is fastest within usable limits on a given card varies with fft length. See https://www.mersenneforum.org/showpo...47&postcount=8 for an example.
kriesel is offline   Reply With Quote
Old 2020-03-21, 20:25   #2801
saviourz
 
Mar 2020

32 Posts
Question

I am compiling CUDALucas 2.06 from the sourceforge (https://sourceforge.net/projects/cudalucas/files/). After ran makefile, I used $./CUDALucas -r 1 to test it's reliable or not.

Unfortunately, I got all residue [0000000000000].

My OS is Ubuntu 18.04 and I have already changed CUDA path in makefile and also --generate-code arch=compute_60, code=sm_60.

Message showed on the top of ./CUDALucas:
binary compiled for CUDA 10.20
CUDA runtime version 10.20
CUDA driver version 10.20

GPU type is Tesla V100-PCIE and driver version is 440.33.01.

I have read all related posts to my question but none of them can solve my problem. What's more, I set worktodo.txt as "Test=79437629" and got
"Illegal residue: 0x0000000000000000. See mersenneforum.org for help.".

Thanks in advance for replies and sorry if I posted in wrong place.

ps.I changed CUDA version to 9.1 and used Linux pre-compiled CUDALucas from https://download.mersenne.ca/CUDALucas/old. The output seems good. No 0 residue appears in the middle of the looping.

But I am still trying to figure out my compilation problem. Open to any advice!
saviourz is offline   Reply With Quote
Old 2020-03-22, 00:00   #2802
saviourz
 
Mar 2020

32 Posts
Default

Quote:
Originally Posted by saviourz View Post
I am compiling CUDALucas 2.06 from the sourceforge (https://sourceforge.net/projects/cudalucas/files/). After ran makefile, I used $./CUDALucas -r 1 to test it's reliable or not.

Unfortunately, I got all residue [0000000000000].

My OS is Ubuntu 18.04 and I have already changed CUDA path in makefile and also --generate-code arch=compute_60, code=sm_60.

Message showed on the top of ./CUDALucas:
binary compiled for CUDA 10.20
CUDA runtime version 10.20
CUDA driver version 10.20

GPU type is Tesla V100-PCIE and driver version is 440.33.01.

I have read all related posts to my question but none of them can solve my problem. What's more, I set worktodo.txt as "Test=79437629" and got
"Illegal residue: 0x0000000000000000. See mersenneforum.org for help.".

Thanks in advance for replies and sorry if I posted in wrong place.

ps.I changed CUDA version to 9.1 and used Linux pre-compiled CUDALucas from https://download.mersenne.ca/CUDALucas/old. The output seems good. No 0 residue appears in the middle of the looping.

But I am still trying to figure out my compilation problem. Open to any advice!

Finally, I compiled an executive file on Tesla M6 with CUDA 10.20. More details in this thread https://www.mersenneforum.org/showth...418#post540418 .
saviourz is offline   Reply With Quote
Old 2020-05-26, 13:06   #2803
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

10010000010112 Posts
Default

Given a cudalucas save file or several for a very large exponent, each at a round number of many millions of iterations, that presumably does not have the res64 encoded into the file name, and no corresponding log file, is there a straightforward way of obtaining the res64s from the save files, at the round numbers of iterations? (Asking for a friend who's trying to do me a favor, but probably does not want to run millions of iterations again to get to the next round numbers.) Opening some much smaller exponents' interim save files in a text editor, there's nothing human readable there, no ASCII header record or footer.

Looking at old logs, I see CUDALucas does not output the stored res64 of a save file when resumed. (Gpuowl does that, which is a nice feature.)

Looking at old source code, I see in CUDALucas.cu,
Code:
void write_checkpoint(unsigned *x_packed, int q, unsigned long long residue)
{
  FILE *fPtr;
  char chkpnt_cfn[32];
  char chkpnt_tfn[32];
  int end = (q + 31) / 32;

  sprintf (chkpnt_cfn, "c%d", q);
  sprintf (chkpnt_tfn, "t%d", q);
  (void) unlink (chkpnt_tfn);
  (void) rename (chkpnt_cfn, chkpnt_tfn);
  fPtr = fopen (chkpnt_cfn, "wb");
  if (!fPtr)
  {
    fprintf(stderr, "Couldn't write checkpoint.\n");
    return;
  }
  x_packed[end + 8] = magic_number(x_packed, q);
  x_packed[end + 9] = checkpoint_checksum((char*) x_packed, 4 * (end + 9));
  fwrite (x_packed, 1, sizeof (unsigned) * (end + 10), fPtr);
  fclose (fPtr);
  if (g_sf > 0)            // save all checkpoint files
  {
    char chkpnt_sfn[64];
    char test[64];
#ifndef _MSC_VER
    sprintf (chkpnt_sfn, "%s/s" "%d.%d.%016llx", g_folder, q, x_packed[end + 2] - 1, residue);
    sprintf (test, "%s/%s", g_folder, ".empty.txt");
#else
    sprintf (chkpnt_sfn, "%s\\s" "%d.%d.%016llx.cls", g_folder, q, x_packed[end + 2] - 1, residue);
    sprintf (test, "%s\\%s", g_folder, ".empty.txt");
#endif
    fPtr = NULL;
    fPtr = fopen (test, "r");
    if(!fPtr)
    {
#ifndef _MSC_VER
      mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH;
      if (mkdir (g_folder, mode) != 0) fprintf (stderr, "mkdir: cannot create directory `%s': File exists\n", g_folder);
#else
      if (_mkdir (g_folder) != 0) fprintf (stderr, "mkdir: cannot create directory `%s': File exists\n", g_folder);
#endif
      fPtr = fopen(test, "w");
      if(fPtr) fclose(fPtr);
    }
    else fclose(fPtr);

    fPtr = fopen (chkpnt_sfn, "wb");
    if (!fPtr) return;
    fwrite (x_packed, 1, sizeof (unsigned) * (((q + 31) / 32) + 10), fPtr);
    fclose (fPtr);
  }
  }
So apparently the res64 could be dug out with a hex editor near EOF. Getting byte offset and order right in interpreting the data would be critical. Maybe practice on a tiny exponent with a known ending residue from console output.


edit: oops, no, I think that's stream print to produce the s<exponent.<iteration>.<res64expressedinhex>.cls filename for storage in the savefiles subdirectory. Need to dig further for regular checkpoint files and file contents, and makeup of x_packed.

Last fiddled with by kriesel on 2020-05-26 at 13:43
kriesel is offline   Reply With Quote
Old 2020-05-28, 17:08   #2804
LaurV
Romulan Interpreter
 
LaurV's Avatar
 
Jun 2011
Thailand

3·2,957 Posts
Default

Yes. Rename it cXXX blah blah and put it in culu folder and run culu on it for 20 seconds with checkpoint set to 2k or so. It will produce a proper named checkpoint file in seconds. Why do you always go the most complicate path to crack it?
Digging out the res64 with a hex editor is shift-dependent. Two files with different shift have different content. But there was a tool to extract the res64 from a file, I used it in the past.

Last fiddled with by LaurV on 2020-05-28 at 17:12
LaurV is offline   Reply With Quote
Old 2020-05-30, 21:04   #2805
kriesel
 
kriesel's Avatar
 
"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

120B16 Posts
Default

Quote:
Originally Posted by LaurV View Post
Yes. Rename it cXXX blah blah and put it in culu folder and run culu on it for 20 seconds with checkpoint set to 2k or so. It will produce a proper named checkpoint file in seconds.
Digging out the res64 with a hex editor is shift-dependent. Two files with different shift have different content. But there was a tool to extract the res64 from a file, I used it in the past.
I don't have the checkpoint files, and they are large and numerous, for a large exponent.

Running them from their current state, a multiple of 10M iterations, would be nontrivial if I had them, and I don't.
What is the tool you used, and where do I find it, for my use, and the holder of the checkpoint files in question find it?
kriesel is offline   Reply With Quote
Reply

Thread Tools


Similar Threads
Thread Thread Starter Forum Replies Last Post
Don't DC/LL them with CudaLucas LaurV Data 131 2017-05-02 18:41
CUDALucas / cuFFT Performance on CUDA 7 / 7.5 / 8 Brain GPU Computing 13 2016-02-19 15:53
CUDALucas: which binary to use? Karl M Johnson GPU Computing 15 2015-10-13 04:44
settings for cudaLucas fairsky GPU Computing 11 2013-11-03 02:08
Trying to run CUDALucas on Windows 8 CP Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 10:19.

Tue Oct 27 10:19:23 UTC 2020 up 47 days, 7:30, 0 users, load averages: 1.47, 1.44, 1.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2020, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.