mersenneforum.org CUDALucas (a.k.a. MaclucasFFTW/CUDA 2.3/CUFFTW)
 Register FAQ Search Today's Posts Mark Forums Read

 2019-10-04, 09:21 #2795 GhettoChild   "Ghetto_Child" Jul 2014 Montreal, QC, Canada 41 Posts I changed GPU clock/power settings during a test and corrupted the results. I didn't backup the work files prior so on stopping the program some of the corruption saved. I did save the screen output with the last 3-5 good residue results. How can I restart the test from close to the good residue output? The only save file I have is right when the corruption started so it always results in suspicious identical residues until eventually an illegal residue error.
2019-10-05, 00:27   #2796
Prime95
P90 years forever!

Aug 2002
Yeehaw, FL

11100011111102 Posts

Quote:
 Originally Posted by GhettoChild I changed GPU clock/power settings during a test and corrupted the results. I didn't backup the work files prior so on stopping the program some of the corruption saved. I did save the screen output with the last 3-5 good residue results. How can I restart the test from close to the good residue output? The only save file I have is right when the corruption started so it always results in suspicious identical residues until eventually an illegal residue error.
I'm afraid you are out of luck. Do you use an Internet backup service that might have a save file prior to the corruption?

This might be a good time to look into gpuowl. It is virtually immune to hardware errors. It does PRP tests instead of LL tests so it is not good for double-checking.

2019-10-05, 13:19   #2797
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

113538 Posts

Quote:
 Originally Posted by GhettoChild I changed GPU clock/power settings during a test and corrupted the results. I didn't backup the work files prior so on stopping the program some of the corruption saved. I did save the screen output with the last 3-5 good residue results. How can I restart the test from close to the good residue output? The only save file I have is right when the corruption started so it always results in suspicious identical residues until eventually an illegal residue error.
In CUDALucas.ini:
Code:
# SaveAllCheckpoints is the same as the -s option. When active, CUDALucas will
# save each checkpoint separately in the folder specified in the "SaveFolder"
# option. This is a binary option; set to 1 to activate, 0 to de-activate.

SaveAllCheckpoints=1

# This option is the name of the folder where the separate checkpoint files are
# saved. This option is only checked if SaveAllCheckpoints is activated.

SaveFolder=savefiles
If SaveAllCheckpoints was set to 1 for the exponent run, there would be lots of earlier save files to revert to and try to continue from, in the savefiles directory.

If you don't have savefiles, all you have are cx and tx files for the exponent x, and anything you can find in system backups or the recycle bin or manually made copies from before a change. C is the most recent, t is preceding. It sounds like you've already gone through the drill of copying the c and t files, then attempting to resume. Sometimes the t file is good and the c file needs to be removed or renamed out of the way. Sometimes both are bad before a problem is found; that situation is what saveallcheckpoints is for. The down side of saving all checkpoints is it fills a lot of disk space over time.

CUDALucas is good but does not have even the Jacobi check with its 50% probability of detecting an error. CUDALucas will run on even old NVIDIA gpus with CUDA compute capability as low as 2.0.

GpuOwl PRP3 has the superior Gerbicz check which has almost 100% error detection and rollback / resume from last known good state. While GpuOwl was originally developed for AMD gpus, V6.5 and later will run on some NVIDIA gpus, but not the older ones. (CUDA compute 2.x, 3.0 fail to run gpuowl in my experience.) GpuOwl V6.5 keeps checkpoints at every 20M iterations. GpuOwl was switched at V6.8 to not saving checkpoint files every 20M iterations, so it now keeps only x.owl and x-prev.owl, analogous to CUDALucas cx and tx. https://www.mersenneforum.org/showpo...83&postcount=7

Last fiddled with by kriesel on 2019-10-05 at 13:33

 2019-11-14, 12:56 #2798 ATH Einyen     Dec 2003 Denmark 300710 Posts I got a Tesla P100 on Google Colab and I compiled CUDALucas again. I can run -cufftbench without problems. But if I run -threadbench with any range or -r 0 or -r 1 or just starts CUDALucas on an exponent I get: *** buffer overflow detected *** -threadbench runs all the way through the test and fails at the very end without creating the *threads*.txt file. It is compiled for the Compute Capability 6.0 which it uses. Last fiddled with by ATH on 2019-11-14 at 13:33
 2019-11-26, 06:13 #2799 wfgarnett3     "William Garnett III" Oct 2002 Bensalem, PA 2×43 Posts CUDA10.1 and CUDA9.2 versions slower than CUDA8.0 on my setup The CUDA10.1 and CUDA9.2 -Windows-x64.exe versions of CUDALucas2.06 (with respective libraries) from your official sourceforge link listed at the bottom below run slower on my GPU than the CUDA8.0 version with respective libraries. For the 57593359 exponent I am manually testing for mersenne.org the CUDA8.0 version is about 10.7ms per iteration while the CUDA9.2 AND CUDA10.1 versions are about 12.3ms per iteration. (all are 3136 FFT with same CUDALucas.ini file) The previous version of 2.05.1_CUDA8.0 CUDALucas (with respective libraries) I used to use from your website before yesterday has the same per iteration time of 10.7 as the 2.06_CUDA8.0 version so something is up with the CUDA9.2 and CUDA10.1 versions of 2.06 CUDALucas on my setup.. The info about my GPU setup is listed below. Can someone tell me why there is a significant slow down with these newer versions? EVGA GeForce GTX 1050 SC GAMING (2GB GDDR5) Part number: 02G-P4-6152-KR Dell Desktop Tower with Windows 10 Intel i3-4150 @ 3.5GHz Memory: 8.00 GB http://sourceforge.net/projects/cudalucas/files/
2019-11-26, 07:25   #2800
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

113538 Posts

Quote:
 Originally Posted by wfgarnett3 Can someone tell me why there is a significant slow down with these newer versions? EVGA GeForce GTX 1050 SC GAMING (2GB GDDR5)
Newer software isn't always better or faster for a given card. Its function is to support newly introduced cards so they sell and the company makes money. CUDA 8 was introduced with the GTX10xx family. Later CUDA versions, later card models. Which CUDA version is fastest within usable limits on a given card varies with fft length. See https://www.mersenneforum.org/showpo...47&postcount=8 for an example.

 2020-03-21, 20:25 #2801 saviourz   Mar 2020 916 Posts I am compiling CUDALucas 2.06 from the sourceforge (https://sourceforge.net/projects/cudalucas/files/). After ran makefile, I used $./CUDALucas -r 1 to test it's reliable or not. Unfortunately, I got all residue [0000000000000]. My OS is Ubuntu 18.04 and I have already changed CUDA path in makefile and also --generate-code arch=compute_60, code=sm_60. Message showed on the top of ./CUDALucas: binary compiled for CUDA 10.20 CUDA runtime version 10.20 CUDA driver version 10.20 GPU type is Tesla V100-PCIE and driver version is 440.33.01. I have read all related posts to my question but none of them can solve my problem. What's more, I set worktodo.txt as "Test=79437629" and got "Illegal residue: 0x0000000000000000. See mersenneforum.org for help.". Thanks in advance for replies and sorry if I posted in wrong place. ps.I changed CUDA version to 9.1 and used Linux pre-compiled CUDALucas from https://download.mersenne.ca/CUDALucas/old. The output seems good. No 0 residue appears in the middle of the looping. But I am still trying to figure out my compilation problem. Open to any advice! 2020-03-22, 00:00 #2802 saviourz Mar 2020 32 Posts Quote:  Originally Posted by saviourz I am compiling CUDALucas 2.06 from the sourceforge (https://sourceforge.net/projects/cudalucas/files/). After ran makefile, I used$./CUDALucas -r 1 to test it's reliable or not. Unfortunately, I got all residue [0000000000000]. My OS is Ubuntu 18.04 and I have already changed CUDA path in makefile and also --generate-code arch=compute_60, code=sm_60. Message showed on the top of ./CUDALucas: binary compiled for CUDA 10.20 CUDA runtime version 10.20 CUDA driver version 10.20 GPU type is Tesla V100-PCIE and driver version is 440.33.01. I have read all related posts to my question but none of them can solve my problem. What's more, I set worktodo.txt as "Test=79437629" and got "Illegal residue: 0x0000000000000000. See mersenneforum.org for help.". Thanks in advance for replies and sorry if I posted in wrong place. ps.I changed CUDA version to 9.1 and used Linux pre-compiled CUDALucas from https://download.mersenne.ca/CUDALucas/old. The output seems good. No 0 residue appears in the middle of the looping. But I am still trying to figure out my compilation problem. Open to any advice!

Finally, I compiled an executive file on Tesla M6 with CUDA 10.20. More details in this thread https://www.mersenneforum.org/showth...418#post540418 .

 2020-05-26, 13:06 #2803 kriesel     "TF79LL86GIMPS96gpu17" Mar 2017 US midwest 10010111010112 Posts Given a cudalucas save file or several for a very large exponent, each at a round number of many millions of iterations, that presumably does not have the res64 encoded into the file name, and no corresponding log file, is there a straightforward way of obtaining the res64s from the save files, at the round numbers of iterations? (Asking for a friend who's trying to do me a favor, but probably does not want to run millions of iterations again to get to the next round numbers.) Opening some much smaller exponents' interim save files in a text editor, there's nothing human readable there, no ASCII header record or footer. Looking at old logs, I see CUDALucas does not output the stored res64 of a save file when resumed. (Gpuowl does that, which is a nice feature.) Looking at old source code, I see in CUDALucas.cu, Code: void write_checkpoint(unsigned *x_packed, int q, unsigned long long residue) { FILE *fPtr; char chkpnt_cfn[32]; char chkpnt_tfn[32]; int end = (q + 31) / 32; sprintf (chkpnt_cfn, "c%d", q); sprintf (chkpnt_tfn, "t%d", q); (void) unlink (chkpnt_tfn); (void) rename (chkpnt_cfn, chkpnt_tfn); fPtr = fopen (chkpnt_cfn, "wb"); if (!fPtr) { fprintf(stderr, "Couldn't write checkpoint.\n"); return; } x_packed[end + 8] = magic_number(x_packed, q); x_packed[end + 9] = checkpoint_checksum((char*) x_packed, 4 * (end + 9)); fwrite (x_packed, 1, sizeof (unsigned) * (end + 10), fPtr); fclose (fPtr); if (g_sf > 0) // save all checkpoint files { char chkpnt_sfn[64]; char test[64]; #ifndef _MSC_VER sprintf (chkpnt_sfn, "%s/s" "%d.%d.%016llx", g_folder, q, x_packed[end + 2] - 1, residue); sprintf (test, "%s/%s", g_folder, ".empty.txt"); #else sprintf (chkpnt_sfn, "%s\\s" "%d.%d.%016llx.cls", g_folder, q, x_packed[end + 2] - 1, residue); sprintf (test, "%s\\%s", g_folder, ".empty.txt"); #endif fPtr = NULL; fPtr = fopen (test, "r"); if(!fPtr) { #ifndef _MSC_VER mode_t mode = S_IRWXU | S_IRGRP | S_IXGRP | S_IROTH | S_IXOTH; if (mkdir (g_folder, mode) != 0) fprintf (stderr, "mkdir: cannot create directory %s': File exists\n", g_folder); #else if (_mkdir (g_folder) != 0) fprintf (stderr, "mkdir: cannot create directory %s': File exists\n", g_folder); #endif fPtr = fopen(test, "w"); if(fPtr) fclose(fPtr); } else fclose(fPtr); fPtr = fopen (chkpnt_sfn, "wb"); if (!fPtr) return; fwrite (x_packed, 1, sizeof (unsigned) * (((q + 31) / 32) + 10), fPtr); fclose (fPtr); } } So apparently the res64 could be dug out with a hex editor near EOF. Getting byte offset and order right in interpreting the data would be critical. Maybe practice on a tiny exponent with a known ending residue from console output. edit: oops, no, I think that's stream print to produce the s..cls filename for storage in the savefiles subdirectory. Need to dig further for regular checkpoint files and file contents, and makeup of x_packed. Last fiddled with by kriesel on 2020-05-26 at 13:43
 2020-05-28, 17:08 #2804 LaurV Romulan Interpreter     Jun 2011 Thailand 24×571 Posts Yes. Rename it cXXX blah blah and put it in culu folder and run culu on it for 20 seconds with checkpoint set to 2k or so. It will produce a proper named checkpoint file in seconds. Why do you always go the most complicate path to crack it? Digging out the res64 with a hex editor is shift-dependent. Two files with different shift have different content. But there was a tool to extract the res64 from a file, I used it in the past. Last fiddled with by LaurV on 2020-05-28 at 17:12
2020-05-30, 21:04   #2805
kriesel

"TF79LL86GIMPS96gpu17"
Mar 2017
US midwest

29·167 Posts

Quote:
 Originally Posted by LaurV Yes. Rename it cXXX blah blah and put it in culu folder and run culu on it for 20 seconds with checkpoint set to 2k or so. It will produce a proper named checkpoint file in seconds. Digging out the res64 with a hex editor is shift-dependent. Two files with different shift have different content. But there was a tool to extract the res64 from a file, I used it in the past.
I don't have the checkpoint files, and they are large and numerous, for a large exponent.

Running them from their current state, a multiple of 10M iterations, would be nontrivial if I had them, and I don't.
What is the tool you used, and where do I find it, for my use, and the holder of the checkpoint files in question find it?

 Similar Threads Thread Thread Starter Forum Replies Last Post LaurV Data 131 2017-05-02 18:41 Brain GPU Computing 13 2016-02-19 15:53 Karl M Johnson GPU Computing 15 2015-10-13 04:44 fairsky GPU Computing 11 2013-11-03 02:08 Rodrigo GPU Computing 12 2012-03-07 23:20

All times are UTC. The time now is 22:11.

Sat Jan 16 22:11:47 UTC 2021 up 44 days, 18:23, 0 users, load averages: 2.58, 2.58, 2.29