![]() |
[QUOTE=TheJudger;508134]Correct but not complete. First barrett kernel im mfaktc was BARRETT92, all other kernels are stripped down versions.
From BARRETT92 to BARRETT79 first (fixed inverse, multibit in single stage possible, a bit faster) From there we go from BARRETT92 to BARRETT88 and BARRETT87 by (re)moving interim correction steps and some other "tricks" (loss of accuracy in interim steps (small example 22 mod 10 yields 12 (instead of 2))). Trading accuracy for speed. The same "tricks" lead from BARRETT79 to BARRETT77 and BARRETT76. "-pre" versions aren't released into the wild and are not intended for productive usage. Removed old stuff (CC 1.x code, CUDA compatibility < 6.5 dropped, minor changes and bugfixed). Oliver[/QUOTE]Thanks for the review, followup on Barretts, and clarification on -pre versions. I'm looking forward to updates and any bug fixes or enhancements whenever they're ready for field testing. (I can throw a variety of gpu models at it, from CC2.0 up to gtx1080Ti) One other thing: there are _gs variations of lots of kernels. What does that _gs mean? |
[QUOTE=kriesel;508137]One other thing: there are _gs variations of lots of kernels. What does that _gs mean?[/QUOTE]
[B]G[/B]PU [B]s[/B]ieve |
[QUOTE=kriesel;507954]Concepts in GIMPS trial factoring (TF) (note, sort of mfaktc oriented, more so toward the end)[/QUOTE]This would make a good entry for the wiki.
|
[QUOTE=Uncwilly;508271]This would make a good entry for the wiki.[/QUOTE]
It's going in one of my reference threads. |
[QUOTE=Uncwilly;508271]This would make a good entry for the wiki.[/QUOTE]
Feel free to link to [url]https://www.mersenneforum.org/showpost.php?p=508523&postcount=6[/url] from the wiki. |
Ugh... I'm trying to compile mfaktc on Windows now. Instead of Visual Studio 2012, I got Visual Studio 2017 (Community). File paths are all over the place. It really took a while to find all the extra bits needed so that the compile job would run through. But it seems that C++/CLI support installed, then finding and running vcvars64.bat finally did the trick.
I already installed MinGW earlier for other purposes and thus had gnu make. Also installed GPU Toolkit 10.0.130. Still, after a succesful compile, the executable gives this error (also included the last bits of info given by the program) : [CODE]CUDA version info binary compiled for CUDA 10.0 CUDA runtime version 10.0 CUDA driver version 10.0 CUDA device info name GeForce RTX 2060 compute capability 7.5 max threads per block 1024 max shared memory per MP 65536 byte number of multiprocessors 30 clock rate (CUDA cores) 1830MHz memory clock rate: 7001MHz memory bus width: 192 bit Automatic parameters threads per grid 983040 GPUSievePrimes (adjusted) 82486 GPUsieve minimum exponent 1055144 running a simple selftest... ERROR: cudaGetLastError() returned 8: invalid device function[/CODE] Which is strange, since I added this in the Makefile [CODE]NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing[/CODE] And it seems to generate 7.5 code during the compilation process. The same thing also happens if I replace code=sm_75 with code=compute_75 to enable just in time compilation. It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now :smile: |
[QUOTE=nomead;508640]It shouldn't be because of VS 2017 / VS 2012 differences, but who knows? Maybe I'll try that, too, but not right now :smile:[/QUOTE]
How wrong can I be? First of all, I *had* to try it now on VS 2012. And now, it works! Even despite the NVCC compiler showing warnings like this: [CODE]support for this version of Microsoft Visual Studio has been deprecated! Only the versions between 2013 and 2017 (inclusive) are supported![/CODE] |
Compilation notes
The basic outline is documented in the mfaktc README.txt, but here are the specific steps I had to do, to make it work. Let's forget about Visual Studio 2017 for the moment and concentrate on Visual Studio 2012. All installation packages listed here are available for free. Even though a Microsoft account is needed for downloading VS2012 Express, it's free to use. And I'm running on Windows 7 64-bit.
First, I got 64-bit MinGW (originally for other reasons, but it includes GNU make) from [URL="https://nuwen.net/mingw.html"]https://nuwen.net/mingw.html[/URL] From there, mingw-16.1-without-git.exe is enough for our purposes. Install that somewhere. Then, Visual Studio 2012 Express for Windows Desktop. [URL="https://my.visualstudio.com/Downloads?q=visual%20studio%202012%20express"]https://my.visualstudio.com/Downloads?q=visual%20studio%202012%20express[/URL] Log in, or create an account and then log in. The one marked "Visual Studio Express 2012" only works on Windows 8 (and up, maybe?) but the "for Windows Desktop" one also works on Windows 7. I got the installer EXE and then ran it. Finally, CUDA Toolkit 10.0 [URL="https://developer.nvidia.com/cuda-downloads"]https://developer.nvidia.com/cuda-downloads[/URL] Download and install. Then prepare the Makefile.win First of all, you need to change the CUDA_DIR to point to where your CUDA Toolkit was installed. For me this was [CODE]CUDA_DIR = "C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v10.0" [/CODE] After that, add code generation for the cards you're planning to use. For example, [CODE]NVCCFLAGS += --generate-code arch=compute_60,code=sm_60 # CC 6.0 Pascal / GTX10xx NVCCFLAGS += --generate-code arch=compute_70,code=sm_70 # CC 7.0 Volta / Titan V NVCCFLAGS += --generate-code arch=compute_75,code=sm_75 # CC 7.5 Turing / RTX20xx, GTX16xx [/CODE] Then there was a problem with NVCC that needed a fix. It expects to find vcvars64.bat in a certain place, and it seems that the VS2012 Express installer doesn't put it there. Go to C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin and see if the subfolder amd64 exists there, and vcvars64.bat inside of it. If not, you need to copy the subfolder x86_amd64 and its contents to amd64, and rename the now copied vcvarsx86_amd64.bat to vcvars64.bat. Finally, time to start compiling. Start a command prompt window. Go to the root folder of where you installed MinGW and run set_distro_paths.bat from there. Then go to wherever that vcvars64.bat is and run it. Then go to the mfaktc-0.21 source folder and make -f Makefile.win Wait a while... (It seems to take a whole lot longer than on Linux gcc + nvcc) Done! If you want to compile other versions (more/less classes, Wagstaff) these can be set by editing params.h and then recompiling. |
TF concepts updated
Some of the existing points have been refined or expanded, and I've added several additional points recently. It's now up to 40 entries. It's at [URL]https://www.mersenneforum.org/showpost.php?p=508523&postcount=6[/URL]
|
Another poke at the internals. I was factoring a few exponents in mfaktc, where the bit depth was 76-77 (among others). I wondered why the barrett87_mul32_gs kernel was chosen instead of barrett77_mul32_gs. Then I looked at the kernel_benchmarks.txt in the source directory. Okay, tests were done back in CUDA 5.5 days and the freshest card used was a Tesla K20m, three (and a half) generations old by now. That got me wondering, again, have things changed? Well, of course, I HAD to do some benchmarkig of my own on Turing, and at least there, yes they have. Not by much, but now barrett77 is faster than barrett87 by about 1%.
Exponent tested: 66362159, bit depth 68-69 (the same as in kernel_benchmarks.txt), less classes, debug RAW GPU BENCH mode on (disables sieving so the GHz-d/d numbers are low because of that), CUDA 10.1 and RTX 2080 locked at 1800 MHz: [CODE] time GHz-d/day barrett76_mul32_gs 02:15.827 572.49 barrett77_mul32_gs 02:24.794 537.04 barrett87_mul32_gs 02:26.262 531.65 barrett88_mul32_gs 02:30.296 517.38 barrett79_mul32_gs 02:45.376 470.20 barrett92_mul32_gs 02:56.342 440.96 75bit_mul32_gs 04:54.998 263.60 95bit_mul32_gs 06:04.134 213.55 [/CODE] There is a selection table in mfaktc.c that only checks for compute capability 1.x (where the speed order was 76 -> 77 -> 87 -> 88 -> 79 -> 92) and all the rest get 76 -> 87 -> 88 -> 77 -> 79 -> 92. So the barrett77_mul32_gs kernel is in effect never selected on anything newer than GTX2xx. It's a small difference, and it only affects this single bit depth, and separate benchmarks should be run on every architecture to see if there are any changes there as well. A lot of work, so is it worth it? I'd like to think yes, since GPU72 is now factoring over 76 bits, and every little bit of extra performance should help. |
It would be worth it to test numbers around 90M, 100M, and 110M, too.
|
| All times are UTC. The time now is 23:01. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.