mersenneforum.org mtsieve enhancements
 Register FAQ Search Today's Posts Mark Forums Read

 2022-04-04, 19:55 #122 rogue     "Mark" Apr 2003 Between here and the 6,653 Posts Now that I finally have an Apple M1, I am starting the process of getting mtsieve to build on it. So far I have only worked on mfsieve, the CPU-only multi-factorial siever. For a range that I have been sieving, a single core on my 10-core i9 iMac is less than half as fast as a single core on the 8 core M1 MacBook Pro. I was not expecting it to be that much faster. On the downside the GPU code in mfsievecl on that i9 is at least 50x faster than the CPU code on that same i9. In order to compare "apples to apples" in the GPU I will need to use Metal instead of OpenCL for the GPU kernel. That will be a bit more work since I have not worked with Metal. I don't think that Metal will be too terrible to work with, but to have it build with Metal on OS X and OpenCL on other platforms could require a lot of effort and it adds yet another code path to support. If anyone has an M1 and wants to do some sieving (CPU only), send me a PM and I will make it a priority to get that program running on OS X. My first focus will probably be to get as many running on the CPU as I can before tackling Metal as that is mostly busy work. I think that some of the sieving programs require AVX or x86 ASM for the main loop, so those are going to have lower priority. My goal is "no ARM ASM" for the M1 ports. I see nothing which makes that impossible. Of course if anyone wants to pitch in with the effort, send me a PM. There appears to be a bug in Apple's OpenCL driver. The multi-sequence kernel used by srsieve2cl crashes immediately. Works fine on other platforms. Apply won't fix it since they want everyone to use Metal. I haven't had issues with any other kernels on OS X, so the problem could reside between the keyboard and the chair. I have no idea how easy or difficult it could be to fix that issue if I wanted to get it to work.
 2022-04-08, 18:50 #123 rogue     "Mark" Apr 2003 Between here and the 147758 Posts Apparently the OpenCL framework exists on the M1 as I can compile and link programs with it. But the kernels do not work correctly, and not in a predictable way, at least not that I have been able to figure out yet. I could submit a bug report to Apple, but I would not expect Apple to fix. I suspect it is only there for backward compatibility and if your code doesn't work, then you have to switch to Metal. If I can get the same bad results with Metal, then I will submit a bug report for Apple.
 2022-04-19, 18:58 #124 rogue     "Mark" Apr 2003 Between here and the 6,653 Posts Support of Metal is requiring changes to the framework. The area affected the most are the makefile, the GPU kernel code, how the GPU workers create the kernels, and the GpuWorker classes. It appears that the Metal kernel and OpenCL kernel have few differences. I should be able to write a single kernel that can be compiled for both Metal and OpenCL. This means that updating kernels to support both should be very easy. I have modified the makefile so that it can convert the kernel source into a header that can be included by the GpuWorker. This was a manual process previously. The makefile also has the ability to create a metallib file on OS X. The application does not use that library, but the process to create that library is a quick way for me to identify syntax bugs in the kernel that I otherwise would only discover at runtime. The KernelArgument class is gone. This is due to how Metal manages them as the M1 shares memory between the GPU and CPU. This means that the Kernel class has new methods to add arguments and is completely responsible for managing CPU and GPU memory needed for the GpuWorkers. The key is that the GpuWorkers are mostly "agnostic" regarding OpenCL or Metal. In short lots of interesting things, but I haven't tested anything yet. My biggest fear is that I cannot use Metal in the way that I think I can use it. That is for later this week. If I can get mfsiieve to build and run with both OpenCL and Metal (with the correct results) then migrating the other GPU sievers to use support both should be fairly easy. My second biggest fear is getting incorrect results and trying figure out the root cause.
 2022-04-21, 21:22 #125 rogue     "Mark" Apr 2003 Between here and the 6,653 Posts Well, this is annoying. I discovered that g++ in msys2 produces a buggy version of xyyxsieve when compiled with -O3. Works fine with -O2 and only impact the AVX code when using -O3. I will modify the makefile so that Windows uses -O2 to compile some of the xyyx source files. Fortunately it only affects two source files, but is very annoying. gcwsieve also uses the same AVX routines, but it has no problems with -O3, so definitely a compiler bug. Since I am in the middle of refactoring a lot of code, I cannot submit a bug report at this time. And yes, I updated to the latest g++ in msys2. This requires a few other changes to my code to be compliant with the newer compiler. Fortunately that isn't too painful to change. At this time my focus is to get everything built on Windows and OS X (x86) using OpenCL then commit. I have had to do a lot of refactoring to get as far as I have. Fortunately the refactoring programs using OpenCL takes about an hour each, assuming I don't try to bite off too much by doing more. I do realize that porting some of these programs to ARM will be a lot more work and I will not be porting some due to x86 routines they are specific to those sieves. pixsieve and afsieve are two examples. Fortunately those are not widely used so they can wait. It is more likely that I can remove the x86 code completely from them yet not lose speed, but that remains to be seen.
 2022-05-04, 18:07 #126 rogue     "Mark" Apr 2003 Between here and the 665310 Posts I am closing to finishing the first set of changes. Most of these changes support the refactoring of the GPU logic to support both OpenCL and Metal abstractly. In other words, the Worker won't know if the underlying kernel is running in OpenCl or Metal. The only sieve that is broken right now is the multi-sequence GPU sieve for srsieve2cl. The single sequence one works fine. I did find a slowdown in the framework with the GPU kernel for single sequence sieving in srsieve2cl. I've added a command line switch for that kernel that can improve the speed by 50% over the previous build. This same change will probably benefit gfndsievecl, but it will have to wait. Some of the sieves (not GPU) will compile and run on M1 out of the box since they don't rely on x86 asm. More will have such support (not GPU) with the upcoming release. Once the current set of changes is working, I will commit all of my changes and post new Windows builds. Then comes the next fun part of the Metal support that started this whole thing.
 2022-06-06, 10:33 #127 twobombs   Jun 2022 2 Posts sorry to break into this thread, found this by google. I am looking for a setting in srsieve2cl to generate primes in the range from 1^19 and beyond. this is to generatie a feed for a quantum algorithm called Shors'. I can go to 1^18 but beyond that I get a range error. ( see attachment ) I must be doing something wrong, right ? :) need ranges to be from 56/64 bits (fp64) all the way to 4096 bits ( BigINT range ) Attached Thumbnails
2022-06-13, 15:06   #128
rogue

"Mark"
Apr 2003
Between here and the

6,653 Posts

Quote:
 Originally Posted by twobombs sorry to break into this thread, found this by google. I am looking for a setting in srsieve2cl to generate primes in the range from 1^19 and beyond. this is to generatie a feed for a quantum algorithm called Shors'. I can go to 1^18 but beyond that I get a range error. ( see attachment ) I must be doing something wrong, right ? :) need ranges to be from 56/64 bits (fp64) all the way to 4096 bits ( BigINT range )
Sorry, but I didn't see this until now.

srsieve2cl is limited to 2^62. Technically I could probably raise to 2^63, but nobody is sieving that deeply so the mtsieve framework has no support for p > 2^63.

Last fiddled with by rogue on 2022-06-13 at 15:07

 2022-06-13, 15:15 #129 rogue     "Mark" Apr 2003 Between here and the 11001111111012 Posts Before I get to Metal support I decided to make more changes. Kim Wallisch (the creator of primesieve) gave me a few hints on how to use his library more efficiently. Along with his changes I am making another big change. With this change the framework will adjust CPU worksize "on the fly" in an effort to ensure that each "chunk" of work needs between 1 and 5 seconds of time to process. This adjustment will only occur after p > 1e5. This provide two benefits. The first is that ^C will terminate CPU workers more quickly so if you have an overly large worksize it will terminate within 5 seconds rather than you having to wait much longer. The second is that if you have lot of workers, this should do a better job of ensuring that all workers have work. For those of you with 32 cores, I will be curious to see if the next release will do better at using all of those cores.
 2022-06-13, 19:16 #130 rogue     "Mark" Apr 2003 Between here and the 6,653 Posts I have committed my changes, but have not officially released yet. I have done some testing, but not a lot. I have only committed the code so that ryanp and others who build on linux can run tests on their environments.
2022-06-13, 20:27   #131
ryanp

Jun 2012
Boulder, CO

2×199 Posts

Quote:
 Originally Posted by rogue I have committed my changes, but have not officially released yet. I have done some testing, but not a lot. I have only committed the code so that ryanp and others who build on linux can run tests on their environments.
Updated to r192. I am getting this; it looks like probably a simple fix to use bool instead?

Code:
$make -j 16 srsieve2 g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/App_cpu.o core/App.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/Worker_cpu.o core/Worker.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/GenericWorker_cpu.o sierpinski_riesel/GenericWorker.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithOneSequenceWorker_cpu.o sierpinski_riesel/CisOneWithOneSequenceWorker.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithMultipleSequencesWorker_cpu.o sierpinski_riesel/CisOneWithMultipleSequencesWorker.cpp core/App.cpp: In member function ‘void App::Sieve()’: core/App.cpp:489:7: error: ‘boolean’ was not declared in this scope; did you mean ‘bool’? 489 | boolean gotNewWork = false; | ^~~~~~~ | bool core/App.cpp:514:13: error: ‘gotNewWork’ was not declared in this scope 514 | gotNewWork = true; | ^~~~~~~~~~ core/App.cpp:519:12: error: ‘gotNewWork’ was not declared in this scope 519 | if (!gotNewWork) | ^~~~~~~~~~ make: *** [makefile:237: core/App_cpu.o] Error 1 2022-06-13, 20:32 #132 rogue "Mark" Apr 2003 Between here and the 19FD16 Posts Quote:  Originally Posted by ryanp Updated to r192. I am getting this; it looks like probably a simple fix to use bool instead? Code: $ make -j 16 srsieve2 g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/App_cpu.o core/App.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o core/Worker_cpu.o core/Worker.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/GenericWorker_cpu.o sierpinski_riesel/GenericWorker.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithOneSequenceWorker_cpu.o sierpinski_riesel/CisOneWithOneSequenceWorker.cpp g++ -Isieve -m64 -Wall -DUSE_X86 -std=c++11 -O3 -c -o sierpinski_riesel/CisOneWithMultipleSequencesWorker_cpu.o sierpinski_riesel/CisOneWithMultipleSequencesWorker.cpp core/App.cpp: In member function ‘void App::Sieve()’: core/App.cpp:489:7: error: ‘boolean’ was not declared in this scope; did you mean ‘bool’? 489 | boolean gotNewWork = false; | ^~~~~~~ | bool core/App.cpp:514:13: error: ‘gotNewWork’ was not declared in this scope 514 | gotNewWork = true; | ^~~~~~~~~~ core/App.cpp:519:12: error: ‘gotNewWork’ was not declared in this scope 519 | if (!gotNewWork) | ^~~~~~~~~~ make: *** [makefile:237: core/App_cpu.o] Error 1
Fixed. Don't know why it compiles for me. boolean must have been added to a more recent C++.

 Similar Threads Thread Thread Starter Forum Replies Last Post rogue Software 654 2022-06-08 19:36 rogue Software 304 2021-11-06 13:51 kar_bon No Prime Left Behind 10 2008-03-28 11:21 Greenbank Octoproth Search 2 2006-12-03 17:28 Reboot It Software 16 2003-10-17 01:31

All times are UTC. The time now is 00:41.

Mon Jul 4 00:41:08 UTC 2022 up 80 days, 22:42, 0 users, load averages: 1.33, 1.22, 1.20