![]() |
![]() |
#309 | |
"Mark"
Apr 2003
Between here and the
141458 Posts |
![]() Quote:
GMP is only needed or gfndsieve and dmdsieve. It seems that the makefile has a mistake since it has a dependency for the GMP library to build gcwsieve, which is what it appears that you started with. You can remove that dependency from the makefile. The OpenCL source files have the .cl extension. These have to be converted to .h files which are then included in the GpuWorker.cpp class. This requires the use of perl (I use ActivePerl). The command is "perl cltoh.pl file.cl > file.h". It is easier to work edit the .cl files. It might be easier to start with the OpenCL kernel for gfndsieve than gcwsieve as that kernel is much simpler. I don't know what experience you have with OpenCL. I'm not certain what you mean by "modified the FPU code". I'm hoping that you did not modify any .S sources. |
|
![]() |
![]() |
![]() |
#310 | |
Jun 2003
30538 Posts |
![]() Quote:
I would recommend you include https://github.com/GPUOpen-Libraries...L-SDK/releases & https://github.com/KhronosGroup/OpenCL-Headers with mtsieve source - instead of using AMD SDK which would need to be installed. I am only using FPU code in the worker file and not the AVX code function. Something to look into when I have time to make the code even faster. I do not have experience with openCL but gcwsieve cw_kernel.cpp code seems straight forward. I can look into it. Using Perl might be more challenging. Do you know if anyone is using the gcwsieve to sieve? It can be made faster by switching multiplication by add. Last fiddled with by Citrix on 2020-05-26 at 08:30 |
|
![]() |
![]() |
![]() |
#311 | |
"Mark"
Apr 2003
Between here and the
141458 Posts |
![]() Quote:
AVX512 is a bit harder to use if you want to get the full benefit. For perl, it is just a command line tool I used to generate a header file from a .cl file. Nothing more than that. In theory it could be converted to a .sh or .bat script. This search uses gcwsieve for Generatlized Woodalls. This search uses gcwsieve for Generatlized Cullens. If you are thinking of an enhancement to gcwsieve, I suggest that you post in the other thread and include the math. |
|
![]() |
![]() |
![]() |
#312 | |
"Mark"
Apr 2003
Between here and the
5·1,249 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#313 | |
Jun 2003
30538 Posts |
![]() Quote:
Though limited to 2^52. Code:
Is there a faster way of doing this avx_set_16a(tempinvs); avx_set_16b(powinvs); avx_mulmod(dps, reciprocals); avx_get_16a(tempinvs); avx_set_16a(tempinvs); avx_set_16b(multinvs); avx_mulmod(dps, reciprocals); avx_get_16a(tempinvs); If I eliminate the first get step and the second set step the program misses factors. Are you able to write some code to calculate n/2 (mod p) for a double and 64 bit integers. Currently I am having to use mulmod which is extremely slow. I will post gcwsieve code/algorithm in other thread. Last fiddled with by Citrix on 2020-05-26 at 21:30 |
|
![]() |
![]() |
![]() |
#314 | ||
"Mark"
Apr 2003
Between here and the
5·1,249 Posts |
![]()
I am impressed with how quickly you have picked up on the framework. It is one of the reasons why I wrote the framework in the first place. The idea is that it should be easily accessible to anyone with some basic knowledge of C and C++ and the math they need for the Worker class.
Quote:
Quote:
If the GPU is returning too many factors and you are concerned about overflowing a buffer, then you can call SetMinGpuPrime() from the App class. This will tell the framework to not use a GPU worker until it reaches the value passed to that method. This might require some experimentation on your part regarding that limit and the size of the factor buffer that the GPU will support. |
||
![]() |
![]() |
![]() |
#315 | |
Jun 2003
1,579 Posts |
![]() Quote:
Code:
void TestPrimesAVX(void) { double __attribute__((aligned(32))) powinvs[AVX_ARRAY_SIZE]; double __attribute__((aligned(32))) tempinvs[AVX_ARRAY_SIZE]; double __attribute__((aligned(32))) dps[AVX_ARRAY_SIZE]; // CODE MISSING HERE for (int i = 0; i < AVX_ARRAY_SIZE; i++) { // This code is really slow. // Faster to use mulmod // Any other tricks? Can we use bitshift etc. if (((uint64_t)tempinvs[i]) % 2 == 1) { tempinvs[i] = tempinvs[i] + dps[i]; } tempinvs[i] = tempinvs[i] / 2; if (((uint64_t)powinvs[i]) % 2 == 1) { powinvs[i] = powinvs[i] + dps[i]; } powinvs[i] = powinvs[i] / 2; if (((uint64_t)powinvs[i]) % 2 == 1) { powinvs[i] = powinvs[i] + dps[i]; } powinvs[i] = powinvs[i] / 2; } } Anything you can help with? Last fiddled with by Citrix on 2020-05-26 at 23:17 |
|
![]() |
![]() |
![]() |
#316 | |
"Mark"
Apr 2003
Between here and the
624510 Posts |
![]() Quote:
|
|
![]() |
![]() |
![]() |
#317 | |
Jun 2003
1,579 Posts |
![]() Quote:
I looked into the code further. I was wrong before on get and set giving the error. This portion of the code gives the error. You can try it yourself. Code:
avx_set_16a(powinvs); avx_set_16b(multinvs2); avx_mulmod(dps, reciprocals); avx_get_16a(powinvs); The above works alright. Multinvs2 is 1/4 (mod p) avx_set_16a(powinvs); avx_set_16b(multinvs); avx_mulmod(dps, reciprocals); avx_mulmod(dps, reciprocals); avx_get_16a(powinvs); The above does not give right answer. Multinvs is 1/2 (mod p) They should be the same. The following does work correctly avx_set_16a(powinvs); avx_set_16b(multinvs); avx_mulmod(dps, reciprocals); avx_get_16a(powinvs); avx_set_16a(powinvs); avx_set_16b(multinvs); avx_mulmod(dps, reciprocals); avx_get_16a(powinvs); |
|
![]() |
![]() |
![]() |
#318 |
"Mark"
Apr 2003
Between here and the
11000011001012 Posts |
![]()
ave_mulmod modifies the contents ymm4-ymm7, thus you cannot call it two times in a row and expect the result to be a*b*b mod p. You would have to call ave_set_16b or avx_set_1b after the first avx_mulmod before calling avx_mulmod again.
|
![]() |
![]() |
![]() |
#319 |
Jun 2003
1,579 Posts |
![]()
I was finally able to modify the GPU code. I had to do the Perl portion manually. The script did not work for me. It goes into an infinite while loop. The code did compile.
Though I cannot get the code to run. I have tried the original gcwsievecl.exe from mtsieve and that does not work either. afsievecl.exe does not work either. ppsieve (open cl) etc works fine. Any thoughts on how to fix this. Thanks. Last fiddled with by Citrix on 2020-05-29 at 02:38 |
![]() |
![]() |