![]() |
|
|
#34 |
|
"Mark"
Apr 2003
Between here and the
3×2,447 Posts |
Sorry about that. Too many things swimming my brain.
|
|
|
|
|
|
#35 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
3·23·89 Posts |
Something worthwhile for mtsieve would be c versions of the asm functions so that it is portable to arm cpus. Arm asm versions would also be useful. One day I might do this although finding the time is an issue.
|
|
|
|
|
|
#36 | |
|
"Mark"
Apr 2003
Between here and the
162558 Posts |
Quote:
It is my desire to buy an ARM based MacBook in the future and to write the asm routines for it at that time. Of course others are welcome to do that as well. The downside of ARM is that the choices for programs to execute PRP tests is very limited. |
|
|
|
|
|
|
#37 | |
|
"Alexander"
Nov 2008
The Alamo City
99110 Posts |
Quote:
|
|
|
|
|
|
|
#38 |
|
"Mark"
Apr 2003
Between here and the
3·2,447 Posts |
Cool. I suggest that you start with fpu_mulmod function. That will likely be the easiest one to port. Most of the others can be built on top of that in one way or another. next up would by the 4x version of an fpu routine although I do not know what gains you can get on ARM by doing more than one mulmod concurrently and I don't know how many is optimal. I suspect that ARM does not have an 80-bit fpu, so it will be limited to p < 2^52. I also do not know if ARM has any vector instructions such like SSE or AVX on x86. You will notice that Worker.h has some builtin checks for AVX compatibility. You will likely need to add something similar to control ARM code paths.
|
|
|
|
|
|
#39 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
10111111111012 Posts |
The issue will be moving beyond 53 bits on non-x86.
Has Montgomery multiplication been tried in mtsieve? It wouldn't be applicable in all sieves but it may be faster for powmods. |
|
|
|
|
|
#40 | |
|
"Alexander"
Nov 2008
The Alamo City
991 Posts |
Quote:
The x86_asm_ext folder is filled with Montgomery arithmetic routines inherited from the older sieve programs. Last fiddled with by Happy5214 on 2020-08-03 at 15:40 |
|
|
|
|
|
|
#41 |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
17FD16 Posts |
|
|
|
|
|
|
#42 | |
|
"Mark"
Apr 2003
Between here and the
3·2,447 Posts |
Quote:
|
|
|
|
|
|
|
#43 |
|
"Alexander"
Nov 2008
The Alamo City
991 Posts |
The ODROID N2+ model sold out before I could get around to ordering it, so I bought the cheaper C4 instead. It hasn't shipped yet, so I'm still waiting for it. Meanwhile, I made an attempt at porting fpu_mulmod, and I think I came up with something. I've attached it in case anyone wants to test it for me. The ARM FPU registers don't form a stack like the x87 registers do, so I didn't sense a need to pre-load 1/p.
|
|
|
|
|
|
#44 | |
|
"Mark"
Apr 2003
Between here and the
1CAD16 Posts |
Quote:
What's great is that not having an FPU "stack" makes coding for the FPU simpler even if other benefits of pipelining are not available. To get mtsieve to build one vs the other will require the sources to be placed in a new folder and a modified makefile. It will also require source changes to not compile AVX logic in the C++ source when compiled on ARM platforms. It might be as simple as an #ifdef ARM in those places. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| mtsieve | rogue | Software | 1343 | 2023-07-06 16:41 |
| srsieve/sr2sieve enhancements | rogue | Software | 304 | 2021-11-06 13:51 |
| LLRnet enhancements | kar_bon | No Prime Left Behind | 10 | 2008-03-28 11:21 |
| TODO list and suggestions/comments/enhancements | Greenbank | Octoproth Search | 2 | 2006-12-03 17:28 |
| Suggestions for future enhancements | Reboot It | Software | 16 | 2003-10-17 01:31 |