![]() |
|
|
#23 | |
|
∂2ω=0
Sep 2002
República de California
22×2,939 Posts |
Quote:
Code:
/* Syntax here is GCC and SunStudio/MSVC, respectively: */
#if !defined(__LONG_MAX__) && !defined(LONG_MAX)
#include <limits.h>
#endif
#if !defined(__LONG_MAX__) && defined(LONG_MAX)
#define __LONG_MAX__ LONG_MAX
#endif
#ifdef __LONG_MAX__
#if __LONG_MAX__ == 2147483647L
#define OS_BITS 32
#elif __LONG_MAX__ == 9223372036854775807L
#define OS_BITS 64
#else
#error __LONG_MAX__ defined but value unrecognized!
#endif
#else
#error platform.h: failed to properly set OS_BITS!
#endif
I need the size of a pointer (rather than a long int per se) because the bit-ness of the OS (which is reflected therein) determines which version of many assembler macros to use. Because of the oddity in LLP64 systems WraithX notes, I can't use LONG_MAX as a proxy for pointer size. I can (and will) use extra preprocessor code that first checks to see if __SIZEOF_POINTER__ is defined, I was simply hoping for a cleaner, more-unified way to do this pointer-size checking. Note that modifying the LONG_MAX-based check to instead use LONG_LONG_MAX (LLONG_MAX under MSVC) won't work, because e.g. 32-bit Linux and 64-bit linux both have LONG_LONG_MAX = 2^64-1. |
|
|
|
|
|
|
#24 |
|
"Mark"
Apr 2003
Between here and the
1CAA16 Posts |
Punt!
Seriously, why not just use LONG_MAX and LP64 or other known preprocessor defines to determine the number of bits in a pointer? If someone needs to build on another platform, then they can provide the necessary defines that your code will need to determine the number of bits in a pointer. In the end it will probably be easier than pulling your hair out of this. |
|
|
|
|
|
#25 |
|
Mar 2006
22A16 Posts |
What if you switched out each occurrence of LONG_MAX with SIZE_MAX? If all your compilers are C99 or later, this should work (hopefully). Like so:
Code:
/* Syntax here is GCC and SunStudio/MSVC, respectively: */
#if !defined(__SIZE_MAX__) && !defined(SIZE_MAX)
#include <limits.h>
#endif
#if !defined(__SIZE_MAX__) && defined(SIZE_MAX)
#define __SIZE_MAX__ SIZE_MAX
#endif
#ifdef __SIZE_MAX__
#if __SIZE_MAX__ == 0xffffffff
#define OS_BITS 32
#elif __SIZE_MAX__ == 0xffffffffffffffff
#define OS_BITS 64
#else
#error __SIZE_MAX__ defined but value unrecognized!
#endif
#else
#error platform.h: failed to properly set OS_BITS!
#endif
*edit* Don't use ULONG_MAX, that wouldn't work on LP64 systems. But UINT_MAX should work on both LP64 and LLP64 systems. |
|
|
|
|
|
#26 |
|
∂2ω=0
Sep 2002
República de California
1175610 Posts |
I checked the list of predefines for both gcc 3.4 and 4.2 on one of our linux/amd64 systems at work, and neither of those versions defines SIZE_MAX.
Since augmenting my LONG_MAX predefine checks by first checking if __SIZEOF_POINTER__ is defined properly handles the Win64/mingw issue, gonna go with that for now - in the meantime, there is actual code that needs to get written. ;) Thanks for the help, everybody - I hope to be able to report an actual working mingw/Win64 SSE2-enabled build sometime next week. |
|
|
|
|
|
#27 |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
...and we have a successful 64-bit windows build!
Some Notes: - I haven't played with various combos of compiler-optimization flags, just used -O3, which was consistently fastest for me in previous linux/gcc builds. - FFT lengths of the form 13*2^n are not yet being supported in the Mingw/gcc build (one more hefty ASM macro needs to be ported to GCC and then eyeball-optimized for 64-bit to support those) - those are the "not available" annotated entries in the rightmost column of timings below. - These comparative timings are on an unloaded (at least insofar as Win7 and our IT department's various - System is a a quad-core 2.67 GHz I7, this is running on a single CPU ... I expect I'll need to cut at least one third off these timings to be competitive with Prime95, but we does what we can in the limited time available to us. :P Code:
FFT length LL-test Timing (sec/iter)
(Kdoubles) 32-bit visual studio 64-bit mingw-w64/gcc-4.4
1024 0.030 0.028
1152 0.038 0.035
1280 0.038 0.036
1408 0.042 0.040
1536 0.043 0.043
1664 0.068 n/a
1792 0.055 0.054
1920 0.058 0.056
2048 0.062 0.057
2304 0.075 0.071
2560 0.081 0.073
2816 0.094 0.093
3072 0.094 0.092
3328 0.138 n/a
3584 0.118 0.111
3840 0.134 0.130
4096 0.125 0.131
Last fiddled with by ewmayer on 2010-10-11 at 21:42 |
|
|
|
|
|
#28 |
|
Tribal Bullet
Oct 2004
DED16 Posts |
Small side note: at least in winXP, if you change the path in your environment you don't have to reboot to make the changes take effect, only launch a new command prompt.
Good luck navigating the maze of pitfalls building on windows, everything is a little bit different there... |
|
|
|
|
|
#29 | |
|
Just call me Henry
"David"
Sep 2007
Liverpool (GMT/BST)
614110 Posts |
Quote:
) >=xp
|
|
|
|
|
|
|
#30 | |
|
∂2ω=0
Sep 2002
República de California
22·2,939 Posts |
Quote:
movaps xmm0,[ecx+edi] In 64-bit GCC, I believe this should translate to the needed (base,index, scale) format as: movaps (%%rcx,%%rdi,1),%%xmm0 Is that right? Also, VS allows register-content add or subtract in such addressing computations, e.g. movaps xmm0,[ecx-edi] is also legal. GCC does allow negative *constant* register-address offsets like -0x20(%%rcx), but is there a way to inline a subtract-second-register-contents in the above (base,index, scale) format, or does one have to either explicitly negate the contents of rdi followed by the above "movaps (%%rcx,%%rdi,1),%%xmm0", or explicitly do "sub %%rdi,%%rcx", followed by "movaps (%%rcx),%%xmm0"? |
|
|
|
|
|
|
#31 | |
|
Undefined
"The unspeakable one"
Jun 2006
My evil lair
6,793 Posts |
Quote:
[edit] I guess the shortest possible sequence might be: Code:
not edi movaps xmm0,[ecx+edi+1] not edi Last fiddled with by retina on 2010-10-14 at 06:16 |
|
|
|
|
|
|
#32 |
|
Tribal Bullet
Oct 2004
5·23·31 Posts |
I'm not sure that gcc inline asm accepts explicit scaled addressing modes when the scale factor is 1; I've always seen it expressed as
movaps (%%rcx,%%rdi),%%xmm0 though it can probably accept both formats. Retina: unlike all other 64-bit processors that have a 32-bit subset, x86_64 always zero-extends (rather than sign-extends) the high-order 32 bits of a 64-bit register when doing 32-bit arithmetic. For Ernst's case you likely will need to perform a 64-bit NEG to get the correct behavior. Last fiddled with by jasonp on 2010-10-17 at 17:42 |
|
|
|
|
|
#33 | |
|
∂2ω=0
Sep 2002
República de California
267548 Posts |
Retina, based on your comment - coupled with the fact that Visual studio does seem to be diligent about not inducing ASM-inlining-related clobbers other than the one the user specifies - I expected to see something like this in the assembler output when I generated it over the weekend:
Code:
add ecx,edi movaps xmm0,[ecx] sub ecx,edi Code:
movaps xmm0, XMMWORD PTR [ecx+edi] Quote:
|
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| PauseWhileRunning and running as admin [Win7] | ixfd64 | Software | 8 | 2016-03-14 01:17 |
| Query - Running GIMPS on a 4 way system | Unregistered | Hardware | 6 | 2005-07-04 04:27 |
| Torture Test - System running processor very low compared to other systems | DougTheSlug | Hardware | 5 | 2005-01-27 09:51 |
| Running prime95 and NFSNET together on a HT enabled system | TauCeti | NFSNET Discussion | 1 | 2003-07-02 16:26 |
| How long has your system been running without a reset? | Gary Edstrom | Lounge | 14 | 2003-06-28 15:00 |