mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2010-10-08, 18:29   #23
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22×2,939 Posts
Default

Quote:
Originally Posted by rogue View Post
I don't know if he is trying to determine the size of a long or the size of a long ptr as they are not necessarily the same. Using LONG_MAX and LLONG_MAX should work in Visual Studio, MinGW, and other OSes if trying to determine the number of bits in a long. If long ptr is what he care about, then LONG_MAX and LLONG_MAX will work everywhere except Windows, in which case he also has to include looking for WIN64. Without WIN64, long ptrs are 32 bits and with WIN64, long ptrs are 64 bits.
All previous versions of GCC I'd used up to the mingw-w64 version defined LONG_MAX (either with or without bracketing __) to be the equal to 2^(bits in a pointer)-1. So what I previously had was this preprocessor checl:
Code:
/* Syntax here is GCC and SunStudio/MSVC, respectively: */
#if !defined(__LONG_MAX__) && !defined(LONG_MAX)
    #include <limits.h>
#endif

#if !defined(__LONG_MAX__) &&  defined(LONG_MAX)
    #define __LONG_MAX__  LONG_MAX
#endif

#ifdef __LONG_MAX__
    #if __LONG_MAX__ == 2147483647L
        #define OS_BITS 32
    #elif __LONG_MAX__ == 9223372036854775807L
        #define OS_BITS 64
    #else
        #error  __LONG_MAX__ defined but value unrecognized!
    #endif
#else
    #error platform.h: failed to properly set OS_BITS!
#endif
(Rogue, note that the syntax "#if a == b" is perfectly fine, although enclosing the boolean clause as you suggest does make it clearer.

I need the size of a pointer (rather than a long int per se) because the bit-ness of the OS (which is reflected therein) determines which version of many assembler macros to use. Because of the oddity in LLP64 systems WraithX notes, I can't use LONG_MAX as a proxy for pointer size. I can (and will) use extra preprocessor code that first checks to see if __SIZEOF_POINTER__ is defined, I was simply hoping for a cleaner, more-unified way to do this pointer-size checking. Note that modifying the LONG_MAX-based check to instead use LONG_LONG_MAX (LLONG_MAX under MSVC) won't work, because e.g. 32-bit Linux and 64-bit linux both have LONG_LONG_MAX = 2^64-1.
ewmayer is offline   Reply With Quote
Old 2010-10-08, 20:28   #24
rogue
 
rogue's Avatar
 
"Mark"
Apr 2003
Between here and the

1CAA16 Posts
Default

Punt!

Seriously, why not just use LONG_MAX and LP64 or other known preprocessor defines to determine the number of bits in a pointer? If someone needs to build on another platform, then they can provide the necessary defines that your code will need to determine the number of bits in a pointer. In the end it will probably be easier than pulling your hair out of this.
rogue is offline   Reply With Quote
Old 2010-10-08, 23:21   #25
WraithX
 
WraithX's Avatar
 
Mar 2006

22A16 Posts
Default

What if you switched out each occurrence of LONG_MAX with SIZE_MAX? If all your compilers are C99 or later, this should work (hopefully). Like so:
Code:
/* Syntax here is GCC and SunStudio/MSVC, respectively: */
#if !defined(__SIZE_MAX__) && !defined(SIZE_MAX)
    #include <limits.h>
#endif

#if !defined(__SIZE_MAX__) &&  defined(SIZE_MAX)
    #define __SIZE_MAX__  SIZE_MAX
#endif

#ifdef __SIZE_MAX__
    #if __SIZE_MAX__ == 0xffffffff
        #define OS_BITS 32
    #elif __SIZE_MAX__ == 0xffffffffffffffff
        #define OS_BITS 64
    #else
        #error  __SIZE_MAX__ defined but value unrecognized!
    #endif
#else
    #error platform.h: failed to properly set OS_BITS!
#endif
You may have to use ULONG_MAX UINT_MAX and ULONG_LONG_MAX for the comparisons above (or maybe append ul and ull to the 0x numbers, respectively) to get the above to work. I'd be interested to hear if this works for you.

*edit* Don't use ULONG_MAX, that wouldn't work on LP64 systems. But UINT_MAX should work on both LP64 and LLP64 systems.
WraithX is offline   Reply With Quote
Old 2010-10-09, 00:14   #26
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

1175610 Posts
Default

I checked the list of predefines for both gcc 3.4 and 4.2 on one of our linux/amd64 systems at work, and neither of those versions defines SIZE_MAX.

Since augmenting my LONG_MAX predefine checks by first checking if __SIZEOF_POINTER__ is defined properly handles the Win64/mingw issue, gonna go with that for now - in the meantime, there is actual code that needs to get written. ;)

Thanks for the help, everybody - I hope to be able to report an actual working mingw/Win64 SSE2-enabled build sometime next week.
ewmayer is offline   Reply With Quote
Old 2010-10-11, 21:37   #27
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·2,939 Posts
Default

...and we have a successful 64-bit windows build!

Some Notes:

- I haven't played with various combos of compiler-optimization flags, just used -O3, which was consistently fastest for me in previous linux/gcc builds.

- FFT lengths of the form 13*2^n are not yet being supported in the Mingw/gcc build (one more hefty ASM macro needs to be ported to GCC and then eyeball-optimized for 64-bit to support those) - those are the "not available" annotated entries in the rightmost column of timings below.

- These comparative timings are on an unloaded (at least insofar as Win7 and our IT department's various keystroke-logger and other employee-spyware utilitiesantivirus and system-monitoring tools allow.

- System is a a quad-core 2.67 GHz I7, this is running on a single CPU ... I expect I'll need to cut at least one third off these timings to be competitive with Prime95, but we does what we can in the limited time available to us. :P
Code:
FFT length             LL-test Timing (sec/iter)
(Kdoubles)  32-bit visual studio   64-bit mingw-w64/gcc-4.4
      1024           0.030                  0.028
      1152           0.038                  0.035
      1280           0.038                  0.036
      1408           0.042                  0.040
      1536           0.043                  0.043
      1664           0.068                   n/a
      1792           0.055                  0.054
      1920           0.058                  0.056
      2048           0.062                  0.057
      2304           0.075                  0.071
      2560           0.081                  0.073
      2816           0.094                  0.093
      3072           0.094                  0.092
      3328           0.138                   n/a
      3584           0.118                  0.111
      3840           0.134                  0.130
      4096           0.125                  0.131

Last fiddled with by ewmayer on 2010-10-11 at 21:42
ewmayer is offline   Reply With Quote
Old 2010-10-13, 11:13   #28
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

DED16 Posts
Default

Small side note: at least in winXP, if you change the path in your environment you don't have to reboot to make the changes take effect, only launch a new command prompt.

Good luck navigating the maze of pitfalls building on windows, everything is a little bit different there...
jasonp is offline   Reply With Quote
Old 2010-10-13, 17:59   #29
henryzz
Just call me Henry
 
henryzz's Avatar
 
"David"
Sep 2007
Liverpool (GMT/BST)

614110 Posts
Default

Quote:
Originally Posted by jasonp View Post
Small side note: at least in winXP, if you change the path in your environment you don't have to reboot to make the changes take effect, only launch a new command prompt.

Good luck navigating the maze of pitfalls building on windows, everything is a little bit different there...
I can confirm that for all windows(not tested server versions) >=xp
henryzz is offline   Reply With Quote
Old 2010-10-13, 21:42   #30
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

22·2,939 Posts
Default

Quote:
Originally Posted by jasonp View Post
Small side note: at least in winXP, if you change the path in your environment you don't have to reboot to make the changes take effect, only launch a new command prompt.

Good luck navigating the maze of pitfalls building on windows, everything is a little bit different there...
Thanks ... currently having to do some disassembly-debug using a 64-bit linux build of the same code, since the mingw64 install only has gdb, not my preferred ddd. Which brings me to a small gcc inline-asm syntax question: The crash I'm debugging is in an assembler macro where I load to an xmm register via a (base + offset) address computation. In VS-style syntax, the load looks like this:

movaps xmm0,[ecx+edi]

In 64-bit GCC, I believe this should translate to the needed (base,index, scale) format as:

movaps (%%rcx,%%rdi,1),%%xmm0

Is that right? Also, VS allows register-content add or subtract in such addressing computations, e.g.

movaps xmm0,[ecx-edi]

is also legal. GCC does allow negative *constant* register-address offsets like -0x20(%%rcx), but is there a way to inline a subtract-second-register-contents in the above (base,index, scale) format, or does one have to either explicitly negate the contents of rdi followed by the above "movaps (%%rcx,%%rdi,1),%%xmm0", or explicitly do "sub %%rdi,%%rcx", followed by "movaps (%%rcx),%%xmm0"?
ewmayer is offline   Reply With Quote
Old 2010-10-14, 06:09   #31
retina
Undefined
 
retina's Avatar
 
"The unspeakable one"
Jun 2006
My evil lair

6,793 Posts
Default

Quote:
Originally Posted by ewmayer View Post
Also, VS allows register-content add or subtract in such addressing computations, e.g.

movaps xmm0,[ecx-edi]
WTF? That is not supported by the CPU. VS must translate that to at least two instructions. And thus clobbering some register/memory/stack/flags that you might not expect? I'm interested to know what instructions it actually compiles to support that.

[edit]
I guess the shortest possible sequence might be:
Code:
not edi
movaps xmm0,[ecx+edi+1]
not edi
Doesn't affect the flags so I suppose it could do it. But in 64-bit mode the upper portion of RDI would be zeroed which might be holding something important. Certainly ugly and inefficient.

Last fiddled with by retina on 2010-10-14 at 06:16
retina is offline   Reply With Quote
Old 2010-10-17, 17:23   #32
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

I'm not sure that gcc inline asm accepts explicit scaled addressing modes when the scale factor is 1; I've always seen it expressed as

movaps (%%rcx,%%rdi),%%xmm0

though it can probably accept both formats.

Retina: unlike all other 64-bit processors that have a 32-bit subset, x86_64 always zero-extends (rather than sign-extends) the high-order 32 bits of a 64-bit register when doing 32-bit arithmetic. For Ernst's case you likely will need to perform a 64-bit NEG to get the correct behavior.

Last fiddled with by jasonp on 2010-10-17 at 17:42
jasonp is offline   Reply With Quote
Old 2010-10-18, 20:07   #33
ewmayer
2ω=0
 
ewmayer's Avatar
 
Sep 2002
República de California

267548 Posts
Default

Retina, based on your comment - coupled with the fact that Visual studio does seem to be diligent about not inducing ASM-inlining-related clobbers other than the one the user specifies - I expected to see something like this in the assembler output when I generated it over the weekend:
Code:
add ecx,edi
movaps xmm0,[ecx]
sub ecx,edi
...and thus was rather surprised to see this generated:
Code:
movaps    xmm0, XMMWORD PTR [ecx+edi]
Now the first thing that occurred to me on seeing this was that more addressing-code translation might be occurring in assembly of the above instruction ... but a little more digging seems to indicate this is in fact a valid addressing mode ... I've boldfaced the key snip:
Quote:
The offset part of the memory address can be specified either directly as a static value (called a displacement) or through an address computation made up of one or more of the following components:

* Displacement - An 8-, 16-, or 32-bit value.
* Base - The value in a general-purpose register.
* Index - The value in a general-purpose register except EBP.
* Scale factor - A value of 2, 4, or 8 that is multiplied by the index value.

An effective address is computed by:

Offset = Base + (Index * Scale) + displacement
My Comment: Jason, the above makes it sound like using scale = 1 is illegitimate, but I tried your sans-1 syntax and observed that the generated assembly has the 1 in there, i.e. the default scale = 1.
ewmayer is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
PauseWhileRunning and running as admin [Win7] ixfd64 Software 8 2016-03-14 01:17
Query - Running GIMPS on a 4 way system Unregistered Hardware 6 2005-07-04 04:27
Torture Test - System running processor very low compared to other systems DougTheSlug Hardware 5 2005-01-27 09:51
Running prime95 and NFSNET together on a HT enabled system TauCeti NFSNET Discussion 1 2003-07-02 16:26
How long has your system been running without a reset? Gary Edstrom Lounge 14 2003-06-28 15:00

All times are UTC. The time now is 04:24.


Fri Jul 7 04:24:21 UTC 2023 up 323 days, 1:52, 0 users, load averages: 1.62, 1.68, 1.56

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔