mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2005-07-26, 12:27   #12
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

3,541 Posts
Default

Quote:
Originally Posted by akruppa
The fastest way should be using the fistp opcode which uses the currently set rounding direction, which is round-to-nearest by default. Changing rounding modes in the fpu causes the pipeline to be flushed which imposes a huge penalty. This occurs for *each* type-cast to int as those are defined as truncate (round to zero)!
Back in the Pentium era FISTP took a long time; a faster alternative, especially in pipelined code, involved adding a magic constant that shifted the mantissa in a floating point register, then performing an ordinary 64-bit store to memory. The converted floating point value was then read as the lower 32 bits of the memory location. I believe the constant was 3*2^52; the technique is probably documented in http://www.agner.org/assem/#optimize, which is a very cool reference in general.

I don't know about more modern processors; the big drawback to this method is that you do a store of one size to memory and a load of a different size to the same memory. That may cause big stalls, or it may not, because hardware designers know lots of people used to use this trick.

jasonp
jasonp is offline   Reply With Quote
Old 2005-07-26, 14:34   #13
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Thumbs up

Quote:
Originally Posted by R.D. Silverman
No, I clearly don't need to pass a pointer. But your suggestion creates
two temporaries on the stack because the float instructions need memory
addresses as destinations. I am not sure if yours is slower or faster than mine
Also, the fadd is not needed in the routine. If the routine only does 'fist'
You can call it with (a + .5) to get "ceil" or (a-.5) to get "floor".

Is the finit instruction really needed??

Thanks for the advice!!!
Would someone more knowledgable than myself about coding FP in assembler
on the Pentium please tell me if the finit is NEEDED??

Thanks.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-26, 16:34   #14
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

32×5×107 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Would someone more knowledgable than myself about coding FP in assembler on the Pentium please tell me if the finit is NEEDED??

Thanks.
FINIT was designed to "reset" the Math coprocessor 80287 and its brother 80387 when a PC was designed to have separated CPU and FPU.

My wild guess is that the FPU and the CPU in the chip of a Pentium-like processor are reset together at the same time.

Michael L. Schmit (Pentium processor optimization tools, AP-Professional, founder of Quantasm Corp.) seems to think the same: he never delved into FINIT instruction in his book.
Well, he didn't find out the Pentium bug either...

Just my 0.02 euro...

Luigi
ET_ is offline   Reply With Quote
Old 2005-07-26, 16:54   #15
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

11101001001002 Posts
Thumbs up

Quote:
Originally Posted by ET_
FINIT was designed to "reset" the Math coprocessor 80287 and its brother 80387 when a PC was designed to have separated CPU and FPU.

My wild guess is that the FPU and the CPU in the chip of a Pentium-like processor are reset together at the same time.

Michael L. Schmit (Pentium processor optimization tools, AP-Professional, founder of Quantasm Corp.) seems to think the same: he never delved into FINIT instruction in his book.
Well, he didn't find out the Pentium bug either...

Just my 0.02 euro...

Luigi
What I meant was whether *not* resetting the FPU inside this routine
will cause problems elsewhere.
R.D. Silverman is offline   Reply With Quote
Old 2005-07-26, 17:26   #16
alpertron
 
alpertron's Avatar
 
Aug 2002
Buenos Aires, Argentina

2·683 Posts
Default

In a program I wrote in assembler using inline assembly in Visual C++, I had to use the instruction FINIT in order to restore the full precision of the coprocessor.

It appears that Visual C++ sets the precision to 53 bits. After I executed the instruction FINIT the precision was incremented to 64 bits that's what I needed in that program.
alpertron is offline   Reply With Quote
Old 2005-07-26, 17:46   #17
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

246710 Posts
Default

According to the Intel Instruction Set Reference, FINIT initialises the FPU state to default values: round to nearest, 64 bit precision, all exceptions masked. Afaik it needs to be issued once per process before that process uses the FPU to get a well-defined state (not sure about that, though). In any case it should not be issued before every double->int conversion, the FINIT is almost certainly a serialising instruction and will flush the pipeline again.

Alex
akruppa is offline   Reply With Quote
Old 2005-07-26, 19:10   #18
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

113178 Posts
Default

I found this piece of (assembly) code that may look interesting to you in Agner Fog sources:

Code:
;          TRUNCATE.ASM                                          Agner Fog 2004

; © 2003 GNU General Public License www.gnu.org/copyleft/gpl.html

.686
.xmm
.model flat

extrn instrset:dword, InstructionSet:near

PublicAlias MACRO MangledName ; macro for giving a function alias public names
        MangledName label near
        public MangledName
ENDM

.code

; ********** Truncate function **********
; C++ prototype:
; extern "C" int Truncate (double x);

; This function converts a double precision floating point number to
; an integer, rounding towards zero.

; This function is faster than the default conversion method in C++.
; Uses SSE2 instruction set if possible.

; For the sake of speed, there is no special overflow check. In case of 
; overflow, you may get an exception or an invalid result.

Truncate PROC NEAR
PUBLIC Truncate
PublicAlias _Truncate               ; Underscore needed when called from Windows
        cmp     [instrset], 4       ; can we use XMM instructions?
        jl      NO_SSE2
        ; SSE2 (XMM) instruction set:
        cvttsd2si eax, [esp+4]      ; use truncation instruction
        ret

NO_SSE2:cmp     [instrset], 0
        jl      DETECT_INSTRUCTIONSET
        ; default instruction set:
        fld     qword ptr [esp+4]   ; x
        sub     esp, 12             ; space for local variables
        fist    dword ptr [esp]     ; rounded value
        fst     dword ptr [esp+4]   ; float value
        fisub   dword ptr [esp]     ; subtract rounded value
        fstp    dword ptr [esp+8]   ; difference
        pop     eax                 ; rounded value
        pop     ecx                 ; float value
        pop     edx                 ; difference (float)
        test    ecx, ecx            ; test sign of x
        js      SHORT NEGATIVE
        add     edx, 7FFFFFFFH      ; produce carry if difference < -0
        sbb     eax, 0              ; subtract 1 if x-round(x) < -0
        ret
NEGATIVE:
        xor     ecx, ecx
        test    edx, edx
        setg    cl                  ; 1 if difference > 0
        add     eax, ecx            ; add 1 if x-round(x) > 0
        ret

DETECT_INSTRUCTIONSET:              ; first time call. detect instruction set
        call    InstructionSet
        jmp     Truncate        
Truncate ENDP

END
The "useful instruction" is the following:

cvttsd2si eax, [esp+4] ; use truncation instruction


HTH

Luigi

Last fiddled with by ET_ on 2005-07-26 at 19:14
ET_ is offline   Reply With Quote
Old 2005-07-26, 20:56   #19
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Question

Quote:
Originally Posted by ET_

Michael L. Schmit (Pentium processor optimization tools, AP-Professional, founder of Quantasm Corp.) seems to think the same: he never delved into FINIT instruction in his book.
Well, he didn't find out the Pentium bug either...

Just my 0.02 euro...

Luigi
Hi,

What is the exact name of this book?
R.D. Silverman is offline   Reply With Quote
Old 2005-07-26, 22:48   #20
Prime95
P90 years forever!
 
Prime95's Avatar
 
Aug 2002
Yeehaw, FL

2×53×71 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Is the finit instruction really needed??
In short, no. FINIT as has been pointed out resets the rounding mode, precision control, etc. The C runtime library expects these things to be set a certain way - the default state of FINIT.

If you change the rounding mode, then you must set it back - either with FINIT or by manipulating the rounding control bits.
Prime95 is offline   Reply With Quote
Old 2005-07-27, 15:48   #21
S78496
 
S78496's Avatar
 
Nov 2002

11 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Hi,

What is the exact name of this book?
Pentium Processor Optimization Tools. ISBN: 0126272301

Here's a link for further info about this book:

http://portal.acm.org/citation.cfm?id=188852
S78496 is offline   Reply With Quote
Old 2005-07-27, 19:08   #22
ET_
Banned
 
ET_'s Avatar
 
"Luigi"
Aug 2002
Team Italia

481510 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Hi,

What is the exact name of this book?
Michael L. Schmit
"Pentium processor optimization tools"
AP Professional
ISBN 0-12-627230-1

Luigi

(whoops... already answered )

Last fiddled with by ET_ on 2005-07-27 at 19:16
ET_ is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
A Challenge on the net devarajkandadai Miscellaneous Math 0 2012-05-31 05:17
When I was your age.....CHALLENGE petrw1 Lounge 14 2009-11-23 02:18
Challenge science_man_88 Miscellaneous Math 229 2009-09-07 08:08
rsa-640 challenge ValerieVonck Factoring 58 2005-10-24 15:54
Who is Challenge? JuanTutors PrimeNet 2 2004-07-22 12:56

All times are UTC. The time now is 08:21.


Sat Jul 17 08:21:23 UTC 2021 up 50 days, 6:08, 1 user, load averages: 1.81, 1.72, 1.53

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.