![]() |
|
|
#12 |
|
Aug 2003
1100002 Posts |
Hmm, I'm not familiar with MASM but you could try using GNU objcopy (part of binutils) to convert the object file. Messing around with objcopy isn't exactly my idea of fun :/.
It's too bad that NASM doesn't have an x86-64 port yet. According to a forum post, the developers want to fix bugs and release 0.99 first. I know that recent versions of GAS support Intel syntax when .intel_syntax is specified, but I doubt the support is anywhere near good enough to handle things like MASM macros. |
|
|
|
|
|
#13 | |
|
Apr 2003
Berlin, Germany
16916 Posts |
Quote:
[code:1]some thought up example in pseudocode (xmm8 holds -2.0,-2.0): movapd xmm0,[v0] ;load v0 addpd xmm0,[v1] ;res0:=v1+v0 movapd [res0],xmm0 ;store res0 mulpd xmm0, xmm8 ;v0:=-2*v0 addpd xmm0,[v1] ;res1:=v1-2*v0 movapd [res1],xmm0 ;store res1[/code:1] Obviously this should be mixed with other instructions. This way we do these calculations using only one explicit load and the implicit loads don't need any issue slots but have increased latency. What I think is that the Opteron architecture is not perfectly suited to SSE2 but has been made compatible to it. Before this decision AMD wanted to add "Technical Floating Point" to the K8 architecture. That included instructions with 3 operands (2 sources, 1 destination) and also more registers than x87 - a number like 32 or 64. The domination of SSE2 made it necessary to be compatible with it. |
|
|
|
|
|
|
#14 | |
|
Aug 2002
11110 Posts |
Quote:
[code:1]res0 := v0 + v1; res1 := v0 - v1; [/code:1] The code I posted had an error in third line (I changed the order of operands). It has to be: [code:1]addpd xmm0, xmm1; r0 <- r0 + r1 mul_minustwo xmm1 ; r1 <- -2*r1; addpd xmm1, xmm0; r1 <- r0 - r1 [/code:1] So is it hard to work without r0 and r1 loaded in registers. Guillermo [/quote] |
|
|
|
|
|
|
#15 |
|
Apr 2003
Berlin, Germany
192 Posts |
What I wanted to say is that the negative constant value allows us to use addpd instead of subpd. And for addpd (and mulpd) the operands can be swapped (leaving the desired one in memory) but not for subpd. Sure, this technique wouldn't find many places of application.
|
|
|
|
|
|
#16 |
|
Apr 2003
Berlin, Germany
36110 Posts |
Currently reading the event counters is only possible at the highest privilege level. To make reading possible at user level software we need a driver which has access to the counters. Such a driver is located here: http://user.it.uu.se/~mikpe/linux/perfctr/. It also supports AMD64's long mode.
I'm sure it would it be useful for development. What do you all think? I made a small page which shows the most performance monitor events for Opteron: http://optimizer.sourceforge.net/events.html I choosed those which could be relevant for us. Now the question is, if and how we want to make this feature available on the Opteron box. It would need some kernel patching and installation of a driver. Thus requiring a reboot I assume. |
|
|
|
|
|
#17 |
|
Aug 2002
2×32×13×37 Posts |
Would making the accounts root level give them access to these counters?
|
|
|
|
|
|
#18 | |
|
Apr 2003
Berlin, Germany
192 Posts |
Quote:
I made a small test suite available in my home in a subdir called "test". Run ./compile there. It will compile the 2 sources and execute them. If the instructions ("rdpmc" and "movl %cr4,%eax") don't work, then you'll get 2 segmentation faults. In the other case it will output "Success" |
|
|
|
|
|
|
#19 | ||||
|
Apr 2003
Berlin, Germany
192 Posts |
I just saw that I missed to answer some questions here.
Quote:
Quote:
Quote:
Here I quote my "useful documents" links, which could help finding important docs: Quote:
|
||||
|
|
|
|
|
#20 |
|
Aug 2002
2×32×13×37 Posts |
[code:1]opteron:/home/db/test # ./compile
./compile: line 3: 16943 Segmentation fault ./rdpmc ./compile: line 4: 16944 Segmentation fault ./movcr4[/code:1] |
|
|
|
|
|
#21 |
|
Apr 2003
Berlin, Germany
192 Posts |
Thanks. So we know that root privileges won't help.
It seems there is no way around kernel patching and using such a driver. :( I think it would also be possible to set the required bit in CR4 somewhere in the kernel. But how and where? I'll investigate further into that. |
|
|
|
|
|
#22 |
|
Apr 2003
Berlin, Germany
192 Posts |
MASM update:
Currently it looks like their is no way to produce an object file format which can be converted by objcopy. Also gcc doesn't understand them (as I expected). Now there are following possibilities left:[list]- trying to modify objcopy to read the created obj files (needs analysis of their new format) - looking for another assembler which understands MASM syntax and assembles AMD64 code, creating an useful elf64 format - looking for or developing a MASM-converter (depends on how many features we have to support)[/list:u] |
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| The bandwidth bottleneck is apparently much older than I thought | Dubslow | Hardware | 5 | 2017-11-16 19:50 |
| Opteron is Hyperthreaded ? | bgbeuning | Information & Answers | 3 | 2016-01-10 08:26 |
| Modular Inversion Bottleneck | Sam Kennedy | Programming | 4 | 2013-01-25 16:50 |
| AMD Athlon 64 vs AMD Opteron for ecm | thomasn | Factoring | 6 | 2004-11-08 13:25 |
| AMD Opteron | naclosagc | Software | 27 | 2003-08-10 19:14 |