mersenneforum.org  

Go Back   mersenneforum.org > Great Internet Mersenne Prime Search > Hardware

Reply
 
Thread Tools
Old 2003-09-10, 02:08   #12
aaronl
 
aaronl's Avatar
 
Aug 2003

1100002 Posts
Default

Hmm, I'm not familiar with MASM but you could try using GNU objcopy (part of binutils) to convert the object file. Messing around with objcopy isn't exactly my idea of fun :/.

It's too bad that NASM doesn't have an x86-64 port yet. According to a forum post, the developers want to fix bugs and release 0.99 first. I know that recent versions of GAS support Intel syntax when .intel_syntax is specified, but I doubt the support is anywhere near good enough to handle things like MASM macros.
aaronl is offline   Reply With Quote
Old 2003-09-10, 08:39   #13
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

16916 Posts
Default

Quote:
Originally Posted by gbvalor
It would be better to store minus two in a XMM register.
That would also allow to work on some values without having to load them explicitly (if they are used less often).

[code:1]some thought up example in pseudocode (xmm8 holds -2.0,-2.0):
movapd xmm0,[v0] ;load v0
addpd xmm0,[v1] ;res0:=v1+v0
movapd [res0],xmm0 ;store res0
mulpd xmm0, xmm8 ;v0:=-2*v0
addpd xmm0,[v1] ;res1:=v1-2*v0
movapd [res1],xmm0 ;store res1[/code:1]
Obviously this should be mixed with other instructions.

This way we do these calculations using only one explicit load and the implicit loads don't need any issue slots but have increased latency.

What I think is that the Opteron architecture is not perfectly suited to SSE2 but has been made compatible to it. Before this decision AMD wanted to add "Technical Floating Point" to the K8 architecture. That included instructions with 3 operands (2 sources, 1 destination) and also more registers than x87 - a number like 32 or 64. The domination of SSE2 made it necessary to be compatible with it.
Dresdenboy is offline   Reply With Quote
Old 2003-09-10, 11:40   #14
gbvalor
 
gbvalor's Avatar
 
Aug 2002

11110 Posts
Default

Quote:
That would also allow to work on some values without having to load them explicitly (if they are used less often).
But this code is not what we need. We need
[code:1]res0 := v0 + v1;
res1 := v0 - v1;
[/code:1]

The code I posted had an error in third line (I changed the order of operands). It has to be:
[code:1]addpd xmm0, xmm1; r0 <- r0 + r1
mul_minustwo xmm1 ; r1 <- -2*r1;
addpd xmm1, xmm0; r1 <- r0 - r1
[/code:1]

So is it hard to work without r0 and r1 loaded in registers.

Guillermo


[/quote]
gbvalor is offline   Reply With Quote
Old 2003-09-10, 13:11   #15
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

What I wanted to say is that the negative constant value allows us to use addpd instead of subpd. And for addpd (and mulpd) the operands can be swapped (leaving the desired one in memory) but not for subpd. Sure, this technique wouldn't find many places of application.
Dresdenboy is offline   Reply With Quote
Old 2003-09-10, 18:52   #16
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

36110 Posts
Default

Currently reading the event counters is only possible at the highest privilege level. To make reading possible at user level software we need a driver which has access to the counters. Such a driver is located here: http://user.it.uu.se/~mikpe/linux/perfctr/. It also supports AMD64's long mode.

I'm sure it would it be useful for development. What do you all think?

I made a small page which shows the most performance monitor events for Opteron: http://optimizer.sourceforge.net/events.html

I choosed those which could be relevant for us.

Now the question is, if and how we want to make this feature available on the Opteron box.

It would need some kernel patching and installation of a driver. Thus requiring a reboot I assume.
Dresdenboy is offline   Reply With Quote
Old 2003-09-11, 02:18   #17
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2×32×13×37 Posts
Default

Would making the accounts root level give them access to these counters?
Xyzzy is offline   Reply With Quote
Old 2003-09-11, 05:37   #18
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Quote:
Originally Posted by Xyzzy
Would making the accounts root level give them access to these counters?
I don't have exact information on that topic. That depends on the privilege level which an application gets. If it's still level 0 (then only kernel has level 3) it won't help.

I made a small test suite available in my home in a subdir called "test".
Run ./compile there. It will compile the 2 sources and execute them. If the instructions ("rdpmc" and "movl %cr4,%eax") don't work, then you'll get 2 segmentation faults. In the other case it will output "Success"
Dresdenboy is offline   Reply With Quote
Old 2003-09-11, 06:39   #19
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

I just saw that I missed to answer some questions here.

Quote:
Originally Posted by Prime95
As an aside, do you know if Microsoft MASM is going to support the extra XMM registers? Also, is it true that these extra registers are only available in 64-bit mode? If so, then MASM would have to output a whole new object file format, true?

Converting all that assembly code to some other syntax would be a horrendously tedious task.
I'm sure you know it already: MASM will support the additional registers but they are only available in 64bit mode. It's object file format would have some changes I think because it was thought for 32bit machines and now has to handle information regarding the 64bit long mode.

Quote:
Thanks for the MASM link, I'll play with it some. Already, I've noticed that some x86 instructions no longer exist. Like "push ebp" and "push OFFSET global_var". Looks like I'll have to download the x86-64 manual.
I think "push ebp" should be "push rbp" in that case because the 32bit pointer would miss the upper half.

Quote:
I sure hope the MASM output can be turned into a linux compatible object file. Does anyone know what format the MASM object file is?
As already mentioned by aaronl there should be a way to do this. Otherwise the conversion of the current asm code to at&t syntax or at least to 64bit mode compatibility could be done by some small perl scripts.

Here I quote my "useful documents" links, which could help finding important docs:
Quote:
Several docs and guides (I recommend the optimization manual and Tim Wilkens' presentation) :
http://www.amd.com/us-en/Processors/DevelopWithAMD/0,,30_2252_9044,00.html

Useful info regarding block prefetching:
http://cdrom.amd.com/devconn/events/

A lot of (mostly older) docs:
http://cdrom.amd.com/
Dresdenboy is offline   Reply With Quote
Old 2003-09-11, 07:43   #20
Xyzzy
 
Xyzzy's Avatar
 
Aug 2002

2×32×13×37 Posts
Default

[code:1]opteron:/home/db/test # ./compile
./compile: line 3: 16943 Segmentation fault ./rdpmc
./compile: line 4: 16944 Segmentation fault ./movcr4[/code:1]
Xyzzy is offline   Reply With Quote
Old 2003-09-11, 09:08   #21
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

Thanks. So we know that root privileges won't help.

It seems there is no way around kernel patching and using such a driver. :(

I think it would also be possible to set the required bit in CR4 somewhere in the kernel. But how and where? I'll investigate further into that.
Dresdenboy is offline   Reply With Quote
Old 2003-09-12, 21:17   #22
Dresdenboy
 
Dresdenboy's Avatar
 
Apr 2003
Berlin, Germany

192 Posts
Default

MASM update:

Currently it looks like their is no way to produce an object file format which can be converted by objcopy. Also gcc doesn't understand them (as I expected).

Now there are following possibilities left:[list]- trying to modify objcopy to read the created obj files (needs analysis of their new format)
- looking for another assembler which understands MASM syntax and assembles AMD64 code, creating an useful elf64 format
- looking for or developing a MASM-converter (depends on how many features we have to support)[/list:u]
Dresdenboy is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
The bandwidth bottleneck is apparently much older than I thought Dubslow Hardware 5 2017-11-16 19:50
Opteron is Hyperthreaded ? bgbeuning Information & Answers 3 2016-01-10 08:26
Modular Inversion Bottleneck Sam Kennedy Programming 4 2013-01-25 16:50
AMD Athlon 64 vs AMD Opteron for ecm thomasn Factoring 6 2004-11-08 13:25
AMD Opteron naclosagc Software 27 2003-08-10 19:14

All times are UTC. The time now is 16:12.


Fri Jul 7 16:12:56 UTC 2023 up 323 days, 13:41, 0 users, load averages: 2.31, 1.64, 1.34

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

≠ ± ∓ ÷ × · − √ ‰ ⊗ ⊕ ⊖ ⊘ ⊙ ≤ ≥ ≦ ≧ ≨ ≩ ≺ ≻ ≼ ≽ ⊏ ⊐ ⊑ ⊒ ² ³ °
∠ ∟ ° ≅ ~ ‖ ⟂ ⫛
≡ ≜ ≈ ∝ ∞ ≪ ≫ ⌊⌋ ⌈⌉ ∘ ∏ ∐ ∑ ∧ ∨ ∩ ∪ ⨀ ⊕ ⊗ 𝖕 𝖖 𝖗 ⊲ ⊳
∅ ∖ ∁ ↦ ↣ ∩ ∪ ⊆ ⊂ ⊄ ⊊ ⊇ ⊃ ⊅ ⊋ ⊖ ∈ ∉ ∋ ∌ ℕ ℤ ℚ ℝ ℂ ℵ ℶ ℷ ℸ 𝓟
¬ ∨ ∧ ⊕ → ← ⇒ ⇐ ⇔ ∀ ∃ ∄ ∴ ∵ ⊤ ⊥ ⊢ ⊨ ⫤ ⊣ … ⋯ ⋮ ⋰ ⋱
∫ ∬ ∭ ∮ ∯ ∰ ∇ ∆ δ ∂ ℱ ℒ ℓ
𝛢𝛼 𝛣𝛽 𝛤𝛾 𝛥𝛿 𝛦𝜀𝜖 𝛧𝜁 𝛨𝜂 𝛩𝜃𝜗 𝛪𝜄 𝛫𝜅 𝛬𝜆 𝛭𝜇 𝛮𝜈 𝛯𝜉 𝛰𝜊 𝛱𝜋 𝛲𝜌 𝛴𝜎𝜍 𝛵𝜏 𝛶𝜐 𝛷𝜙𝜑 𝛸𝜒 𝛹𝜓 𝛺𝜔