mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2005-07-30, 14:16   #34
xenon
 
Jul 2004

3510 Posts
Default

Here is a comparison of assembly vs C. This is to give an estimation of the speed gain using assembly, so I haven't comment the assembly code yet.

My method is linking the OBJ file with C code. The benchmarking code is written in C. I use Visual C++ 6.0 and I hope you have no problem compiling it. I haven't prepare to let you look at my assembly code, I admit it is very messy. But you can try using the function

extern "C" __fastcall eegcd(int input, int modulus);

link together with inverse1.obj.
Don't trust my code yet. Check if there is any wrong result produced.

eegcd-11 means Euclidean algorithm with 11 times repeated subtraction.

Code:
          eegcd-11 eegcd-10 eegcd-9 eegcd-8 eegcd-7 akruppa_C
 1048583    300      313      310     316    321      418
 2097169    315      326      324     332    333      422
 4194319    316      326      322     332    332      429
 8388617    322      331      328     337    338      437
16777259    333      343      340     350    350      457
33554467    345      355      352     363    363      475
Attached Files
File Type: zip xenon_asm1.zip (8.8 KB, 258 views)
xenon is offline   Reply With Quote
Old 2005-07-31, 06:46   #35
TTn
 

23×317 Posts
Post 2 cents

From the looks of it I see a lot of repeated code.
.NET may be the way to go.
  Reply With Quote
Old 2005-07-31, 14:10   #36
TTn
 

2·4,409 Posts
Question

xenon

I've compiled it with Visual Studio 6.0.(c++6)
It hangs upon launch, so I didn't get to far with it.
I'm not an expert in C++, but I have dabbled with it.
Although I am more famliar with .NET, which is a love hate relationship.
  Reply With Quote
Old 2005-07-31, 14:49   #37
xenon
 
Jul 2004

2316 Posts
Default

TTn

When you compile, you should see
... inverse3.cpp(19) : warning C4101: 'temp' : unreferenced local variable
This is expected.

I guess __fastcall didn't work. But for the same compiler, it shouldn't be a problem.
Try replacing
//ans = single_modinv_ak(i, PRIME);
ans = eegcd(i, PRIME);

with
ans = single_modinv_ak(i, PRIME);
//ans = eegcd(i, PRIME);

Also if your computer is less than 1GHz, wait for 10 seconds before killing off the process.
If that doesn't help. Create a new project and try using the akruppa C code first. Then you can sort out the problem.
xenon is offline   Reply With Quote
Old 2005-07-31, 17:59   #38
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

5·17·89 Posts
Thumbs up

Quote:
Originally Posted by xenon
Here is a comparison of assembly vs C. This is to give an estimation of the speed gain using assembly, so I haven't comment the assembly code yet.

My method is linking the OBJ file with C code. The benchmarking code is written in C. I use Visual C++ 6.0 and I hope you have no problem compiling it. I haven't prepare to let you look at my assembly code, I admit it is very messy. But you can try using the function

extern "C" __fastcall eegcd(int input, int modulus);

link together with inverse1.obj.
Don't trust my code yet. Check if there is any wrong result produced.

eegcd-11 means Euclidean algorithm with 11 times repeated subtraction.

Code:
          eegcd-11 eegcd-10 eegcd-9 eegcd-8 eegcd-7 akruppa_C
 1048583    300      313      310     316    321      418
 2097169    315      326      324     332    333      422
 4194319    316      326      322     332    332      429
 8388617    322      331      328     337    338      437
16777259    333      343      340     350    350      457
33554467    345      355      352     363    363      475

Post the source!! If it truly is faster, the linkage problems can be fixed.
I will port it to Microsoft style ASM's. 475 to 345 is a noteworthy
improvement....
R.D. Silverman is offline   Reply With Quote
Old 2005-07-31, 19:04   #39
jasonp
Tribal Bullet
 
jasonp's Avatar
 
Oct 2004

5·23·31 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Post the source!! If it truly is faster, the linkage problems can be fixed.
I will port it to Microsoft style ASM's. 475 to 345 is a noteworthy
improvement....
In the meantime, GGNFS includes a very similar modular inverse core in assembly language; go to www.sourceforge.net/projects/ggnfs and in the CVS sources look for branch_0/src/modinv1002.s

Unfortunately the code is in AT&T syntax and expects the modulus to be in a global location, so it's not plugin-compatible with the code we've been comparing.

jasonp
jasonp is offline   Reply With Quote
Old 2005-07-31, 22:13   #40
akruppa
 
akruppa's Avatar
 
"Nancy"
Aug 2002
Alexandria

1001101000112 Posts
Default

Afaik ggnfs is distributed under the GPL, so there may be a license problem with using that code bit - unless Bob wants to put his code under the GPL as well.

Alex
akruppa is offline   Reply With Quote
Old 2005-08-01, 00:12   #41
R.D. Silverman
 
R.D. Silverman's Avatar
 
"Bob Silverman"
Nov 2003
North of Boston

5×17×89 Posts
Thumbs up

Quote:
Originally Posted by akruppa
Afaik ggnfs is distributed under the GPL, so there may be a license problem with using that code bit - unless Bob wants to put his code under the GPL as well.

Alex
I'd be happy to put it under GPL if asked, but I see no need. I give the code
out to anyone who asks.
R.D. Silverman is offline   Reply With Quote
Old 2005-08-01, 00:23   #42
xenon
 
Jul 2004

1000112 Posts
Default

Quote:
Originally Posted by R.D. Silverman
Post the source!! If it truly is faster, the linkage problems can be fixed.
I will port it to Microsoft style ASM's. 475 to 345 is a noteworthy
improvement....
The asm file is already located in the asm folder, but no comments at all. Very difficult to read. At the moment, please help verifying the speed. Actually I did nothing other than rewriting the given algorithm in assembly.

I use MASM syntax only. You should be porting from Microsoft style to gcc style.

Last fiddled with by xenon on 2005-08-01 at 00:25
xenon is offline   Reply With Quote
Old 2005-08-01, 03:33   #43
TTn
 

22×809 Posts
Default

Quote:
I'd be happy to put it under GPL if asked, but I see no need. I give the code out to anyone who asks.
I believe your position is justified.
  Reply With Quote
Old 2005-08-01, 08:21   #44
xilman
Bamboozled!
 
xilman's Avatar
 
"๐’‰บ๐’ŒŒ๐’‡ท๐’†ท๐’€ญ"
May 2003
Down not across

22×3×983 Posts
Default

Quote:
Originally Posted by R.D. Silverman
I'd be happy to put it under GPL if asked, but I see no need. I give the code
out to anyone who asks.
Your choice, but please take the decision having given it careful thought and with your eyes wide open.

The GPL is a kiss of death to many organizations, as it forces recipients to GPL their own material if they wish to redistribute anything containing GPL-ed components. That's why NFSNET has avoided any contributions of GPL-licensed code because it conflicts with the licensing requirements on the CWI material. NFSNET-written code is distributed under a BSD-like license.


Paul
xilman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Calling airsquirrels Prime95 GPU Computing 16 2015-09-29 18:06
Help from coders ET_ GPU Computing 5 2014-01-26 13:58
Calling all 64-bit Linux sievers! frmky NFS@Home 25 2013-10-16 15:58
IA-32 Assembly Coders, anyone? xenon Programming 6 2005-06-02 13:26
Bob, I'm calling you out! synergy Miscellaneous Math 17 2004-10-26 15:26

All times are UTC. The time now is 04:11.


Fri Jul 7 04:11:27 UTC 2023 up 323 days, 1:40, 0 users, load averages: 1.58, 1.61, 1.41

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2023, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.

โ‰  ยฑ โˆ“ รท ร— ยท โˆ’ โˆš โ€ฐ โŠ— โŠ• โŠ– โŠ˜ โŠ™ โ‰ค โ‰ฅ โ‰ฆ โ‰ง โ‰จ โ‰ฉ โ‰บ โ‰ป โ‰ผ โ‰ฝ โŠ โА โŠ‘ โŠ’ ยฒ ยณ ยฐ
โˆ  โˆŸ ยฐ โ‰… ~ โ€– โŸ‚ โซ›
โ‰ก โ‰œ โ‰ˆ โˆ โˆž โ‰ช โ‰ซ โŒŠโŒ‹ โŒˆโŒ‰ โˆ˜ โˆ โˆ โˆ‘ โˆง โˆจ โˆฉ โˆช โจ€ โŠ• โŠ— ๐–• ๐–– ๐–— โŠฒ โŠณ
โˆ… โˆ– โˆ โ†ฆ โ†ฃ โˆฉ โˆช โІ โŠ‚ โŠ„ โŠŠ โЇ โŠƒ โŠ… โŠ‹ โŠ– โˆˆ โˆ‰ โˆ‹ โˆŒ โ„• โ„ค โ„š โ„ โ„‚ โ„ต โ„ถ โ„ท โ„ธ ๐“Ÿ
ยฌ โˆจ โˆง โŠ• โ†’ โ† โ‡’ โ‡ โ‡” โˆ€ โˆƒ โˆ„ โˆด โˆต โŠค โŠฅ โŠข โŠจ โซค โŠฃ โ€ฆ โ‹ฏ โ‹ฎ โ‹ฐ โ‹ฑ
โˆซ โˆฌ โˆญ โˆฎ โˆฏ โˆฐ โˆ‡ โˆ† ฮด โˆ‚ โ„ฑ โ„’ โ„“
๐›ข๐›ผ ๐›ฃ๐›ฝ ๐›ค๐›พ ๐›ฅ๐›ฟ ๐›ฆ๐œ€๐œ– ๐›ง๐œ ๐›จ๐œ‚ ๐›ฉ๐œƒ๐œ— ๐›ช๐œ„ ๐›ซ๐œ… ๐›ฌ๐œ† ๐›ญ๐œ‡ ๐›ฎ๐œˆ ๐›ฏ๐œ‰ ๐›ฐ๐œŠ ๐›ฑ๐œ‹ ๐›ฒ๐œŒ ๐›ด๐œŽ๐œ ๐›ต๐œ ๐›ถ๐œ ๐›ท๐œ™๐œ‘ ๐›ธ๐œ’ ๐›น๐œ“ ๐›บ๐œ”