mersenneforum.org  

Go Back   mersenneforum.org > Extra Stuff > Programming

Reply
 
Thread Tools
Old 2005-08-07, 01:09   #12
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Default

Quote:
Originally Posted by xenon
Code:
Example: q=1299721, r=30

|  30          1   |      |1441     -43276  |
|   1     -43324   |      |  30         1   |

      option 1                  option 2
The second one is shorter. But your code produces the first one. Am I correct?
Yes. My code produces the first one.
R.D. Silverman is offline   Reply With Quote
Old 2005-08-07, 01:54   #13
xenon
 
Jul 2004

5·7 Posts
Default

Then which one is better? What are the consequences of not having the "best" reduction?

Also, I see that there are 8 ways to present the result in v1 and v2.
1. as it is
2. negate v1
3. negate v2
4. negate both v1 and v2
5-8. as 1-4 but v1 and v2 swapped

So, how do you want it to be presented? The code from the other thread have the "make b positive" part, but is removed when you post this thread.

I have a suggestion that requires the calling code to be modified. Instead of passing two pointers v1 and v2, pass only v1. Make v1 and v2 contiguous.
Maybe you can do something like this:
int v1[4];
int *const v2 = v1+2;

Tell me if this is not feasible. For example, you have an array of many vectors and you need to choose two particular vector to be passed to lat_red.
xenon is offline   Reply With Quote
Old 2005-08-07, 04:15   #14
dsouza123
 
dsouza123's Avatar
 
Sep 2002

2·331 Posts
Default

MASM32 version x86 assembly code ALU and one division by FPU,
source and exe.

Worked with xenon's q and r values q=1299721, r=30
Attached Files
File Type: zip Lat_Red.zip (3.1 KB, 127 views)
dsouza123 is offline   Reply With Quote
Old 2005-08-07, 13:05   #15
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Thumbs up

Quote:
Originally Posted by xenon
Then which one is better? What are the consequences of not having the "best" reduction?

Also, I see that there are 8 ways to present the result in v1 and v2.
1. as it is
2. negate v1
3. negate v2
4. negate both v1 and v2
5-8. as 1-4 but v1 and v2 swapped

So, how do you want it to be presented? The code from the other thread have the "make b positive" part, but is removed when you post this thread.

I have a suggestion that requires the calling code to be modified. Instead of passing two pointers v1 and v2, pass only v1. Make v1 and v2 contiguous.
Maybe you can do something like this:
int v1[4];
int *const v2 = v1+2;

Tell me if this is not feasible. For example, you have an array of many vectors and you need to choose two particular vector to be passed to lat_red.
There does not seem to be a way to define "best". The objective is to
provide a reasonably good reduction as fast as possible. It would be
hard/impossible to determine if a "better reduction but slower" is truly
worthwhile. The "ideal" reduction would have all the coefficients
within a small multiple of sqrt(q), with a condition number as close to 1
as possible. This isn't always achievable, and when it is, it is sometimes
expensive to do it (i.e. G-S is much slower than Euclidean). One could
try a second G-S reduction following the Euclidean, but it probably isn't
worth the cost.

The signs do not matter.

With the improvements already presented, the routine now only takes 3-4%
of total run time. Doubling its speed again seems impossible and would only
save 1.5 to 2%.

Format of the inputs doesn't matter, but the parent code uses two separate
vectors in many places. There would be a cost to package the inputs, then
pull apart the outputs.

I am thinking of issuing a new challenge, small code, but unrelated to this.
R.D. Silverman is offline   Reply With Quote
Old 2005-08-07, 14:20   #16
dsouza123
 
dsouza123's Avatar
 
Sep 2002

2·331 Posts
Default

For the Lat_Red.asm from a previous post attachment.

Here are the timings with various number of subtracts
before doing the divide and two multiplies. q=1299721, r=30
Code:
cycles  description
159      0 sub _ only divide, subtract code commented out
160      1 sub + divide
168      2 sub + divide
176      3 sub + divide
184      4 sub + divide
192      5 sub + divide
200      6 sub + divide
208      7 sub + divide
245      8 sub + divide
256      9 sub + divide
276     10 sub + divide
527     60 sub + divide
It would take :
43324 subtracts on the first iteration and
30 subtracts on the second
if the divide + two multiplies are never done.

Timed on Athlon 1.2 Ghz Windows XP Pro

Can any of the t * calcs overflow an int ?

The attachment LR_cycle.zip
has the source and exe for cycle timings
with subtract 0, 1, 4, 5, 6, 60 times.

LatRed0.asm for 0 subtracts
...
LatRed60.asm for 60 subtracts
Attached Files
File Type: zip LR_CYCLE.ZIP (20.9 KB, 121 views)
dsouza123 is offline   Reply With Quote
Old 2005-08-08, 11:59   #17
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22×5×373 Posts
Thumbs up

Quote:
Originally Posted by dsouza123
For the Lat_Red.asm from a previous post attachment.

Here are the timings with various number of subtracts
before doing the divide and two multiplies. q=1299721, r=30
Code:
cycles  description
159      0 sub _ only divide, subtract code commented out
160      1 sub + divide
168      2 sub + divide
176      3 sub + divide
184      4 sub + divide
192      5 sub + divide
200      6 sub + divide
208      7 sub + divide
245      8 sub + divide
256      9 sub + divide
276     10 sub + divide
527     60 sub + divide
It would take :
43324 subtracts on the first iteration and
30 subtracts on the second
if the divide + two multiplies are never done.

Timed on Athlon 1.2 Ghz Windows XP Pro

Can any of the t * calcs overflow an int ?
Fascinating. I am very surprised that reducing the number of conditional
subtracts is a win! Your help is appreciated.
The (t*) mults can not overflow.
R.D. Silverman is offline   Reply With Quote
Old 2005-08-09, 02:09   #18
dsouza123
 
dsouza123's Avatar
 
Sep 2002

2×331 Posts
Default

Did some optimizing and used q = 524309, r = 999998 runs in 57 cycles
doesn't use the divide that follows the subtracts.

Issue v1 and v2 are swapped (row wise)
in comparison with the delphi version.
The four numbers are the same.

Code:
Asm
|      999998           1|  v1[0] v1[1]
|     -475689          -1|  v2[0] v2[1]

Delphi
|     -475689          -1|  v1[0] v1[1]
|      999998           1|  v2[0] v2[1]
Issue, if r == 1 (all q tested) it eventually causes a division by zero.
xenon mentioned it too.
Attached Files
File Type: zip LatRed6D.zip (3.7 KB, 129 views)
dsouza123 is offline   Reply With Quote
Old 2005-08-09, 13:14   #19
R.D. Silverman
 
R.D. Silverman's Avatar
 
Nov 2003

22·5·373 Posts
Thumbs up

Quote:
Originally Posted by dsouza123
Did some optimizing and used q = 524309, r = 999998 runs in 57 cycles
doesn't use the divide that follows the subtracts.

Issue v1 and v2 are swapped (row wise)
in comparison with the delphi version.
The four numbers are the same.

Code:
Asm
|      999998           1|  v1[0] v1[1]
|     -475689          -1|  v2[0] v2[1]

Delphi
|     -475689          -1|  v1[0] v1[1]
|      999998           1|  v2[0] v2[1]
Issue, if r == 1 (all q tested) it eventually causes a division by zero.
xenon mentioned it too.
Nice.
The case r=1 is handled separately.
Swapped rows do not matter.
R.D. Silverman is offline   Reply With Quote
Reply



Similar Threads
Thread Thread Starter Forum Replies Last Post
Anyone know enough about coding to do this? jrafanelli Software 2 2018-01-11 15:16
Python Coding Help? kelzo Programming 3 2016-11-27 05:16
Two possible challenges devarajkandadai Miscellaneous Math 1 2013-11-09 23:30
coding midlet for TF starrynte Programming 1 2008-12-30 22:31
The RSA Challenges - RIP BotXXX Factoring 16 2007-05-27 03:37

All times are UTC. The time now is 13:11.


Fri Jul 16 13:11:34 UTC 2021 up 49 days, 10:58, 2 users, load averages: 1.96, 2.06, 1.81

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.

This forum has received and complied with 0 (zero) government requests for information.

Permission is granted to copy, distribute and/or modify this document under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free Software Foundation.
A copy of the license is included in the FAQ.