![]() |
|
|
#12 | |
|
Nov 2003
22·5·373 Posts |
Quote:
|
|
|
|
|
|
|
#13 |
|
Jul 2004
5·7 Posts |
Then which one is better? What are the consequences of not having the "best" reduction?
Also, I see that there are 8 ways to present the result in v1 and v2. 1. as it is 2. negate v1 3. negate v2 4. negate both v1 and v2 5-8. as 1-4 but v1 and v2 swapped So, how do you want it to be presented? The code from the other thread have the "make b positive" part, but is removed when you post this thread. I have a suggestion that requires the calling code to be modified. Instead of passing two pointers v1 and v2, pass only v1. Make v1 and v2 contiguous. Maybe you can do something like this: int v1[4]; int *const v2 = v1+2; Tell me if this is not feasible. For example, you have an array of many vectors and you need to choose two particular vector to be passed to lat_red. |
|
|
|
|
|
#14 |
|
Sep 2002
2·331 Posts |
MASM32 version x86 assembly code ALU and one division by FPU,
source and exe. Worked with xenon's q and r values q=1299721, r=30 |
|
|
|
|
|
#15 | |
|
Nov 2003
22×5×373 Posts |
Quote:
provide a reasonably good reduction as fast as possible. It would be hard/impossible to determine if a "better reduction but slower" is truly worthwhile. The "ideal" reduction would have all the coefficients within a small multiple of sqrt(q), with a condition number as close to 1 as possible. This isn't always achievable, and when it is, it is sometimes expensive to do it (i.e. G-S is much slower than Euclidean). One could try a second G-S reduction following the Euclidean, but it probably isn't worth the cost. The signs do not matter. With the improvements already presented, the routine now only takes 3-4% of total run time. Doubling its speed again seems impossible and would only save 1.5 to 2%. Format of the inputs doesn't matter, but the parent code uses two separate vectors in many places. There would be a cost to package the inputs, then pull apart the outputs. I am thinking of issuing a new challenge, small code, but unrelated to this. |
|
|
|
|
|
|
#16 |
|
Sep 2002
2·331 Posts |
For the Lat_Red.asm from a previous post attachment.
Here are the timings with various number of subtracts before doing the divide and two multiplies. q=1299721, r=30 Code:
cycles description 159 0 sub _ only divide, subtract code commented out 160 1 sub + divide 168 2 sub + divide 176 3 sub + divide 184 4 sub + divide 192 5 sub + divide 200 6 sub + divide 208 7 sub + divide 245 8 sub + divide 256 9 sub + divide 276 10 sub + divide 527 60 sub + divide 43324 subtracts on the first iteration and 30 subtracts on the second if the divide + two multiplies are never done. Timed on Athlon 1.2 Ghz Windows XP Pro Can any of the t * calcs overflow an int ? The attachment LR_cycle.zip has the source and exe for cycle timings with subtract 0, 1, 4, 5, 6, 60 times. LatRed0.asm for 0 subtracts ... LatRed60.asm for 60 subtracts |
|
|
|
|
|
#17 | |
|
Nov 2003
22×5×373 Posts |
Quote:
subtracts is a win! Your help is appreciated. The (t*) mults can not overflow. |
|
|
|
|
|
|
#18 |
|
Sep 2002
2×331 Posts |
Did some optimizing and used q = 524309, r = 999998 runs in 57 cycles
doesn't use the divide that follows the subtracts. Issue v1 and v2 are swapped (row wise) in comparison with the delphi version. The four numbers are the same. Code:
Asm | 999998 1| v1[0] v1[1] | -475689 -1| v2[0] v2[1] Delphi | -475689 -1| v1[0] v1[1] | 999998 1| v2[0] v2[1] xenon mentioned it too. |
|
|
|
|
|
#19 | |
|
Nov 2003
22·5·373 Posts |
Quote:
The case r=1 is handled separately. Swapped rows do not matter. |
|
|
|
|
![]() |
Similar Threads
|
||||
| Thread | Thread Starter | Forum | Replies | Last Post |
| Anyone know enough about coding to do this? | jrafanelli | Software | 2 | 2018-01-11 15:16 |
| Python Coding Help? | kelzo | Programming | 3 | 2016-11-27 05:16 |
| Two possible challenges | devarajkandadai | Miscellaneous Math | 1 | 2013-11-09 23:30 |
| coding midlet for TF | starrynte | Programming | 1 | 2008-12-30 22:31 |
| The RSA Challenges - RIP | BotXXX | Factoring | 16 | 2007-05-27 03:37 |