![]() |
|
|
#342 | |
|
If I May
"Chris Halsall"
Sep 2002
Barbados
2×67×73 Posts |
Quote:
You have reproducible crashes in code you have the source code to. This is easy. Just wait until you have reported to you a "once a month" bug (read: not easily reproducible) in a piece of software you're responsible for....
|
|
|
|
|
|
|
#343 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
1C3516 Posts |
I can't reproduce the crash. Whatever it does for me, it doesn't crash. This code is a modification of other code already in use; I looked at the diff between them (in the same post you quoted), and that didn't reveal anything. The debugger was rather useless, though I'll run it through with -O0 and see if anything appears there.
Last fiddled with by Dubslow on 2012-06-04 at 00:24 |
|
|
|
|
|
#344 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Okay, being now at least a somewhat proficient programmer in C/Python, I'm interested now in learning how to write fast code, à la YAFU/Msieve/whatever Ernst does/Prime95, etc. (I realize that you guys have years and years of experience, but I've got to start somewhere
) To that end, I wrote a sieve of Ἐρατοσθένης in Python, to avoid messy complications like memory allocation/control etc., with the goal of finding algorithmic optimizations. I have these two different versions: one runs 1-2% faster than the other, but with some extra code (and 1-2% is hardly an improvement). Code:
def primes(depth=10000000):
if depth <= 1: return []
primes = [2]
if depth % 2 == 0: depth -= 1 # make depth odd
length = depth // 2
list = [True for i in range(length)] # 3, 5, 7... might be prime
i, n = 0, 3 # i=index, n=num=2*index+3
j = i + (n>>1)*n # the index of n^2
while j < length: # while n^2 < depth
if list[i]: # n is prime
primes.append(n)
while j < length:
list[j] = False
j += n
i, n = i+1, n+2
j = i + (n>>1)*n
# all necessary sieving is done, read the rest of the list
while i < length:
if list[i]:
primes.append(n)
i, n = i+1, n+2
return primes
def slow(depth=10000000):
if depth <= 1: return []
primes = [2]
if depth % 2 == 0: depth -= 1 # make depth odd
length = depth // 2
list = [True for i in range(length)] # 3, 5, 7... might be prime
i, n = 0, 3 # i=index, n=num=2*index+3
while i < length:
#print("checking", n)
if list[i]: # n is prime
#print(n, "is prime")
primes.append(n)
j = i + (n>>1)*n # sieve out the odd mulitples of n:
# n*n, (n+2)*n, (n+4)*n...
while j < length:
list[j] = False
j += n
i, n = i+1, n+2 # check the next odd
return primes
It can sieve the primes <= 10e6 in around 144ms on one core of my 2600K with nothing else running (which I'm sure is at least 1, probably >= 2 orders of magnitude slower than YAFU). (The slower one takes around 154 ms.) I also ran a profiler: Code:
Timer unit: 1e-06 s
File: sieve.py
Function: primes at line 3
Total time: 88.8523 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
3 @profile
4 def primes(depth=10000000):
5 1 4 4.0 0.0 if depth <= 1: return []
6 1 3 3.0 0.0 primes = [2]
7
8 1 5 5.0 0.0 if depth % 2 == 0: depth -= 1 # make depth odd
9 1 2 2.0 0.0 length = depth // 2
10 1 1174339 1174339.0 1.3 list = [True for i in range(length)] # 3, 5, 7... might be prime
11
12 1 4 4.0 0.0 i, n = 0, 3 # i=index, n=num=2*index+3
13 1 5 5.0 0.0 j = i + (n>>1)*n
14 1581 3204 2.0 0.0 while j < length: # while n^2 < depth
15 1580 3399 2.2 0.0 if list[i]: # n is prime
16 445 1790 4.0 0.0 primes.append(n)
17 8925589 17890912 2.0 20.1 while j < length:
18 8925144 18544686 2.1 20.9 list[j] = False
19 8925144 18512476 2.1 20.8 j += n
20 1580 3710 2.3 0.0 i, n = i+1, n+2
21 1580 4128 2.6 0.0 j = i + (n>>1)*n
22 # all necessary sieving is done, read the rest of the list
23 4998420 10202160 2.0 11.5 while i < length:
24 4998419 10139585 2.0 11.4 if list[i]:
25 664133 1509461 2.3 1.7 primes.append(n)
26 4998419 10862405 2.2 12.2 i, n = i+1, n+2
27
28 1 2 2.0 0.0 return primes
File: sieve.py
Function: slow at line 30
Total time: 87.4097 s
Line # Hits Time Per Hit % Time Line Contents
==============================================================
30 @profile
31 def slow(depth=10000000):
32 1 4 4.0 0.0 if depth <= 1: return []
33 1 2 2.0 0.0 primes = [2]
34
35 1 4 4.0 0.0 if depth % 2 == 0: depth -= 1 # make depth odd
36 1 2 2.0 0.0 length = depth // 2
37 1 1168725 1168725.0 1.3 list = [True for i in range(length)] # 3, 5, 7... might be prime
38
39 1 4 4.0 0.0 i, n = 0, 3 # i=index, n=num=2*index+3
40 5000000 9554915 1.9 10.9 while i < length:
41 #print("checking", n)
42 4999999 9654528 1.9 11.0 if list[i]: # n is prime
43 #print(n, "is prime")
44 664578 1445235 2.2 1.7 primes.append(n)
45 664578 1503640 2.3 1.7 j = i + (n>>1)*n # sieve out the odd mulitples of n:
46 # n*n, (n+2)*n, (n+4)*n...
47 9589722 18329467 1.9 21.0 while j < length:
48 8925144 17693395 2.0 20.2 list[j] = False
49 8925144 17703299 2.0 20.3 j += n
50 4999999 10356447 2.1 11.8 i, n = i+1, n+2 # check the next odd
51
52 1 2 2.0 0.0 return primes
Besides rewriting in C (presumably I'd use a bit array for the sieving list?), are there other ways to speed this up, or is this as far as I can get in an interpreted scripting language? |
|
|
|
|
|
#345 | |
|
"Ben"
Feb 2007
2·3·587 Posts |
Quote:
If googling doesn't get you far enough, let me know if you have more specific questions about any of these. |
|
|
|
|
|
|
#346 | ||||
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Oh jeez, this'll take a while
![]() Quote:
That's probably why the list initialization took so long. (And by bit array I think I meant bit packing.)Quote:
Pre-sieving? Hmm... Google google google... Quote:
Quote:
I'm in Winbloze right now for my monthly game fix, but perhaps later tonight or tomorrow I'll be able to tackle the Google.
|
||||
|
|
|
|
|
#347 |
|
Romulan Interpreter
Jun 2011
Thailand
227008 Posts |
say "or just a cleverer manner". Assume you want to store the numbers from 1 to 30, you may use a vector with 30 components for it. You can do this "cleverer" by storing only the odd numbers, so you would only need 15 bytes. Bit packing is using only four bytes to store the numbers by using a bit for each, or if you only store the odds, you need just 2 bytes, instead of using 30 bytes. "Wheeling" them would only need a byte (ONE byte), instead of 4 or 2, so you will only need a thousand bytes to store all primes from 0 to 30000. This by observing that you only have 8 possible primes in any 30 numbers, those are 1,7,11,13,17,19,23,29 (mod 30), so you need only one byte to store 30 numbers. Of course, addressing them (for clearing the bits when you sieve) become a bit more complicate now, and you need to find a compromise. Some guys came with really beautiful solutions for this.
|
|
|
|
|
|
#348 |
|
"Ben"
Feb 2007
352210 Posts |
I'd say that none of these are truly algorithmic improvements. It is still the sieve of Eratosthenes, after all, no matter how you represent the list of things you are striking off.
That said, wheel sieves can lower the asymptotic complexity a bit (search for Pritchard’s Wheel Sieve), and the segmented sieve and bit packing will drastically lower the space complexity. If you count asymptotic complexity improvements as algorithmic improvements then yes we have some of the latter. Manual loop unrolling might not be pretty, but some things you just gotta do yourself. I got significant (10%+) gains that way in both my CPU and GPU versions of the sieve. At rough count YAFU's sieve is > 2000 lines; at that point you stop caring about aesthetics :) |
|
|
|
|
|
#349 | ||||
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Quote:
![]() Quote:
Quote:
![]() Quote:
|
||||
|
|
|
|
|
#350 | |
|
"Gang aft agley"
Sep 2002
2×1,877 Posts |
Quote:
|
|
|
|
|
|
|
#351 |
|
Basketry That Evening!
"Bunslow the Bold"
Jun 2011
40<A<43 -89<O<-88
3×29×83 Posts |
Well, it seems I'm doing things backwards compared to Mr. Sam Kennedy.
![]() I had decided to learn some x86 assembly, when lo and behold one of my friends in ECE is taking a Computer Systems Engineering course in which x86 assembly is being taught -- so I'm sitting in on that and learning. I have some questions though. They seem to be using the GNU assembler, which as far as I can tell, uses some pretty weird conventions (besides the AT&T syntax), like '#' for comments, and ';' has the same meaning as in C, as opposed to marking a comment (not to mention C-sytle /* */ comments are allowable too, and the .string syntax, which I haven't seen elsewhere). So, in the interest of learning practical things, how do you guys (meaning I guess B^2, Jasonp, Prime95, only_human, EWMayer et al.) deal with the different syntaxes and formats in your programs? I looked at the Prime95 assembly once, and noticed that it uses ';' comments, yet I was able to compile it with gcc just fine -- how? I looked at YAFU, and saw the defines by BGladman, while in the one SIQS file I looked at, it seemed that only GNU syntax was used ("ASM_G"). Can anybody help shed some light on the complicated situation? Thanks ![]() (PS @B^2 I tried shifting the SoE to using a bytes array, but being in Python, it was actually slower. See the best answer to this question.) |
|
|
|
|
|
#352 |
|
"Ben"
Feb 2007
DC216 Posts |
YAFU is (and by extension, I am) perhaps strange in that only inline assembler is used. Typically the assembler blocks are small-ish, so it just seemed easier that way (especially within visual studio).
For any given block of inline assembly, I typically write it first in gcc syntax and then port it to the other one. Actually, usually there is a third port as well, using visual studio intrinsic functions. Compile time #defines control which version of any given inline assembly block is used. The file you reference is the only exception to that rule, where the bloody thing was just too complex to contemplate porting it to 2 other representations (non-supported compilers in that case see a generic C implementation). Last fiddled with by bsquared on 2013-01-18 at 21:44 |
|
|
|