mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   PrimeNet (https://www.mersenneforum.org/forumdisplay.php?f=11)
-   -   Quick Question about assignments (https://www.mersenneforum.org/showthread.php?t=16104)

chalsall 2011-11-14 18:53

[QUOTE=Mr. P-1;278301]I disagree. This task lies right in AWK's sweet spot. Dublsow, however (no disrespect) didn't do it the AWK way.[/QUOTE]

As we quickly learn in computer science and the maths (and probably other domains as well)...

There is often more than one way to solve a problem.... :smile:

Dubslow 2011-11-14 19:01

To all of the above: I put that script together after about 30-60 minutes of going through [URL="http://www.grymoire.com/Unix/Awk.html"]here[/URL]. I don't know an ounce of programming besides. I literally know no perl, and the only awk I know is what I posted. My only experience in anything related is a novice level of bash, if that.

(Also I did think of a way to do what I wanted within what I know, but your method does look cleaner Mr. P-1.) But this will have to wait anyways, I need to go eat lunch. Also, see [url]http://www.mersenneforum.org/showthread.php?t=16228[/url] for why I'm doing this; there's around 2000 TF that needs to be done, and I'd like some help.

Dubslow 2011-11-15 03:00

[QUOTE=Mr. P-1;278301]
[code]#!/bin/awk -f

BEGIN {
FS="[ M^]"
ORS="\r\n"
}

$5 == expo {
bitb = $11
next
}

$5 ~ /[0-9]./ {
if ( expo ) print "Factor=" expo "," bita "," bitb
bita = $8
bitb = $11
expo=$5
}

END {
if ( expo ) print "Factor=" expo "," bita "," bitb
}[/code][/QUOTE]
In support of learning something: $5 == expo and $5 ~ regex are conditions attached to running the following functions? ( expo ) only tests if that variable is defined? What does it test? And how exactly does 'next' work? (I can guess it's purpose but want to be sure of details.)

Also, what's a hash?
Also also, how could this be modified to output to exponent order? It gets harder and harder to compare lines further and further away, and to order it you'd need to check the whole file...

LaurV 2011-11-15 03:43

If the end of the week is not too late, then send (PM) me a bunch of, say, 100 to 500 of them, in mfaktc input format. I am going to try to play with mfaktc for the first time in my life during the coming weekend and I think that is better to have some input to test, i.e. to do some useful work instead of playing around, but I don't want to be forced to figure out how to transform a list of numbers into a format that mfaktc would understand (I know nothing about mfaktc yet). You save me of this pain and I save you of a part of exponents.

LaurV 2011-11-15 04:09

[QUOTE=Dubslow;278403]
Also, what's a hash?
[/QUOTE]

Imagine you have a million nice colored little balls in a bag and you see a new one with a nice red and blue pattern in the shop, but you only want to buy it if you do not have one like that already. Then you have to go home and see if you have it already, but you can't search your bag, ball by ball, this will take ages, but you also can't take the ball with you to compare. And remembering the color's pattern is impossible for a normal man. Then you invent a method (function, or hash) to codify the color patterns, in a way that is easy for you to remember, and in a way that can also be "sorted" at a glance. For example you associate a digit to each color, red=1, orange=2, etc, a digit to each pattern, triangles=1, diamonds=2, leaves=3, etc. Then use 3 digits for colors, from the one most dominant to the one less dominant, two digits for patterns. Then the ball 32147 is easy to remember. Then you go home and look in your big bag and sort your balls in 10 small bags, all balls starting with 1 in the first bag, all starting with 2 in the second, etc. Then you repeat for each small bag: put 10 boxes inside of each bag, and sort in the first box all the balls which have the second digit 1, put in the second bag all the balls having the second digit 2, etc. Do this for each small bag, eventually go one step deeper adding 10 little boxes inside of each box (that is inside of each bag) and sort for the third digit.

Congratulation, you just made a hash table. Now it is quite easy to search for a ball: from the first three digits you know exactly which small box to look into, and in that box there are not so many balls, you can look to them one by one.

The hash variables are not different. They make "hashes" of the lines automatically, and for each line they read (this is the new ball you see on the shop), the value is hashed and compared to what you already read. If the hash match, then the value is checked, and if it matches, then the function (pattern) is applied. This makes sense because checking the hash is much faster then checking the value/pattern/function/etc effectively. The method is used for complicate data that you can not sort (if you can sort it, then a quick search or binary search will be much faster) or for which the comparison is complicate (as comparing the patterns, you better make hashes -functions- of patterns and compare the hashes -values of the fucntions). In fact, it is exactly same amount of work as you said, "checking the whole file", but this is done faster, optimal, and/or with only one read, etc, because somebody else did the work already (implemented the hash variables), hash tables are short and can be kept in memory, etc.

For details [URL="http://en.wikipedia.org/wiki/Hash_table"]see here[/URL].

Christenson 2011-11-15 04:21

Laurv:
The mfaktc parser isn't difficult. It's in C, and straightforward. But don't put too much work into it, as I have a re-write waiting for me to test it that is a little smarter...I set it up so there is exactly one interpreter of the lines in worktodo.txt, and so assignments are beginning to be handled as structs. The subtlety here is that worktodo has to be edited when an assignment is done, and anything skipped over in finding the assignment has to be ignored silently.

Definitions, from me (and Chalsall can correct them):
Hash := associative hash := store (key, value) pairs retrievable by key := content-addressable storage
Regular Expression (Regex): Powerful syntax for dissecting and re-assembling strings as needed.

Chalsall's perl script would have been shorter than Mr P-1's awk script. I don't do perl often enough to do it without the book, being a classical C programmer during the day. Also, a "real" text editor would be able to be programmed quickly to do the editing you need on a one-time basis. I personally use Codewright, (now from Borland, has a small following) but this is not open source. It's worth learning enough emacs or other editor to learn how to do this, because you need this kind of editing from time to time.

davieddy 2011-11-15 04:46

Skinning a cat
 
[QUOTE=chalsall;278304]As we quickly learn in computer science and the maths (and probably other domains as well)...

There is often more than one way to solve a problem.... :smile:[/QUOTE]
I related this anecdote to Ernst the other week, and can't
resist retelling it now.

About 15 years ago I had to decompress JPEGs on a 386.
I devized a fast inverse discrete cosine transform which used
only 11 multiplies. I had written it out in BASIC.
Knowing that a guy called Chen had done similar, we
checked it out. It was displayed in C, and it was identical
(verbatim) to mine.

In the way smart colleagues do, my mate Steve casually
remarked:
"Just goes to show there's only one way to skin a cat."

Happy Days:smile:

David

Dubslow 2011-11-15 07:06

Oh also Mr. P-1 my work that you assigned me a while ago was completed earlier today.

Mr. P-1 2011-11-15 16:33

[QUOTE=Dubslow;278403]In support of learning something: $5 == expo and $5 ~ regex are conditions attached to running the following functions?[/quote]

Basically yes. In the vernacular of AWK, these are "patterns", while the associated function is the corresponding "action".

[QUOTE]( expo ) only tests if that variable is defined? What does it test?[/QUOTE]

It actually tests if it evaluates to true. Undefined evaluates to false. 0 is false. "" is false. any other number or non-blank string is true.

[QUOTE]And how exactly does 'next' work? (I can guess it's purpose but want to be sure of details.)[/QUOTE]

Proceed to the next line of input (or END if there is no more input) do not parse any more patterns or actions for this line.

[QUOTE]Also, what's a hash?[/QUOTE]

Most programming languages (and all useful ones!) allow you define arrays of values, usually indexed by small integers. An associative array or hash is similar, except that it can be indexed by arbitrary values defined by the programmer.

[QUOTE]Also also, how could this be modified to output to exponent order? It gets harder and harder to compare lines further and further away, and to order it you'd need to check the whole file...[/QUOTE]

No, you use a hash indexed by the exponent. You don't compare lines. You check to see if current exponent is already an index in the hash.

Mr. P-1 2011-11-15 17:08

And here's the script so modifed. Admittedly asorta is a GAWK extension, so this would perhaps not be so easy with other versions.

[code]#! /bin/awk -f

BEGIN {
FS="[ M^]"
ORS="\r\n"
}

$5 in bita {
if (bita[$5] > $8) bita[$5] = $8
if (bitb[$5] < $11) bitb[$5] = $11
next
}

$5 ~ /[0-9]./ {
bita[$5] = $8
bitb[$5] = $11
}

END {
n = asorti(bita, expo)
for (i = 1; i <= n; i++) print "factor=" expo[i] "," bita[expo[i]] "," bitb[expo[i]]
}[/code]

I would be interested to see perl versions of both scripts.

chalsall 2011-11-15 17:39

[QUOTE=Mr. P-1;278500]Most programming languages (and all useful ones!) allow you define arrays of values, usually indexed by small integers. An associative array or hash is similar, except that it can be indexed by arbitrary values defined by the programmer[/QUOTE]

Indeed. Hash arrays (AKA Associative arrays) are an incredibly powerful tool in the programmer's tool chest.

If I can give a real world example (at least to we GIMPSers... :smile:)

Say you had a worktodo.txt file like (note the exponents are out of order, and the duplicate exponent "2"):

[CODE]Factor=N/A,11,7,11
Factor=N/A,2,1,2
Factor=N/A,7,5,7
Factor=N/A,5,4,5
Factor=N/A,2,3,4[/CODE]

But you preferred to do each "bit" level individually, and from the lowest exponent up. The following code could do the transform for you:

[CODE]#!/usr/bin/perl

while (<>) {
$_ =~ /Factor=([^,]*),(\d*),(\d*),(\d*)/;

$AID{$2} = $1;

if ($From{$2} > $3 || !defined($From{$2})) { $From{$2} = $3; }
if ($To{$2} < $4 || !defined($To{$2})) { $To{$2} = $4; }
}

sub sort_by_number { $a <=> $b; }

foreach $key (sort sort_by_number keys %AID) {
for ($cnt = $From{$key}; $cnt < $To{$key}; $cnt++) {
$NewTo = $cnt+1;
print "Factor=${AID{$key}},${key},${cnt},${NewTo}\n";
}
}[/CODE]

If you put this Perl code into a file called stepbystep.pl and made it "executable", you could then at the command line execute:

[CODE]./stepbystep.pl < worktodo.txt >new_worktodo.txt
[/CODE]

This would result in new_worktodo.txt containing:

[CODE]Factor=N/A,2,1,2
Factor=N/A,2,2,3
Factor=N/A,2,3,4
Factor=N/A,5,4,5
Factor=N/A,7,5,6
Factor=N/A,7,6,7
Factor=N/A,11,7,8
Factor=N/A,11,8,9
Factor=N/A,11,9,10
Factor=N/A,11,10,11
[/CODE]

Note that an Associated array (in Perl) is referenced by a % symbol (e.g. "%AID" in the above), but the value of the array based on the key is referenced by, in the above example, "$AID{$key}" (replace "$key" with whatever variable you'd like).

One of the reasons Associative arrays are so powerful, beyond the speed, is they are effectively "sparse arrays". As in, the memory consumed is a function of the number of elements, not the minimum and maximum values of the keys. In Perl, executing the two statements "$Array[1] = 1; $Array[1000000000]=1" will result in an "Out of memory error" on most computers because the interpreter will try to allocate an array with one billion rows only to store two values.

Does that help at all, or only confuse you more?


All times are UTC. The time now is 01:23.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.