mersenneforum.org

mersenneforum.org (https://www.mersenneforum.org/index.php)
-   Programming (https://www.mersenneforum.org/forumdisplay.php?f=29)
-   -   Equivalent code (https://www.mersenneforum.org/showthread.php?t=2252)

mephisto 2005-01-16 07:53

Timing for the code above (compiled, P4 2.8 MHz): 0.8 seconds, including library loading.

[CODE]> (time (run))
user time = 0.875
system time = 0.031
Elapsed time = 0:00:02
Allocation = 26411632 bytes standard / 7018 bytes fixlen
0 Page faults
DONE[/CODE]

mephisto 2005-01-20 02:01

Squeak!!!
 
Here's Squeak, an open source implementation of Smalltalk.
Smalltalk is probably best known for the fact that everything is an object: ints, booleans, methods, everything - they basically invented object orientation, and probably saw no reason to do it halfway.

[CODE]
input:=FileStream fileNamed: 'hrf3.txt'.
output:=FileStream newFileNamed: 'hrf3exp.txt'.
[[input atEnd] whileFalse: [
output nextPutAll: (input upTo: $,); nextPut: Character cr.
input upTo: (Character cr); next.]
] ensure: [output close. input close.].
[/CODE]
Timing: About 5 sec - interpreted mode only.

mephisto 2005-01-23 16:25

XSL
 
The transformation can be done in XSL (XML Stylesheet Language) by first converting the plaintext file to XML.
Timing: 1.3s for text-to-xml conversion, ~14s for the XSL transform, with Java JRE 1.4.1 on a P4/2.66GHz.

The XML conversion is pretty simple. Plaintext hrf3.txt:
[code]
9176581,Norman,School,WW1,00000000
9348359,SW,odf62,WV2,80000000
...
[/code]
is converted to hrf3.xml:
[code]
<?xml version="1.0" encoding="ISO-8859-1"?>
<file>
<line>9176581,Norman,School,WW1,00000000</line>
<line>9348359,SW,odf62,WV2,80000000</line>
...
</file>
[/code]

Here's a Java snippet to do the XML conversion:
[code]
import java.io.*;
...
/** convert line-based plaintext file to xml format */
public static void textToXmlFile(String textFile, String targetFile) {
BufferedReader in=null;
BufferedWriter out=null;
try {
in=new BufferedReader(new FileReader(textFile));
out=new BufferedWriter(new FileWriter(targetFile));
out.write("<?xml version='1.0' encoding='ISO-8859-1'?>");
out.newLine();
out.write("<file>");
out.newLine();
for (String line=in.readLine(); line!=null; line=in.readLine()) {
out.write("<line>");
out.write(line);
out.write("</line>");
out.newLine();
}
out.write("</file>");
out.newLine();
} catch (Exception e) {
throw new RuntimeException("Error in read/write", e);
} finally { // ensure files are closed
try { if (in!=null) in.close(); } catch (Exception e) {}
try { if (out!=null) out.close(); } catch (Exception e) {}
}
}
...
[/code]
Here's a Java snippet to run the XSL transform programmatically:
[code]
import javax.xml.transform.*;
import javax.xml.transform.stream.*;
...
/** Apply 'xslFile' transform to 'xmlFile', writing result to 'targetFile' */
public static void transformXmlFile(String xmlFile, String xslFile, String targetFile){
Source src=new StreamSource(new File(xmlFile));
Source trans=new StreamSource(new File(xslFile));
Result res=new StreamResult(new File(targetFile));
javax.xml.transform.Transformer t=null;
try {
t=TransformerFactory.newInstance().newTransformer(trans);
t.transform(src, res);
} catch (Exception e) {
throw new RuntimeException("Error in transform: ", e);
}
}
...
[/code]
Then we add a bit of code to run the whole thing:
[code]
...
public static void run() {
textToXmlFile("hrf3.txt", "hrf3.xml");
transformXmlFile("hrf3.xml", "hrf3-transform.xsl", "hrf3exp.txt");
}
...
[/code]

Finally, the XSL transform itself (hrf3-transform.xsl):
[code]
<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" indent="no" encoding="ISO-8859-1"/>
<xsl:strip-space elements="file"/>

<xsl:template match="line/text()">
<xsl:variable name="current" select="substring-before(self::text(), ',')"/>
<xsl:variable name="next" select="substring-before(following::line[1], ',')"/>
<xsl:if test="not($current=$next)">
<xsl:value-of select="$current"/><xsl:text>
</xsl:text>
</xsl:if>
</xsl:template>

</xsl:transform>
[/code]
NOTE: Testing on 'next' rather than 'previous' in the XSL is mandatory, due to the linked-list implementation of the XML parser. Testing on 'previous' gave a runtime of ~11 hours(!).

The XSL transform can also be run by inserting a 'stylesheet' tag in the XML file and opening it in a browser:
[code]
<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet type="text/xsl" href="hrf3-transform.xsl"?>
<file>
<line>9176581,Norman,School,WW1,00000000</line>
...
[/code]

Although somewhat complex, the advantages of this approach are that[list][*]the text-to-xml conversion is generic enough to apply to any line-based plaintext format[*]transformation details are specified in a plaintext file that does not need recompiling and can be tested for validity in most browsers[/list]
In the real world, something like this might be worth considering if[list][*]formats change frequently[*]the code is part of a larger system, so recompiling, redistributing and reinstalling everything every x weeks is too much of a hassle[/list]

mephisto 2005-10-08 05:10

Ruby
 
Here's ruby, a kind of cross between smalltalk, python and perl.

Standard programming style:
[CODE]input=File.new 'hrf3.txt' # defaults to 'read'
output=File.new 'hrf3exp.txt', 'w'
prev=nil
begin # block necessary for exception handling
while line=input.gets
curr=line[0..(line=~/,/)-1]
output.puts curr if prev!=curr
prev=curr
end
ensure
input.close; output.close # close files even if exception
end
[/CODE]

Using shortcuts:
[CODE]output=File.new 'hrf3exp.txt', 'w'
prev,curr=nil
IO.foreach('hrf3.txt') { |line|
output.puts curr if prev!=(curr=line[0..(line=~/,/)-1])
prev=curr
} # automatic file close
[/CODE]

Running time: ~5 seconds on Pentium M 2.0 GHz. (The size of the file has increased since the first post, it's currently 4MB compressed, 12 MB uncompressed. )

According to [URL=http://www.informit.com/articles/article.asp?p=18225&seqNum=2&rl=1]this page[/URL], Ruby tends to be quicker than Python and slower than Perl.

I've been using Ruby for about four hours, but I like it so far. Readable syntax without being too verbose, and you can be brief if you want to :)


All times are UTC. The time now is 23:26.

Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.