![]() |
Timing for the code above (compiled, P4 2.8 MHz): 0.8 seconds, including library loading.
[CODE]> (time (run)) user time = 0.875 system time = 0.031 Elapsed time = 0:00:02 Allocation = 26411632 bytes standard / 7018 bytes fixlen 0 Page faults DONE[/CODE] |
Squeak!!!
Here's Squeak, an open source implementation of Smalltalk.
Smalltalk is probably best known for the fact that everything is an object: ints, booleans, methods, everything - they basically invented object orientation, and probably saw no reason to do it halfway. [CODE] input:=FileStream fileNamed: 'hrf3.txt'. output:=FileStream newFileNamed: 'hrf3exp.txt'. [[input atEnd] whileFalse: [ output nextPutAll: (input upTo: $,); nextPut: Character cr. input upTo: (Character cr); next.] ] ensure: [output close. input close.]. [/CODE] Timing: About 5 sec - interpreted mode only. |
XSL
The transformation can be done in XSL (XML Stylesheet Language) by first converting the plaintext file to XML.
Timing: 1.3s for text-to-xml conversion, ~14s for the XSL transform, with Java JRE 1.4.1 on a P4/2.66GHz. The XML conversion is pretty simple. Plaintext hrf3.txt: [code] 9176581,Norman,School,WW1,00000000 9348359,SW,odf62,WV2,80000000 ... [/code] is converted to hrf3.xml: [code] <?xml version="1.0" encoding="ISO-8859-1"?> <file> <line>9176581,Norman,School,WW1,00000000</line> <line>9348359,SW,odf62,WV2,80000000</line> ... </file> [/code] Here's a Java snippet to do the XML conversion: [code] import java.io.*; ... /** convert line-based plaintext file to xml format */ public static void textToXmlFile(String textFile, String targetFile) { BufferedReader in=null; BufferedWriter out=null; try { in=new BufferedReader(new FileReader(textFile)); out=new BufferedWriter(new FileWriter(targetFile)); out.write("<?xml version='1.0' encoding='ISO-8859-1'?>"); out.newLine(); out.write("<file>"); out.newLine(); for (String line=in.readLine(); line!=null; line=in.readLine()) { out.write("<line>"); out.write(line); out.write("</line>"); out.newLine(); } out.write("</file>"); out.newLine(); } catch (Exception e) { throw new RuntimeException("Error in read/write", e); } finally { // ensure files are closed try { if (in!=null) in.close(); } catch (Exception e) {} try { if (out!=null) out.close(); } catch (Exception e) {} } } ... [/code] Here's a Java snippet to run the XSL transform programmatically: [code] import javax.xml.transform.*; import javax.xml.transform.stream.*; ... /** Apply 'xslFile' transform to 'xmlFile', writing result to 'targetFile' */ public static void transformXmlFile(String xmlFile, String xslFile, String targetFile){ Source src=new StreamSource(new File(xmlFile)); Source trans=new StreamSource(new File(xslFile)); Result res=new StreamResult(new File(targetFile)); javax.xml.transform.Transformer t=null; try { t=TransformerFactory.newInstance().newTransformer(trans); t.transform(src, res); } catch (Exception e) { throw new RuntimeException("Error in transform: ", e); } } ... [/code] Then we add a bit of code to run the whole thing: [code] ... public static void run() { textToXmlFile("hrf3.txt", "hrf3.xml"); transformXmlFile("hrf3.xml", "hrf3-transform.xsl", "hrf3exp.txt"); } ... [/code] Finally, the XSL transform itself (hrf3-transform.xsl): [code] <?xml version="1.0" encoding="ISO-8859-1"?> <xsl:transform version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="text" indent="no" encoding="ISO-8859-1"/> <xsl:strip-space elements="file"/> <xsl:template match="line/text()"> <xsl:variable name="current" select="substring-before(self::text(), ',')"/> <xsl:variable name="next" select="substring-before(following::line[1], ',')"/> <xsl:if test="not($current=$next)"> <xsl:value-of select="$current"/><xsl:text> </xsl:text> </xsl:if> </xsl:template> </xsl:transform> [/code] NOTE: Testing on 'next' rather than 'previous' in the XSL is mandatory, due to the linked-list implementation of the XML parser. Testing on 'previous' gave a runtime of ~11 hours(!). The XSL transform can also be run by inserting a 'stylesheet' tag in the XML file and opening it in a browser: [code] <?xml version="1.0" encoding="ISO-8859-1"?> <?xml-stylesheet type="text/xsl" href="hrf3-transform.xsl"?> <file> <line>9176581,Norman,School,WW1,00000000</line> ... [/code] Although somewhat complex, the advantages of this approach are that[list][*]the text-to-xml conversion is generic enough to apply to any line-based plaintext format[*]transformation details are specified in a plaintext file that does not need recompiling and can be tested for validity in most browsers[/list] In the real world, something like this might be worth considering if[list][*]formats change frequently[*]the code is part of a larger system, so recompiling, redistributing and reinstalling everything every x weeks is too much of a hassle[/list] |
Ruby
Here's ruby, a kind of cross between smalltalk, python and perl.
Standard programming style: [CODE]input=File.new 'hrf3.txt' # defaults to 'read' output=File.new 'hrf3exp.txt', 'w' prev=nil begin # block necessary for exception handling while line=input.gets curr=line[0..(line=~/,/)-1] output.puts curr if prev!=curr prev=curr end ensure input.close; output.close # close files even if exception end [/CODE] Using shortcuts: [CODE]output=File.new 'hrf3exp.txt', 'w' prev,curr=nil IO.foreach('hrf3.txt') { |line| output.puts curr if prev!=(curr=line[0..(line=~/,/)-1]) prev=curr } # automatic file close [/CODE] Running time: ~5 seconds on Pentium M 2.0 GHz. (The size of the file has increased since the first post, it's currently 4MB compressed, 12 MB uncompressed. ) According to [URL=http://www.informit.com/articles/article.asp?p=18225&seqNum=2&rl=1]this page[/URL], Ruby tends to be quicker than Python and slower than Perl. I've been using Ruby for about four hours, but I like it so far. Readable syntax without being too verbose, and you can be brief if you want to :) |
| All times are UTC. The time now is 23:26. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.