![]() |
[QUOTE=preda;466117]@Madpoo,
JSON sounds good to me. Just one observation, the manual result entry form now is line-oriented (one result per line). How do you see that with JSON? Anyway, from my side, generating the result in any format you prefer shouldn't be hard. Parsing it on the server side is harder, that's why maybe you and James should think what looks OK to you and I'll do that.[/QUOTE] Fortunately json doesn't really need line feeds anywhere (although it'd be one long ugly string, but oh well). If I recall, some of the apps I work with that spit out json are doing it without any pesky line feeds. That makes it interesting if I want to debug something since I have to paste it into something like Notepad++ and make it pretty (json viewer plugin or whatever). I wonder if there ought not be a checksum too just as a sanity check for people doing funky cut/paste. It may or may not be surprising to know that there have been times when someone manually re-typed their results text rather than cut/paste, with some interesting results. Rare, but it's happened. :smile: An example of just the basic info would be something like: [CODE]{"result":{"exponent":"74207281","worktype":"LL","residue64":"0","username":"madpoo","cpuname":"bigboy","program":"Prime95 v29.2 x64","assignmentid":"12345"}}[/CODE] Nicely formatted that just comes out as: [CODE]{ "result": { "exponent": "74207281", "worktype": "LL", "residue64": "0", "username": "madpoo", "cpuname": "bigboy" "program": "Prime95 v29.2 x64" "assignmentid": "12345" } }[/CODE] The beauty then being it can be rearranged, shuffled, etc and who cares as long as it can be deserialized okay. No horrendous regex to parse whatever changes whoever compiled CUDALucas felt like doing that day. LOL And... extensible. |
[QUOTE=Madpoo;466149]I wonder if there ought not be a checksum too just as a sanity check for people doing funky cut/paste.[/QUOTE]
Yes, please! [QUOTE=Madpoo;466149]An example of just the basic info would be something like: [CODE]{"result":{"exponent":"74207281","worktype":"LL","residue64":"0","username":"madpoo","cpuname":"bigboy","program":"Prime95 v29.2 x64","assignmentid":"12345"}}[/CODE] Nicely formatted that just comes out as: [CODE]{ "result": { "exponent": "74207281", "worktype": "LL", "residue64": "0", "username": "madpoo", "cpuname": "bigboy" "program": "Prime95 v29.2 x64" "assignmentid": "12345" } }[/CODE][/QUOTE] I think you should drop the quotes on the exponent. :smile: Otherwise it looks good to me. |
[QUOTE=Madpoo;466149]
I wonder if there ought not be a checksum too just as a sanity check for people doing funky cut/paste. It may or may not be surprising to know that there have been times when someone manually re-typed their results text rather than cut/paste, with some interesting results. Rare, but it's happened. :smile: [/QUOTE] Should the check digit encompass residue *and* exponent? (otherwise, maybe the residue is transcribed correctly but the exponent isn't) Should the check digit become part of the residue (one additional digit on the residue), or be expressed as a separate field in JSON? Proposals for simple algorithms for hexadecimal check digit? |
[QUOTE=preda;466164]Should the check digit encompass residue *and* exponent? (otherwise, maybe the residue is transcribed correctly but the exponent isn't)
Should the check digit become part of the residue (one additional digit on the residue), or be expressed as a separate field in JSON? Proposals for simple algorithms for hexadecimal check digit?[/QUOTE] in UPC barcodes they use : [QUOTE="http://www.av1611.org/666/barcode.html"]Check digit: Also called the "self-check" digit. The check digit is on the outside right of the bar code. The check digit is an "old-programmer's trick" to validate the other digits (number system character, manufacturer code, and product code) were read correctly. The check digit is red on the "Anatomy of a Barcode". How the computer calulates the check digit: Add all the odd digits. In our "Anatomy of a Barcode" we would add 0 (yes, you include the number system character digit) + 2 + 4 + 6 + 8 + 0 = 20 Multiply the sum of step 1 by 3. Our example would be 20 x 3 = 60. Add all the even numbers. In our "Anatomy of a Barcode" we would add 1 + 3 + 5 + 7 + 9 = 25. You do not include the 5 or the check digit because that's what you are calculating. Now add the result from step 2 and step 3. 60 + 25 = 85. The check digit is the number needed to add to step 4 to equal a multiple of 10. 85 + 5 = 90. 5 is the check digit in our example. Another way to calculate the check digit would be simply to divide the number from step 4 by 10. The remainder is the check digit. Example 85/10 = 8.5[/QUOTE] |
[QUOTE=preda;466164]Should the check digit encompass residue *and* exponent? (otherwise, maybe the residue is transcribed correctly but the exponent isn't)
Should the check digit become part of the residue (one additional digit on the residue), or be expressed as a separate field in JSON? Proposals for simple algorithms for hexadecimal check digit?[/QUOTE] I don't imagine it has to be too fancy, just good enough to detect fat fingering or weird copy/paste. The important parts are exponent and residue... the rest of it, if there's a typo or something then really who cares, that's the other guy's problem. :smile: CRC32 seems like overkill but considering it's just 4 more bytes during transmission and it's then discarded once checked, I wouldn't quibble. CRC-16 may be just fine but for only saving a couple bytes I just wouldn't care. Message digests like MD5 would be fine too but then it's even more overkill... 16 bytes. If you were trying to verify a huge document or file, that's one thing, but for just validating a few values... I saw science_man_88's comment about how barcodes do it. Simpler checksums, like the last digit of a credit card #, would be okay, but I think the odds are higher of a false positive if you got just the right combination of someone fat fingering things, transposing hexits, or whatever. Yeah, CRC-32 or even CRC-16 seem okay to me... built in functions to handle it and pretty sturdy for a basic check. I note that PHP has a crc32 function... custom funcs for crc-16 or crc-8 are easy enough but it should be easy enough for everyone, not just on the Primenet side. I think crc32 functions are probably built in to just about everything so that'd be my first pick. You could probably compute CRC across all the values of the json doc... not just the important ones, but all of them just because you could, and why not. :smile: Who knows what someone else may find terribly important years from now and would wonder "why weren't they checking those things for transmission problems?" |
[QUOTE=preda;466164]Should the check digit encompass residue *and* exponent? (otherwise, maybe the residue is transcribed correctly but the exponent isn't) Should the check digit become part of the residue (one additional digit on the residue), or be expressed as a separate field in JSON? Proposals for simple algorithms for hexadecimal check digit?[/QUOTE]
I suggest: a CRC, that detects digit transpositions and other changes; at least 16-bit in size, applied to the entire result message, excluding the CRC value field itself, & CRC value on the right as the last field. That gives substantial protection of the data and message integrity, at relatively low cost in computation and message length. Almost any dropped or transposed or altered characters are detected (even white space). Try it out at [URL]https://www.lammertbies.nl/comm/info/crc-calculation.html[/URL] A simple CRC could also be computed and stored with full length interim residues when writing them to disk, and checked when read from disk before use of the residue. That's a separate implementation choice for the developers. A question that arises is: Will the choice of checksum details or actual implementation code be made public, or limited to trusted developers, or limited to some other subset of the community, to make faking results or other error harder and less likely? There may be license considerations here too. Handy overviews are also available at [URL]https://en.wikipedia.org/wiki/Checksum[/URL] & [URL]https://en.wikipedia.org/wiki/Cyclic_redundancy_check[/URL] |
[QUOTE=Madpoo;466196]I don't imagine it has to be too fancy, just good enough to detect fat fingering or weird copy/paste.
The important parts are exponent and residue... the rest of it, if there's a typo or something then really who cares, that's the other guy's problem. :smile: CRC32 seems like overkill but considering it's just 4 more bytes during transmission and it's then discarded once checked, I wouldn't quibble. CRC-16 may be just fine but for only saving a couple bytes I just wouldn't care. Message digests like MD5 would be fine too but then it's even more overkill... 16 bytes. If you were trying to verify a huge document or file, that's one thing, but for just validating a few values... I saw science_man_88's comment about how barcodes do it. Simpler checksums, like the last digit of a credit card #, would be okay, but I think the odds are higher of a false positive if you got just the right combination of someone fat fingering things, transposing hexits, or whatever. Yeah, CRC-32 or even CRC-16 seem okay to me... built in functions to handle it and pretty sturdy for a basic check. I note that PHP has a crc32 function... custom funcs for crc-16 or crc-8 are easy enough but it should be easy enough for everyone, not just on the Primenet side. I think crc32 functions are probably built in to just about everything so that'd be my first pick. You could probably compute CRC across all the values of the json doc... not just the important ones, but all of them just because you could, and why not. :smile: Who knows what someone else may find terribly important years from now and would wonder "why weren't they checking those things for transmission problems?"[/QUOTE] CRC-32 can be tricky because there are many variants of it. For instance, SSE4.2 has CRC-32c acceleration, while things like gzip and PNG use plain CRC-32. PHP supports CRC-32 and CRC-32b. If anything is to be hashed, I would md5 for simplicity. Isn't JSON field validation enough though? |
[QUOTE=Mark Rose;466211]Isn't JSON field validation enough though?[/QUOTE]
No, because you could very easily have valid JSON with invalid values inside. |
[QUOTE=CRGreathouse;466214]No, because you could very easily have valid JSON with invalid values inside.[/QUOTE]
That's what I meant by field validation: the values of the fields, not the presence of the fields :smile: |
[QUOTE=Mark Rose;466215]That's what I meant by field validation: the values of the fields, not the presence of the fields :smile:[/QUOTE]
I'm confused, then -- how does that differ from what was being discussed? :blush: |
[QUOTE=Mark Rose;466211]CRC-32 can be tricky because there are many variants of it. For instance, SSE4.2 has CRC-32c acceleration, while things like gzip and PNG use plain CRC-32. PHP supports CRC-32 and CRC-32b.
If anything is to be hashed, I would md5 for simplicity. Isn't JSON field validation enough though?[/QUOTE] Developers and Primenet need to agree on a specific standard, yes. On the message source end, implementation is typically in C and the code already exists, somewhere, and could be added in, for CRC of various lengths and standards, and probably for other choices too. (Careful, some are incorrect implementations.) There are at least 7 applications in active use, so multiple implementations to do there; one Primenet, so one implementation there. Spiders can take the CRC or whatever check code value from the application and pass it on, so it seems to me the submission spiders are unaffected or lightly affected. They, along with Primenet, are also less numerous than the count of applications. That, to me, argues for making the application (source) implementation easy, so it will happen. Not sure what you mean by JSON field validation. Enough is a judgment call that will vary. There are always trade-offs. If you mean a simple checksum only on the residue, only on the exponent, or one for each, it doesn't catch the error of someone transcribing results and in the process transposing digits. Or an actual cheater copying and pasting others' results with his UID. (I have no data on cheat rate or suspects. I think it is small to nonexistent. If it takes effort, it is less likely.) If you mean a CRC or other check field by field, that makes for longer records than doing one for the whole message, even at twice the length, because there's one JSON field identifier & punctuation for it rather than two or more sets. If you mean only structure it as JSON, no CRC or checksum, which seems unlikely, that catches even less error. If going to the trouble of doing a checksum at all, why not be thorough, and run the whole message into one CRC, perhaps all the way up through the CRC: JSON field identifier? It would flag even an accidental copy/paste error (missing the first or perhaps last character in the copy) such as copy/paste into the manual submission web form, as much GPU output is done. The 16-bit feels like the right size to me; 32 overkill, 8 a bit too light (8 bit,=~0.39% silent misses, 16-bit 15ppm, 32-bit 233ppt) One per message is mostly more concise than two for individual values. exponent ECS: F residue RCS: F ... vs. exponent ECS: FF residue RCS: FF ... vs. exponent residue ... CRC: FF vs. exponent residue ... CRC: FFFF vs. exponent residue ... CRC: FFFFFFFF MD5 is rather large; at 128bits. It seems to me excessively long compared to the size of the whole message. Expressed in hex, it's another 32+ characters tacked onto a message that is typically less than 192 bytes otherwise, with the unwieldy 32-character AID prominent in the total length. Note some users (many?) want to be able to read the results records on screen as a single line without line wrap, in a font big enough to be readable. Recast into single-line JSON, a (purposely mangled) Prime95 format result for example does not fit on one line even at 1600x900 resolution in the smallest available Windows FixedSys font. Fixed width fonts and single line records of a length that don't wrap are useful because they make scanning vertically in a results file by eye much easier. UID: "Kriesel/system-name", exponent: "M234745067", result: "is not prime", Res64: "46ECF1E803DDA820", Wd4: "E986021D,15615940,00000000", AID: "915C453965A8DF2B1ADE0FBB36CD0787", MD5: "0123456789ABCDEF0123456789ABCDEF" (And this doesn't include the full-blown JSON punctuation, other syntax, absent fields and identifiers etc. See Madpoo's earlier layout example., adding dozens more characters.) In mostly 32-character chunks: UID: "Kriesel/system-name", expo nent: "M234745067", result: "is not prime", Res64: "46ECF1E803DD A820", Wd4: "E986021D,15615940,0 0000000", AID: "915C453965A8DF2B 1ADE0FBB36CD0787", MD5: "0123456789ABCDEF0123456789ABCDEF" I prefer fully addressing the message integrity question once. Doing checksums or CRCs on individual fields or less, feels to me like using half measures, and more likely to be revisited later. |
| All times are UTC. The time now is 21:50. |
Powered by vBulletin® Version 3.8.11
Copyright ©2000 - 2021, Jelsoft Enterprises Ltd.