[Gambas-user] Binary compare of files?

Doriano Blengino doriano.blengino at ...1909...
Fri Oct 17 18:21:42 CEST 2008


Kari Laine ha scritto:
> On Fri, Oct 17, 2008 at 5:44 PM, Kari Laine <klaine8 at ...626...> wrote:
>   
>>>> A little misunderstanding of MD5.
>>>> You  know for _*_sure_*_ if the SUM differs they are not equal.
>>>>
>>>> You may _*_assume_*_ they like the same if the sum is equal.
>>>>
>>>>         
> Ok should have looked around myself. I found a command sha512sum , which
> seems to calculate longer finger prints. Is this better in avoiding
> collisions?
> How about calculating both md5sum and sha512sum for a file - do collisions
> happen on different places on different methods?
>   
This is the kind of things I like to talk about :-).

Supposing a "not wrong" checksum method, the only important thing is the 
length of the result of the checksum.

Think at a checksum like the one used in old xmodem-like protocols: a 
single byte to ensure the 256-byte data are correct. If you use a single 
byte to decide if two files are equal, you get 1 chance in 256 to have a 
mistake. Good.
If you use two bytes, you have 1 chance (to fail) among 65536.
TCP/IP uses 32 bit numbers, so the chance is one in something more than 
4 billions.
I think that 32 bits could be enough for you - it is more likely that 
all the copies of your backupped file get lost, than to have the 
algorithm say two files are equal when they are not. Well, perhaps 64 
bits could do better?

The fact SHA is better than MD5 refers to a malicious programmer 
(hacker?) who wants to create a file than "seems" another one, but it is 
not. The hacker wants to do so to break security, and he has and uses 
computational power (a lot).
Randomness does not have all this power, and doesn't want to fake you...

So, choose the algorithm which gives you more bytes as possible. Well, 
some algorithm is more oriented to checksumming, some to guarantee 
security (difficult to break); probably checksumming ones are faster; 
perhaps you need checksumming (and MD5 was developed for that). - Just 
my opinion -

Cheers,
Doriano





More information about the User mailing list