[Gambas-user] Binary compare of files?

Ron_1st ronstk at ...239...
Fri Oct 17 20:11:38 CEST 2008


On Friday 17 October 2008, Kari Laine wrote:
> On Fri, Oct 17, 2008 at 5:44 PM, Kari Laine <klaine8 at ...626...> wrote:
> 
> > On Fri, Oct 17, 2008 at 12:28 PM, Stefano Palmeri <rospolosco at ...152...>wrote:
> >
> >> Il venerdì 17 ottobre 2008 10:28:28 Ron_1st ha scritto:
> >> > On Friday 17 October 2008, Stefano Palmeri wrote:
> >> > > If you only want to know if two files are identical,
> >> > > you could use md5sum.
> >> > >
> >> > > Ciao,
> >> > >
> >> > > Stefano
> >> >
> >> > A little misunderstanding of MD5.
> >> > You  know for _*_sure_*_ if the SUM differs they are not equal.
> >> >
> >> > You may _*_assume_*_ they like the same if the sum is equal.
> >> >
> >> > As first test to know if you need investigation for 100% the
> >> > same files it is helpfull.
> >> > The real test could be done with i.e. a shell to diff or comp command.
> >> >
> >> >
> >> >
> >> >
> >> > Best regards
> >> > Ron_1st
> >> >
> >>
> >> Thanks Ron! I didn't know about this MD5 issue ("collision").
> >>
> >>
> >> http://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html
> >>
> >> I've always believed that md5sum was 100% safe.
> >> There's always something to learn.
> >>
> >> Ciao,
> >>
> >> Stefano
> >>
> >
> > Hi thanks for comments,
> >
> > I am doing a backup program. I have collected things (files) for years
> > (15years) now and have ended with uncontrolled pile of harddisks which
> > contain backups and backups of backups and partly backups with different
> > names and so on. To clean this mess one feature of my backup program is that
> > it should not backup file twice even if it is with different name. Therefore
> > I turned to md5. But this collisions thing is not good. Is there better
> > checksumming algorithms with longer fingerprint? Any ideas how to do this?
> >
> > I cannot use file compare because when later backing up a particular disk
> > the colliding file is not mounted any more on the machine or might be
> > mounted on different directory. So I need a way to store enough information
> > to make that decision based on information stored in database without access
> > to colliding file. Size might help. Name cannot be used because I have ended
> > up with lot of duplicate files with different names.
> >
> > I am now going to test this collision thing. I will checksum 5 TB of files
> > and see how many collisions I get. If it is less than 10 files then I don't
> > care.
> >
> >
> > Best Regards
> > Kari Laine
> >
> >
> >
> >
> Ok should have looked around myself. I found a command sha512sum , which
> seems to calculate longer finger prints. Is this better in avoiding
> collisions?
> How about calculating both md5sum and sha512sum for a file - do collisions
> happen on different places on different methods?
> 
> thankful for any advice.
> 
> 
> Best Regards
> Kari Laine

If there are no problems to keep both I would use it.
The reason is very simple. 

If manual check on files is in need then MD5 is allmost standard 
installed and easy to use to test and is relative quick.

If the md5 is equal you can check with sha to verify they do not
differ in any aspect at cost of more time to test.
For sha you may have to install sha package first

These methods never tell you 100% the truth but are safe to indicate
they are different and not the same.
The only one 100% is true is byte for byte compare, not nice to
store in the database. :)

I curious what the test with 5TByte wil do however it is not importand
how many bytes but more how many files are involved. 





Best regards
Ron_1st






More information about the User mailing list