[Gambas-user] Binary compare of files?

Kari Laine klaine8 at ...626...
Fri Oct 17 16:44:44 CEST 2008


On Fri, Oct 17, 2008 at 12:28 PM, Stefano Palmeri <rospolosco at ...152...>wrote:

> Il venerdì 17 ottobre 2008 10:28:28 Ron_1st ha scritto:
> > On Friday 17 October 2008, Stefano Palmeri wrote:
> > > If you only want to know if two files are identical,
> > > you could use md5sum.
> > >
> > > Ciao,
> > >
> > > Stefano
> >
> > A little misunderstanding of MD5.
> > You  know for _*_sure_*_ if the SUM differs they are not equal.
> >
> > You may _*_assume_*_ they like the same if the sum is equal.
> >
> > As first test to know if you need investigation for 100% the
> > same files it is helpfull.
> > The real test could be done with i.e. a shell to diff or comp command.
> >
> >
> >
> >
> > Best regards
> > Ron_1st
> >
>
> Thanks Ron! I didn't know about this MD5 issue ("collision").
>
>
> http://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html
>
> I've always believed that md5sum was 100% safe.
> There's always something to learn.
>
> Ciao,
>
> Stefano
>

Hi thanks for comments,

I am doing a backup program. I have collected things (files) for years
(15years) now and have ended with uncontrolled pile of harddisks which
contain backups and backups of backups and partly backups with different
names and so on. To clean this mess one feature of my backup program is that
it should not backup file twice even if it is with different name. Therefore
I turned to md5. But this collisions thing is not good. Is there better
checksumming algorithms with longer fingerprint? Any ideas how to do this?

I cannot use file compare because when later backing up a particular disk
the colliding file is not mounted any more on the machine or might be
mounted on different directory. So I need a way to store enough information
to make that decision based on information stored in database without access
to colliding file. Size might help. Name cannot be used because I have ended
up with lot of duplicate files with different names.

I am now going to test this collision thing. I will checksum 5 TB of files
and see how many collisions I get. If it is less than 10 files then I don't
care.


Best Regards
Kari Laine



More information about the User mailing list