[Gambas-user] Binary compare of files?

Kari Laine klaine8 at ...626...
Fri Oct 17 17:55:44 CEST 2008


On Fri, Oct 17, 2008 at 5:44 PM, Kari Laine <klaine8 at ...626...> wrote:

> On Fri, Oct 17, 2008 at 12:28 PM, Stefano Palmeri <rospolosco at ...152...>wrote:
>
>> Il venerdì 17 ottobre 2008 10:28:28 Ron_1st ha scritto:
>> > On Friday 17 October 2008, Stefano Palmeri wrote:
>> > > If you only want to know if two files are identical,
>> > > you could use md5sum.
>> > >
>> > > Ciao,
>> > >
>> > > Stefano
>> >
>> > A little misunderstanding of MD5.
>> > You  know for _*_sure_*_ if the SUM differs they are not equal.
>> >
>> > You may _*_assume_*_ they like the same if the sum is equal.
>> >
>> > As first test to know if you need investigation for 100% the
>> > same files it is helpfull.
>> > The real test could be done with i.e. a shell to diff or comp command.
>> >
>> >
>> >
>> >
>> > Best regards
>> > Ron_1st
>> >
>>
>> Thanks Ron! I didn't know about this MD5 issue ("collision").
>>
>>
>> http://www.gnu.org/software/coreutils/manual/html_node/md5sum-invocation.html
>>
>> I've always believed that md5sum was 100% safe.
>> There's always something to learn.
>>
>> Ciao,
>>
>> Stefano
>>
>
> Hi thanks for comments,
>
> I am doing a backup program. I have collected things (files) for years
> (15years) now and have ended with uncontrolled pile of harddisks which
> contain backups and backups of backups and partly backups with different
> names and so on. To clean this mess one feature of my backup program is that
> it should not backup file twice even if it is with different name. Therefore
> I turned to md5. But this collisions thing is not good. Is there better
> checksumming algorithms with longer fingerprint? Any ideas how to do this?
>
> I cannot use file compare because when later backing up a particular disk
> the colliding file is not mounted any more on the machine or might be
> mounted on different directory. So I need a way to store enough information
> to make that decision based on information stored in database without access
> to colliding file. Size might help. Name cannot be used because I have ended
> up with lot of duplicate files with different names.
>
> I am now going to test this collision thing. I will checksum 5 TB of files
> and see how many collisions I get. If it is less than 10 files then I don't
> care.
>
>
> Best Regards
> Kari Laine
>
>
>
>
Ok should have looked around myself. I found a command sha512sum , which
seems to calculate longer finger prints. Is this better in avoiding
collisions?
How about calculating both md5sum and sha512sum for a file - do collisions
happen on different places on different methods?

thankful for any advice.


Best Regards
Kari Laine



More information about the User mailing list