Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hard links are not a suitable alternative here. When you deduplicate files, you typically want copy-on-write: if an app writes to one file, it should not change the other. Because of this, I would be extremely scared to use anything based on hard links.

In any case, a good design is to ask the kernel to do the dedupe step after user space has found duplicates. The kernel can double-check for you that they are really identical before doing the dedupe. This is available on Linux as the ioctl BTRFS_IOC_FILE_EXTENT_SAME.



It was for me. I was using rsync with "--link-dest" earlier for this purpose, but that only works if the file is present in consecutive backups. I wanted to have the option of seeing a potentially different subset of files for each backup and saving disk space at the same time.

Restic and Borg can do this at the block level, which is more effective but requires the tool to be installed when I want to check out something.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: