There is a lot of confusion about btrfs and its RAID features. Let's start from the basics:
- RAID stands for "redundant array of inexpensive/independent disks". It is a way of organizing disks.
- There are hardware and software RAID implementations, for instance:
mdadm
is a software implementation found in the linux kernel- some CPUs have hardware RAID
- there are specialized hardware RAID controllers
- Some filesystems implement software RAID (zfs, btrfs)
- all examples above are independent from each other
- There are many RAID configurations. Not all are really redundant, and they
affect performance (for better or worse) in different ways. The basic concept
is:
- RAID0 doesn't offer any redundancy at all, but may increase performance. If you lose one disk, you lose all your data.
- RAID1 duplicates all data over your disks (redundancy), so that if one disk fails you can save your data and replace the faulty disk.
- RAID10 (or RAID 1+0) offers redundancy and potential performance benefits, as it is a combination of RAID0 and RAID1.
- RAID5 and RAID6 offer redundancy and parity checking.
- Most RAID implementations don't do anything about corrupted data. They can warn about corruption if the drive's SMART detects some specific condition, but they typically can't fix anything.
- RAID is not a backup. If anything happens to your RAID, you lose your data. However RAID could definetely be used as a backup that you keep separate from another copy.
Many people complain that btrfs RAID isn't really RAID, because it is in fact very different from the tradicional RAID conception.
First things first, RAID5 and RAID6 are unstable and should not be used except for testing with throwaway data. It doesn't matter if you use RAID5 or 6 for years and haven't run in a problem. The moment when anything weird happens, blame is on you for using it despite all the warnings, even if the problem has apparently nothing to do with RAID.
The other RAID profiles are stable and can be used. Btrfs can apply different profiles for metadata and data. For instance, to add a device to your btrfs file system and then convert the metadata to raid1 and the data to raid0:
btrfs device add /dev/sdb1 /mnt
btrfs balance start -mconvert=raid1 -dconvert=raid0 /mnt
RAID0 works pretty much like tradicional RAID0. Stripping of data is used to achieve better performance. If one disk fails, you lose all data. Now, RAID1 and RAID10 is where things get different and the main reason people complain about btrfs raid. Tradicional RAID1 offers more redundancy as you add more disks to your array. That means that if you lose 2 disks of a 3 disk array, you still have all your data saved. Btrfs RAID1 only makes one redundant copy, even if you have more than 2 disks. That means that if you lose 2 disks or more, you are screwed. The same thing applies to RAID10. If you want more redundant copies, use btrfs RAID1C3 or RAIDD1C4 for 3 or 4 copies respectively.
Btrfs RAID10 requires 2 disks at minimum. All data is mirrored and also stripped. Which makes me wonder why anyone would choose btrfs RAID1 over btrfs RAID10, even if btrfs RAID10 performance isn't better because of the current implementation. Someone else asked this at https://lore.kernel.org/linux-btrfs/[email protected]/T/ but no one was able to answer it.
Another point of complain about btrfs is its inabiliy to mount the filesystem
when there are missing disks. To do that, you have to supply the degraded
mount option. Since RAID is supposed to improve the uptime of your storage, this
design decision does not make any sense.
As I said before, most RAID implementations can't fix corrupted data, while
btrfs can, but: Btrfs needs "block group profile with
redundancy" to
auto-repair corrupted data. You may also want to look into mdadm
which
provides fixing of corrupted data.
Profiles | Redundant copies | Parity | Striping | Space usage | Min. devices |
---|---|---|---|---|---|
single | 1 | 100% | 1 | ||
DUP | 2/1 device | 50% | 1 | ||
RAID0 | 1 | 1 to N | 100% | 1 | |
RAID1 | 2 | 50% | 2 | ||
RAID1C3 | 3 | 33% | 3 | ||
RAID1C4 | 4 | 25% | 4 | ||
RAID10 | 2 | 1 to N | 50% | 2 | |
RAID5 | 1 | 1 | 2 to N-1 | (N-1)/N | 2 |
RAID6 | 1 | 2 | 3 to N-2 | (N-2)/N | 3 |