Building a RAID Array with USB Drives

If you asked what I was thinking about at any given moment, there’s a decent chance I’d say RAID, USB drives, and how confusing it all is. This past week I built a RAID array using USB drives after seeing a post by Darko.

This is a walkthrough of how I set it up, which RAID level I went with, and what I learned along the way.

Backup… RAID?

Redundant Array of Independent Disks or RAID, is a technology that allows multiple hard drives or solid-state drives to be combined into a single unit to improve performance, provide data redundancy, or both.

RAID has always been an interesting topic to me, one I wish I had paid more attention to in computer science class, largely because of how practical it can be for data centers or anyone running on-prem.

RAID has several levels (0 through 6). I considered two when embarking on this project: RAID 1 and RAID 5.

RAID 1

RAID 1 appealed to me because it is primarily focused on creating a redundant copy of data. I run a media server, and one more backup never hurts.

Worth noting: RAID 1 does not use parity or striping, so if both drives fail, the data is gone and there is no way to reconstruct it.

Some limitations that made it less viable for me:

Capacity: only 50% of total capacity is usable. With two 1 TB USB sticks, you get 1 TB, not 2.
Cost: this project was supposed to be quick and fun. Buying five 1 TB USB drives starts to feel less like a fun weekend project, and storage prices haven’t exactly been friendly.
Write performance is only as fast as each individual drive.

RAID 5

RAID 5 was a lot more appealing. It combines block-level striping with distributed parity, which ensures that if a single drive fails, reads can still be serviced using the parity data spread across the remaining drives.

Additionally, if one drive fails it can be rebuilt using the distributed parity,though the rebuild process can take a while.

Fun fact

While writing this, one of the drives actually failed :)

You can think of Raid5’s fault tolerance as something like this:

Unlike RAID 1, the total usable space is (N−1), so i you had six drives(128GB ) five would be usable and you get 512 GiB capacity while one is dedicated for parity.

Assembling the array was fairly straightforward. Unfortunately I am not made of gold, and hardware RAID controllers could run anywhere from 50 to 400(USD)+.

Thankfully, Linux has a software implementation of this via a command-line utility called mdadm:

mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[abcde]

checking on my drives it looked something like this

 sudo mdadm --detail /dev/md0
/dev/md0:
           Version : 1.2
     Creation Time : Mon Apr 27 00:33:56 2026
        Raid Level : raid5
        Array Size : 393010176 (374.80 GiB 402.44 GB)
     Used Dev Size : 131003392 (124.93 GiB 134.15 GB)
      Raid Devices : 4
     Total Devices : 4
       Persistence : Superblock is persistent

     Intent Bitmap : Internal

       Update Time : Mon Apr 27 08:46:12 2026
             State : clean
    Active Devices : 4
   Working Devices : 4
    Failed Devices : 0
     Spare Devices : 0

            Layout : left-symmetric
        Chunk Size : 512K

Consistency Policy : bitmap

              Name : somehost:0  (local to host somehost)
              UUID : b7434282:0d0a0d25:138beec6:7b594ed3
            Events : 3844

    Number   Major   Minor   RaidDevice State
       0       8       17        0      active sync   /dev/sdb1
       1       8       33        1      active sync   /dev/sdc1
       2       8       49        2      active sync   /dev/sdd1
       4       8       65        3      active sync   /dev/sde1

physically my array ended up looking something like this:

Closing thoughts

RAID is an incredible technology, and this short experiemnt really allowed me to explore and get a better sense of how to reason about data availability requirments on small scale.

In the future i would attempt a more serious version of this for backups and probably write a follow up post.

Until then happy hacking.

Notes

Striping the technique of splitting data across drives at the block level

Notes

Hamming code is the error-correcting math behind how parity bits work.
Worth reading Digital Ocean’s tutorial on RAID