Skip to content

Data Redundancy Methods: Erasure Coding Explained

Video Storage No Longer Suitable with Outdated RAID Drive Arrays and the Advantages of Erasure Coding

Encoding Strategies Unveiled: Examination of Erasure Coding Techniques
Encoding Strategies Unveiled: Examination of Erasure Coding Techniques

Data Redundancy Methods: Erasure Coding Explained

The Evolution of Hard Drives and the Rise of RAID Technology

The cost and size of hard drives have undergone a dramatic transformation since their inception, paving the way for the widespread adoption and design of RAID (Redundant Array of Independent Disks) systems in modern storage.

The Early Days of Hard Drives

The first hard disk, IBM's RAMAC, was introduced in 1956. It consisted of 52 magnetic disks, a far cry from the terabyte-scale drives we have today. By the 1980s, personal computing saw a significant leap with hard drives like the IBM PC/XT boasting about 10 MB capacity [1].

Throughout the 2000s, storage density improved significantly with multiple platters and refined head control. For instance, an 80 GB hard drive cost approximately $200 in 2002, representing a considerable drop from earlier cost per GB levels—roughly a 30% annual price decline similar to memory cost trends [3].

As we moved into the 2010s and 2020s, capacities typically ranged in terabytes (e.g., 1 TB and 2 TB drives) with prices steadily lowering relative to storage size. By 2018-2020, 1 TB hard drives became competitively priced against smaller SSDs, and HDD prices are proportional to capacity beyond 2 TB. Smaller capacity drives became less popular due to insufficient size [4].

External hard drives remain cheaper per gigabyte than flash drives, though flash drive capacities and speeds have also increased markedly [5].

The Impact on RAID in Modern Storage Systems

The declining cost per gigabyte made multiple-disk arrays economically viable, allowing RAID techniques to protect data and improve throughput in modern storage systems across enterprise and personal contexts [1][2][3][4].

Early RAID levels (RAID 3 and RAID 4) split data into bytes or blocks across drives to achieve high transfer rates. However, small-file operations were slower on RAID 3 but improved with RAID 4, making it popular for personal computers where small files predominate [2].

As drives became larger and more affordable, building RAID arrays with multiple disks to provide redundancy (protection from single drive failure) and performance enhancements became common in enterprise and consumer systems.

The increased size and lower cost of drives reduced the relative overhead of using multiple disks, encouraging configurations like RAID 5 or RAID 6 that use distributed parity for fault tolerance.

However, RAID causes an increased load on the parity or checksum disk, potentially reducing its lifespan [2]. Modern storage often balances between cheaper large-capacity HDDs for bulk storage with RAID arrays, and faster, more expensive SSDs for speed-critical applications.

Erasure Coding: A Modern Approach to Data Storage

Erasure Coding breaks data down into fragments and uses coding to generate redundancy data for re-creation of lost fragments. This method is particularly effective in distributed storage systems, making optimal use of storage resources while providing fault tolerance [6].

High-performance Erasure Coding requires a lot of processing power and is achieved with a virtual storage system built using cloud computing technology and lots of virtualization [6]. Erasure Coding technology is used in video surveillance appliances like Pivot3, allowing for quick data recovery even with the loss of an entire node and one disk drive [7].

Erasure Coding writes data in parallel to many disks at once, resulting in high-performance storage systems. This technology is used by Amazon S3, Google Cloud, and Microsoft Azure for efficient fault-tolerant data storage, providing storage efficiencies for large drive arrays of 80% to 90% and higher [7].

Balancing Storage Needs

Many entry-level NVRs use JBOD storage (just a bunch of disks) without RAID to avoid data loss. JBOD storage doesn't require a high level of IT expertise for configuration or drive replacement [8]. However, it lacks the fault tolerance and performance benefits offered by RAID.

In contrast, RAID arrays can be vulnerable during a rebuild, as a RAID array is highly vulnerable and can lose all data if another disk fails [8]. The main problem with RAID for large capacity disk drives is rebuild time, which can be measured in weeks for a single disk loss [8].

Pivot3 uses "virtual sparing," which reserves space on each drive, allowing the system to proactively fail a hard drive showing early warning signs of failure [7]. This approach mitigates the risk of data loss and speeds up the rebuild process.

In 1983, a 5MB hard drive cost $4,500 in 1980 dollars. Today, a Seagate 10-terabyte 5.25-inch drive is $300 at Best Buy, which is $0.00003 (3 thousandths of a penny) per MB [5]. This dramatic reduction in cost per megabyte has been instrumental in the widespread adoption of RAID and other advanced storage technologies.

Read also:

Latest