Datum Technology
Home Button About Us Button Products Button Service Button News Button Career Button Contacts Button
ExaGrid Banner Datum > Products > ExaGrid Systems > ExaGrid vs Other Appliance

ExaGrid Systems Logo

Comparing Inline/Block-Level vs Post-Process/Byte-Level Deduplication

When organizations are choosing a disk-based backup appliance with deduplication for their IT departments, we are often asked the difference between ExaGrid and other appliances (e.g. Brand "D" or Brand "Q"). The main difference between product architectures is that ExaGrid's system offers post-process byte-level deduplication (which is faster and more scalable) while other appliances like Data Domain or Quantum offer inline block-level deduplication (which is not as fast and less easily scalable).

The IT goal is to have a deduplication solution that offers the following attributes and capabilities:

  1. Highest performance solution for the shortest backup window
  2. High performance maintained as data grows to keep short backup window
  3. Fast full system restores
  4. Fast offsite tape copy
  5. Fast older backup version restores
  6. Simplified GUI-based management and comprehensive reporting
  7. Plug-and-play installation, requiring little ongoing IT staff time to manage
  8. Cost-effective up-front purchase
  9. Cost effectiveness as data grows
  10. Strong technical customer support

The Post Process Advantage

ExaGrid Post Process Deduplication Diagram

Shortest backup window via post-process data deduplication

ExaGrid employs post-process deduplication, which allows the backups to write directly to disk at disk speed. This produces a faster backup and shorter backup window. The rationale here is to defer the compute-intensive process until after the backup has landed so as not to impact the time it takes to perform the backup.

Another approach in the market is inline deduplication, which deduplicates data on the fly, before it lands on the disk. Because inline deduplication can potentially cause a bottleneck at the point where data is streaming into the backup appliance, inline deduplication can result in slower performance and a longer backup window. Proponents of inline deduplication often argue that their approach requires less disk and is therefore less expensive. However, because inline deduplication must rely on faster and more expensive processors--and more memory--in order to avoid being prohibitively slow, any cost differences in the amount of disk used are overcome by the need for more expensive processors and memory.

Fast Post-Process Byte-level Deduplication vs Slow Inline Block-Level Deduplication

Check out our chart below as we compare and contrast the differences between ExaGrid's post-process/byte-level deduplication and other vendors' inline/block-level deduplication solutions for each of the ten goals above.

Attribute/Capability Post-Process with Byte-Level Deduplication Inline with Block-Level Deduplication
Highest performance for shortest backup window Fastest
Writes to the disk at disk speed to ensure completion of the backups quickly.
Slower
Performs compute-intensive process between the backup server and disk.
High performance maintained as data grows to keep short backup window Strong — can add full appliances
Backup jobs are broken into 50MB to 100MB segments and then compared to find the bytes that change. Due to the large size of the segments—thousands of them at 10TB—the tracking data can be copied across full servers each with their own processor, memory, bandwidth and disk. As a result, full appliances are added versus just disk. When you grow from 10TB to 20TB (twice the data) the processor, memory, bandwidth and disk all double which means you have twice the resources. This results in a backup window that stays the same length.
Weak — can only add disk
Uses roughly 8KB blocks, which produces a hash tracking table of a billion entries at just 10TB. Due to the size of the hash table and prohibitive cost of memory to store the entire hash table multiple times, the hash table needs to be kept in a front-end server—and therefore, only disk shelves are added as data grows. If your backup grows from 10TB to 20TB, only disk is added while the processor, memory and bandwidth stay the same. With twice as much data but the same processor and memory, as the data grows, the backup window expands.
Fast full system restores Fastest Restores
Keeps full copy of most recent backups and historical versions as byte-level deltas behind the most recent backup. Latest full copy is always ready to restore in complete form for fastest full system restores.
Slower Restores
Deduplicates data on the fly so all data on disk is deduplicated. When doing a full system restore— often time-sensitive—you have to wait for all of the data to be put back together (rehydrated).
Fast offsite tape copy Fastest Tape Copies
Keeps a full copy, so when your Friday night backup is complete, the full backup is sitting on the disk waiting to be copied to tape. The tape copy job simply copies the full backup from disk to tape without any data rehydration time, resulting in fastest tape copies.
Slow Tape Copies
Deduplicates data on the fly so during the Friday night full backup, data is deduplicated on the way to disk. As soon as the Friday night full is complete and the offsite tape copy starts, the entire full backup needs to be put back together (rehydrated) which makes for slow tape copies.
Fast older backup version restores Rehydrate Older Versions
Takes similar time to restore older versions as inline/block-level deduplication.
Rehydrate Older Versions
Takes similar time to restore older versions as post-process/byte-level deduplication.
Plug-and-play installation Plug-and-Play Appliance
Simple to initially install.
Plug-and-Play Appliance
Simple to initially install.
Simplified GUI-based management and comprehensive reporting Simplified Management, Content-Aware Reporting
Management of multiple sites and devices via single web-based UI. Job-level reporting of deduplication ratio and replication status makes it easy to determine how to optimize backups.
Complex Management, Generic Reports
Configuration of system via command-level interface (CLI) per device. Reporting of dedupe ratios and replication status just by device, not at job level.
Cost-effective up-front purchase Best Price for Highest Performance
Performs the processing after the backups are complete. Therefore, the systems can utilize mass market Intel processors that are shipped in high quantity and therefore are inexpensive. This greatly reduces the cost of the system. These systems can be up to as much as 30% less than an inline/block system.
Higher Price
Due to the inline approach, this requires the most recent, high-performance CPU in order to keep up with backups. The premium processor and memory drives the cost of the system up. These systems are more expensive than post process/ byte.
Cost effectiveness as data grows Cost Effective to Scale
Uses full servers. As data grows you add another server into a GRID architecture. Each server comes with processor, memory, bandwidth and disk. When your data goes from 10TB to 40TB, you simply keep adding more appliances and the system keeps growing. There are no forklift upgrade points and no future costs to consider. Just add as you grow.
Costly to Scale
Disk is added as data grows, but at some point the front-end server can no longer keep up because the amount of disk you can add behind fixed processor, memory and bandwidth is limited. At some point the front-end server must be replaced with a server that has faster processor and memory, which is called a “forklift” upgrade. Some product lines have as many as five forklift upgrade points. Since the cost of the front-end can be as much as the price of the initial system, when you buy in and then hit a forklift upgrade point, you may have to spend about as much for the upgrade as what you originally spent.
Strong technical customer support Depends on the Company
Ask each company about their support models.
Depends on the Company
Ask each company about their support models.


The Bottom Line on Post-Process Deduplication

Post-process, byte-level deduplication offers the following advantages compared to inline, block-level deduplication:

  • Faster for backups and full system restores
  • More scalable, with no forklift upgrades
  • Faster for offsite tape copy
  • Better management and reporting
  • Same power consumption, cooling and rack space
  • Costs less up front and costs less over time

ExaGrid Customer Testimonies and Success Story


Overland Storage ExaGrid Systems Riverbed Technology Alcatel-Lucent Ritestor SmartPick Backup Fujifilm Garner Hard Disk Degaussers Atto
Follow Us Follow Datum at Facebook Follow Datum at Twitter Feel Free to Email Us for Any Enquiry    

| Home | About Us | Products | Services | News & Events | Career | Contacts | Promotions
Copyright © 2008-2011 Datum Technology Pte Ltd

e: sales@datum.com.sg
t: (65) 6842 6966      f: (65) 6842 3933