Scale-out Storage with high speed erasure code

Overview

overview2.0.png

RozoFS is a software scale-out NAS providing an easy way to scale to petabytes of storage with optimized raw capacity on heterogeneous commodity hardware. RozoFS is an easy fit for almost every business case.

While need for unstructured data storage and the complexity of storage environments explode, traditional storage solutions have many pain points:

  • Unstructured data is difficult to store and index.
  • Scaling is expensive and complex.
  • Unavailability is expensive.

Download Data Sheet

Architecture Basics

RozoFS aims to provide an open source high performance and high availability scale-out storage software appliance for intensive disk I/O data center scenarios. It lets you see commodity server pools as a single storage resource. Using RozoFS is as simple as using a local filesystem, but behind the scene things are quite different: the data to be stored is partitioned into multiple chunks using the Mojette Transform and distributed across storage devices in such a way that it can be retrieved even if multiple devices are unavailable. However, chunks are meaningless alone requiring the use of redundancy. RozoFS uses redundancy schemes based on coding techniques that achieve significant storage savings when compared to simple replication.

RozoFS is designed to let customers take advantage of scale-out storage performance with simplicity, but with high availability at lower cost, more securely and without the drawback of multiple copies:

  • Reuse existing network and storage resources.
  • Scaling of users and capacity with no impact on performance.
  • Simple data management.
  • Reduced raw capacity with fewer devices and less bandwidth.
  • Higher security and data confidentiality.

Deployed on commodity servers, RozoFS provides a global namespace, high availability and uses well known technologies to provide multiple access methods to best fit business needs.

architecture overview

The file system itself comprises three components:

  • exportd — Meta Data Server managing the location (layout) of chunks (ensure the best capacity load balancing with respect to high availability), file access and namespace (hierarchy). Multiple replicated meta data servers are used to provide failover.
  • storaged — Storage server storing the chunks.
  • rozofsmount — Clients communicating with both export servers and chunk servers and are responsible for data transformation.

Beyond the scale-out architecture, RozoFS was designed for performance and scalability using a single-process event-driven architecture and non-blocking calls to perform asynchronous I/O operations.

Data Availability and Protection

Given the scale of systems, failure of a significant subset of the constituent nodes, as well as other network components, is a norm rather than the exception. To enable a highly available overall service, it is thus essential to both tolerate short-term outages of some nodes and to provide resilience against permanent failures of individual components. Fault-tolerance is achieved using redundancy while long-term resilience relies on replenishment of lost redundancy over time.The simplest and most commonly used form of redundancy is straightforward replication of the data in multiple storage nodes. However, erasure coding techniques can potentially achieve orders of magnitude more reliability for the same redundancy compared to replication.

RozoFS uses such a code, namely the Mojette Transform and thus can bring the same system availability and durability that three replicates would, but with only 50% redundancy overhead. That is, data is only stored 1.5 times in the system to obtain a 99.9999% availability level and, thanks to its self healing, a 99.99999999999% durability level. Using this technology, RozoFS can withstand at least up to four simultaneous (configurable) failures in the commodity storage pool.

When writing data (file, block or object): RozoFS manages path, rights, quotas... and storage location, transforms the data in a set of redundant chunks (according to the selected availability and the reported nodes reliability) and then stores the resulting chunks. When reading data: RozoFS makes sure the data can be reached even if several failures occur (network, host, disk...).

Data Management

RozoFS lets you aggregate storage of commodity hardware in the way best fitting your needs, it can manage several pools (called volumes) built on top of available storage on each node. A node can belong to several volumes. Volumes can be declared according to underlying storage performance (tiering) and can be extended easily on the fly without service interruption. These volumes are the raw storage on top of which several file systems can be created (called exports) and exposed to clients. Exports can be declared or removed at any time. Every export shares the raw capacity offered by its volume that can be managed through resizable quotas (hard and soft).