Skip to content

Instantly share code, notes, and snippets.

@sachin-j-joshi
Created October 30, 2019 18:27
Show Gist options
  • Save sachin-j-joshi/7fb8df1042136f2634de644ba13bfcc0 to your computer and use it in GitHub Desktop.
Save sachin-j-joshi/7fb8df1042136f2634de644ba13bfcc0 to your computer and use it in GitHub Desktop.
PDP-38: Support for multi-tier storage

Summary

Motivation

Support multiple “tiers”​

  • Cloud storage tier - Amazon S3, Azure, GCP​
  • Cold storage tier​
  • Edge tier​
  • Fast/Expensive tiers with fancy hardware (E.g. Optane)

Current situation

Prerequisite

This PDP requires implementing PDP-34

Assumptions

High level requirement

Key Concepts

Mounting multiple storage

Layer​ed Cascading Storage

Policy driven eviction​

  • Initially - Simple policy based on size and/or time thresholds [Min-Max] ​
  • Lots of possibilities for future​
    • Smarter algorithms ​
    • Machine Learning​

User stories

Ability to specify multiple tier-2

Ability to auto-tier

Ability to specify eviction policy

  • Policy driven ( initially very simple)​
  • Based on Perf/cost or QoS or other considerations​

Public Interface Changes

API Changes

None

Config Changes

Deployment Changes

Internal Changes

Key Components

Key Operations

Write Path​

  • The data is first written to the “near most” hot storage. When enough data accumulates there it is moved to the next specified storage, and then to next. ​
  • Data is deleted from a mount once it is moved to a different mount. ​
  • While moving to colder storage we may also “merge” many small objects into single larger object on that mount. ​

Read Path​

  • Directly read from the storage​
  • Prefetch data​ - Analogous to L1, L2, L3 cache​
  • Data is not copied temporarily to near storages. ​

Compatibility and Migration

Discarded Approaches

References

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment