When you store data in a computer file on a persistent storage device, what are you doing?

Understanding the meaning of persistence is important for evaluating different data store systems. Given the importance of the data store in most modern applications, making a poorly informed choice could mean substantial downtime or loss of data. In this post, we'll discuss persistence and data store design approaches and provide some background on these in the context of Cassandra.

If you’re interested in learning more about persistence in Cassandra and other NoSQL databases, check out our complete guide to NoSQL.

What is data persistence?

Persistence is "the continuance of an effect after its cause is removed". In the context of storing data in a computer system, this means that the data survives after the process with which it was created has ended. In other words, for a data store to be considered persistent, it must write to non-volatile storage.

Which data stores provide persistence?

If you need persistence in your data store, then you need to also understand the four main design approaches that a data store can take and how (or if) these designs provide persistence:

  • Pure in-memory, no persistence at all, such as memcached or Scalaris
  • In-memory with periodic snapshots, such as Oracle Coherence or Redis
  • Disk-based with update-in-place writes, such as MySQL ISAM or MongoDB
  • Commitlog-based, such as all traditional OLTP databases (Oracle, SQL Server, etc.)

In-memory approaches can achieve blazing speed, but at the cost of being limited to a relatively small data set. Most workloads have relatively small "hot" (active) subset of their total data; systems that require the whole dataset to fit in memory rather than just the active part are fine for caches but a bad fit for most other applications. Because the data is in memory only, it will not survive process termination. Therefore these types of data stores are not considered persistent.

Adding persistence to systems

The easiest way to add persistence to an in-memory system is with periodic snapshots to disk at a configurable interval. Thus, you can lose up to that interval's worth of updates.

Update-in-place and commitlog-based systems store to non-volatile memory immediately, but only commitlog-based persistence provides Durability -- the D in ACID -- with every write persisted before success is returned to the client.

Cassandra implements a commit-log based persistence design, but at the same time provides for tunable levels of durability. This allows you to decide what the right trade off is between safety and performance. You can choose, for each write operation, to wait for that update to be: 

  • buffered to memory
  • written to disk on a single machine
  • written to disk on multiple machines
  • written to disk on multiple machines in different data centers 

Or, you can choose to accept writes as quickly as possible, acknowledging their receipt immediately before they have even been fully deserialized from the network.

Why data persistence matters

At the end of the day, you're the only one who knows what the right performance/durability trade off is for your data. Making an informed decision on data store technologies is critical to addressing this tradeoff on your terms. Because Cassandra provides such tunability, it is a logical choice for systems with a need for a durable, performant data store.

In the last few years, we have seen Kubernetes emerging as a de facto standard for container orchestration and a go-to method for hosting microservices-based processes. This has led to an ongoing concern of data storage issues such as where the data is being stored, how much capacity is there for data, and how we retrieve it.

The answer to all these questions lies in one concept, persistent storage.

The importance of persistent storage for containers in computer systems is huge. This is simply because if all data were to be volatile, there is no possibility of keeping data permanently for later use, as it would be gone once the system is turned off.

Persistence storage is necessary to be able to keep all our files and data for later use. For instance, a hard disk drive is a perfect example of persistent storage, as it allows us to permanently store a variety of data.
In this blog, we learn more about persistence storage, its benefits, persistence storage for containers, the process of linking Kubernetes to persistent storage via NVMe®, and more.

Overview

What is Persistent Storage?
Benefits of Persistent Storage
Persistent Storage and Containerization
Persistent Storage in Kubernetes
Importance of Persistent Block Storage for Modern Application Development
Persistent Storage Use Case
Linking Kubernetes to Persistent Storage via NVMe®

What is Persistent Storage?

Also known as non-volatile storage, persistence storage refers to any of the data storage devices that can retain data even after there is no power supply to that device.

Among some of the common types of persistent storage are magnetic media, such as hard disk drives, tapes and several forms of optical media such as DVD. Persistent storage structures typically can be in the form of storage for files, blocks or objects.

Benefits of Persistent Storage

Among the key advantages of persistent storage include:

Simplicity: Persistent storage helps developers provision their storage, without necessarily needing any expertise as storage experts. It simply allows them to provision volumes for both on-premise/ public cloud services.

Security: When it comes to the security and encryption aspects of storage solutions, persistent storage scores high. It fulfils the security requirements of most enterprises in terms of volume-level encryption, self-encrypting disks, and key management, among others to protect them against any kind of data loss and security breaches.

Flexibility: Persistent storage offers you great flexibility over traditional storage and lets you use the same software across different virtual machines, clouds and containers. Further, developers also enjoy the flexibility to choose the storage interfaces for their workload, including file, block or object storage. It also gives developers the ability to deliver data services with one system, irrespective of protocol, thus boosting productivity, offering more freedom, and leading to more effective application development.

Portability: Today’s cloud-native world requires organizations to adopt a hybrid cloud approach to be able to combine the benefits of public and on-premises clouds. Persistent storage makes it easy to migrate your stateless applications across multiple clouds and migrate your data from one cloud to other clouds.

Efficiency: Persistent storage makes application development much more efficient. It eliminates the need to rewrite applications when you want to port them from one cloud provider to another and you can simply move applications without expensive or time-consuming rewrites whenever you want.

Cost-effectiveness: With persistent storage, you only have to pay for the storage and compute you use. It scales on-demand with no disruptions, growing and shrinking automatically as you add and remove files.

Persistent Storage and Containerization

Containers are a key ingredient for building an agile, DevOps-oriented infrastructure and have emerged as an easy way to port software to wherever it needs to be. In containerization specifically, persistent storage refers to storage volumes that are typically associated with applications, such as databases, which you can access even if the application is shut down/ processed.

The recent years have seen containerization emerging as a common way to package software and their operating systems into transportable and isolated modules that are generated and destroyed as much as possible. But originally, containers did not allow permanent storage, which means all the data generated by a containerized app would disappear until the app completed its function, and the container was broken.

However, there have been methods developed recently by software and storage vendors to retain the data generated by container applications and safely keep them in familiar storage volumes. Persistence storage helps in resolving the issue of retaining the more ephemeral storage volumes (that generally live and die with the stateless apps).

Persistent Storage in Kubernetes

Kubernetes is primarily an open-source container orchestration framework. It provides management and services capabilities required to efficiently deploy, operate, and scale containers in a cloud/cluster environment.

Kubernetes storage is basically useful for storage administrators since it allows them to maintain multiple forms of persistent and non-persistent data in a Kubernetes cluster. This enables them to create dynamic storage resources that can serve different types of applications.

The Kubernetes storage, if managed properly, can be used to automatically provision the most appropriate storage to a range of applications, with minimal administrative costs.

To enable persistent storage, Kubernetes primarily uses two main concepts as discussed below:

1. Persistent Volume (PV)

PV is mainly a storage element in a cluster, which is defined manually by an administrator or dynamically defined by a storage class. It has its own lifecycle which is separate from the lifecycle of Kubernetes pods.

2. Persistent Volume Claim (PVC)

PVC is primarily a storage request by a user, where any application running on a container can request storage. For instance, a container can specify the way it needs to access the data or the size of storage it requires.

Apart from access mode and storage size, administrators can offer PVs with various custom properties, such as the level of performance, type of disk, or storage tier. Users can then request storage based on all these custom parameters without knowing the implementation details of the underlying storage.

Importance of Persistent Block Storage for Modern Application Development

Development teams across the board are modernizing their applications by adopting containers, serverless, and microservices-based architectures. Most of these applications are stateful in nature making persistent storage a necessity.

Here are some of the reasons why cloud-native persistent storage is important for modern application development by offering many powerful capabilities and providing significant flexibility/portability for DevOps teams:

1. Developers working with a Kubernetes orchestrator find it simpler to create their resources for a project. A persistent storage layer can act as a robust storage platform to give confidence to the developers that the storage layer also adheres to their data security and resilience requirements of modern application deployments.

2. With a viable software-defined persistent storage platform, development teams can easily define and adjust their data requirements for a project on the go instead of completing this process manually.. Further, they don’t need to rely on storage administrators for provisioning the storage.

3. Open source software-defined persistent storage allows for portable storage across various kinds of infrastructures, including virtual machines (VMs), bare metal, and public and private cloud environments.
Since data federation can also take place across hybrid and multi-cloud environments, developers can conveniently place sensitive data where it needs to be along with integrating applications and microservices from various multi-cloud deployments.

Persistent Storage Use Case

One of the top uses cases of persistent storage:

Stateful Applications

A stateful application refers to a program that saves important data from the activities of one session for use in the subsequent session. The saved data here is called the application’s state.

The advent of persistent storage on Kubernetes made it possible to support stateful applications as well, unlike earlier.

For all these modern applications, persistent storage serves as a data foundation and allows data to persist in the application state.

Linking Kubernetes to Persistent Storage via NVMe®

Optimal Kubernetes persistent storage requires a robust solution that is as flexible and portable as containers yet can perform like local NVMe® SSDs. Further, to preserve container portability, it must speak common network protocols and should not require special NIC’s apart from being standards-based, managed via an API and run on standard servers.

LightOS by Lightbits Labs meets all these philosophical and technical requirements to be the best high-performance persistent storage solution for Kubernetes, leading to improved scaling and availability via clustering. It supercharges your Kubernetes based applications while increasing reliability and flexibility by providing,

– Similar performance as flash Local Persistent Volumes with greater utilization of your storage investment
– Better and enhanced service levels and a better user experience with consistent latency
– Faster rebuild time with higher resiliency levels
– No changes to your TCP/IP network with no proprietary drivers on Kubernetes servers
– Simple and secure storage access to Kubernetes application servers

All in all, the primary goal of LightOS is to completely transform commodity servers into a powerful storage pool linked via NVMe®/TCP to the Kubernetes cluster orchestrator that allows you to separate storage from computing without much hassle and at a relatively lower cost.

Additional Resources

Kubernetes® and LightOS™ Performance, Persistence, Simplicity
Cloud-Native Storage for Kubernetes
Disaggregated Storage
Ceph Storage
Kubernetes Storage
Edge Cloud Storage
NVMe® over TCP

How does persistent storage work?

Persistent storage is any data storage device that retains data after power to that device is shut off. It is also sometimes referred to as non-volatile storage. Magnetic media, such as hard disk drives and tape are common types of persistent storage, as are the various forms of Optical media such as DVD.

What is file explain the concept of file as a persistent storage?

Persistent storage (aka non-volatile storage) is any storage device that retains data after power to the device is turned off. Examples of non-volatile storage used to persist data are flash memory, hard disk, tape or optical media. Persistent storage also comes in the form of file, block or object storage.

What are the purposes of the storage devices in the computer system?

A storage device is any type of computing hardware that is used for storing, porting or extracting data files and objects. Storage devices can hold and store information both temporarily and permanently. They may be internal or external to a computer, server or computing device.

When you write a program and save it to a disk you are using storage?

When you write a program and save it to a disk, you are using temporary storage. The terms directory and folder are used synonymously to refer to an entity that is used to organize files.