Persistent storage in Docker containers using Volumes

In this tutorial, we'll learn how to persist data in a docker container. By default, docker containers are stateless by nature, meaning once a container is destroyed, everything is erased from the container. This is because, all the files created inside a container are stored on a writable layer of the container. Writable layer is a tiny layer which is created at the time of container creation on the top of the underlying layers. If an application inside of a container is saving data in a file system or memory, all of those saved items are erased immediately, the moment container is destroyed. Most of the times, this stateless nature of docker is exactly what we want, as it's much easier to manage and scale stateless applications than stateful applications. With stateless docker containers, we use ephemeral storage volumes, that live and die with containers. But, sometimes, we want a docker container to persist data in all situations using persistent storage volumes.

In docker, volumes and bind mounts can be used to persist data. However, volumes are the preferred way of persisting data in containers. By default, volumes are anonymous. The recommended way is to create a named volume by using --name flag. In this chapter, we will discuss briefly about bind mounts and then discuss in depth about named volumes.

Bind Mounts

Bind mounts allow us to persist data by mounting a specific file or folder on the host machine to a specific location on the docker container. They have very limited functionalities in comparison to volumes. The file or folder used as mount point to a container is referenced by its absolute path on the host machine. It’s often used to provide additional data into containers. When working on an application, we can use a bind mount to mount our source code into the container, so that it can process those code changes and let us see the changes right away. For ex: using nodemon to reflect node.js code changes in source code directory in host machine. Similarly, in react applications, we can use same technique to reflect frontend changes right away.

One of the major problem with bind mounts is that, since they can be mounted anywhere on the host system,  even non-docker processes on the docker host can add, modify or remove files and folders at any time, thereby corrupting the containers. Another issue with bind mounts is, it is entirely dependent on directory structure and operating system of the host machine.

Anonymous Volumes

With anonymous volumes, we don't need to specify a name. A random characters are given to it as a name. This is not preferred. It can be created in the following way:

   
docker run --name mysql -d \
-v /var/lib/mysql \
-e MYSQL_ROOT_PASSWORD=Db#2022PwdNP \
 -e MYSQL_DATABASE=test_database \
 -e MYSQL_USER=test_db_user \
 -e MYSQL_PASSWORD=testN#DB2028P \
 --restart unless-stopped \
mysql:8
   
anonymous docker volume


Named Volumes

A named volume is created and managed by docker itself. We can create a volume using docker volume create command. We can also automatically create a volume at the time of container creation. They are stored in a part of the host filesystem, which is managed by docker. In linux machines, volumes are mounted in a part of the host filesystem at /var/lib/docker/volumes directory. For other OS, directory location may be different. This directory is accessible to the root user of the host machine only. With named volume, we only need to know the name of the volume and it can be attached it to any number of containers. Also, we can easily backup, restore or transfer named volumes to external storages.

There are several advantages of using volumes than bind mounts. Some of them are as follows:

  • It is easy to back up or migrate than bind mounts
  • It can managed using docker CLI commands or docker API.
  • It works well on both linux & windows containers.
  • It can be more safely shared among multiple containers.
  • Using volume drivers, we can store volumes on remote hosts or cloud providers, encrypt the contents of volumes, and add other functionalities.

Let's test the persistent storage using database technology.

Docker container without persistent storage

Let's run mysql container without any persistent storage attached to it:

Note:

In a container orchestration platforms, we normally use cloud secret managers like AWS secrets manager, Google Cloud secrets manager, Hashicorp vault etc,  to inject secrets at the time of container creation.

   
docker run --name mysql -d \
 -p 3306:3306 \
 -e MYSQL_ROOT_PASSWORD=Db#2022PwdNP \
 -e MYSQL_DATABASE=test_database \
 -e MYSQL_USER=test_db_user \
 -e MYSQL_PASSWORD=testN#DB2028P \
 --restart unless-stopped \
 mysql:8
   

To show a list of containers:

list of docker containers

Issue following command to start mysql server connection:

   
	docker exec -it mysql mysql -u test_db_user test_database -p
   

Create a table called users:

   
CREATE TABLE `users` (
  `user_id` varchar(36),
  `first_name` varchar(64) NOT null,
  `last_name` varchar(64) NOT null,
  `email` varchar(48) UNIQUE NOT null,
  `password` varchar(128) NOT null,
  `added_on` datetime DEFAULT NOW(),
  PRIMARY KEY (`user_id`)
);
   

Insert data into users table:

   

INSERT INTO `users`
(`user_id`,
`first_name`,
`last_name`,
`email`,
`password`)
VALUES
("056a3ce0-dc51-4c26-88fe-809a5a7a48b4",
"John",
"Doe",
"john.doe@nodexplained.com",
"this_is_secure_password");
   

Issue following command to see a newly inserted data:

   
	SELECT * FROM users;
   
mysql container - select query from table users

If you want to know more about MySQL, click on the following link:

https://www.nodexplained.com/create-database-and-perform-crud-operations-in-mysql-server/

Now, let's stop and remove mysql container:

   
	docker stop mysql && docker rm mysql
   

Again, issue above command that starts a mysql container. Then, check for a list of tables. As we can see from below image, there is no more tables. All the previous data is erased.

mysql container - data reset after re-initialization

Docker container with persistent storage

Let's start by creating a volume. To create a named volume, following is a syntax for the command:

   
	docker volume create --name VOLUME_NAME
   

Replace VOLUME_NAME with a meaningful volume name.

   
	docker volume create --name mysql_db
   

We can attach this named volume to a container using following command:

   
docker run --name mysql -d \
 -p 3306:3306 \
 -v mysql_db:/var/lib/mysql \
 -e MYSQL_ROOT_PASSWORD=Db#2022PwdNP \
 -e MYSQL_DATABASE=test_database \
 -e MYSQL_USER=test_db_user \
 -e MYSQL_PASSWORD=testN#DB2028P \
 --restart unless-stopped \
 mysql:8
   

As you can see, we use -v parameter to attach a volume to the container and it mounts the named volume to /var/lib/mysql data directory inside of a docker container. Let's see the list of containers:

   
	docker ps
   

To show a list of volumes:

   
	docker volume ls
   
docker volume list

To view detail information about a volume, we can issue following command::

   
	docker volume inspect mysql_db
   

Output looks like below:

   
[
    {
        "CreatedAt": "2022-08-22T11:34:20Z",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/mysql_db/_data",
        "Name": "mysql_db",
        "Options": {},
        "Scope": "local"
    }
]
   

Issue following command to start mysql server connection:

   
	docker exec -it mysql mysql -u test_db_user test_database -p
   

Create a table called users:

   
CREATE TABLE `users` (
  `user_id` varchar(36),
  `first_name` varchar(64) NOT null,
  `last_name` varchar(64) NOT null,
  `email` varchar(48) UNIQUE NOT null,
  `password` varchar(128) NOT null,
  `added_on` datetime DEFAULT NOW(),
  PRIMARY KEY (`user_id`)
);
   

Insert data into users table:

   

INSERT INTO `users`
(`user_id`,
`first_name`,
`last_name`,
`email`,
`password`)
VALUES
("056a3ce0-dc51-4c26-88fe-809a5a7a48b4",
"John",
"Doe",
"john.doe@nodexplained.com",
"this_is_secure_password");
   

Issue following command to see a newly inserted data:

   
	SELECT * FROM users;
   
mysql container - select query from table users

Now, let's stop and remove mysql container:

   
	docker stop mysql && docker rm mysql
   

Again, issue above command that starts a mysql container. Then, check for a list of tables. This time, we can see the table created earlier, still exists.

docker volume - data persists

To remove a named volume, we need to first stop and remove the associated container and then issue following command:

   
	docker volume rm mysql_db
   

To remove all unused volumes:

   
	docker volume prune
   


Note:

For databases, even though we can use volumes for persisting data in a container, for production systems, it is highly recommended to use a managed database service. Every cloud service provider offers a managed database service. For AWS it is RDS. For google, it is Google Cloud SQL etc. Working with data is a highly critical and sensitive matter and apart from data storage, there are many other database administration tasks like backups, restores etc.  These things can be easily done with managed cloud database technologies. And, there is also the scaling part which is easily achieved with a managed database service.

For local development and testing setup, it is perfectly okay to use docker volumes to persist data for databases as well as any other services that needs persistent storage. This helps us in speedy setup of environment for local development as well as offers other flexibilities.

In our next docker chapter, we will look into docker networking.

Other chapters: