AWS storage for Kubernetes performance

Do you want some form of persistence on Kubernetes? What are your options? How does it perform? In this post I go over some options on AWS, but it can be used for other platforms aswell.

Step 1: Don’t use volumes if you can

I think I have to say this. Try to not get into the state (pun intended) of having to worry about stateful applications on Kubernetes. It’s just annoying and even though there are options, it’s not that cool to figure these things out. Storage is hard. Storage for a cloud-native orchestration tool containing diverse workloads is even harder.

Best-case you work with stateless applications, cloud-native, and based on the ‘12 factor principle’ (https://12factor.net/)

So if your use-case is some wacko old-school application that needs to share a volume between 10 pods and does madness read/writes to it.. Then seriously stop and don’t use Kubernetes in the first place. Not using Kubernetes does not solve the actual problem, but at least you don’t have to figure your NFS problems out on another complex system.

However, I do believe there are use-cases.

Identifing your use-case

I believe there are 4 options;

So basically these are also the availible products.

Let’s just recap this a bit. We have storage that is basically the standard disk you have in your PC. Then we have two options for sharing files between multiple systems. It’s that simple. So why is storage difficult?

Expecting and wanting too much

Now we have this application and “I have to share these files between 100 systems”. Best-case you can use S3 to give your systems access to the files. The applications load this data or serve it (or even as CDN). Yet what happens is that often applications start to do the processing. It’s going to read the file, write others, change files and do a gazillion other actions. The question is “how”.

There is nothing wrong with “downloading” the data from S3, do some processing, and uploading it back. It does not “hurt” other systems and the processing can take place on your local storage (or EBS in that case).

Yet if you have an EFS/NFS mount, applications tend to do that processing ON that share. Continuously doing those read/writes. It’s just not made for that. NFS can work if you periodically call a file. It just does not work if you write the logs of 100 systems to 1 single file on a share. For that you just ship it into a logging solution that can run cloud-native and you are set.

Difference between EFS and S3?

It might look that EFS and S3 are somewhat similar. I do think they share the same use-case but the implementation is different. We can go into much more details, but I try to keep it high-level here: With EFS we have a file system, with S3 we have a RESTful API. We mount the EFS volume and with S3 we can programmatically get our objects.

This is also why EFS is a thing. Not all applications are made for an object-store. It tends to be that legacy wants a simple file system. This is “fine”. However, it’s keener to cause performance issues because it’s easier to do things that it should not be done on a file share.

What is a valid use-case for EFS or NFS-like systems?

The problem is that people tend to go for EFS because it’s “the” solution that can mount a single volume to many systems. So instead of making your application cloud-native, 12factor-style. They just mount a share and “yolo”.

Yes, it would work. Yes, it might be “good” for you now. However, there are so many pitfalls and issues that can and will happen. If you do something wrong in your application or the workload is increasing, you have the chance to completely fail on your storage. It’s then to slow, it has hiccups, it locks - you will cry.

Using the right tool for the job

So we can define a somewhat solid use-case for the various storage solutions. Let’s say I’m running Prometheus. It needs storage but I’m also taking a more cloud-native approach by implementing either Cortex or Thanos.

What happens is that Prometheus will process data and store this, but for a limited amount. Every 2 hours my data gets uploaded to an object-store. That’s it. For the future chain of ‘logic’, the object store data is used.

In this case, I will use EBS volumes for Prometheus and let Cortex/Thanos do the rest on S3. I even can make this setup high available without the need to have a file share between the Prometheuses. The data just gets shipped twice to the object store and de-duplicated later on.

Neat.

The single flaw in EBS

However, there is one thing that EBS cannot do. It’s not able to mount cross-zone. If I have a machine in zone A and I want EBS; the EBS volume will get created in zone A and it will be locked there. I simply cannot unmount and mount that volume to an instance in zone B.

I could “solve” this by running, for example, Prometheus in two different zones. If there is a zone outage, at least one instance is still up. The other instance will not be able to reschedule in Kubernetes because it simply cannot find a node that is in the correct zone for the volume.

Multi-zone clusters and EBS

The neat thing about Kubernetes is that you provide a workload (I.e. pods) and they get scheduled over the nodes you have. When we have a cluster that is using 3 AZ’s (availability zones) - we do however have a little problem.

If we use EBS, that pod is dead-locked to a specific set of nodes in a specific AZ. If the zone goes down, it won’t be able to schedule that pod. It will search for space in a node, which can mount the volume. Yet there are no nodes because the AZ is down.

If you use EBS and you want to be HA (high availability), you will need to have a minimum of 2 replicas with a zone anti-affinity. I.e. replica’s get scheduled in different zones. If the zone goes down, 1 replica is down, yet the other will keep working.

The trade-off with EBS and multi-zone Kubernetes

If you want to be HA, you will have to force a minimum of 2 replicas on your workloads that require a volume like EBS. This has multiple downsides:

Let’s say I run 10 pods that have a volume. 4 live in zone A, 4 in zone B, and 2 in zone C. Things are fine, but now I have to scale my cluster because my workload increases, and I need more memory. So I add extra nodes. If I want to support the 4 pods that live in Zone A, I need to add n*3 nodes, because of AZ’s.

I.e I want to add 2 nodes in zone A, I need to add 2*3, 6 nodes. 2 in A, 2 in B, and 2 nodes in zone C. Unless I’m going to specifically add nodes to a certain zone. Yet this beats the purpose of Kubernetes and perhaps poses more problems later on when a zone outage happens.

But why not use EFS then?

So EFS can mount in every AZ. Yet why I should not use it? That’s using the right tool for the job. Yes, it can mount everywhere, but it’s not made for the workload that I do. I do know that AWS advertise on IOPS and throughput for EFS, but frankly, it’s not suitable for applications or EBS-like volumes. Don’t take my word for it, let’s test it.

Testing it

So I’ve created an EC2 instance for the sake of testing. It’s a bit easier than setting up various storage controllers/drivers.

The instance I used was a 2 CPU, 4 GB memory instance. I then mounted 3 volumes.

I used the tool FIO with the following test-case:

[global]
name=fio-rand-RW
filename=fio-rand-RW
rw=randrw
rwmixread=60
rwmixwrite=40
bs=4K
direct=0
numjobs=4
time_based
runtime=900

[file1]
size=10G
ioengine=libaio
iodepth=16

I believe this somewhat pushes the system for the behaviour that you can expect from an application that is trying quite hard.

EBS

file1: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
file1: Laying out IO file (1 file / 10240MiB)
Jobs: 4 (f=4): [m(4)][100.0%][r=18.3MiB/s,w=12.2MiB/s][r=4677,w=3122 IOPS][eta 00m:00s]
file1: (groupid=0, jobs=1): err= 0: pid=20470: Sat Feb  6 20:53:59 2021
   read: IOPS=562, BW=2251KiB/s (2305kB/s)(1979MiB/900001msec)
    slat (usec): min=2, max=178266, avg=1088.46, stdev=1565.14
    clat (usec): min=28, max=262712, avg=16074.01, stdev=15871.56
     lat (usec): min=398, max=269008, avg=17163.72, stdev=16870.85
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   24], 90.00th=[   41], 95.00th=[   55],
     | 99.00th=[   73], 99.50th=[   80], 99.90th=[   91], 99.95th=[   96],
     | 99.99th=[  106]
   bw (  KiB/s): min=  496, max= 5648, per=24.95%, avg=2251.02, stdev=1615.05, samples=1800
   iops        : min=  124, max= 1412, avg=562.72, stdev=403.76, samples=1800
  write: IOPS=375, BW=1503KiB/s (1539kB/s)(1321MiB/900001msec)
    slat (usec): min=3, max=96415, avg=1013.27, stdev=1747.81
    clat (usec): min=5, max=258974, avg=15844.44, stdev=15836.98
     lat (usec): min=23, max=262640, avg=16858.86, stdev=16776.30
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   23], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   80], 99.90th=[   91], 99.95th=[   95],
     | 99.99th=[  106]
   bw (  KiB/s): min=  288, max= 3624, per=24.94%, avg=1503.06, stdev=1083.55, samples=1800
   iops        : min=   72, max=  906, avg=375.73, stdev=270.89, samples=1800
  lat (usec)   : 10=0.01%, 50=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.07%, 4=2.11%, 10=52.53%, 20=23.86%, 50=15.10%
  lat (msec)   : 100=6.29%, 250=0.02%, 500=0.01%
  cpu          : usr=0.57%, sys=3.36%, ctx=954242, majf=1, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=506556,338245,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20471: Sat Feb  6 20:53:59 2021
   read: IOPS=564, BW=2259KiB/s (2314kB/s)(1986MiB/900001msec)
    slat (usec): min=2, max=96180, avg=1087.73, stdev=1547.74
    clat (usec): min=27, max=250965, avg=16000.04, stdev=15779.10
     lat (usec): min=50, max=257360, avg=17089.02, stdev=16775.86
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   23], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   80], 99.90th=[   91], 99.95th=[   96],
     | 99.99th=[  107]
   bw (  KiB/s): min=  496, max= 5458, per=25.06%, avg=2261.26, stdev=1622.41, samples=1800
   iops        : min=  124, max= 1364, avg=565.18, stdev=405.62, samples=1800
  write: IOPS=377, BW=1512KiB/s (1548kB/s)(1329MiB/900001msec)
    slat (usec): min=3, max=174374, avg=1002.89, stdev=1748.38
    clat (usec): min=5, max=246378, avg=15780.95, stdev=15731.99
     lat (usec): min=19, max=252531, avg=16784.97, stdev=16661.97
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   23], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   80], 99.90th=[   91], 99.95th=[   96],
     | 99.99th=[  106]
   bw (  KiB/s): min=  336, max= 3912, per=25.10%, avg=1512.91, stdev=1090.43, samples=1800
   iops        : min=   84, max=  978, avg=378.11, stdev=272.58, samples=1800
  lat (usec)   : 10=0.01%, 50=0.01%, 100=0.01%, 500=0.01%, 750=0.01%
  lat (usec)   : 1000=0.01%
  lat (msec)   : 2=0.08%, 4=2.20%, 10=52.85%, 20=23.41%, 50=15.30%
  lat (msec)   : 100=6.13%, 250=0.03%, 500=0.01%
  cpu          : usr=0.53%, sys=3.40%, ctx=955398, majf=0, minf=13
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=508376,340134,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20472: Sat Feb  6 20:53:59 2021
   read: IOPS=563, BW=2255KiB/s (2309kB/s)(1982MiB/900001msec)
    slat (usec): min=2, max=36556, avg=1085.50, stdev=1544.38
    clat (usec): min=142, max=239522, avg=16009.40, stdev=15792.14
     lat (usec): min=147, max=242281, avg=17096.17, stdev=16787.27
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   23], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   81], 99.90th=[   93], 99.95th=[   97],
     | 99.99th=[  110]
   bw (  KiB/s): min=  424, max= 5680, per=24.99%, avg=2255.00, stdev=1619.12, samples=1800
   iops        : min=  106, max= 1420, avg=563.72, stdev=404.78, samples=1800
  write: IOPS=376, BW=1507KiB/s (1543kB/s)(1325MiB/900001msec)
    slat (usec): min=3, max=180505, avg=1012.37, stdev=1765.62
    clat (usec): min=5, max=242263, avg=15859.64, stdev=15842.96
     lat (usec): min=135, max=243047, avg=16873.17, stdev=16782.22
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   23], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   80], 99.90th=[   93], 99.95th=[   97],
     | 99.99th=[  109]
   bw (  KiB/s): min=  328, max= 3608, per=25.00%, avg=1506.78, stdev=1080.48, samples=1800
   iops        : min=   82, max=  902, avg=376.66, stdev=270.12, samples=1800
  lat (usec)   : 10=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.07%, 4=2.12%, 10=52.70%, 20=23.70%, 50=15.21%
  lat (msec)   : 100=6.16%, 250=0.04%
  cpu          : usr=0.67%, sys=3.24%, ctx=956670, majf=1, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=507450,339079,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20473: Sat Feb  6 20:53:59 2021
   read: IOPS=564, BW=2256KiB/s (2310kB/s)(1983MiB/900001msec)
    slat (usec): min=2, max=174351, avg=1086.38, stdev=1564.66
    clat (usec): min=5, max=224279, avg=16001.65, stdev=15727.92
     lat (usec): min=392, max=227137, avg=17089.26, stdev=16724.56
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   23], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   80], 99.90th=[   91], 99.95th=[   96],
     | 99.99th=[  107]
   bw (  KiB/s): min=  480, max= 5434, per=25.04%, avg=2259.20, stdev=1624.18, samples=1800
   iops        : min=  120, max= 1358, avg=564.59, stdev=406.06, samples=1800
  write: IOPS=376, BW=1506KiB/s (1542kB/s)(1323MiB/900001msec)
    slat (usec): min=3, max=100681, avg=1011.28, stdev=1735.27
    clat (usec): min=395, max=224278, avg=15876.79, stdev=15810.02
     lat (usec): min=400, max=224290, avg=16889.20, stdev=16740.12
    clat percentiles (msec):
     |  1.00th=[    4],  5.00th=[    5], 10.00th=[    6], 20.00th=[    7],
     | 30.00th=[    8], 40.00th=[    9], 50.00th=[   10], 60.00th=[   11],
     | 70.00th=[   13], 80.00th=[   24], 90.00th=[   41], 95.00th=[   54],
     | 99.00th=[   73], 99.50th=[   79], 99.90th=[   91], 99.95th=[   95],
     | 99.99th=[  107]
   bw (  KiB/s): min=  248, max= 3632, per=25.01%, avg=1507.46, stdev=1079.29, samples=1800
   iops        : min=   62, max=  908, avg=376.69, stdev=269.78, samples=1800
  lat (usec)   : 10=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
  lat (msec)   : 2=0.08%, 4=2.10%, 10=52.72%, 20=23.61%, 50=15.29%
  lat (msec)   : 100=6.17%, 250=0.02%
  cpu          : usr=0.62%, sys=3.29%, ctx=956696, majf=0, minf=13
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=507663,338748,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=9022KiB/s (9239kB/s), 2251KiB/s-2259KiB/s (2305kB/s-2314kB/s), io=7930MiB (8315MB), run=900001-900001msec
  WRITE: bw=6028KiB/s (6172kB/s), 1503KiB/s-1512KiB/s (1539kB/s-1548kB/s), io=5298MiB (5555MB), run=900001-900001msec

Disk stats (read/write):
  xvdb: ios=1511378/1245594, merge=0/22, ticks=1525247/4155320, in_queue=2205008, util=99.50%

EFS General Purpose

file1: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
file1: Laying out IO file (1 file / 10240MiB)
fio: native_fallocate call failed: Operation not supported
Jobs: 4 (f=4): [f(4)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
file1: (groupid=0, jobs=1): err= 0: pid=20508: Sat Feb  6 21:24:36 2021
   read: IOPS=73, BW=294KiB/s (301kB/s)(258MiB/900031msec)
    slat (usec): min=3, max=11172k, avg=8297.13, stdev=72921.08
    clat (msec): min=10, max=13528, avg=120.65, stdev=397.06
     lat (msec): min=14, max=13568, avg=128.95, stdev=415.57
    clat percentiles (msec):
     |  1.00th=[   36],  5.00th=[   44], 10.00th=[   48], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   66], 60.00th=[   70],
     | 70.00th=[   75], 80.00th=[   82], 90.00th=[   94], 95.00th=[  115],
     | 99.00th=[ 1452], 99.50th=[ 1670], 99.90th=[ 4111], 99.95th=[ 9597],
     | 99.99th=[13087]
   bw (  KiB/s): min=    7, max=  760, per=27.72%, avg=325.15, stdev=256.13, samples=1626
   iops        : min=    1, max=  190, avg=81.21, stdev=64.04, samples=1626
  write: IOPS=48, BW=195KiB/s (200kB/s)(172MiB/900031msec)
    slat (usec): min=4, max=12026k, avg=7973.68, stdev=97935.35
    clat (usec): min=9, max=13646k, avg=125701.44, stdev=456961.73
     lat (msec): min=12, max=13864, avg=133.68, stdev=478.26
    clat percentiles (msec):
     |  1.00th=[   37],  5.00th=[   44], 10.00th=[   49], 20.00th=[   55],
     | 30.00th=[   59], 40.00th=[   63], 50.00th=[   66], 60.00th=[   70],
     | 70.00th=[   75], 80.00th=[   83], 90.00th=[   95], 95.00th=[  117],
     | 99.00th=[ 1485], 99.50th=[ 1687], 99.90th=[ 9194], 99.95th=[12416],
     | 99.99th=[13489]
   bw (  KiB/s): min=    7, max=  598, per=28.13%, avg=220.00, stdev=170.06, samples=1596
   iops        : min=    1, max=  149, avg=54.91, stdev=42.52, samples=1596
  lat (usec)   : 10=0.01%
  lat (msec)   : 20=0.02%, 50=12.95%, 100=79.19%, 250=3.90%, 500=0.19%
  lat (msec)   : 750=0.25%, 1000=0.68%, 2000=2.61%, >=2000=0.21%
  cpu          : usr=0.12%, sys=0.67%, ctx=146755, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=66133,43920,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20509: Sat Feb  6 21:24:36 2021
   read: IOPS=73, BW=293KiB/s (300kB/s)(257MiB/900030msec)
    slat (usec): min=3, max=12061k, avg=8374.49, stdev=82376.05
    clat (usec): min=12, max=13553k, avg=122864.34, stdev=420015.71
     lat (msec): min=18, max=13627, avg=131.24, stdev=439.43
    clat percentiles (msec):
     |  1.00th=[   36],  5.00th=[   44], 10.00th=[   49], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   66], 60.00th=[   70],
     | 70.00th=[   75], 80.00th=[   82], 90.00th=[   94], 95.00th=[  117],
     | 99.00th=[ 1469], 99.50th=[ 1670], 99.90th=[ 6812], 99.95th=[ 9866],
     | 99.99th=[13355]
   bw (  KiB/s): min=    7, max=  808, per=27.44%, avg=321.84, stdev=255.40, samples=1638
   iops        : min=    1, max=  202, avg=80.42, stdev=63.81, samples=1638
  write: IOPS=48, BW=195KiB/s (200kB/s)(171MiB/900030msec)
    slat (usec): min=4, max=9339.9k, avg=7911.09, stdev=87069.71
    clat (msec): min=17, max=13737, avg=123.18, stdev=431.11
     lat (msec): min=17, max=13817, avg=131.09, stdev=451.21
    clat percentiles (msec):
     |  1.00th=[   37],  5.00th=[   45], 10.00th=[   49], 20.00th=[   55],
     | 30.00th=[   59], 40.00th=[   63], 50.00th=[   67], 60.00th=[   71],
     | 70.00th=[   77], 80.00th=[   83], 90.00th=[   95], 95.00th=[  118],
     | 99.00th=[ 1485], 99.50th=[ 1703], 99.90th=[ 7148], 99.95th=[10134],
     | 99.99th=[13489]
   bw (  KiB/s): min=    8, max=  536, per=28.49%, avg=222.78, stdev=169.51, samples=1576
   iops        : min=    2, max=  134, avg=55.69, stdev=42.38, samples=1576
  lat (usec)   : 20=0.01%
  lat (msec)   : 20=0.01%, 50=12.47%, 100=79.68%, 250=3.92%, 500=0.17%
  lat (msec)   : 750=0.28%, 1000=0.65%, 2000=2.58%, >=2000=0.26%
  cpu          : usr=0.13%, sys=0.66%, ctx=146681, majf=0, minf=8
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=65869,43900,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20510: Sat Feb  6 21:24:36 2021
   read: IOPS=73, BW=293KiB/s (300kB/s)(258MiB/900016msec)
    slat (usec): min=3, max=8465.3k, avg=7962.96, stdev=39572.21
    clat (usec): min=12, max=13775k, avg=123922.26, stdev=437645.82
     lat (msec): min=18, max=13883, avg=131.89, stdev=449.52
    clat percentiles (msec):
     |  1.00th=[   36],  5.00th=[   44], 10.00th=[   48], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   66], 60.00th=[   70],
     | 70.00th=[   75], 80.00th=[   82], 90.00th=[   94], 95.00th=[  117],
     | 99.00th=[ 1435], 99.50th=[ 1636], 99.90th=[ 8490], 99.95th=[ 9866],
     | 99.99th=[13489]
   bw (  KiB/s): min=    8, max=  784, per=27.57%, avg=323.41, stdev=255.67, samples=1633
   iops        : min=    2, max=  196, avg=80.80, stdev=63.86, samples=1633
  write: IOPS=49, BW=196KiB/s (201kB/s)(172MiB/900016msec)
    slat (usec): min=4, max=11945k, avg=8458.35, stdev=123591.78
    clat (msec): min=11, max=13759, avg=120.56, stdev=395.14
     lat (msec): min=14, max=13775, avg=129.02, stdev=426.79
    clat percentiles (msec):
     |  1.00th=[   36],  5.00th=[   45], 10.00th=[   49], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   63], 50.00th=[   66], 60.00th=[   71],
     | 70.00th=[   75], 80.00th=[   83], 90.00th=[   95], 95.00th=[  117],
     | 99.00th=[ 1435], 99.50th=[ 1603], 99.90th=[ 3809], 99.95th=[ 9463],
     | 99.99th=[13489]
   bw (  KiB/s): min=    8, max=  609, per=28.59%, avg=223.54, stdev=170.40, samples=1580
   iops        : min=    2, max=  152, avg=55.88, stdev=42.60, samples=1580
  lat (usec)   : 20=0.01%
  lat (msec)   : 20=0.01%, 50=12.88%, 100=79.25%, 250=3.86%, 500=0.19%
  lat (msec)   : 750=0.26%, 1000=0.65%, 2000=2.68%, >=2000=0.21%
  cpu          : usr=0.13%, sys=0.64%, ctx=146791, majf=0, minf=9
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=65983,44155,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20511: Sat Feb  6 21:24:36 2021
   read: IOPS=73, BW=293KiB/s (300kB/s)(258MiB/900016msec)
    slat (usec): min=3, max=11536k, avg=8309.89, stdev=78193.81
    clat (msec): min=13, max=13882, avg=121.87, stdev=424.10
     lat (msec): min=13, max=13887, avg=130.18, stdev=442.50
    clat percentiles (msec):
     |  1.00th=[   36],  5.00th=[   44], 10.00th=[   48], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   62], 50.00th=[   66], 60.00th=[   70],
     | 70.00th=[   75], 80.00th=[   82], 90.00th=[   94], 95.00th=[  117],
     | 99.00th=[ 1469], 99.50th=[ 1720], 99.90th=[ 6812], 99.95th=[10268],
     | 99.99th=[13355]
   bw (  KiB/s): min=    8, max=  753, per=27.64%, avg=324.17, stdev=255.93, samples=1630
   iops        : min=    2, max=  188, avg=80.94, stdev=63.88, samples=1630
  write: IOPS=48, BW=196KiB/s (201kB/s)(172MiB/900016msec)
    slat (usec): min=4, max=12026k, avg=7952.12, stdev=92106.64
    clat (usec): min=50, max=13783k, avg=123812.27, stdev=421857.04
     lat (msec): min=14, max=13892, avg=131.77, stdev=443.19
    clat percentiles (msec):
     |  1.00th=[   36],  5.00th=[   44], 10.00th=[   48], 20.00th=[   54],
     | 30.00th=[   58], 40.00th=[   63], 50.00th=[   67], 60.00th=[   71],
     | 70.00th=[   75], 80.00th=[   83], 90.00th=[   95], 95.00th=[  118],
     | 99.00th=[ 1502], 99.50th=[ 1687], 99.90th=[ 6477], 99.95th=[ 9731],
     | 99.99th=[13355]
   bw (  KiB/s): min=    8, max=  569, per=28.35%, avg=221.71, stdev=171.23, samples=1591
   iops        : min=    2, max=  142, avg=55.42, stdev=42.80, samples=1591
  lat (usec)   : 100=0.01%
  lat (msec)   : 20=0.01%, 50=13.29%, 100=78.87%, 250=3.89%, 500=0.17%
  lat (msec)   : 750=0.24%, 1000=0.69%, 2000=2.61%, >=2000=0.22%
  cpu          : usr=0.12%, sys=0.65%, ctx=146572, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=65972,44098,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=1173KiB/s (1201kB/s), 293KiB/s-294KiB/s (300kB/s-301kB/s), io=1031MiB (1081MB), run=900016-900031msec
  WRITE: bw=783KiB/s (801kB/s), 195KiB/s-196KiB/s (200kB/s-201kB/s), io=688MiB (721MB), run=900016-900031msec

EFS Max I/O

file1: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
...
fio-3.1
Starting 4 processes
file1: Laying out IO file (1 file / 10240MiB)
fio: native_fallocate call failed: Operation not supported

Jobs: 4 (f=4): [f(4)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
file1: (groupid=0, jobs=1): err= 0: pid=20617: Sat Feb  6 22:33:00 2021
   read: IOPS=44, BW=177KiB/s (181kB/s)(155MiB/900008msec)
    slat (usec): min=3, max=1025.5k, avg=14004.16, stdev=55094.99
    clat (msec): min=11, max=15599, avg=203.57, stdev=723.39
     lat (msec): min=14, max=16181, avg=217.57, stdev=767.70
    clat percentiles (msec):
     |  1.00th=[   55],  5.00th=[   66], 10.00th=[   73], 20.00th=[   82],
     | 30.00th=[   88], 40.00th=[   93], 50.00th=[   99], 60.00th=[  104],
     | 70.00th=[  110], 80.00th=[  118], 90.00th=[  131], 95.00th=[  144],
     | 99.00th=[ 4933], 99.50th=[ 5470], 99.90th=[ 6275], 99.95th=[ 6745],
     | 99.99th=[15368]
   bw (  KiB/s): min=    7, max=  544, per=30.98%, avg=218.10, stdev=177.46, samples=1458
   iops        : min=    1, max=  136, avg=54.44, stdev=44.36, samples=1458
  write: IOPS=29, BW=118KiB/s (121kB/s)(104MiB/900008msec)
    slat (usec): min=5, max=11845k, avg=12893.30, stdev=92058.77
    clat (usec): min=6, max=15754k, avg=203521.09, stdev=710834.42
     lat (msec): min=11, max=15754, avg=216.42, stdev=755.18
    clat percentiles (msec):
     |  1.00th=[   55],  5.00th=[   67], 10.00th=[   74], 20.00th=[   83],
     | 30.00th=[   88], 40.00th=[   94], 50.00th=[  100], 60.00th=[  105],
     | 70.00th=[  111], 80.00th=[  120], 90.00th=[  132], 95.00th=[  146],
     | 99.00th=[ 4866], 99.50th=[ 5403], 99.90th=[ 6477], 99.95th=[ 6745],
     | 99.99th=[13624]
   bw (  KiB/s): min=    7, max=  392, per=34.45%, avg=161.90, stdev=115.63, samples=1310
   iops        : min=    1, max=   98, avg=40.38, stdev=28.91, samples=1310
  lat (usec)   : 10=0.01%
  lat (msec)   : 20=0.01%, 50=0.46%, 100=52.90%, 250=44.20%, 500=0.05%
  lat (msec)   : 750=0.03%, 1000=0.04%, 2000=0.14%, >=2000=2.17%
  cpu          : usr=0.06%, sys=0.93%, ctx=94550, majf=0, minf=10
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=39782,26537,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20618: Sat Feb  6 22:33:00 2021
   read: IOPS=43, BW=175KiB/s (180kB/s)(154MiB/900004msec)
    slat (usec): min=3, max=1121.0k, avg=14131.26, stdev=57098.32
    clat (msec): min=8, max=17004, avg=201.05, stdev=724.28
     lat (msec): min=10, max=17341, avg=215.18, stdev=770.32
    clat percentiles (msec):
     |  1.00th=[   55],  5.00th=[   67], 10.00th=[   73], 20.00th=[   82],
     | 30.00th=[   88], 40.00th=[   93], 50.00th=[   99], 60.00th=[  104],
     | 70.00th=[  110], 80.00th=[  118], 90.00th=[  131], 95.00th=[  146],
     | 99.00th=[ 5000], 99.50th=[ 5604], 99.90th=[ 6745], 99.95th=[ 7282],
     | 99.99th=[15368]
   bw (  KiB/s): min=    8, max=  521, per=31.47%, avg=221.56, stdev=175.76, samples=1425
   iops        : min=    2, max=  130, avg=55.39, stdev=43.94, samples=1425
  write: IOPS=29, BW=117KiB/s (120kB/s)(103MiB/900004msec)
    slat (usec): min=5, max=11488k, avg=12939.67, stdev=90499.57
    clat (usec): min=7, max=16864k, avg=210838.70, stdev=765745.75
     lat (msec): min=8, max=17197, avg=223.78, stdev=810.75
    clat percentiles (msec):
     |  1.00th=[   56],  5.00th=[   68], 10.00th=[   74], 20.00th=[   83],
     | 30.00th=[   89], 40.00th=[   94], 50.00th=[  100], 60.00th=[  106],
     | 70.00th=[  112], 80.00th=[  121], 90.00th=[  133], 95.00th=[  148],
     | 99.00th=[ 5134], 99.50th=[ 5738], 99.90th=[ 7215], 99.95th=[ 7752],
     | 99.99th=[16442]
   bw (  KiB/s): min=    8, max=  376, per=34.02%, avg=159.88, stdev=114.79, samples=1320
   iops        : min=    2, max=   94, avg=39.97, stdev=28.70, samples=1320
  lat (usec)   : 10=0.01%
  lat (msec)   : 10=0.01%, 20=0.01%, 50=0.47%, 100=52.07%, 250=45.09%
  lat (msec)   : 500=0.05%, 750=0.03%, 1000=0.03%, 2000=0.15%, >=2000=2.10%
  cpu          : usr=0.06%, sys=0.90%, ctx=94572, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=39474,26387,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20619: Sat Feb  6 22:33:00 2021
   read: IOPS=44, BW=176KiB/s (181kB/s)(155MiB/900007msec)
    slat (usec): min=3, max=11683k, avg=14867.35, stdev=82362.34
    clat (msec): min=11, max=16858, avg=204.65, stdev=715.21
     lat (msec): min=11, max=17187, avg=219.52, stdev=765.14
    clat percentiles (msec):
     |  1.00th=[   54],  5.00th=[   67], 10.00th=[   73], 20.00th=[   82],
     | 30.00th=[   88], 40.00th=[   93], 50.00th=[   99], 60.00th=[  104],
     | 70.00th=[  111], 80.00th=[  118], 90.00th=[  132], 95.00th=[  146],
     | 99.00th=[ 4732], 99.50th=[ 5537], 99.90th=[ 6611], 99.95th=[ 6879],
     | 99.99th=[14429]
   bw (  KiB/s): min=    8, max=  520, per=30.48%, avg=214.60, stdev=177.32, samples=1479
   iops        : min=    2, max=  130, avg=53.65, stdev=44.33, samples=1479
  write: IOPS=29, BW=117KiB/s (120kB/s)(103MiB/900007msec)
    slat (usec): min=4, max=1052.3k, avg=11700.04, stdev=51144.23
    clat (usec): min=7, max=16668k, avg=203458.32, stdev=725566.69
     lat (msec): min=10, max=17344, avg=215.16, stdev=761.82
    clat percentiles (msec):
     |  1.00th=[   56],  5.00th=[   67], 10.00th=[   73], 20.00th=[   83],
     | 30.00th=[   88], 40.00th=[   94], 50.00th=[  100], 60.00th=[  105],
     | 70.00th=[  111], 80.00th=[  120], 90.00th=[  132], 95.00th=[  146],
     | 99.00th=[ 4866], 99.50th=[ 5537], 99.90th=[ 6477], 99.95th=[ 6745],
     | 99.99th=[16576]
   bw (  KiB/s): min=    8, max=  400, per=33.97%, avg=159.68, stdev=115.75, samples=1324
   iops        : min=    2, max=  100, avg=39.92, stdev=28.94, samples=1324
  lat (usec)   : 10=0.01%
  lat (msec)   : 20=0.01%, 50=0.54%, 100=52.27%, 250=44.71%, 500=0.05%
  lat (msec)   : 750=0.04%, 1000=0.05%, 2000=0.15%, >=2000=2.20%
  cpu          : usr=0.06%, sys=1.03%, ctx=94387, majf=0, minf=11
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=39683,26434,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16
file1: (groupid=0, jobs=1): err= 0: pid=20620: Sat Feb  6 22:33:00 2021
   read: IOPS=43, BW=176KiB/s (180kB/s)(154MiB/900003msec)
    slat (usec): min=3, max=11508k, avg=14369.45, stdev=81491.16
    clat (usec): min=7, max=17024k, avg=200537.92, stdev=722782.28
     lat (msec): min=8, max=17349, avg=214.91, stdev=771.35
    clat percentiles (msec):
     |  1.00th=[   55],  5.00th=[   67], 10.00th=[   73], 20.00th=[   82],
     | 30.00th=[   88], 40.00th=[   93], 50.00th=[   99], 60.00th=[  104],
     | 70.00th=[  110], 80.00th=[  118], 90.00th=[  131], 95.00th=[  144],
     | 99.00th=[ 4866], 99.50th=[ 5470], 99.90th=[ 6946], 99.95th=[ 7550],
     | 99.99th=[16040]
   bw (  KiB/s): min=    8, max=  561, per=30.93%, avg=217.76, stdev=176.59, samples=1452
   iops        : min=    2, max=  140, avg=54.44, stdev=44.15, samples=1452
  write: IOPS=29, BW=118KiB/s (121kB/s)(104MiB/900003msec)
    slat (usec): min=4, max=1116.2k, avg=12454.71, stdev=55514.46
    clat (msec): min=19, max=16182, avg=209.52, stdev=735.52
     lat (msec): min=19, max=16427, avg=221.97, stdev=775.08
    clat percentiles (msec):
     |  1.00th=[   54],  5.00th=[   67], 10.00th=[   74], 20.00th=[   82],
     | 30.00th=[   89], 40.00th=[   94], 50.00th=[  100], 60.00th=[  105],
     | 70.00th=[  112], 80.00th=[  120], 90.00th=[  133], 95.00th=[  148],
     | 99.00th=[ 4933], 99.50th=[ 5604], 99.90th=[ 6812], 99.95th=[ 7483],
     | 99.99th=[15234]
   bw (  KiB/s): min=    8, max=  424, per=34.20%, avg=160.76, stdev=116.03, samples=1323
   iops        : min=    2, max=  106, avg=40.19, stdev=29.01, samples=1323
  lat (usec)   : 10=0.01%
  lat (msec)   : 10=0.01%, 20=0.01%, 50=0.45%, 100=52.41%, 250=44.69%
  lat (msec)   : 500=0.06%, 750=0.04%, 1000=0.05%, 2000=0.14%, >=2000=2.15%
  cpu          : usr=0.05%, sys=1.07%, ctx=94392, majf=0, minf=13
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=39534,26590,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=16

Run status group 0 (all jobs):
   READ: bw=704KiB/s (721kB/s), 175KiB/s-177KiB/s (180kB/s-181kB/s), io=619MiB (649MB), run=900003-900008msec
  WRITE: bw=471KiB/s (482kB/s), 117KiB/s-118KiB/s (120kB/s-121kB/s), io=414MiB (434MB), run=900003-900008msec

Results

The workload of EFS looks like this, where the first part of the timeline is the General Purpose and the second one is the Max I/O.

efs

Now if we get the IOPS from both EFS types:

GP:

Reads:
iops        : min=    1, max=  190, avg=81.21, stdev=64.04, samples=1626
iops        : min=    1, max=  202, avg=80.42, stdev=63.81, samples=1638
iops        : min=    2, max=  196, avg=80.80, stdev=63.86, samples=1633
iops        : min=    2, max=  188, avg=80.94, stdev=63.88, samples=1630

Writes:

iops        : min=    1, max=  149, avg=54.91, stdev=42.52, samples=1596
iops        : min=    2, max=  134, avg=55.69, stdev=42.38, samples=1576
iops        : min=    2, max=  152, avg=55.88, stdev=42.60, samples=1580
iops        : min=    2, max=  142, avg=55.42, stdev=42.80, samples=1591

Max I/O:

Reads:
iops        : min=    1, max=  136, avg=54.44, stdev=44.36, samples=1458
iops        : min=    2, max=  130, avg=55.39, stdev=43.94, samples=1425
iops        : min=    2, max=  130, avg=53.65, stdev=44.33, samples=1479
iops        : min=    2, max=  140, avg=54.44, stdev=44.15, samples=1452

Writes:
iops        : min=    1, max=   98, avg=40.38, stdev=28.91, samples=1310
iops        : min=    2, max=   94, avg=39.97, stdev=28.70, samples=1320
iops        : min=    2, max=  100, avg=39.92, stdev=28.94, samples=1324
iops        : min=    2, max=  106, avg=40.19, stdev=29.01, samples=1323

I provisioned 4MB/s specifically because of 10GB with a provision of 4MB/s costs about 30$ p/m. Going higher in MB/s is quite expensive. I felt this was a “sane” value. Funny thing though, my more expensive setup was worse than my first run with just the General Purpose one.

Now I did invest some time to comprehend the EFS solution regarding hard & soft limits, pricing, various performance types, and the throughput. Yet honestly, I have no idea why this happened. My best bet is that the GP used a freebie on throughput.

If you take a look again at the previous screenshot, you can see that the GP was quite fast on uploading the 10GB file, while the Max I/O took a while (in line with 4MB/s):

Anyhow, if we look at EBS:

Reads:

iops        : min=  124, max= 1412, avg=562.72, stdev=403.76, samples=1800
iops        : min=  124, max= 1364, avg=565.18, stdev=405.62, samples=1800
iops        : min=  106, max= 1420, avg=563.72, stdev=404.78, samples=1800
iops        : min=  120, max= 1358, avg=564.59, stdev=406.06, samples=1800

Writes:

iops        : min=   72, max=  906, avg=375.73, stdev=270.89, samples=1800
iops        : min=   84, max=  978, avg=378.11, stdev=272.58, samples=1800
iops        : min=   82, max=  902, avg=376.66, stdev=270.12, samples=1800
iops        : min=   62, max=  908, avg=376.69, stdev=269.78, samples=1800

These are just sane values we can work with for most use-cases. Perhaps we have to buff EBS a bit more (which is possible) for higher performance, but the base-line is solid.

Stability

The thing the tests did not show was stability on the performance and how the systems deals with its storage. To give you an example, if I would run the read/write test and try to do something on the file system, it either locks out or is terrible slow.

[email protected]:/efs# time ls -lah
total 11G
drwxr-xr-x  2 root root 6.0K Feb  6 21:07 .
drwxr-xr-x 25 root root 4.0K Feb  6 20:15 ..
-rw-r--r--  1 root root  10G Feb  6 21:19 fio-rand-RW

real    0m11.318s
user    0m0.000s
sys 0m0.036s

Yes, that took 11.3 seconds.

Also, a funny thing, when I placed the 10GB file and did a list in my folder, it took the time of uploading the file (about 30-40 minutes) before my list command was completed.

Obviously, I’m here “abusing” EFS, but I really wanted to show you why it’s not suitable for applications.

Use-cases for EFS

So it does have use-cases, for instance in a CMS to include documents and other files. I guess various SAP solutions can use it. It can be really good on that. It just scales with how much data you have on it and you can provision throughput based on what you want. It allows for many hosts to mount the volume and has quite some read/write power. Roughly 35.000 actions per second.

Yet this is just not great for workloads you expect on Kubernetes. If EBS is not an option and S3 is not supported, I assume you can go for EFS if your workload does not go mental on it. I.e. if you store artifacts on it, fine. If you use it to process data ON EFS, I would say: nope.

EFS is a can of worms though.

When I was fiddling around, I felt I have to address this. I’m just going to post some quotes from the AWS website.

Amazon EFS offers a Standard and an Infrequent Access storage class. The Standard storage class is designed for active file system workloads and you pay only for the file system storage you use per month.

Enable Lifecycle Management when your file system contains files that are not accessed every day to reduce your storage costs

Files smaller than 128 KiB are not eligible for Lifecycle Management and will always be stored on Amazon EFS Standard storage class.

When reading from or writing to Amazon EFS IA, your first-byte latency is higher than that of Amazon EFS Standard.

Throughput of bursting mode file systems scales linearly with the amount of data stored. If you need more throughput than you can achieve with your amount of data stored, you can configure Provisioned Throughput.

EFS supports one to thousands of Amazon EC2 instances connecting to a file system concurrently

Amazon EFS’s distributed design avoids the bottlenecks and constraints inherent to traditional file servers

This distributed data storage design means that multi-threaded applications, and applications that concurrently access data from multiple Amazon EC2 instances can drive substantial levels of aggregate throughput and IOPS

Due to this per-operation latency, overall throughput generally increases as the average I/O size increases, since the overhead is amortized over a larger amount of data

“Max I/O” performance mode is optimized for applications where tens, hundreds, or thousands of EC2 instances are accessing the file system

With bursting mode, the default throughput mode for Amazon EFS file systems, the throughput available to a file system scales as a file system grows.

Also, because many workloads are read-heavy, read operations are metered at a 1:3 ratio to other NFS operations (like write).

All file systems deliver a consistent baseline performance of 50 MB/s per TB of Standard class storage

All file systems (regardless of size) can burst to 100 MB/s,

File systems with more than 1TB of Standard class storage can burst to 100 MB/s per TB

Since read operations are metered at a 1:3 ratio, you can drive up to 300 MiBs/s per TiB of read throughput.

Provisioned Throughput also includes 50 KB/s per GB (or 1 MB/s per 20 GB) of throughput in the price of Standard storage.

and there are more…

mindblown

Look, I get it. Storage is hard. AWS tries to give options to users but also requires $$$ depending on what your requirements are. The problem is however, it’s getting too many variables. It’s getting hard to understand what is happening, why that’s happening, and how much that will cost you.

So don’t get me wrong, I do think AWS made something really awesome with EFS (if you use it correctly) but its setup and billing model is an abomination.

Furthermore, I see quite some statements regarding IOPS and throughput but the reality is different depending on your use-case. The test I did with random read/writes gave about 50 IOPS for both reads and writes on average. So, 100 IOPS to make it easier. AWS states:

“In General Purpose mode, there is a limit of 35,000 file operations per second. Operations that read data or metadata consume one file operation, operations that write data or update metadata consume five file operations.”

Talking about that can of worms again, but let’s continue:

“This means that a file system can support 35,000 read operations per second, or 7,000 write operations, or some combination of the two. For example, 20,000 read operations and 3,000 write operations (20,000 reads x 1 file operation per read + 3,000 writes x 5 file operations per write = 35,000 file operations).”

Well, that was not really in line with my test. So after searching and searching I could not find exact numbers on IOPS and how this was calculated. Then I found this slide:

aws-slide

So I believe these “35,000” file operations are based on concurrency. Multiple hosts asking for “some file”. Often this is not the case. Therefore I believe the base-line IOPS for EFS is 100.

What if you want good storage for multi-AZ?

Now we know that EFS is not a solid option, unless we change the behavior of our application, we can think about a solution for having EBS-like storage but for multi-AZ.

The fact is, there is not an AWS solution for this. No storage option does not focus on RWX type of “file sharing” but rather plain cool EBS that can be mounted in every AZ.

If you do not want the trade-off that I discussed before, there is only one other solution and that is creating your own storage stack :)

My two favorites are:

Longhorn: https://github.com/longhorn/longhorn and Rook/Ceph: https://github.com/rook/rook

It’s enterprise-grade distributed storage with no single point of failure, cloud-native and it’s distributed in a way that it’s like EBS, without the zone limitation.

Now, this does add complexity and something you have to manage. Yet if you want to run your clusters on 3 AZ’s, have workloads that can use simple block storage for processing, and/or don’t want to run inefficiently on Kubernetes: This might be just it.

Perhaps in a follow-up, I’ll do a deeper dive into these solutions.

Recap

If you have certain requirements, be sure that the storage you pick can support that. Also, make sure you keep your requirements in line. Don’t abuse storage for your requirements. If you need RWX volumes and EFS is your only option, perhaps consider rewriting your application to modern standards.

Multi-zone mounts are not supported by EBS, which is really unfortunate.

My take on EBS multi-zone support

I believe AWS makes a lot of money on EFS. Legacy applications need to use it if they are pushed into the cloud (I’m watching you SAP). EFS does have legit use-cases but I strongly believe it’s often used because there is no alternative and the only solution to make something work. Therefore the incentive to make something like EBS for multi-zones is gone.

0,30$ per GB stored and 6$ per MB/s provisioned per month is also no incentive to create something else.

For reference, EBS costs $0.08/GB-month, 125 MB/s free and $0.04/provisioned MB/s-month over 125. That includes the up to 3,000 IOPS standard.

comments powered by Disqus