AWS storage for Kubernetes performance
Do you want some form of persistence on Kubernetes? What are your options? How does it perform? In this post I go over some options on AWS, but it can be used for other platforms aswell.
Step 1: Don't use volumes if you can
I think I have to say this. Try to not get into the state (pun intended) of having to worry about stateful applications on Kubernetes. It's just annoying and even though there are options, it's not that cool to figure these things out. Storage is hard. Storage for a cloud-native orchestration tool containing diverse workloads is even harder.
Best-case you work with stateless applications, cloud-native, and based on the '12 factor principle' (https://12factor.net/)
So if your use-case is some wacko old-school application that needs to share a volume between 10 pods and does madness read/writes to it.. Then seriously stop and don't use Kubernetes in the first place. Not using Kubernetes does not solve the actual problem, but at least you don't have to figure your NFS problems out on another complex system.
However, I do believe there are use-cases.
Identifing your use-case
I believe there are 4 options;
- Storage for "applications"
- Storage for "media" and cloud-native solutions
- "I have legacy-stuff and I'm just doing something or I have no idea what I'm doing - storage".
- The "I actually have a decent use-case and I'm using this - storage"
So basically these are also the availible products.
- EBS, just good storage in both IOPS and throughput. It can be mounted to a single system. In fact, you can just see this as a regular HDD (or better yet SSD).
- Object storage, S3. Great to host "media" on it. Either you use it as CDN or do something smart in your application with it.
- EFS, an NFS file share on steroids. Number 3 & 4 on the use-cases use this.
Let's just recap this a bit. We have storage that is basically the standard disk you have in your PC. Then we have two options for sharing files between multiple systems. It's that simple. So why is storage difficult?
Expecting and wanting too much
Now we have this application and "I have to share these files between 100 systems". Best-case you can use S3 to give your systems access to the files. The applications load this data or serve it (or even as CDN). Yet what happens is that often applications start to do the processing. It's going to read the file, write others, change files and do a gazillion other actions. The question is "how".
There is nothing wrong with "downloading" the data from S3, do some processing, and uploading it back. It does not "hurt" other systems and the processing can take place on your local storage (or EBS in that case).
Yet if you have an EFS/NFS mount, applications tend to do that processing ON that share. Continuously doing those read/writes. It's just not made for that. NFS can work if you periodically call a file. It just does not work if you write the logs of 100 systems to 1 single file on a share. For that you just ship it into a logging solution that can run cloud-native and you are set.
Difference between EFS and S3?
It might look that EFS and S3 are somewhat similar. I do think they share the same use-case but the implementation is different. We can go into much more details, but I try to keep it high-level here: With EFS we have a file system, with S3 we have a RESTful API. We mount the EFS volume and with S3 we can programmatically get our objects.
This is also why EFS is a thing. Not all applications are made for an object-store. It tends to be that legacy wants a simple file system. This is "fine". However, it's keener to cause performance issues because it's easier to do things that it should not be done on a file share.
What is a valid use-case for EFS or NFS-like systems?
- Probably not your use-case.
- Not for applications
- Unlikely for Kubernetes workloads
The problem is that people tend to go for EFS because it's "the" solution that can mount a single volume to many systems. So instead of making your application cloud-native, 12factor-style. They just mount a share and "yolo".
Yes, it would work. Yes, it might be "good" for you now. However, there are so many pitfalls and issues that can and will happen. If you do something wrong in your application or the workload is increasing, you have the chance to completely fail on your storage. It's then to slow, it has hiccups, it locks - you will cry.
Using the right tool for the job
So we can define a somewhat solid use-case for the various storage solutions. Let's say I'm running Prometheus. It needs storage but I'm also taking a more cloud-native approach by implementing either Cortex or Thanos.
What happens is that Prometheus will process data and store this, but for a limited amount. Every 2 hours my data gets uploaded to an object-store. That's it. For the future chain of 'logic', the object store data is used.
In this case, I will use EBS volumes for Prometheus and let Cortex/Thanos do the rest on S3. I even can make this setup high available without the need to have a file share between the Prometheuses. The data just gets shipped twice to the object store and de-duplicated later on.
Neat.
The single flaw in EBS
However, there is one thing that EBS cannot do. It's not able to mount cross-zone. If I have a machine in zone A and I want EBS; the EBS volume will get created in zone A and it will be locked there. I simply cannot unmount and mount that volume to an instance in zone B.
I could "solve" this by running, for example, Prometheus in two different zones. If there is a zone outage, at least one instance is still up. The other instance will not be able to reschedule in Kubernetes because it simply cannot find a node that is in the correct zone for the volume.
Multi-zone clusters and EBS
The neat thing about Kubernetes is that you provide a workload (I.e. pods) and they get scheduled over the nodes you have. When we have a cluster that is using 3 AZ's (availability zones) - we do however have a little problem.
If we use EBS, that pod is dead-locked to a specific set of nodes in a specific AZ. If the zone goes down, it won't be able to schedule that pod. It will search for space in a node, which can mount the volume. Yet there are no nodes because the AZ is down.
If you use EBS and you want to be HA (high availability), you will need to have a minimum of 2 replicas with a zone anti-affinity. I.e. replica's get scheduled in different zones. If the zone goes down, 1 replica is down, yet the other will keep working.
The trade-off with EBS and multi-zone Kubernetes
If you want to be HA, you will have to force a minimum of 2 replicas on your workloads that require a volume like EBS. This has multiple downsides:
- You will need twice the compute power (CPU/mem)
- You will need a big overcommitment to be able to schedule everything
Let's say I run 10 pods that have a volume. 4 live in zone A, 4 in zone B, and 2 in zone C. Things are fine, but now I have to scale my cluster because my workload increases, and I need more memory. So I add extra nodes. If I want to support the 4 pods that live in Zone A, I need to add n*3 nodes, because of AZ's.
I.e I want to add 2 nodes in zone A, I need to add 2*3, 6 nodes. 2 in A, 2 in B, and 2 nodes in zone C. Unless I'm going to specifically add nodes to a certain zone. Yet this beats the purpose of Kubernetes and perhaps poses more problems later on when a zone outage happens.
But why not use EFS then?
So EFS can mount in every AZ. Yet why I should not use it? That's using the right tool for the job. Yes, it can mount everywhere, but it's not made for the workload that I do. I do know that AWS advertise on IOPS and throughput for EFS, but frankly, it's not suitable for applications or EBS-like volumes. Don't take my word for it, let's test it.
Testing it
So I've created an EC2 instance for the sake of testing. It's a bit easier than setting up various storage controllers/drivers.
The instance I used was a 2 CPU, 4 GB memory instance. I then mounted 3 volumes.
- 1 EBS volume of 30GB
- 1 EFS General Purpose volume
- 1 EFS Max I/O volume with a Provisioned Throughput of 4MB/s
I used the tool FIO with the following test-case:
1[global]
2name=fio-rand-RW
3filename=fio-rand-RW
4rw=randrw
5rwmixread=60
6rwmixwrite=40
7bs=4K
8direct=0
9numjobs=4
10time_based
11runtime=900
12
13[file1]
14size=10G
15ioengine=libaio
16iodepth=16
I believe this somewhat pushes the system for the behaviour that you can expect from an application that is trying quite hard.
EBS
1file1: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
2...
3fio-3.1
4Starting 4 processes
5file1: Laying out IO file (1 file / 10240MiB)
6Jobs: 4 (f=4): [m(4)][100.0%][r=18.3MiB/s,w=12.2MiB/s][r=4677,w=3122 IOPS][eta 00m:00s]
7file1: (groupid=0, jobs=1): err= 0: pid=20470: Sat Feb 6 20:53:59 2021
8 read: IOPS=562, BW=2251KiB/s (2305kB/s)(1979MiB/900001msec)
9 slat (usec): min=2, max=178266, avg=1088.46, stdev=1565.14
10 clat (usec): min=28, max=262712, avg=16074.01, stdev=15871.56
11 lat (usec): min=398, max=269008, avg=17163.72, stdev=16870.85
12 clat percentiles (msec):
13 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
14 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
15 | 70.00th=[ 13], 80.00th=[ 24], 90.00th=[ 41], 95.00th=[ 55],
16 | 99.00th=[ 73], 99.50th=[ 80], 99.90th=[ 91], 99.95th=[ 96],
17 | 99.99th=[ 106]
18 bw ( KiB/s): min= 496, max= 5648, per=24.95%, avg=2251.02, stdev=1615.05, samples=1800
19 iops : min= 124, max= 1412, avg=562.72, stdev=403.76, samples=1800
20 write: IOPS=375, BW=1503KiB/s (1539kB/s)(1321MiB/900001msec)
21 slat (usec): min=3, max=96415, avg=1013.27, stdev=1747.81
22 clat (usec): min=5, max=258974, avg=15844.44, stdev=15836.98
23 lat (usec): min=23, max=262640, avg=16858.86, stdev=16776.30
24 clat percentiles (msec):
25 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
26 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
27 | 70.00th=[ 13], 80.00th=[ 23], 90.00th=[ 41], 95.00th=[ 54],
28 | 99.00th=[ 73], 99.50th=[ 80], 99.90th=[ 91], 99.95th=[ 95],
29 | 99.99th=[ 106]
30 bw ( KiB/s): min= 288, max= 3624, per=24.94%, avg=1503.06, stdev=1083.55, samples=1800
31 iops : min= 72, max= 906, avg=375.73, stdev=270.89, samples=1800
32 lat (usec) : 10=0.01%, 50=0.01%, 250=0.01%, 500=0.01%, 750=0.01%
33 lat (usec) : 1000=0.01%
34 lat (msec) : 2=0.07%, 4=2.11%, 10=52.53%, 20=23.86%, 50=15.10%
35 lat (msec) : 100=6.29%, 250=0.02%, 500=0.01%
36 cpu : usr=0.57%, sys=3.36%, ctx=954242, majf=1, minf=11
37 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
38 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
39 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
40 issued rwt: total=506556,338245,0, short=0,0,0, dropped=0,0,0
41 latency : target=0, window=0, percentile=100.00%, depth=16
42file1: (groupid=0, jobs=1): err= 0: pid=20471: Sat Feb 6 20:53:59 2021
43 read: IOPS=564, BW=2259KiB/s (2314kB/s)(1986MiB/900001msec)
44 slat (usec): min=2, max=96180, avg=1087.73, stdev=1547.74
45 clat (usec): min=27, max=250965, avg=16000.04, stdev=15779.10
46 lat (usec): min=50, max=257360, avg=17089.02, stdev=16775.86
47 clat percentiles (msec):
48 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
49 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
50 | 70.00th=[ 13], 80.00th=[ 23], 90.00th=[ 41], 95.00th=[ 54],
51 | 99.00th=[ 73], 99.50th=[ 80], 99.90th=[ 91], 99.95th=[ 96],
52 | 99.99th=[ 107]
53 bw ( KiB/s): min= 496, max= 5458, per=25.06%, avg=2261.26, stdev=1622.41, samples=1800
54 iops : min= 124, max= 1364, avg=565.18, stdev=405.62, samples=1800
55 write: IOPS=377, BW=1512KiB/s (1548kB/s)(1329MiB/900001msec)
56 slat (usec): min=3, max=174374, avg=1002.89, stdev=1748.38
57 clat (usec): min=5, max=246378, avg=15780.95, stdev=15731.99
58 lat (usec): min=19, max=252531, avg=16784.97, stdev=16661.97
59 clat percentiles (msec):
60 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
61 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
62 | 70.00th=[ 13], 80.00th=[ 23], 90.00th=[ 41], 95.00th=[ 54],
63 | 99.00th=[ 73], 99.50th=[ 80], 99.90th=[ 91], 99.95th=[ 96],
64 | 99.99th=[ 106]
65 bw ( KiB/s): min= 336, max= 3912, per=25.10%, avg=1512.91, stdev=1090.43, samples=1800
66 iops : min= 84, max= 978, avg=378.11, stdev=272.58, samples=1800
67 lat (usec) : 10=0.01%, 50=0.01%, 100=0.01%, 500=0.01%, 750=0.01%
68 lat (usec) : 1000=0.01%
69 lat (msec) : 2=0.08%, 4=2.20%, 10=52.85%, 20=23.41%, 50=15.30%
70 lat (msec) : 100=6.13%, 250=0.03%, 500=0.01%
71 cpu : usr=0.53%, sys=3.40%, ctx=955398, majf=0, minf=13
72 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
73 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
74 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
75 issued rwt: total=508376,340134,0, short=0,0,0, dropped=0,0,0
76 latency : target=0, window=0, percentile=100.00%, depth=16
77file1: (groupid=0, jobs=1): err= 0: pid=20472: Sat Feb 6 20:53:59 2021
78 read: IOPS=563, BW=2255KiB/s (2309kB/s)(1982MiB/900001msec)
79 slat (usec): min=2, max=36556, avg=1085.50, stdev=1544.38
80 clat (usec): min=142, max=239522, avg=16009.40, stdev=15792.14
81 lat (usec): min=147, max=242281, avg=17096.17, stdev=16787.27
82 clat percentiles (msec):
83 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
84 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
85 | 70.00th=[ 13], 80.00th=[ 23], 90.00th=[ 41], 95.00th=[ 54],
86 | 99.00th=[ 73], 99.50th=[ 81], 99.90th=[ 93], 99.95th=[ 97],
87 | 99.99th=[ 110]
88 bw ( KiB/s): min= 424, max= 5680, per=24.99%, avg=2255.00, stdev=1619.12, samples=1800
89 iops : min= 106, max= 1420, avg=563.72, stdev=404.78, samples=1800
90 write: IOPS=376, BW=1507KiB/s (1543kB/s)(1325MiB/900001msec)
91 slat (usec): min=3, max=180505, avg=1012.37, stdev=1765.62
92 clat (usec): min=5, max=242263, avg=15859.64, stdev=15842.96
93 lat (usec): min=135, max=243047, avg=16873.17, stdev=16782.22
94 clat percentiles (msec):
95 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
96 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
97 | 70.00th=[ 13], 80.00th=[ 23], 90.00th=[ 41], 95.00th=[ 54],
98 | 99.00th=[ 73], 99.50th=[ 80], 99.90th=[ 93], 99.95th=[ 97],
99 | 99.99th=[ 109]
100 bw ( KiB/s): min= 328, max= 3608, per=25.00%, avg=1506.78, stdev=1080.48, samples=1800
101 iops : min= 82, max= 902, avg=376.66, stdev=270.12, samples=1800
102 lat (usec) : 10=0.01%, 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
103 lat (msec) : 2=0.07%, 4=2.12%, 10=52.70%, 20=23.70%, 50=15.21%
104 lat (msec) : 100=6.16%, 250=0.04%
105 cpu : usr=0.67%, sys=3.24%, ctx=956670, majf=1, minf=11
106 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
107 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
108 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
109 issued rwt: total=507450,339079,0, short=0,0,0, dropped=0,0,0
110 latency : target=0, window=0, percentile=100.00%, depth=16
111file1: (groupid=0, jobs=1): err= 0: pid=20473: Sat Feb 6 20:53:59 2021
112 read: IOPS=564, BW=2256KiB/s (2310kB/s)(1983MiB/900001msec)
113 slat (usec): min=2, max=174351, avg=1086.38, stdev=1564.66
114 clat (usec): min=5, max=224279, avg=16001.65, stdev=15727.92
115 lat (usec): min=392, max=227137, avg=17089.26, stdev=16724.56
116 clat percentiles (msec):
117 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
118 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
119 | 70.00th=[ 13], 80.00th=[ 23], 90.00th=[ 41], 95.00th=[ 54],
120 | 99.00th=[ 73], 99.50th=[ 80], 99.90th=[ 91], 99.95th=[ 96],
121 | 99.99th=[ 107]
122 bw ( KiB/s): min= 480, max= 5434, per=25.04%, avg=2259.20, stdev=1624.18, samples=1800
123 iops : min= 120, max= 1358, avg=564.59, stdev=406.06, samples=1800
124 write: IOPS=376, BW=1506KiB/s (1542kB/s)(1323MiB/900001msec)
125 slat (usec): min=3, max=100681, avg=1011.28, stdev=1735.27
126 clat (usec): min=395, max=224278, avg=15876.79, stdev=15810.02
127 lat (usec): min=400, max=224290, avg=16889.20, stdev=16740.12
128 clat percentiles (msec):
129 | 1.00th=[ 4], 5.00th=[ 5], 10.00th=[ 6], 20.00th=[ 7],
130 | 30.00th=[ 8], 40.00th=[ 9], 50.00th=[ 10], 60.00th=[ 11],
131 | 70.00th=[ 13], 80.00th=[ 24], 90.00th=[ 41], 95.00th=[ 54],
132 | 99.00th=[ 73], 99.50th=[ 79], 99.90th=[ 91], 99.95th=[ 95],
133 | 99.99th=[ 107]
134 bw ( KiB/s): min= 248, max= 3632, per=25.01%, avg=1507.46, stdev=1079.29, samples=1800
135 iops : min= 62, max= 908, avg=376.69, stdev=269.78, samples=1800
136 lat (usec) : 10=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
137 lat (msec) : 2=0.08%, 4=2.10%, 10=52.72%, 20=23.61%, 50=15.29%
138 lat (msec) : 100=6.17%, 250=0.02%
139 cpu : usr=0.62%, sys=3.29%, ctx=956696, majf=0, minf=13
140 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
141 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
142 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
143 issued rwt: total=507663,338748,0, short=0,0,0, dropped=0,0,0
144 latency : target=0, window=0, percentile=100.00%, depth=16
145
146Run status group 0 (all jobs):
147 READ: bw=9022KiB/s (9239kB/s), 2251KiB/s-2259KiB/s (2305kB/s-2314kB/s), io=7930MiB (8315MB), run=900001-900001msec
148 WRITE: bw=6028KiB/s (6172kB/s), 1503KiB/s-1512KiB/s (1539kB/s-1548kB/s), io=5298MiB (5555MB), run=900001-900001msec
149
150Disk stats (read/write):
151 xvdb: ios=1511378/1245594, merge=0/22, ticks=1525247/4155320, in_queue=2205008, util=99.50%
EFS General Purpose
1file1: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
2...
3fio-3.1
4Starting 4 processes
5file1: Laying out IO file (1 file / 10240MiB)
6fio: native_fallocate call failed: Operation not supported
7Jobs: 4 (f=4): [f(4)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
8file1: (groupid=0, jobs=1): err= 0: pid=20508: Sat Feb 6 21:24:36 2021
9 read: IOPS=73, BW=294KiB/s (301kB/s)(258MiB/900031msec)
10 slat (usec): min=3, max=11172k, avg=8297.13, stdev=72921.08
11 clat (msec): min=10, max=13528, avg=120.65, stdev=397.06
12 lat (msec): min=14, max=13568, avg=128.95, stdev=415.57
13 clat percentiles (msec):
14 | 1.00th=[ 36], 5.00th=[ 44], 10.00th=[ 48], 20.00th=[ 54],
15 | 30.00th=[ 58], 40.00th=[ 62], 50.00th=[ 66], 60.00th=[ 70],
16 | 70.00th=[ 75], 80.00th=[ 82], 90.00th=[ 94], 95.00th=[ 115],
17 | 99.00th=[ 1452], 99.50th=[ 1670], 99.90th=[ 4111], 99.95th=[ 9597],
18 | 99.99th=[13087]
19 bw ( KiB/s): min= 7, max= 760, per=27.72%, avg=325.15, stdev=256.13, samples=1626
20 iops : min= 1, max= 190, avg=81.21, stdev=64.04, samples=1626
21 write: IOPS=48, BW=195KiB/s (200kB/s)(172MiB/900031msec)
22 slat (usec): min=4, max=12026k, avg=7973.68, stdev=97935.35
23 clat (usec): min=9, max=13646k, avg=125701.44, stdev=456961.73
24 lat (msec): min=12, max=13864, avg=133.68, stdev=478.26
25 clat percentiles (msec):
26 | 1.00th=[ 37], 5.00th=[ 44], 10.00th=[ 49], 20.00th=[ 55],
27 | 30.00th=[ 59], 40.00th=[ 63], 50.00th=[ 66], 60.00th=[ 70],
28 | 70.00th=[ 75], 80.00th=[ 83], 90.00th=[ 95], 95.00th=[ 117],
29 | 99.00th=[ 1485], 99.50th=[ 1687], 99.90th=[ 9194], 99.95th=[12416],
30 | 99.99th=[13489]
31 bw ( KiB/s): min= 7, max= 598, per=28.13%, avg=220.00, stdev=170.06, samples=1596
32 iops : min= 1, max= 149, avg=54.91, stdev=42.52, samples=1596
33 lat (usec) : 10=0.01%
34 lat (msec) : 20=0.02%, 50=12.95%, 100=79.19%, 250=3.90%, 500=0.19%
35 lat (msec) : 750=0.25%, 1000=0.68%, 2000=2.61%, >=2000=0.21%
36 cpu : usr=0.12%, sys=0.67%, ctx=146755, majf=0, minf=8
37 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
38 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
39 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
40 issued rwt: total=66133,43920,0, short=0,0,0, dropped=0,0,0
41 latency : target=0, window=0, percentile=100.00%, depth=16
42file1: (groupid=0, jobs=1): err= 0: pid=20509: Sat Feb 6 21:24:36 2021
43 read: IOPS=73, BW=293KiB/s (300kB/s)(257MiB/900030msec)
44 slat (usec): min=3, max=12061k, avg=8374.49, stdev=82376.05
45 clat (usec): min=12, max=13553k, avg=122864.34, stdev=420015.71
46 lat (msec): min=18, max=13627, avg=131.24, stdev=439.43
47 clat percentiles (msec):
48 | 1.00th=[ 36], 5.00th=[ 44], 10.00th=[ 49], 20.00th=[ 54],
49 | 30.00th=[ 58], 40.00th=[ 62], 50.00th=[ 66], 60.00th=[ 70],
50 | 70.00th=[ 75], 80.00th=[ 82], 90.00th=[ 94], 95.00th=[ 117],
51 | 99.00th=[ 1469], 99.50th=[ 1670], 99.90th=[ 6812], 99.95th=[ 9866],
52 | 99.99th=[13355]
53 bw ( KiB/s): min= 7, max= 808, per=27.44%, avg=321.84, stdev=255.40, samples=1638
54 iops : min= 1, max= 202, avg=80.42, stdev=63.81, samples=1638
55 write: IOPS=48, BW=195KiB/s (200kB/s)(171MiB/900030msec)
56 slat (usec): min=4, max=9339.9k, avg=7911.09, stdev=87069.71
57 clat (msec): min=17, max=13737, avg=123.18, stdev=431.11
58 lat (msec): min=17, max=13817, avg=131.09, stdev=451.21
59 clat percentiles (msec):
60 | 1.00th=[ 37], 5.00th=[ 45], 10.00th=[ 49], 20.00th=[ 55],
61 | 30.00th=[ 59], 40.00th=[ 63], 50.00th=[ 67], 60.00th=[ 71],
62 | 70.00th=[ 77], 80.00th=[ 83], 90.00th=[ 95], 95.00th=[ 118],
63 | 99.00th=[ 1485], 99.50th=[ 1703], 99.90th=[ 7148], 99.95th=[10134],
64 | 99.99th=[13489]
65 bw ( KiB/s): min= 8, max= 536, per=28.49%, avg=222.78, stdev=169.51, samples=1576
66 iops : min= 2, max= 134, avg=55.69, stdev=42.38, samples=1576
67 lat (usec) : 20=0.01%
68 lat (msec) : 20=0.01%, 50=12.47%, 100=79.68%, 250=3.92%, 500=0.17%
69 lat (msec) : 750=0.28%, 1000=0.65%, 2000=2.58%, >=2000=0.26%
70 cpu : usr=0.13%, sys=0.66%, ctx=146681, majf=0, minf=8
71 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
72 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
73 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
74 issued rwt: total=65869,43900,0, short=0,0,0, dropped=0,0,0
75 latency : target=0, window=0, percentile=100.00%, depth=16
76file1: (groupid=0, jobs=1): err= 0: pid=20510: Sat Feb 6 21:24:36 2021
77 read: IOPS=73, BW=293KiB/s (300kB/s)(258MiB/900016msec)
78 slat (usec): min=3, max=8465.3k, avg=7962.96, stdev=39572.21
79 clat (usec): min=12, max=13775k, avg=123922.26, stdev=437645.82
80 lat (msec): min=18, max=13883, avg=131.89, stdev=449.52
81 clat percentiles (msec):
82 | 1.00th=[ 36], 5.00th=[ 44], 10.00th=[ 48], 20.00th=[ 54],
83 | 30.00th=[ 58], 40.00th=[ 62], 50.00th=[ 66], 60.00th=[ 70],
84 | 70.00th=[ 75], 80.00th=[ 82], 90.00th=[ 94], 95.00th=[ 117],
85 | 99.00th=[ 1435], 99.50th=[ 1636], 99.90th=[ 8490], 99.95th=[ 9866],
86 | 99.99th=[13489]
87 bw ( KiB/s): min= 8, max= 784, per=27.57%, avg=323.41, stdev=255.67, samples=1633
88 iops : min= 2, max= 196, avg=80.80, stdev=63.86, samples=1633
89 write: IOPS=49, BW=196KiB/s (201kB/s)(172MiB/900016msec)
90 slat (usec): min=4, max=11945k, avg=8458.35, stdev=123591.78
91 clat (msec): min=11, max=13759, avg=120.56, stdev=395.14
92 lat (msec): min=14, max=13775, avg=129.02, stdev=426.79
93 clat percentiles (msec):
94 | 1.00th=[ 36], 5.00th=[ 45], 10.00th=[ 49], 20.00th=[ 54],
95 | 30.00th=[ 58], 40.00th=[ 63], 50.00th=[ 66], 60.00th=[ 71],
96 | 70.00th=[ 75], 80.00th=[ 83], 90.00th=[ 95], 95.00th=[ 117],
97 | 99.00th=[ 1435], 99.50th=[ 1603], 99.90th=[ 3809], 99.95th=[ 9463],
98 | 99.99th=[13489]
99 bw ( KiB/s): min= 8, max= 609, per=28.59%, avg=223.54, stdev=170.40, samples=1580
100 iops : min= 2, max= 152, avg=55.88, stdev=42.60, samples=1580
101 lat (usec) : 20=0.01%
102 lat (msec) : 20=0.01%, 50=12.88%, 100=79.25%, 250=3.86%, 500=0.19%
103 lat (msec) : 750=0.26%, 1000=0.65%, 2000=2.68%, >=2000=0.21%
104 cpu : usr=0.13%, sys=0.64%, ctx=146791, majf=0, minf=9
105 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
106 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
107 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
108 issued rwt: total=65983,44155,0, short=0,0,0, dropped=0,0,0
109 latency : target=0, window=0, percentile=100.00%, depth=16
110file1: (groupid=0, jobs=1): err= 0: pid=20511: Sat Feb 6 21:24:36 2021
111 read: IOPS=73, BW=293KiB/s (300kB/s)(258MiB/900016msec)
112 slat (usec): min=3, max=11536k, avg=8309.89, stdev=78193.81
113 clat (msec): min=13, max=13882, avg=121.87, stdev=424.10
114 lat (msec): min=13, max=13887, avg=130.18, stdev=442.50
115 clat percentiles (msec):
116 | 1.00th=[ 36], 5.00th=[ 44], 10.00th=[ 48], 20.00th=[ 54],
117 | 30.00th=[ 58], 40.00th=[ 62], 50.00th=[ 66], 60.00th=[ 70],
118 | 70.00th=[ 75], 80.00th=[ 82], 90.00th=[ 94], 95.00th=[ 117],
119 | 99.00th=[ 1469], 99.50th=[ 1720], 99.90th=[ 6812], 99.95th=[10268],
120 | 99.99th=[13355]
121 bw ( KiB/s): min= 8, max= 753, per=27.64%, avg=324.17, stdev=255.93, samples=1630
122 iops : min= 2, max= 188, avg=80.94, stdev=63.88, samples=1630
123 write: IOPS=48, BW=196KiB/s (201kB/s)(172MiB/900016msec)
124 slat (usec): min=4, max=12026k, avg=7952.12, stdev=92106.64
125 clat (usec): min=50, max=13783k, avg=123812.27, stdev=421857.04
126 lat (msec): min=14, max=13892, avg=131.77, stdev=443.19
127 clat percentiles (msec):
128 | 1.00th=[ 36], 5.00th=[ 44], 10.00th=[ 48], 20.00th=[ 54],
129 | 30.00th=[ 58], 40.00th=[ 63], 50.00th=[ 67], 60.00th=[ 71],
130 | 70.00th=[ 75], 80.00th=[ 83], 90.00th=[ 95], 95.00th=[ 118],
131 | 99.00th=[ 1502], 99.50th=[ 1687], 99.90th=[ 6477], 99.95th=[ 9731],
132 | 99.99th=[13355]
133 bw ( KiB/s): min= 8, max= 569, per=28.35%, avg=221.71, stdev=171.23, samples=1591
134 iops : min= 2, max= 142, avg=55.42, stdev=42.80, samples=1591
135 lat (usec) : 100=0.01%
136 lat (msec) : 20=0.01%, 50=13.29%, 100=78.87%, 250=3.89%, 500=0.17%
137 lat (msec) : 750=0.24%, 1000=0.69%, 2000=2.61%, >=2000=0.22%
138 cpu : usr=0.12%, sys=0.65%, ctx=146572, majf=0, minf=11
139 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
140 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
141 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
142 issued rwt: total=65972,44098,0, short=0,0,0, dropped=0,0,0
143 latency : target=0, window=0, percentile=100.00%, depth=16
144
145Run status group 0 (all jobs):
146 READ: bw=1173KiB/s (1201kB/s), 293KiB/s-294KiB/s (300kB/s-301kB/s), io=1031MiB (1081MB), run=900016-900031msec
147 WRITE: bw=783KiB/s (801kB/s), 195KiB/s-196KiB/s (200kB/s-201kB/s), io=688MiB (721MB), run=900016-900031msec
EFS Max I/O
1file1: (g=0): rw=randrw, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=libaio, iodepth=16
2...
3fio-3.1
4Starting 4 processes
5file1: Laying out IO file (1 file / 10240MiB)
6fio: native_fallocate call failed: Operation not supported
7
8Jobs: 4 (f=4): [f(4)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 00m:00s]
9file1: (groupid=0, jobs=1): err= 0: pid=20617: Sat Feb 6 22:33:00 2021
10 read: IOPS=44, BW=177KiB/s (181kB/s)(155MiB/900008msec)
11 slat (usec): min=3, max=1025.5k, avg=14004.16, stdev=55094.99
12 clat (msec): min=11, max=15599, avg=203.57, stdev=723.39
13 lat (msec): min=14, max=16181, avg=217.57, stdev=767.70
14 clat percentiles (msec):
15 | 1.00th=[ 55], 5.00th=[ 66], 10.00th=[ 73], 20.00th=[ 82],
16 | 30.00th=[ 88], 40.00th=[ 93], 50.00th=[ 99], 60.00th=[ 104],
17 | 70.00th=[ 110], 80.00th=[ 118], 90.00th=[ 131], 95.00th=[ 144],
18 | 99.00th=[ 4933], 99.50th=[ 5470], 99.90th=[ 6275], 99.95th=[ 6745],
19 | 99.99th=[15368]
20 bw ( KiB/s): min= 7, max= 544, per=30.98%, avg=218.10, stdev=177.46, samples=1458
21 iops : min= 1, max= 136, avg=54.44, stdev=44.36, samples=1458
22 write: IOPS=29, BW=118KiB/s (121kB/s)(104MiB/900008msec)
23 slat (usec): min=5, max=11845k, avg=12893.30, stdev=92058.77
24 clat (usec): min=6, max=15754k, avg=203521.09, stdev=710834.42
25 lat (msec): min=11, max=15754, avg=216.42, stdev=755.18
26 clat percentiles (msec):
27 | 1.00th=[ 55], 5.00th=[ 67], 10.00th=[ 74], 20.00th=[ 83],
28 | 30.00th=[ 88], 40.00th=[ 94], 50.00th=[ 100], 60.00th=[ 105],
29 | 70.00th=[ 111], 80.00th=[ 120], 90.00th=[ 132], 95.00th=[ 146],
30 | 99.00th=[ 4866], 99.50th=[ 5403], 99.90th=[ 6477], 99.95th=[ 6745],
31 | 99.99th=[13624]
32 bw ( KiB/s): min= 7, max= 392, per=34.45%, avg=161.90, stdev=115.63, samples=1310
33 iops : min= 1, max= 98, avg=40.38, stdev=28.91, samples=1310
34 lat (usec) : 10=0.01%
35 lat (msec) : 20=0.01%, 50=0.46%, 100=52.90%, 250=44.20%, 500=0.05%
36 lat (msec) : 750=0.03%, 1000=0.04%, 2000=0.14%, >=2000=2.17%
37 cpu : usr=0.06%, sys=0.93%, ctx=94550, majf=0, minf=10
38 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
39 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
40 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
41 issued rwt: total=39782,26537,0, short=0,0,0, dropped=0,0,0
42 latency : target=0, window=0, percentile=100.00%, depth=16
43file1: (groupid=0, jobs=1): err= 0: pid=20618: Sat Feb 6 22:33:00 2021
44 read: IOPS=43, BW=175KiB/s (180kB/s)(154MiB/900004msec)
45 slat (usec): min=3, max=1121.0k, avg=14131.26, stdev=57098.32
46 clat (msec): min=8, max=17004, avg=201.05, stdev=724.28
47 lat (msec): min=10, max=17341, avg=215.18, stdev=770.32
48 clat percentiles (msec):
49 | 1.00th=[ 55], 5.00th=[ 67], 10.00th=[ 73], 20.00th=[ 82],
50 | 30.00th=[ 88], 40.00th=[ 93], 50.00th=[ 99], 60.00th=[ 104],
51 | 70.00th=[ 110], 80.00th=[ 118], 90.00th=[ 131], 95.00th=[ 146],
52 | 99.00th=[ 5000], 99.50th=[ 5604], 99.90th=[ 6745], 99.95th=[ 7282],
53 | 99.99th=[15368]
54 bw ( KiB/s): min= 8, max= 521, per=31.47%, avg=221.56, stdev=175.76, samples=1425
55 iops : min= 2, max= 130, avg=55.39, stdev=43.94, samples=1425
56 write: IOPS=29, BW=117KiB/s (120kB/s)(103MiB/900004msec)
57 slat (usec): min=5, max=11488k, avg=12939.67, stdev=90499.57
58 clat (usec): min=7, max=16864k, avg=210838.70, stdev=765745.75
59 lat (msec): min=8, max=17197, avg=223.78, stdev=810.75
60 clat percentiles (msec):
61 | 1.00th=[ 56], 5.00th=[ 68], 10.00th=[ 74], 20.00th=[ 83],
62 | 30.00th=[ 89], 40.00th=[ 94], 50.00th=[ 100], 60.00th=[ 106],
63 | 70.00th=[ 112], 80.00th=[ 121], 90.00th=[ 133], 95.00th=[ 148],
64 | 99.00th=[ 5134], 99.50th=[ 5738], 99.90th=[ 7215], 99.95th=[ 7752],
65 | 99.99th=[16442]
66 bw ( KiB/s): min= 8, max= 376, per=34.02%, avg=159.88, stdev=114.79, samples=1320
67 iops : min= 2, max= 94, avg=39.97, stdev=28.70, samples=1320
68 lat (usec) : 10=0.01%
69 lat (msec) : 10=0.01%, 20=0.01%, 50=0.47%, 100=52.07%, 250=45.09%
70 lat (msec) : 500=0.05%, 750=0.03%, 1000=0.03%, 2000=0.15%, >=2000=2.10%
71 cpu : usr=0.06%, sys=0.90%, ctx=94572, majf=0, minf=11
72 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
73 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
74 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
75 issued rwt: total=39474,26387,0, short=0,0,0, dropped=0,0,0
76 latency : target=0, window=0, percentile=100.00%, depth=16
77file1: (groupid=0, jobs=1): err= 0: pid=20619: Sat Feb 6 22:33:00 2021
78 read: IOPS=44, BW=176KiB/s (181kB/s)(155MiB/900007msec)
79 slat (usec): min=3, max=11683k, avg=14867.35, stdev=82362.34
80 clat (msec): min=11, max=16858, avg=204.65, stdev=715.21
81 lat (msec): min=11, max=17187, avg=219.52, stdev=765.14
82 clat percentiles (msec):
83 | 1.00th=[ 54], 5.00th=[ 67], 10.00th=[ 73], 20.00th=[ 82],
84 | 30.00th=[ 88], 40.00th=[ 93], 50.00th=[ 99], 60.00th=[ 104],
85 | 70.00th=[ 111], 80.00th=[ 118], 90.00th=[ 132], 95.00th=[ 146],
86 | 99.00th=[ 4732], 99.50th=[ 5537], 99.90th=[ 6611], 99.95th=[ 6879],
87 | 99.99th=[14429]
88 bw ( KiB/s): min= 8, max= 520, per=30.48%, avg=214.60, stdev=177.32, samples=1479
89 iops : min= 2, max= 130, avg=53.65, stdev=44.33, samples=1479
90 write: IOPS=29, BW=117KiB/s (120kB/s)(103MiB/900007msec)
91 slat (usec): min=4, max=1052.3k, avg=11700.04, stdev=51144.23
92 clat (usec): min=7, max=16668k, avg=203458.32, stdev=725566.69
93 lat (msec): min=10, max=17344, avg=215.16, stdev=761.82
94 clat percentiles (msec):
95 | 1.00th=[ 56], 5.00th=[ 67], 10.00th=[ 73], 20.00th=[ 83],
96 | 30.00th=[ 88], 40.00th=[ 94], 50.00th=[ 100], 60.00th=[ 105],
97 | 70.00th=[ 111], 80.00th=[ 120], 90.00th=[ 132], 95.00th=[ 146],
98 | 99.00th=[ 4866], 99.50th=[ 5537], 99.90th=[ 6477], 99.95th=[ 6745],
99 | 99.99th=[16576]
100 bw ( KiB/s): min= 8, max= 400, per=33.97%, avg=159.68, stdev=115.75, samples=1324
101 iops : min= 2, max= 100, avg=39.92, stdev=28.94, samples=1324
102 lat (usec) : 10=0.01%
103 lat (msec) : 20=0.01%, 50=0.54%, 100=52.27%, 250=44.71%, 500=0.05%
104 lat (msec) : 750=0.04%, 1000=0.05%, 2000=0.15%, >=2000=2.20%
105 cpu : usr=0.06%, sys=1.03%, ctx=94387, majf=0, minf=11
106 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
107 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
108 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
109 issued rwt: total=39683,26434,0, short=0,0,0, dropped=0,0,0
110 latency : target=0, window=0, percentile=100.00%, depth=16
111file1: (groupid=0, jobs=1): err= 0: pid=20620: Sat Feb 6 22:33:00 2021
112 read: IOPS=43, BW=176KiB/s (180kB/s)(154MiB/900003msec)
113 slat (usec): min=3, max=11508k, avg=14369.45, stdev=81491.16
114 clat (usec): min=7, max=17024k, avg=200537.92, stdev=722782.28
115 lat (msec): min=8, max=17349, avg=214.91, stdev=771.35
116 clat percentiles (msec):
117 | 1.00th=[ 55], 5.00th=[ 67], 10.00th=[ 73], 20.00th=[ 82],
118 | 30.00th=[ 88], 40.00th=[ 93], 50.00th=[ 99], 60.00th=[ 104],
119 | 70.00th=[ 110], 80.00th=[ 118], 90.00th=[ 131], 95.00th=[ 144],
120 | 99.00th=[ 4866], 99.50th=[ 5470], 99.90th=[ 6946], 99.95th=[ 7550],
121 | 99.99th=[16040]
122 bw ( KiB/s): min= 8, max= 561, per=30.93%, avg=217.76, stdev=176.59, samples=1452
123 iops : min= 2, max= 140, avg=54.44, stdev=44.15, samples=1452
124 write: IOPS=29, BW=118KiB/s (121kB/s)(104MiB/900003msec)
125 slat (usec): min=4, max=1116.2k, avg=12454.71, stdev=55514.46
126 clat (msec): min=19, max=16182, avg=209.52, stdev=735.52
127 lat (msec): min=19, max=16427, avg=221.97, stdev=775.08
128 clat percentiles (msec):
129 | 1.00th=[ 54], 5.00th=[ 67], 10.00th=[ 74], 20.00th=[ 82],
130 | 30.00th=[ 89], 40.00th=[ 94], 50.00th=[ 100], 60.00th=[ 105],
131 | 70.00th=[ 112], 80.00th=[ 120], 90.00th=[ 133], 95.00th=[ 148],
132 | 99.00th=[ 4933], 99.50th=[ 5604], 99.90th=[ 6812], 99.95th=[ 7483],
133 | 99.99th=[15234]
134 bw ( KiB/s): min= 8, max= 424, per=34.20%, avg=160.76, stdev=116.03, samples=1323
135 iops : min= 2, max= 106, avg=40.19, stdev=29.01, samples=1323
136 lat (usec) : 10=0.01%
137 lat (msec) : 10=0.01%, 20=0.01%, 50=0.45%, 100=52.41%, 250=44.69%
138 lat (msec) : 500=0.06%, 750=0.04%, 1000=0.05%, 2000=0.14%, >=2000=2.15%
139 cpu : usr=0.05%, sys=1.07%, ctx=94392, majf=0, minf=13
140 IO depths : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=100.0%, 32=0.0%, >=64=0.0%
141 submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
142 complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.1%, 32=0.0%, 64=0.0%, >=64=0.0%
143 issued rwt: total=39534,26590,0, short=0,0,0, dropped=0,0,0
144 latency : target=0, window=0, percentile=100.00%, depth=16
145
146Run status group 0 (all jobs):
147 READ: bw=704KiB/s (721kB/s), 175KiB/s-177KiB/s (180kB/s-181kB/s), io=619MiB (649MB), run=900003-900008msec
148 WRITE: bw=471KiB/s (482kB/s), 117KiB/s-118KiB/s (120kB/s-121kB/s), io=414MiB (434MB), run=900003-900008msec
Results
The workload of EFS looks like this, where the first part of the timeline is the General Purpose and the second one is the Max I/O.
(src="efs.png")
Now if we get the IOPS from both EFS types:
GP:
1Reads:
2iops : min= 1, max= 190, avg=81.21, stdev=64.04, samples=1626
3iops : min= 1, max= 202, avg=80.42, stdev=63.81, samples=1638
4iops : min= 2, max= 196, avg=80.80, stdev=63.86, samples=1633
5iops : min= 2, max= 188, avg=80.94, stdev=63.88, samples=1630
6
7Writes:
8
9iops : min= 1, max= 149, avg=54.91, stdev=42.52, samples=1596
10iops : min= 2, max= 134, avg=55.69, stdev=42.38, samples=1576
11iops : min= 2, max= 152, avg=55.88, stdev=42.60, samples=1580
12iops : min= 2, max= 142, avg=55.42, stdev=42.80, samples=1591
Max I/O:
1Reads:
2iops : min= 1, max= 136, avg=54.44, stdev=44.36, samples=1458
3iops : min= 2, max= 130, avg=55.39, stdev=43.94, samples=1425
4iops : min= 2, max= 130, avg=53.65, stdev=44.33, samples=1479
5iops : min= 2, max= 140, avg=54.44, stdev=44.15, samples=1452
6
7Writes:
8iops : min= 1, max= 98, avg=40.38, stdev=28.91, samples=1310
9iops : min= 2, max= 94, avg=39.97, stdev=28.70, samples=1320
10iops : min= 2, max= 100, avg=39.92, stdev=28.94, samples=1324
11iops : min= 2, max= 106, avg=40.19, stdev=29.01, samples=1323
I provisioned 4MB/s specifically because of 10GB with a provision of 4MB/s costs about 30$ p/m. Going higher in MB/s is quite expensive. I felt this was a "sane" value. Funny thing though, my more expensive setup was worse than my first run with just the General Purpose one.
Now I did invest some time to comprehend the EFS solution regarding hard & soft limits, pricing, various performance types, and the throughput. Yet honestly, I have no idea why this happened. My best bet is that the GP used a freebie on throughput.
If you take a look again at the previous screenshot, you can see that the GP was quite fast on uploading the 10GB file, while the Max I/O took a while (in line with 4MB/s):
Anyhow, if we look at EBS:
Reads:
1iops : min= 124, max= 1412, avg=562.72, stdev=403.76, samples=1800
2iops : min= 124, max= 1364, avg=565.18, stdev=405.62, samples=1800
3iops : min= 106, max= 1420, avg=563.72, stdev=404.78, samples=1800
4iops : min= 120, max= 1358, avg=564.59, stdev=406.06, samples=1800
5
6Writes:
7
8iops : min= 72, max= 906, avg=375.73, stdev=270.89, samples=1800
9iops : min= 84, max= 978, avg=378.11, stdev=272.58, samples=1800
10iops : min= 82, max= 902, avg=376.66, stdev=270.12, samples=1800
11iops : min= 62, max= 908, avg=376.69, stdev=269.78, samples=1800
These are just sane values we can work with for most use-cases. Perhaps we have to buff EBS a bit more (which is possible) for higher performance, but the base-line is solid.
Stability
The thing the tests did not show was stability on the performance and how the systems deals with its storage. To give you an example, if I would run the read/write test and try to do something on the file system, it either locks out or is terrible slow.
1root@ip-172-31-16-158:/efs# time ls -lah
2total 11G
3drwxr-xr-x 2 root root 6.0K Feb 6 21:07 .
4drwxr-xr-x 25 root root 4.0K Feb 6 20:15 ..
5-rw-r--r-- 1 root root 10G Feb 6 21:19 fio-rand-RW
6
7real 0m11.318s
8user 0m0.000s
9sys 0m0.036s
Yes, that took 11.3 seconds.
Also, a funny thing, when I placed the 10GB file and did a list in my folder, it took the time of uploading the file (about 30-40 minutes) before my list command was completed.
Obviously, I'm here "abusing" EFS, but I really wanted to show you why it's not suitable for applications.
Use-cases for EFS
So it does have use-cases, for instance in a CMS to include documents and other files. I guess various SAP solutions can use it. It can be really good on that. It just scales with how much data you have on it and you can provision throughput based on what you want. It allows for many hosts to mount the volume and has quite some read/write power. Roughly 35.000 actions per second.
Yet this is just not great for workloads you expect on Kubernetes. If EBS is not an option and S3 is not supported, I assume you can go for EFS if your workload does not go mental on it. I.e. if you store artifacts on it, fine. If you use it to process data ON EFS, I would say: nope.
EFS is a can of worms though.
When I was fiddling around, I felt I have to address this. I'm just going to post some quotes from the AWS website.
Amazon EFS offers a Standard and an Infrequent Access storage class. The Standard storage class is designed for active file system workloads and you pay only for the file system storage you use per month.
Enable Lifecycle Management when your file system contains files that are not accessed every day to reduce your storage costs
Files smaller than 128 KiB are not eligible for Lifecycle Management and will always be stored on Amazon EFS Standard storage class.
When reading from or writing to Amazon EFS IA, your first-byte latency is higher than that of Amazon EFS Standard.
Throughput of bursting mode file systems scales linearly with the amount of data stored. If you need more throughput than you can achieve with your amount of data stored, you can configure Provisioned Throughput.
EFS supports one to thousands of Amazon EC2 instances connecting to a file system concurrently
Amazon EFS’s distributed design avoids the bottlenecks and constraints inherent to traditional file servers
This distributed data storage design means that multi-threaded applications, and applications that concurrently access data from multiple Amazon EC2 instances can drive substantial levels of aggregate throughput and IOPS
Due to this per-operation latency, overall throughput generally increases as the average I/O size increases, since the overhead is amortized over a larger amount of data
“Max I/O” performance mode is optimized for applications where tens, hundreds, or thousands of EC2 instances are accessing the file system
With bursting mode, the default throughput mode for Amazon EFS file systems, the throughput available to a file system scales as a file system grows.
Also, because many workloads are read-heavy, read operations are metered at a 1:3 ratio to other NFS operations (like write).
All file systems deliver a consistent baseline performance of 50 MB/s per TB of Standard class storage
All file systems (regardless of size) can burst to 100 MB/s,
File systems with more than 1TB of Standard class storage can burst to 100 MB/s per TB
Since read operations are metered at a 1:3 ratio, you can drive up to 300 MiBs/s per TiB of read throughput.
Provisioned Throughput also includes 50 KB/s per GB (or 1 MB/s per 20 GB) of throughput in the price of Standard storage.
and there are more...
Look, I get it. Storage is hard. AWS tries to give options to users but also requires $$$ depending on what your requirements are. The problem is however, it's getting too many variables. It's getting hard to understand what is happening, why that's happening, and how much that will cost you.
So don't get me wrong, I do think AWS made something really awesome with EFS (if you use it correctly) but its setup and billing model is an abomination.
Furthermore, I see quite some statements regarding IOPS and throughput but the reality is different depending on your use-case. The test I did with random read/writes gave about 50 IOPS for both reads and writes on average. So, 100 IOPS to make it easier. AWS states:
"In General Purpose mode, there is a limit of 35,000 file operations per second. Operations that read data or metadata consume one file operation, operations that write data or update metadata consume five file operations."
Talking about that can of worms again, but let's continue:
"This means that a file system can support 35,000 read operations per second, or 7,000 write operations, or some combination of the two. For example, 20,000 read operations and 3,000 write operations (20,000 reads x 1 file operation per read + 3,000 writes x 5 file operations per write = 35,000 file operations)."
Well, that was not really in line with my test. So after searching and searching I could not find exact numbers on IOPS and how this was calculated. Then I found this slide:
So I believe these "35,000" file operations are based on concurrency. Multiple hosts asking for "some file". Often this is not the case. Therefore I believe the base-line IOPS for EFS is 100.
What if you want good storage for multi-AZ?
Now we know that EFS is not a solid option, unless we change the behavior of our application, we can think about a solution for having EBS-like storage but for multi-AZ.
The fact is, there is not an AWS solution for this. No storage option does not focus on RWX type of "file sharing" but rather plain cool EBS that can be mounted in every AZ.
If you do not want the trade-off that I discussed before, there is only one other solution and that is creating your own storage stack :)
My two favorites are:
Longhorn: https://github.com/longhorn/longhorn and Rook/Ceph: https://github.com/rook/rook
It's enterprise-grade distributed storage with no single point of failure, cloud-native and it's distributed in a way that it's like EBS, without the zone limitation.
Now, this does add complexity and something you have to manage. Yet if you want to run your clusters on 3 AZ's, have workloads that can use simple block storage for processing, and/or don't want to run inefficiently on Kubernetes: This might be just it.
Perhaps in a follow-up, I'll do a deeper dive into these solutions.
Recap
- EFS is good as a product but only suitable for certain workloads and requires A LOT of thought on how you want to use it
- S3 is not a file system, but an excellent way to store "media", backups, data-lake, etc.
- EBS is solid for storage; boot volumes, whatever database (I'm not saying you should run databases on k8s though)
If you have certain requirements, be sure that the storage you pick can support that. Also, make sure you keep your requirements in line. Don't abuse storage for your requirements. If you need RWX volumes and EFS is your only option, perhaps consider rewriting your application to modern standards.
Multi-zone mounts are not supported by EBS, which is really unfortunate.
My take on EBS multi-zone support
I believe AWS makes a lot of money on EFS. Legacy applications need to use it if they are pushed into the cloud (I'm watching you SAP). EFS does have legit use-cases but I strongly believe it's often used because there is no alternative and the only solution to make something work. Therefore the incentive to make something like EBS for multi-zones is gone.
0,30$ per GB stored and 6$ per MB/s provisioned per month is also no incentive to create something else.
For reference, EBS costs $0.08/GB-month, 125 MB/s free and $0.04/provisioned MB/s-month over 125. That includes the up to 3,000 IOPS standard.