[documentation] Add benchmarking section and update benchmark results after new run

2019-08-10 21:46:45 +03:00 · 2019-08-10 21:46:45 +03:00 · 12f9a2c0ec
commit 12f9a2c0ec
parent 08eb9ae918
1 changed files with 160 additions and 36 deletions
--- a/documentation/readme.rst
+++ b/documentation/readme.rst
@ -65,62 +65,69 @@ Results

  Bottom line (**even on my 6 years old laptop**):

-  * under normal conditions (16 concurrent clients), you get around 36k requests / second, at about 0.5ms latency;
-  * under stress conditions (512 concurrent clients), you get arround 32k requests / second, at about 15ms latency;
+  * under normal conditions (16 concurrent connections), you get around 72k requests / second, at about 0.4ms latency for 99% of the requests;
+  * under stress conditions (512 concurrent connections), you get arround 74k requests / second, at about 15ms latency for 99% of the requests;
+  * **under extreme conditions (2048 concurrent connections), you get arround 74k requests / second, at about 500ms latency for 99% of the requests (meanwhile the average is 50ms);**
+  * (the timeout errors are due to the fact that ``wrk`` is configured to timeout after only 1 second of waiting;)
+  * (the read errors are due to the fact that the server closes a keep-alive connection after serving 256k requests;)

 .. note ::

  Please note that the values under ``Thread Stats`` are reported per thread.
  Therefore it is best to look at the first two values, i.e. ``Requests/sec``.

-* 16 connections / 4 threads: ::
+* 16 connections / 2 server threads / 4 wrk threads: ::

-    Requests/sec:  36084.51
-    Transfer/sec:     16.45MB
+    Requests/sec:  71935.39
+    Transfer/sec:     29.02MB

+    Running 30s test @ http://127.0.0.1:8080/
      4 threads and 16 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
-        Latency   436.77us  223.21us   3.36ms   81.09%
-        Req/Sec     9.07k   499.08    10.27k    72.17%
+        Latency   220.12us   96.77us   1.98ms   64.61%
+        Req/Sec    18.08k   234.07    18.71k    82.06%
      Latency Distribution
-         50%  390.00us
-         75%  481.00us
-         90%  669.00us
-         99%    1.34ms
-      1082680 requests in 30.00s, 493.55MB read
+         50%  223.00us
+         75%  295.00us
+         90%  342.00us
+         99%  397.00us
+      2165220 requests in 30.10s, 0.85GB read

-* 512 connections / 4 threads: ::
+* 512 connections / 2 server threads / 4 wrk threads: ::

-    Requests/sec:  32773.77
-    Transfer/sec:     14.94MB
+    Requests/sec:  74050.48
+    Transfer/sec:     29.87MB

+    Running 30s test @ http://127.0.0.1:8080/
      4 threads and 512 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
-        Latency    15.84ms   11.04ms  65.68ms   61.64%
-        Req/Sec     8.24k     1.76k   15.65k    70.95%
+        Latency     6.86ms    6.06ms 219.10ms   54.85%
+        Req/Sec    18.64k     1.62k   36.19k    91.42%
      Latency Distribution
-         50%   15.91ms
-         75%   23.48ms
-         90%   29.63ms
-         99%   45.90ms
-      986092 requests in 30.09s, 449.52MB read
+         50%    7.25ms
+         75%   12.54ms
+         90%   13.56ms
+         99%   14.84ms
+      2225585 requests in 30.05s, 0.88GB read
+      Socket errors: connect 0, read 89, write 0, timeout 0

-* 2048 connections / 4 threads: ::
+* 2048 connections / 2 server threads / 4 wrk threads: ::

-    Requests/sec:  31132.31
-    Transfer/sec:     14.19MB
+    Requests/sec:  74714.23
+    Transfer/sec:     30.14MB

+    Running 30s test @ http://127.0.0.1:8080/
      4 threads and 2048 connections
      Thread Stats   Avg      Stdev     Max   +/- Stdev
-        Latency    98.56ms  163.64ms   4.12s    90.85%
-        Req/Sec     7.84k     1.83k   14.43k    68.36%
+        Latency    52.45ms   87.02ms 997.26ms   88.24%
+        Req/Sec    18.84k     3.18k   35.31k    80.77%
      Latency Distribution
-         50%   57.15ms
-         75%   92.95ms
-         90%  248.46ms
-         99%  671.10ms
-      936780 requests in 30.09s, 427.04MB read
-      Socket errors: connect 0, read 0, write 1, timeout 0
+         50%   23.60ms
+         75%   34.86ms
+         90%  162.92ms
+         99%  435.41ms
+      2244296 requests in 30.04s, 0.88GB read
+      Socket errors: connect 0, read 106, write 0, timeout 51


 Notes
@ -128,14 +135,16 @@ Notes

 The following benchmarks were executed as follows:

-* the machine was my personal laptop:  6 years old with an Intel Core i5 2520M (2 cores with 2 threads each), which during the benchmarks (due to a bad fan and dust) it kept entering into thermal throttling;  (i.e. the worst case scenario;)
-* the ``kawipiko-server`` was started with ``GOMAXPROCS=4``;  (i.e. 4 threads handling the requests;)
+* the machine was my personal laptop:  6 years old with an Intel Core i7 3667U (2 cores with 2 threads each);
+* the ``kawipiko-server`` was started with ``--processes 1 --threads 2``;  (i.e. 2 threads handling the requests;)
 * the ``kawipiko-server`` was started with ``--archive-inmem``;  (i.e. the CDB database file was preloaded into memory, thus no disk I/O;)
 * the benchmarking tool was wrk_;
 * both ``kawipiko-server`` and ``wrk`` tools were run on the same machine;
+* both ``kawipiko-server`` and ``wrk`` tools were pinned on different physical cores;
 * the benchmark was run over loopback networking (i.e. ``127.0.0.1``);
 * the served file contains the content ``Hello World!``;
-* the protocol was HTTP  (i.e. no TLS);
+* the protocol was HTTP (i.e. no TLS), with keep-alive;
+* see the `benchmarking section <#benchmarking>`_ for details;



@ -283,6 +292,121 @@ Examples



+Benchmarking
+------------
+
+
+* get the binaries (either `download <#download-binaries>`_ or `build <#build-from-sources>`_ them);
+* get the ``hello-world.cdb`` (from the `examples <./examples>`__ folder inside the repository);
+
+
+Single process / single threaded
+................................
+
+* this scenario will yield a "base-line performance" per core;
+
+* execute the server (in-memory and indexed) (i.e. the "best case scenario"): ::
+
+    kawipiko-server \
+        --bind 127.0.0.1:8080 \
+        --archive ./hello-world.cdb \
+        --archive-inmem \
+        --index-all \
+        --processes 1 \
+        --threads 1 \
+    #
+
+* execute the server (memory mapped) (i.e. the "the recommended scenario"): ::
+
+    kawipiko-server \
+        --bind 127.0.0.1:8080 \
+        --archive ./hello-world.cdb \
+        --archive-mmap \
+        --processes 1 \
+        --threads 1 \
+    #
+
+
+Single process / two threads
+............................
+
+* this scenario is the usual setup;  configure `--threads` to equal the number of cores;
+
+* execute the server (memory mapped): ::
+
+    kawipiko-server \
+        --bind 127.0.0.1:8080 \
+        --archive ./hello-world.cdb \
+        --archive-mmap \
+        --processes 1 \
+        --threads 2 \
+    #
+
+
+Load generators
+...............
+
+* 512 concurrent connections (handled by 2 threads): ::
+
+    wrk \
+        --threads 2 \
+        --connections 512 \
+        --timeout 6s \
+        --duration 30s \
+        --latency \
+        http://127.0.0.1:8080/ \
+    #
+
+* 4096 concurrent connections (handled by 4 threads): ::
+
+    wrk \
+        --threads 4 \
+        --connections 4096 \
+        --timeout 6s \
+        --duration 30s \
+        --latency \
+        http://127.0.0.1:8080/ \
+    #
+
+
+Take into account
+.................
+
+* the number of threads for the server plus for ``wkr`` shouldn't be larger than the number of available cores;  (or use different machines for the server and the client;)
+
+* also take into account that by default the number of "file descriptors" on most UNIX/Linux machines is 1024, therefore if you want to try with more connections than 1000, you need to raise this limit;  (see bellow;)
+
+* additionally, you can try to pin the server and ``wrk`` to specific cores, increase various priorities (scheduling, IO, etc.);  (given that Intel processors have HyperThreading which appear to the OS as individual cores, you should make sure that you pin each process on cores part of the same physical processor / core;)
+
+* pinning the server (cores ``0`` and ``1`` are mapped on physical core ``1``): ::
+
+    sudo -u root -n -E -P -- \
+    taskset -c 0,1 \
+    nice -n -19 -- \
+    ionice -c 2 -n 0 -- \
+    chrt -r 10 \
+    prlimit -n16384 -- \
+    sudo -u "${USER}" -n -E -P -- \
+    kawipiko-server \
+        ... \
+    #
+
+* pinning the client (cores ``2`` and ``3`` are mapped on physical core ``2``): ::
+
+    sudo -u root -n -E -P -- \
+    taskset -c 2,3 \
+    nice -n -19 -- \
+    ionice -c 2 -n 0 -- \
+    chrt -r 10 \
+    prlimit -n16384 -- \
+    sudo -u "${USER}" -n -E -P -- \
+    wrk \
+        ... \
+    #
+
+
+
+
 Installation
 ============