``kawipiko`` is a **super-lightweight static HTTP (and HTTPS, H2, H3) server** written in Go, whose main purpose is to serve static content as fast as possible, and with the lowest resource consumption (CPU and RAM) as possible.
However "simple" doesn't imply "dumb" or "limited", instead it implies "efficient" through the removal of superfluous features, thus being inline with UNIX's old philosophy of `"do one thing and do it well" <https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well>`__.
As such ``kawipiko`` basically supports only ``GET`` requests, and does not provide features like dynamic content generation, authentication, reverse proxying, etc.; while still providing compression (``gzip``, ``zopfli``, ``brotli``), plus HTML-CSS-JS minification (TODO), without affecting its performance (due to its unique architecture as seen below).
What ``kawipiko`` does provide is something very unique, that no other HTTP server offers: the static content is served from a CDB_ database with almost no latency (as compared to classical static servers that still have to pass through the OS via the ``open-read-close`` syscalls).
Moreover as noted earlier, the static content can be compressed (with either ``gzip``, ``zopfli`` or ``brotli``) or minified ahead of time, thus reducing not only CPU but also bandwidth and latency.
CDB_ databases are binary files that provide efficient read-only key-value lookup tables, initially used in some DNS and SMTP servers, mainly for their low overhead lookup operations, zero locking in multi-threaded / multi-process scenarios, and "atomic" multi-record updates.
For those familiar with Netlify (or competitors like CloudFlare Pages, GitHub Pages, etc.), ``kawipiko`` is a "host-it-yourself" alternative featuring:
* self-contained deployment with simple configuration; (i.e. just `fetch the executable <#installation>`__ and use the `proper flags <#kawipiko-server>`__;)
* (hopefully) extremely secure; (i.e. it doesn't launch processes, it doesn't connect to other services or databases, it doesn't open any files, etc.; basically you can easily ``chroot`` it, or containerize it as is in fashion these days;)
* highly portable, supporting at least Linux (the main development, testing and deployment platform), FreeBSD, OpenBSD, and OSX;
With regard to performance, as described in the `benchmarks section <#benchmarks>`__, ``kawipiko`` is at least on-par with NGinx, sustaining over 100K requests / second with 0.25ms latency for 99% of the requests even on my 6 years old laptop.
However the main advantage over NGinx is not raw performance, but deployment and configuration simplicity, plus efficient management and storage of large collections of many small files.
*``kawipiko`` -- an all-in-one executable that bundles all functionality in one executable; (i.e. ``kawipiko server ...`` or ``kawipiko archiver ...``);
Unlike most (if not all) other servers out-there, in which you just point your web server to the folder holding the static website content root, ``kawipiko`` takes a radically different approach.
In order to serve the static content, one has to first "archive" the content into the CDB file through ``kawipiko-archiver``, and then one can "serve" it from the CDB file through ``kawipiko-server``.
This two step phase also presents a few opportunities:
* one can decouple the "building", "testing", and "publishing" phases of a static website, by using a similar CI/CD pipeline as done for other software projects;
* one can instantaneously rollback to a previous version if the newly published one has issues;
* (insecure) HTTP/1.1 for ``--bind``, leveraging ``fasthttp`` library;
* (secure) HTTP/1.1 over TLS for ``--bind-tls``, leveraging ``fasthttp`` library;
* (insecure) HTTP/1.1 for `--bind-2``, leveraging Go's ``net/http`` library; (not as performant as the ``fasthttp`` powered endpoint;)
* (secure) H2 or HTTP/1.1 over TLS for ``--bind-tls-2``, leveraging Go's ``net/http``; (not as performant as the ``fasthttp`` powered endpoint;)
* (secure) H3 over QUIC for ``--bind-quic``, leveraging ``github.com/lucas-clemente/quic-go`` library; (given that H3 is still a new protocol, this must be used with caution; also one should use the ``--http3-alt-svc <ip:port>``;)
* if one uses just ``--bind-tls`` (without ``--bind-tls-2``, and without ``--http2-disabled``), then the TLS endpoint is split between ``fasthttp`` for HTTP/1.1 and Go's ``net/http`` for H2;
If TLS is enabled, these options allows one to specify the certificate to use, either as a single file (a bundle) or separate files (the actual public certificate and the private key).
If one doesn't specify any of these options, an embedded self-signed certificate will be used. In such case, one can choose between RSA (the ``--tls-self-rsa`` flag) or Ed25519 (the ``--tls-self-ed25519`` flag);
``--http1-disable``, ``--http2-disable``
Disables that particular protocol.
(It can be used only with ``--bind-tls-2``, given that ``fasthttp`` only supports HTTP/1.)
The number of processes and threads per each process to start. (Given Go's concurrency model, the threads count is somewhat a soft limit, hinting to the runtime the desired parallelism level.)
It is highly recommended to use one process and as many threads as there are cores.
(However note that if using ``--archive-inmem``, then each process will allocate its own copy of the database in RAM; in such cases it is highly recommended to use ``--archive-mmap``.)
(**recommended**) The CDB file is `memory mapped <#mmap>`__, thus reading its data uses the kernel's file-system cache, as opposed to issuing ``read`` syscalls.
Before starting to serve requests, read the CDB file so that its data is buffered in the kernel's file-system cache. (This option can be used with or without ``--archive-mmap``.)
* given the request's path, it is used to locate the corresponding resource's metadata (i.e. response headers) and data (i.e. response body) references;
by using ``--index-paths`` a RAM-based lookup table is created to eliminate a CDB read operation for this purpose; (the memory impact is proportional to the size of all resource paths combined; given that the number of resources is acceptable, say up to a couple hundred thousand, one could safely use this option;)
* based on the resource's metadata reference, the actual metadata (i.e. the response headers) is located;
by using ``--index-data-meta`` a RAM-based lookup table is created to eliminate a CDB read operation for this purpose; (the memory impact is proportional to the size of all resource metadata blocks combined; given that the metadata blocks are deduplicated, one could safely use this option; if one also uses ``--archive-mmap`` or ``--archive-inmem``, then the memory impact is only proportional to the number of resource metadata blocks;)
* based on the resource's data reference, the actual data (i.e. the response body) is located;
by using ``--index-data-content`` a RAM-based lookup table is created to eliminate a CDB operation operation for this purpose; (the memory impact is proportional to the size of all resource data blocks combined; one can use this option to obtain the best performance; if one also uses ``--archive-mmap`` or ``--archive-inmem``, then the memory impact is only proportional to the number of resource data blocks;)
* (depending on the use-case) it is recommended to use ``--index-paths``; if ``--exclude-etag`` was used during archival, one can also use ``--index-data-meta``;
* it is recommended to use either ``--archive-mmap`` or ``--archive-inmem``, else (especially if data is indexed) the resulting effect is that of loading everything in RAM;
It starts the server in a "dummy" mode, ignoring all archive related arguments and always responding with ``hello world!\n`` (unless ``--dummy-empty`` was used) and without additional headers except the HTTP status line and ``Content-Length``.
This argument can be used to benchmark the raw performance of the underlying ``fasthttp``, Go's ``net/http``, or QUIC performance; this is the upper limit of the achievable performance given the underlying technologies.
The path to the target CDB file that contains the archived static content.
``--compress``, and ``--compress-level``
Each individual file (and consequently of the corresponding HTTP response body) is compressed with either ``gzip``, ``zopfli`` or ``brotli``; by default (or alternatively with ``identity``) no compression is used.
Even if compression is explicitly requested, if the compression ratio is bellow a certain threshold (depending on the uncompressed size), the file is stored without any compression.
(It's senseless to force the client to spend time and decompress the response body if that time is not recovered during network transmission.)
The compression level can be chosen, the value depending on the algorithm:
*``gzip`` -- ``-1`` for algorithm default, ``-2`` for Huffman only, ``0`` to ``9`` for fast to slow;
*``zopfli`` -- ``-1`` for algorithm default, ``0`` to ``30`` iterations for fast to slow;
*``brotli`` -- ``-1`` for algorithm default, ``0`` to ``9`` for fast to slow, ``-2`` for extreme;
* (by "algorithm default", it is meant "what that algorithm considers the recommended default compression level";)
*``kawipiko`` by default uses the maximum compression level for each algorithm; (i.e. ``9`` for ``gzip``, ``30`` for ``zopfli``, and ``-2`` for ``brotli``;)
``--sources-cache <path>``, and ``--compress-cache <path>``
At the given path a single file is created (that is an BBolt database), that will be used to cache the following information:
* in case of ``--sources-cache``, the fingerprint of each file contents is stored, so that if the file was not changed, re-reading it shouldn't be attempted unless it is absolutely necessary; also if the file is small enough, its contents is stored in this database (deduplicated by its fingerprint);
* in case of ``--compress-cache`` the compression outcome of each file contents is stored (deduplicated by its fingerprint), so that compression is done only once over multiple runs;
Each of these caches can be safely reused between multiple related archives, especially when they have many files in common.
Each of these caches can be independently used (or shared).
Using these caches allows one to very quickly rebuild an archive when only a couple of files have been changed, without even touching the file-system for the unchanged ones.
Disables using ``index.*`` files (where ``.*`` is one of ``.html``, ``.htm``, ``.xhtml``, ``.xht``, ``.txt``, ``.json``, and ``.xml``) to respond to a request whose URL path ends in ``/`` (corresponding to the folder wherein ``index.*`` file is located).
Disables using a file with the suffix ``.html``, ``.htm``, ``.xhtml``, ``.xht``, and ``.txt`` to respond to a request whose URL does not exactly match an existing file.
Disables adding an ``Cache-Control: public, immutable, max-age=3600`` header that forces the browser (and other intermediary proxies) to cache the response for an hour (the ``public`` and ``max-age=3600`` arguments), and furthermore not request it even on reloads (the ``immutable`` argument).
By not including the ``ETag`` header (i.e. the default), and because identical headers are stored only one, if one has many files of the same type (that in turn without ``ETag`` generates the same headers), this can lead to significant reduction in stored headers blocks, including reducing RAM usage.
(At this moment it does not support HTTP conditional requests, i.e. the ``If-None-Match``, ``If-Modified-Since`` and their counterparts; however this ``ETag`` header might be used in conjuction with ``HEAD`` requests to see if the resource has changed.)
* (at the moment these rules are not configurable through flags;)
``_wildcard.*`` files
.....................
By placing a file whose name matches ``_wildcard.*`` (i.e. with the prefix ``_wildcard.`` and any other suffix), it will be used to respond to any request whose URL fails to find a "better" match.
These wildcard files respect the folder hierarchy, in that wildcard files in (direct or transitive) subfolders override the wildcard file in their parents (direct or transitive).
You freely use symlinks (including pointing outside of the content root) and they will be crawled during archival respecting the "logical" hierarchy they introduce.
* (optionally) in order to reduce the serving latency even further, one can preload the entire CDB database in memory, or alternatively mapping it in memory (mmap_); this trades memory for CPU;
* "atomic" static website content changes; because the entire content is held in a single CDB database file, and because the file replacement is atomically achieved via the ``rename`` syscall (or the ``mv`` tool), all resources are "changed" at the same time;
*``_wildcard.*`` files (where ``.*`` are the regular extensions like ``.txt``, ``.html``, etc.) which will be used if an actual resource is not found under that folder; (these files respect the hierarchical tree structure, i.e. "deeper" ones override the ones closer to "root";)
* support for custom HTTP response headers (for specific files, for specific folders, etc.); (currently only ``Content-Type``, ``Content-Length``, ``Content-Encoding`` and optionally ``ETag`` is included; additionally ``Cache-Control: public, immutable, max-age=3600`` and a few security related headers are also included;)
* (won't fix) the CDB database **maximum size is 4 GiB**; (however if you have a static website this large, you are probably doing something extremely wrong, as large files should be offloaded to something like AWS S3 and served through a CDN like CloudFlare or AWS CloudFront;)
* (won't fix) the server **does not support per-request decompression / recompression**; this implies that if the content was saved in the CDB database with compression (say ``gzip``), the server will serve all resources compressed (i.e. ``Content-Encoding: gzip``), regardless of what the browser accepts (i.e. ``Accept-Encoding: gzip``); the same applies for uncompressed content; (however always using ``gzip`` compression is safe enough as it is implemented in virtually all browsers and HTTP clients out there;)
* (won't fix) regarding the "atomic" static website changes, there is a small time window in which a client that has fetched an "old" version of a resource (say an HTML page), but which has not yet fetched the required resources (say the CSS or JS files), and the CDB database was swapped, it will consequently fetch the "new" version of these required resources; however due to the low latency serving, this time window is extremely small; (**this is not a limitation of this HTTP server, but a limitation of the way the "web" is built;** always use fingerprints in your resources URL, and perhaps always include the current and previous version on each deploy;)
* under normal conditions (16 concurrent connections), you get around 111k requests / second, at about 0.25ms latency for 99% of the requests;
* under light stress conditions (128 concurrent connections), you get around 118k requests / second, at about 2.5ms latency for 99% of the requests;
* under medium stress conditions (512 concurrent connections), you get around 106k requests / second, at about 10ms latency for 99% of the requests (meanwhile the average is 4.5ms);
***under high stress conditions (2048 concurrent connections), you get around 100k requests / second, at about 400ms latency for 99% of the requests (meanwhile the average is 45ms);**
* under extreme stress conditions (16384 concurrent connections) (i.e. someone tries to DDOS the server), you get around 53k requests / second, at about 2.8s latency for 99% of the requests (meanwhile the average is 200ms);
* (the timeout errors are due to the fact that ``wrk`` is configured to timeout after only 1 second of waiting while connecting or receiving the full response;)
***the raw performance is at least on-par with NGinx**; (from my measurements ``kawipiko`` serves in fact 30% more requests / second than NGinx, at least for my "synthetic" benchmark;) however, especially for a "real world" scenarios (i.e. thousand of small files, accessed in a random patterns), I think ``kawipiko`` fares better; (not to mention how simple it is to configure and deploy ``kawipiko`` as compared to NGinx;)
* the ``kawipiko-server`` was started with ``--security-headers-disable``; (because these headers are not set by default by other HTTP servers;)
* the ``kawipiko-server`` was started with ``--timeout-disable``; (because, due to a known Go issue, using ``net.Conn.SetDeadline`` has an impact of about 20% of the raw performance; thus the reported values above might be about 10%-15% smaller when used with timeouts;)
* both the CDB and the NGinx folder were put on ``tmpfs`` (which implies that the disk is not a limiting factor); (in fact ``kawipiko`` performs quite well even on spinning disks due to careful storage management;)
* the NGinx configuration file can be found in the `examples folder <./examples>`__; the configuration was obtained after many experiments to squeeze out of NGinx as much performance as possible, given the targeted use-case, namely many small files;
* moreover NGinx seems to be quite sensitive to the actual path requested:
* if one requests ``http://127.0.0.1:8080/``, and one has configured NGinx to look for ``index.txt``, and that file actually exists, the performance is quite a bit lower than just asking for that file; (perhaps it issues more syscalls, searching for the index file;)
* if one requests ``http://127.0.0.1:8080/index.txt``, as mentioned above, it achieves the higher performance; (perhaps it issues fewer syscalls;)
* if one requests ``http://127.0.0.1:8080/does-not-exist``, it seems to achieve the "best" performance; (perhaps it issues the least amount of syscalls;) (however this is not an actual "use-ful" corner-case;)
* it must be noted that ``kawipiko`` doesn't exhibit this behaviour, the same performance is achieved regardless of the path variant;
* therefore the benchmarks above use ``/index.txt`` as opposed to ``/``;
* the number of threads for the server plus for ``wkr`` shouldn't be larger than the number of available cores; (or use different machines for the server and the client;)
* also take into account that by default the number of "file descriptors" on most UNIX/Linux machines is 1024, therefore if you want to try with more connections than 1000, you need to raise this limit; (see bellow;)
* additionally, you can try to pin the server and ``wrk`` to specific cores, increase various priorities (scheduling, IO, etc.); (given that Intel processors have HyperThreading which appear to the OS as individual cores, you should make sure that you pin each process on cores part of the same physical processor / core;)
* pinning the server (cores ``0`` and ``1`` are mapped on physical core ``1``): ::
Until I expand upon why I have chosen to use CDB for service static website content, you can read about the `sparkey <https://github.com/spotify/sparkey>`__ from Spotify.
For details about the copyright and licensing, please consult the `notice <./documentation/licensing/notice.txt>`__ file in the `documentation/licensing <./documentation/licensing>`__ folder.
*`Benchmarking LevelDB vs. RocksDB vs. HyperLevelDB vs. LMDB Performance for InfluxDB <https://www.influxdata.com/blog/benchmarking-leveldb-vs-rocksdb-vs-hyperleveldb-vs-lmdb-performance-for-influxdb/>`__ (article);
*`Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go <https://blog.dgraph.io/post/badger-lmdb-boltdb/>`__ (article);
*`Benchmarking BDB, CDB and Tokyo Cabinet on large datasets <https://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/>`__ (article);