kawipiko/documentation/readme.rst



#############################################
kawipiko -- blazingly fast static HTTP server
#############################################


``kawipiko`` is a **super-lightweight static HTTP (and HTTPS, H2, H3) server** written in Go, whose main purpose is to serve static content as fast as possible, and with the lowest resource consumption (CPU and RAM) as possible.

However "simple" doesn't imply "dumb" or "limited", instead it implies "efficient" through the removal of superfluous features, thus being inline with UNIX's old philosophy of `"do one thing and do it well" <https://en.wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_Do_It_Well>`__.
As such ``kawipiko`` basically supports only ``GET`` requests, and does not provide features like dynamic content generation, authentication, reverse proxying, etc.;  while still providing compression (``gzip``, ``zopfli``, ``brotli``), plus HTML-CSS-JS minification (TODO), without affecting its performance (due to its unique architecture as seen below).

What ``kawipiko`` does provide is something very unique, that no other HTTP server offers:  the static content is served from a `CDB database <#why-cdb>`__ with almost no latency (as compared to classical static servers that still have to pass through the OS via the ``open-read-close`` syscalls).
Moreover as noted earlier, the static content can be compressed (with either ``gzip``, ``zopfli`` or ``brotli``) or minified ahead of time, thus reducing not only CPU but also bandwidth and latency.

`CDB databases <#why-cdb>`__ are binary files that provide efficient read-only key-value lookup tables, initially used in some DNS and SMTP servers, mainly for their low overhead lookup operations, zero locking in multi-threaded / multi-process scenarios, and "atomic" multi-record updates.
This also makes them suitable for low-latency static website content serving over HTTP, which this project provides.

For those familiar with Netlify (or competitors like CloudFlare Pages, GitHub Pages, etc.), ``kawipiko`` is a "host-it-yourself" alternative featuring:

* self-contained deployment with simple configuration;  (i.e. just `fetch the executable <#installation>`__ and use the `proper flags <#kawipiko-server>`__;)
* low and constant resource consumption (both in terms of CPU and RAM);  (i.e. you won't have surprises when under load;)
* (hopefully) extremely secure;  (i.e. it doesn't launch processes, it doesn't connect to other services or databases, it doesn't open any files, etc.;  basically you can easily ``chroot`` it, or containerize it as is in fashion these days;)
* highly portable, supporting at least Linux (the main development, testing and deployment platform), FreeBSD, OpenBSD, and OSX;

For a complete list of features please consult the `features section <#features>`__.
Unfortunately, there are also some tradeoffs as described in the `limitations section <#limitations>`__ (although none are critical).

With regard to performance, as described in the `benchmarks section <#benchmarks>`__, ``kawipiko`` is at least on-par with NGinx, sustaining over 100K requests / second with 0.25ms latency for 99% of the requests even on my 6 years old laptop.
However the main advantage over NGinx is not raw performance, but deployment and configuration simplicity, plus efficient management and storage of large collections of many small files.


--------


.. contents::
    :depth: 1
    :backlinks: none


--------


::

    +---------------------------------------------------------------------------+
    .                                                                           .
    .   __                                                 __                   .
    .  /\ \                              __            __ /\ \                  .
    .  \ \ \/'\       __     __  __  __ /\_\   _____  /\_\\ \ \/'\      ___     .
    .   \ \ , <     /'__`\  /\ \/\ \/\ \\/\ \ /\ '__`\\/\ \\ \ , <     / __`\   .
    .    \ \ \\`\  /\ \L\.\_\ \ \_/ \_/ \\ \ \\ \ \L\ \\ \ \\ \ \\`\  /\ \L\ \  .
    .     \ \_\ \_\\ \__/.\_\\ \___x___/' \ \_\\ \ ,__/ \ \_\\ \_\ \_\\ \____/  .
    .      \/_/\/_/ \/__/\/_/ \/__//__/    \/_/ \ \ \/   \/_/ \/_/\/_/ \/___/   .
    .                                            \ \_\                          .
    .                                             \/_/                          .
    .                                                                           .
    .            _  _ ___ ___ ___     ____ ____ ____ _  _ ____ ____             .
    .            |__|  |   |  |__]    [__  |___ |__/ |  | |___ |__/             .
    .            |  |  |   |  |       ___] |___ |  \  \/  |___ |  \             .
    .                                                                           .
    .                                                                           .
    +---------------------------------------------------------------------------+


--------


Manual
======

.. contents::
    :local:
    :backlinks: none


Workflow
--------

The project provides the following executables (statically linked, without any other dependencies):

* ``kawipiko-server`` -- which serves the static content from the CDB file either via HTTP, HTTPS, H2 (i.e. HTTP/2) or H3 (i.e. HTTP/3 over QUIC);
* ``kawipiko-archiver`` -- which creates the CDB file from a source folder holding the static content, optionally compressing and minimifing it;
* ``kawipiko`` -- an all-in-one executable that bundles all functionality in one executable;  (i.e. ``kawipiko server ...`` or ``kawipiko archiver ...``);

Unlike most (if not all) other servers out-there, in which you just point your web server to the folder holding the static website content root, ``kawipiko`` takes a radically different approach.
In order to serve the static content, one has to first "archive" the content into the CDB file through ``kawipiko-archiver``, and then one can "serve" it from the CDB file through ``kawipiko-server``.

This two step phase also presents a few opportunities:

* one can decouple the "building", "testing", and "publishing" phases of a static website, by using a similar CI/CD pipeline as done for other software projects;
* one can instantaneously rollback to a previous version if the newly published one has issues;
* one can apply extreme compression (e.g. ``brotli``) to trade CPU during deployment vs latency and bandwidth at runtime.


.. note ::

   As described in the `limitations section <#limitations>`__, at the moment, if one rebuilds the CDB file, the server has to be restarted.


kawipiko-server
---------------

See `dedicated manual <./documentation/manual-server.rst>`__.


kawipiko-archiver
-----------------

See `dedicated manual <./documentation/manual-archiver.rst>`__.


--------


Examples
========

* fetch and extract the Python 3.7 documentation HTML archive: ::

    curl \
            -s -S -f \
            -o ./python-3.7.3-docs-html.tar.bz2 \
            https://docs.python.org/3/archives/python-3.7.3-docs-html.tar.bz2 \
    #

    tar \
            -x -j -v \
            -f ./python-3.7.3-docs-html.tar.bz2 \
    #

* create the CDB archive (without any compression): ::

    kawipiko-archiver \
            --archive ./python-3.7.3-docs-html-nozip.cdb \
            --sources ./python-3.7.3-docs-html \
            --debug \
    #

* create the CDB archive (with ``gzip`` compression): ::

    kawipiko-archiver \
            --archive ./python-3.7.3-docs-html-gzip.cdb \
            --sources ./python-3.7.3-docs-html \
            --compress gzip \
            --debug \
    #

* create the CDB archive (with ``zopfli`` compression): ::

    kawipiko-archiver \
            --archive ./python-3.7.3-docs-html-zopfli.cdb \
            --sources ./python-3.7.3-docs-html \
            --compress zopfli \
            --debug \
    #

* create the CDB archive (with ``brotli`` compression): ::

    kawipiko-archiver \
            --archive ./python-3.7.3-docs-html-brotli.cdb \
            --sources ./python-3.7.3-docs-html \
            --compress brotli \
            --debug \
    #

* serve the CDB archive (with ``gzip`` compression): ::

    kawipiko-server \
            --bind 127.0.0.1:8080 \
            --archive ./python-3.7.3-docs-html-gzip.cdb \
            --archive-mmap \
            --archive-preload \
            --debug \
    #

* compare sources and archive sizes: ::

    du \
            -h -s \
            \
            ./python-3.7.3-docs-html-nozip.cdb \
            ./python-3.7.3-docs-html-gzip.cdb \
            ./python-3.7.3-docs-html-zopfli.cdb \
            ./python-3.7.3-docs-html-brotli.cdb \
            \
            ./python-3.7.3-docs-html \
            ./python-3.7.3-docs-html.tar.bz2 \
    #

    45M     ./python-3.7.3-docs-html-nozip.cdb
    9.7M    ./python-3.7.3-docs-html-gzip.cdb
    ???     ./python-3.7.3-docs-html-zopfli.cdb
    7.9M    ./python-3.7.3-docs-html-brotli.cdb

    46M     ./python-3.7.3-docs-html
    6.0M    ./python-3.7.3-docs-html.tar.bz2


--------


Installation
============

See `dedicated installation document <./documentation/installation.rst>`__.


--------


Features
========

.. contents::
    :local:
    :backlinks: none


Implemented
-----------

The following is a list of the most important features:

* (optionally)  the static content is compressed when the CDB database is created, thus no CPU cycles are used while serving requests;

* (optionally)  the static content can be compressed with either ``gzip``, ``zopfli`` or ``brotli``;

* (optionally)  in order to reduce the serving latency even further, one can preload the entire CDB database in memory, or alternatively mapping it in memory (mmap_);  this trades memory for CPU;

* "atomic" static website content changes;  because the entire content is held in a single CDB database file, and because the file replacement is atomically achieved via the ``rename`` syscall (or the ``mv`` tool), all resources are "changed" at the same time;

* ``_wildcard.*`` files (where ``.*`` are the regular extensions like ``.txt``, ``.html``, etc.) which will be used if an actual resource is not found under that folder;  (these files respect the hierarchical tree structure, i.e. "deeper" ones override the ones closer to "root";)

* support for HTTPS, with HTTP/1.1, by leveraging ``fasthttp``;
* support for H2 (i.e. HTTP/2), by leveraging Go's ``net/http``;
* support for H3 (i.e. HTTP/3), by leveraging ``github.com/lucas-clemente/quic-go``;


Pending
-------

The following is a list of the most important features that are currently missing and are planed to be implemented:

* support for custom HTTP response headers (for specific files, for specific folders, etc.);  (currently only ``Content-Type``, ``Content-Length``, ``Content-Encoding`` and optionally ``ETag`` is included;  additionally ``Cache-Control: public, immutable, max-age=3600`` and a few security related headers are also included;)

* support for mapping virtual hosts to key prefixes;  (currently virtual hosts, i.e. the ``Host`` header, are ignored;)

* support for mapping virtual hosts to multiple CDB database files;  (i.e. the ability to serve multiple domains, each with its own CDB database;)

* automatic reloading of CDB database files;

* customized error pages (also part of the CDB database);


Limitations
-----------

As stated in the `about section <#about>`__, nothing comes for free, and in order to provide all these features, some corners had to be cut:

* (TODO)  currently if the CDB database file changes, the server needs to be restarted in order to pickup the changed files;

* (won't fix)  the CDB database **maximum size is 4 GiB**;  (however if you have a static website this large, you are probably doing something extremely wrong, as large files should be offloaded to something like AWS S3 and served through a CDN like CloudFlare or AWS CloudFront;)

* (won't fix)  the server **does not support per-request decompression / recompression**;  this implies that if the content was saved in the CDB database with compression (say ``gzip``), the server will serve all resources compressed (i.e. ``Content-Encoding: gzip``), regardless of what the browser accepts (i.e. ``Accept-Encoding: gzip``);  the same applies for uncompressed content;  (however always using ``gzip`` compression is safe enough as it is implemented in virtually all browsers and HTTP clients out there;)

* (won't fix)  regarding the "atomic" static website changes, there is a small time window in which a client that has fetched an "old" version of a resource (say an HTML page), but which has not yet fetched the required resources (say the CSS or JS files), and the CDB database was swapped, it will consequently fetch the "new" version of these required resources;  however due to the low latency serving, this time window is extremely small;  (**this is not a limitation of this HTTP server, but a limitation of the way the "web" is built;**  always use fingerprints in your resources URL, and perhaps always include the current and previous version on each deploy;)


--------


Benchmarks
==========

See `dedicated benchmarks document <./documentation/benchmarks.rst>`__.


--------


FAQ
===


Is it production ready?
-----------------------

Yes, it currently is serving ~600K HTML pages.

Although, being open source, you are responsible for making sure it works within your requirements!


Why CDB?
--------

Until I expand upon why I have chosen to use CDB for service static website content, you can read about the `sparkey <https://github.com/spotify/sparkey>`__ from Spotify.


Why Go?
-------

Because Go is highly portable, highly stable, and especially because it can easily support cross-compiling statically linked binaries to any platform it supports.


Why not Rust?
-------------

Because Rust fails to easily support cross-compiling (statically or dynamically linked) executables to any platform it supports.

Because Rust is less portable than Go;  for example Rust doesn't consider OpenBSD as a "tier-1" platform.


--------


Notice (copyright and licensing)
================================

.. contents::
    :local:
    :backlinks: none


Authors
-------

Ciprian Dorin Craciun
  * `ciprian@volution.ro <mailto:ciprian@volution.ro>`__ or `ciprian.craciun@gmail.com <mailto:ciprian.craciun@gmail.com>`__
  * `<https://volution.ro/ciprian>`__
  * `<https://github.com/cipriancraciun>`__


Notice -- short version
-----------------------

The code is licensed under AGPL 3 or later.

If you **change** the code within this repository **and use** it for **non-personal** purposes, you'll have to release it as per AGPL.


Notice -- long version
----------------------

For details about the copyright and licensing, please consult the `notice <./documentation/licensing/notice.txt>`__ file in the `documentation/licensing <./documentation/licensing>`__ folder.

If someone requires the sources and/or documentation to be released
under a different license, please send an email to the authors,
stating the licensing requirements, accompanied with the reasons
and other details; then, depending on the situation, the authors might
release the sources and/or documentation under a different license.


References
==========


.. [CDB]
    * `CDB <https://en.wikipedia.org/wiki/Cdb_(software)>`__ (@Wikipedia);
    * `cdb <http://cr.yp.to/cdb.html>`__ (project website, reference implementation by DJB);
    * `cdb <https://github.com/colinmarc/cdb>`__ (project @GitHub, pure Go implementation, used by ``kawipiko`` with patches;)
    * `Constant Database Internals <http://www.unixuser.org/~euske/doc/cdbinternals/index.html>`__ (article);
    * `Benchmarking LevelDB vs. RocksDB vs. HyperLevelDB vs. LMDB Performance for InfluxDB <https://www.influxdata.com/blog/benchmarking-leveldb-vs-rocksdb-vs-hyperleveldb-vs-lmdb-performance-for-influxdb/>`__ (article);
    * `Badger vs LMDB vs BoltDB: Benchmarking key-value databases in Go <https://blog.dgraph.io/post/badger-lmdb-boltdb/>`__ (article);
    * `Benchmarking BDB, CDB and Tokyo Cabinet on large datasets <https://www.dmo.ca/blog/benchmarking-hash-databases-on-large-data/>`__ (article);
    * `TinyCDB <http://www.corpit.ru/mjt/tinycdb.html>`__ (fork project);
    * `tinydns <https://cr.yp.to/djbdns/tinydns.html>`__ (DNS server using CDB);
    * `qmail <https://cr.yp.to/qmail.html>`__ (SMTP server using CDB);

.. [Go]
    * `Go <https://en.wikipedia.org/wiki/Go_(programming_language)>`__ (@Wikipedia);
    * `Go <https://golang.com/>`__ (project website);

.. [fasthttp]
    * `fasthttp <https://github.com/valyala/fasthttp>`__ (project @GitHub);
    * high performance HTTP server implementation;  (alternative to Go's ``net/http`` implementation;)
    * supports HTTP/1 with or without TLS;
    * used by ``kawipiko``;

.. [quic-go]
    * `quic-go <https://github.com/lucas-clemente/quic-go>`__ (project @GitHub);
    * supports H3 over QUIC;
    * used by ``kawipiko``;

.. [Zopfli]
    * `Zopfli <https://en.wikipedia.org/wiki/Zopfli>`__ (@Wikipedia);
    * `Zopfli <https://github.com/google/zopfli>`__ (project @GitHub, reference implementation by Google);
    * `Zopfli <https://github.com/foobaz/go-zopfli>`__ (project @GitHub, pure Go implementation, used by ``kawipiko``);

.. [Brotli]
    * `Brotli <https://en.wikipedia.org/wiki/Brotli>`__ (@Wikipedia);
    * `Brotli <https://github.com/google/brotli>`__ (project @GitHub, reference implementation by Google);
    * `Brotli <https://github.com/andybalholm/brotli>`__ (project @GitHub, pure Go implementation, used by ``kawipiko``);
    * `Results of experimenting with Brotli for dynamic web content <https://blog.cloudflare.com/results-experimenting-brotli/>`__ (article);

.. [Blake3]
    * `Blake3 <https://en.wikipedia.org/wiki/BLAKE_(hash_function)>`__ (@Wikipedia);
    * `Blake3 <https://github.com/BLAKE3-team/BLAKE3>`__ (project @GitHub, reference implementation);
    * `Blake3 <https://github.com/zeebo/blake3>`__ (project @GitHub, pure Go implementation, used by ``kawipiko``);

.. [Bolt]
    * `bolt <https://github.com/boltdb/bolt>`__ (project @GitHub, original pure Go implementation);
    * `bbolt <https://github.com/etcd-io/bbolt>`__ (project @GitHub, forked pure Go implementation, used by ``kawipiko``);

.. [wrk]
    * `wrk <https://github.com/wg/wrk>`__ (project @GitHub);
    * modern HTTP benchmarking tool;
    * multi threaded, implemented in C, with event loop and Lua support;
    * supports HTTP/1 with and without TLS;

.. [h2load]
    * part of the ``nghttp2`` project;
    * `nghttp2 <https://github.com/nghttp2/nghttp2>`__ (project @GitHub);
    * modern HTTP benchmarking tool;
    * multi threaded, implemented in C, with event loop;
    * supports HTTP/1 with TLS, H2, and H3;

.. [Netlify]
    * `Netlify <https://www.netlify.com/>`__ (cloud provider);

.. [HAProxy]
    * `HAProxy <https://en.wikipedia.org/wiki/HAProxy>`__ (@Wikipedia);
    * `HAProxy <https://www.haproxy.org/>`__ (project website);
    * reliable high performance TCP/HTTP load-balancer;
    * multi threaded, implemented in C, with event loop and Lua support;

.. [NGinx]
    * `NGinx <https://en.wikipedia.org/wiki/Nginx>`__ (@Wikipedia);
    * `NGinx <https://nginx.org/>`__ (project website);
    * reliable high performance HTTP server;
    * multi threaded, implemented in C, with event loop;

.. [darkhttpd]
    * `darkhttpd <https://unix4lyfe.org/darkhttpd/>`__ (project website);
    * simple static HTTP server;
    * single threaded, implemented in C, with event loop and ``sendfile`` support;

.. [mmap]
    * `Memory mapping <https://en.wikipedia.org/wiki/Memory-mapped_file>`__ (@Wikipedia);
    * `mmap(2) <http://man7.org/linux/man-pages/man2/mmap.2.html>`__ (Linux man page);