kawipiko/documentation/manuals/archiver.html

515 lines
20 KiB
HTML
Raw Normal View History

<?xml version="1.0" encoding="utf-8" ?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
<meta name="generator" content="Docutils 0.19: https://docutils.sourceforge.io/" />
<title>kawipiko -- blazingly fast static HTTP server</title>
<style type="text/css">
/*
:Author: David Goodger (goodger@python.org)
:Id: $Id: html4css1.css 8954 2022-01-20 10:10:25Z milde $
:Copyright: This stylesheet has been placed in the public domain.
Default cascading style sheet for the HTML output of Docutils.
See https://docutils.sourceforge.io/docs/howto/html-stylesheets.html for how to
customize this style sheet.
*/
/* used to remove borders from tables and images */
.borderless, table.borderless td, table.borderless th {
border: 0 }
table.borderless td, table.borderless th {
/* Override padding for "table.docutils td" with "! important".
The right padding separates the table cells. */
padding: 0 0.5em 0 0 ! important }
.first {
/* Override more specific margin styles with "! important". */
margin-top: 0 ! important }
.last, .with-subtitle {
margin-bottom: 0 ! important }
.hidden {
display: none }
.subscript {
vertical-align: sub;
font-size: smaller }
.superscript {
vertical-align: super;
font-size: smaller }
a.toc-backref {
text-decoration: none ;
color: black }
blockquote.epigraph {
margin: 2em 5em ; }
dl.docutils dd {
margin-bottom: 0.5em }
object[type="image/svg+xml"], object[type="application/x-shockwave-flash"] {
overflow: hidden;
}
/* Uncomment (and remove this text!) to get bold-faced definition list terms
dl.docutils dt {
font-weight: bold }
*/
div.abstract {
margin: 2em 5em }
div.abstract p.topic-title {
font-weight: bold ;
text-align: center }
div.admonition, div.attention, div.caution, div.danger, div.error,
div.hint, div.important, div.note, div.tip, div.warning {
margin: 2em ;
border: medium outset ;
padding: 1em }
div.admonition p.admonition-title, div.hint p.admonition-title,
div.important p.admonition-title, div.note p.admonition-title,
div.tip p.admonition-title {
font-weight: bold ;
font-family: sans-serif }
div.attention p.admonition-title, div.caution p.admonition-title,
div.danger p.admonition-title, div.error p.admonition-title,
div.warning p.admonition-title, .code .error {
color: red ;
font-weight: bold ;
font-family: sans-serif }
/* Uncomment (and remove this text!) to get reduced vertical space in
compound paragraphs.
div.compound .compound-first, div.compound .compound-middle {
margin-bottom: 0.5em }
div.compound .compound-last, div.compound .compound-middle {
margin-top: 0.5em }
*/
div.dedication {
margin: 2em 5em ;
text-align: center ;
font-style: italic }
div.dedication p.topic-title {
font-weight: bold ;
font-style: normal }
div.figure {
margin-left: 2em ;
margin-right: 2em }
div.footer, div.header {
clear: both;
font-size: smaller }
div.line-block {
display: block ;
margin-top: 1em ;
margin-bottom: 1em }
div.line-block div.line-block {
margin-top: 0 ;
margin-bottom: 0 ;
margin-left: 1.5em }
div.sidebar {
margin: 0 0 0.5em 1em ;
border: medium outset ;
padding: 1em ;
background-color: #ffffee ;
width: 40% ;
float: right ;
clear: right }
div.sidebar p.rubric {
font-family: sans-serif ;
font-size: medium }
div.system-messages {
margin: 5em }
div.system-messages h1 {
color: red }
div.system-message {
border: medium outset ;
padding: 1em }
div.system-message p.system-message-title {
color: red ;
font-weight: bold }
div.topic {
margin: 2em }
h1.section-subtitle, h2.section-subtitle, h3.section-subtitle,
h4.section-subtitle, h5.section-subtitle, h6.section-subtitle {
margin-top: 0.4em }
h1.title {
text-align: center }
h2.subtitle {
text-align: center }
hr.docutils {
width: 75% }
img.align-left, .figure.align-left, object.align-left, table.align-left {
clear: left ;
float: left ;
margin-right: 1em }
img.align-right, .figure.align-right, object.align-right, table.align-right {
clear: right ;
float: right ;
margin-left: 1em }
img.align-center, .figure.align-center, object.align-center {
display: block;
margin-left: auto;
margin-right: auto;
}
table.align-center {
margin-left: auto;
margin-right: auto;
}
.align-left {
text-align: left }
.align-center {
clear: both ;
text-align: center }
.align-right {
text-align: right }
/* reset inner alignment in figures */
div.align-right {
text-align: inherit }
/* div.align-center * { */
/* text-align: left } */
.align-top {
vertical-align: top }
.align-middle {
vertical-align: middle }
.align-bottom {
vertical-align: bottom }
ol.simple, ul.simple {
margin-bottom: 1em }
ol.arabic {
list-style: decimal }
ol.loweralpha {
list-style: lower-alpha }
ol.upperalpha {
list-style: upper-alpha }
ol.lowerroman {
list-style: lower-roman }
ol.upperroman {
list-style: upper-roman }
p.attribution {
text-align: right ;
margin-left: 50% }
p.caption {
font-style: italic }
p.credits {
font-style: italic ;
font-size: smaller }
p.label {
white-space: nowrap }
p.rubric {
font-weight: bold ;
font-size: larger ;
color: maroon ;
text-align: center }
p.sidebar-title {
font-family: sans-serif ;
font-weight: bold ;
font-size: larger }
p.sidebar-subtitle {
font-family: sans-serif ;
font-weight: bold }
p.topic-title {
font-weight: bold }
pre.address {
margin-bottom: 0 ;
margin-top: 0 ;
font: inherit }
pre.literal-block, pre.doctest-block, pre.math, pre.code {
margin-left: 2em ;
margin-right: 2em }
pre.code .ln { color: grey; } /* line numbers */
pre.code, code { background-color: #eeeeee }
pre.code .comment, code .comment { color: #5C6576 }
pre.code .keyword, code .keyword { color: #3B0D06; font-weight: bold }
pre.code .literal.string, code .literal.string { color: #0C5404 }
pre.code .name.builtin, code .name.builtin { color: #352B84 }
pre.code .deleted, code .deleted { background-color: #DEB0A1}
pre.code .inserted, code .inserted { background-color: #A3D289}
span.classifier {
font-family: sans-serif ;
font-style: oblique }
span.classifier-delimiter {
font-family: sans-serif ;
font-weight: bold }
span.interpreted {
font-family: sans-serif }
span.option {
white-space: nowrap }
span.pre {
white-space: pre }
span.problematic {
color: red }
span.section-subtitle {
/* font-size relative to parent (h1..h6 element) */
font-size: 80% }
table.citation {
border-left: solid 1px gray;
margin-left: 1px }
table.docinfo {
margin: 2em 4em }
table.docutils {
margin-top: 0.5em ;
margin-bottom: 0.5em }
table.footnote {
border-left: solid 1px black;
margin-left: 1px }
table.docutils td, table.docutils th,
table.docinfo td, table.docinfo th {
padding-left: 0.5em ;
padding-right: 0.5em ;
vertical-align: top }
table.docutils th.field-name, table.docinfo th.docinfo-name {
font-weight: bold ;
text-align: left ;
white-space: nowrap ;
padding-left: 0 }
/* "booktabs" style (no vertical lines) */
table.docutils.booktabs {
border: 0px;
border-top: 2px solid;
border-bottom: 2px solid;
border-collapse: collapse;
}
table.docutils.booktabs * {
border: 0px;
}
table.docutils.booktabs th {
border-bottom: thin solid;
text-align: left;
}
h1 tt.docutils, h2 tt.docutils, h3 tt.docutils,
h4 tt.docutils, h5 tt.docutils, h6 tt.docutils {
font-size: 100% }
ul.auto-toc {
list-style-type: none }
</style>
</head>
<body>
<div class="document" id="kawipiko-blazingly-fast-static-http-server">
<h1 class="title">kawipiko -- blazingly fast static HTTP server</h1>
<h2 class="subtitle" id="kawipiko-archiver"><tt class="docutils literal"><span class="pre">kawipiko-archiver</span></tt></h2>
<pre class="literal-block">
&gt;&gt; kawipiko-archiver --help
&gt;&gt; kawipiko-archiver --man
</pre>
<pre class="literal-block">
--sources &lt;path&gt;
--archive &lt;path&gt;
--compress &lt;gzip | zopfli | brotli | identity&gt;
--compress-level &lt;number&gt;
--compress-cache &lt;path&gt;
--sources-cache &lt;path&gt;
--exclude-index
--exclude-strip
--exclude-cache
--include-etag
--exclude-slash-redirects
--include-folder-listing
--exclude-paths-index
--progress --debug
--version
--help (show this short help)
--man (show the full manual)
--sources-md5 (dump an ``md5sum`` of the sources)
--sources-cpio (dump a ``cpio.gz`` of the sources)
--sbom --sbom-text --sbom-json
</pre>
<hr class="docutils" />
<div class="section" id="flags">
<h1>Flags</h1>
<p><tt class="docutils literal"><span class="pre">--sources</span></tt></p>
<blockquote>
The path to the source folder that is the root of the static website content.</blockquote>
<p><tt class="docutils literal"><span class="pre">--archive</span></tt></p>
<blockquote>
The path to the target CDB file that contains the archived static content.</blockquote>
<p><tt class="docutils literal"><span class="pre">--compress</span></tt>, and <tt class="docutils literal"><span class="pre">--compress-level</span></tt></p>
<blockquote>
<p>Each individual file (and consequently of the corresponding HTTP response body) is compressed with either <tt class="docutils literal">gzip</tt>, <tt class="docutils literal">zopfli</tt> or <tt class="docutils literal">brotli</tt>; by default (or alternatively with <tt class="docutils literal">identity</tt>) no compression is used.</p>
<p>Even if compression is explicitly requested, if the compression ratio is bellow a certain threshold (depending on the uncompressed size), the file is stored without any compression.
(It's senseless to force the client to spend time and decompress the response body if that time is not recovered during network transmission.)</p>
<p>The compression level can be chosen, the value depending on the algorithm:</p>
<ul class="simple">
<li><tt class="docutils literal">gzip</tt> -- <tt class="docutils literal"><span class="pre">-1</span></tt> for algorithm default, <tt class="docutils literal"><span class="pre">-2</span></tt> for Huffman only, <tt class="docutils literal">0</tt> to <tt class="docutils literal">9</tt> for fast to slow;</li>
<li><tt class="docutils literal">zopfli</tt> -- <tt class="docutils literal"><span class="pre">-1</span></tt> for algorithm default, <tt class="docutils literal">0</tt> to <tt class="docutils literal">30</tt> iterations for fast to slow;</li>
<li><tt class="docutils literal">brotli</tt> -- <tt class="docutils literal"><span class="pre">-1</span></tt> for algorithm default, <tt class="docutils literal">0</tt> to <tt class="docutils literal">9</tt> for fast to slow, <tt class="docutils literal"><span class="pre">-2</span></tt> for extreme;</li>
<li>(by &quot;algorithm default&quot;, it is meant &quot;what that algorithm considers the recommended default compression level&quot;;)</li>
<li><tt class="docutils literal">kawipiko</tt> by default uses the maximum compression level for each algorithm; (i.e. <tt class="docutils literal">9</tt> for <tt class="docutils literal">gzip</tt>, <tt class="docutils literal">30</tt> for <tt class="docutils literal">zopfli</tt>, and <tt class="docutils literal"><span class="pre">-2</span></tt> for <tt class="docutils literal">brotli</tt>;)</li>
</ul>
</blockquote>
<p><tt class="docutils literal"><span class="pre">--compress-cache</span> &lt;path&gt;</tt>, and <tt class="docutils literal"><span class="pre">--sources-cache</span> &lt;path&gt;</tt></p>
<blockquote>
<p>At the given path a single file is created (that is an BBolt database), that will be used to cache the following information:</p>
<ul class="simple">
<li>in case of <tt class="docutils literal"><span class="pre">--sources-cache</span></tt>, the fingerprint of each file contents is stored, so that if the file was not changed, re-reading it shouldn't be attempted unless it is absolutely necessary; also if the file is small enough, its contents is stored in this database (deduplicated by its fingerprint);</li>
<li>in case of <tt class="docutils literal"><span class="pre">--compress-cache</span></tt> the compression outcome of each file contents is stored (deduplicated by its fingerprint), so that compression is done only once over multiple runs;</li>
</ul>
<p>Each of these caches can be safely reused between multiple related archives, especially when they have many files in common.
Each of these caches can be independently used (or shared).</p>
<p>Using these caches allows one to very quickly rebuild an archive when only a couple of files have been changed, without even touching the file-system for the unchanged ones.</p>
</blockquote>
<p><tt class="docutils literal"><span class="pre">--exclude-index</span></tt></p>
<blockquote>
Disables using <tt class="docutils literal">_index.*</tt> and <tt class="docutils literal">index.*</tt> files (where <tt class="docutils literal">.*</tt> is one of <tt class="docutils literal">.html</tt>, <tt class="docutils literal">.htm</tt>, <tt class="docutils literal">.xhtml</tt>, <tt class="docutils literal">.xht</tt>, <tt class="docutils literal">.txt</tt>, <tt class="docutils literal">.json</tt>, and <tt class="docutils literal">.xml</tt>) to respond to a request whose URL path ends in <tt class="docutils literal">/</tt> (corresponding to the folder wherein <tt class="docutils literal">_index.*</tt> or <tt class="docutils literal">index.*</tt> file is located).
(This can be used to implement &quot;slash&quot; blog style URL's like <tt class="docutils literal">/blog/whatever/</tt> which maps to <tt class="docutils literal">/blog/whatever/index.html</tt>.)</blockquote>
<p><tt class="docutils literal"><span class="pre">--exclude-strip</span></tt></p>
<blockquote>
Disables using a file with the suffix <tt class="docutils literal">.html</tt>, <tt class="docutils literal">.htm</tt>, <tt class="docutils literal">.xhtml</tt>, <tt class="docutils literal">.xht</tt>, and <tt class="docutils literal">.txt</tt> to respond to a request whose URL does not exactly match an existing file.
(This can be used to implement &quot;suffix-less&quot; blog style URL's like <tt class="docutils literal">/blog/whatever</tt> which maps to <tt class="docutils literal">/blog/whatever.html</tt>.)</blockquote>
<p><tt class="docutils literal"><span class="pre">--exclude-cache</span></tt></p>
<blockquote>
Disables adding an <tt class="docutils literal"><span class="pre">Cache-Control:</span> public, immutable, <span class="pre">max-age=3600</span></tt> header that forces the browser (and other intermediary proxies) to cache the response for an hour (the <tt class="docutils literal">public</tt> and <tt class="docutils literal"><span class="pre">max-age=3600</span></tt> arguments), and furthermore not request it even on reloads (the <tt class="docutils literal">immutable</tt> argument).</blockquote>
<p><tt class="docutils literal"><span class="pre">--include-etag</span></tt></p>
<blockquote>
<p>Enables adding an <tt class="docutils literal">ETag</tt> response header that contains the SHA256 of the response body.</p>
<p>By not including the <tt class="docutils literal">ETag</tt> header (i.e. the default), and because identical headers are stored only one, if one has many files of the same type (that in turn without <tt class="docutils literal">ETag</tt> generates the same headers), this can lead to significant reduction in stored headers blocks, including reducing RAM usage.
(At this moment it does not support HTTP conditional requests, i.e. the <tt class="docutils literal"><span class="pre">If-None-Match</span></tt>, <tt class="docutils literal"><span class="pre">If-Modified-Since</span></tt> and their counterparts; however this <tt class="docutils literal">ETag</tt> header might be used in conjuction with <tt class="docutils literal">HEAD</tt> requests to see if the resource has changed.)</p>
</blockquote>
<p><tt class="docutils literal"><span class="pre">--exclude-slash-redirects</span></tt></p>
<blockquote>
Disables adding redirects to/from paths with/without <cite>/</cite>
(For example, by default, if <cite>/file</cite> exists, then there is also a <cite>/file/</cite> redirect towards <cite>/file</cite>; and vice-versa from <cite>/folder</cite> towards <cite>/folder/</cite>.)</blockquote>
<p><tt class="docutils literal"><span class="pre">--include-folder-listing</span></tt></p>
<blockquote>
Enables the creation of an internal list of folders.</blockquote>
<p><tt class="docutils literal"><span class="pre">--exclude-paths-index</span></tt></p>
<blockquote>
Disables the creation of an internal list of references that can be used in conjunction with the <tt class="docutils literal"><span class="pre">--index-all</span></tt> flag of the <tt class="docutils literal"><span class="pre">kawipiko-server</span></tt>.</blockquote>
<p><tt class="docutils literal"><span class="pre">--progress</span></tt></p>
<blockquote>
Enables periodic reporting of various metrics.</blockquote>
<p><tt class="docutils literal"><span class="pre">--debug</span></tt></p>
<blockquote>
Enables verbose logging.
It will log various information about the archived files (including compression statistics).</blockquote>
</div>
<div class="section" id="ignored-files">
<h1>Ignored files</h1>
<ul class="simple">
<li>any file with the following prefixes: <tt class="docutils literal">.</tt>, <tt class="docutils literal">#</tt>;</li>
<li>any file with the following suffixes: <tt class="docutils literal">~</tt>, <tt class="docutils literal">#</tt>, <tt class="docutils literal">.log</tt>, <tt class="docutils literal">.tmp</tt>, <tt class="docutils literal">.temp</tt>, <tt class="docutils literal">.lock</tt>;</li>
<li>any file that contains the following: <tt class="docutils literal">#</tt>;</li>
<li>any file that exactly matches the following: <tt class="docutils literal">Thumbs.db</tt>, <tt class="docutils literal">.DS_Store</tt>;</li>
<li>(at the moment these rules are not configurable through flags;)</li>
</ul>
</div>
<div class="section" id="wildcard-files">
<h1>Wildcard files</h1>
<p>By placing a file whose name matches <tt class="docutils literal">_wildcard.*</tt> (i.e. with the prefix <tt class="docutils literal">_wildcard.</tt> and any other suffix), it will be used to respond to any request whose URL fails to find a &quot;better&quot; match.</p>
<p>These wildcard files respect the folder hierarchy, in that wildcard files in (direct or transitive) subfolders override the wildcard file in their parents (direct or transitive).</p>
<p>In addition to <tt class="docutils literal">_wildcard.*</tt>, there is also support for <tt class="docutils literal">_200.html</tt> (or just <tt class="docutils literal">200.html</tt>), plus <tt class="docutils literal">_404.html</tt> (or just <tt class="docutils literal">404.html</tt>).</p>
</div>
<div class="section" id="redirect-files">
<h1>Redirect files</h1>
<p>By placing a file whose name is <tt class="docutils literal">_redirects</tt> (or <tt class="docutils literal">_redirects.txt</tt>), it instructs the archiver to create redirect responses.</p>
<p>The syntax is quite simple:</p>
<pre class="literal-block">
# This is a comment.
# NOTE: Absolute paths are allowed only at the top of the sources folder.
/some-path https://example.com/ 301
# NOTE: Relative paths are always, and are reinterpreted as relative to the containing folder.
./some-path https://example.com/ 302
# NOTE: Redirects only for a specific domain. (The protocol is irelevant.)
# (Allowed only at the top of the sources folder.)
://example.com/some-path https://example.com/ 303
http://example.com/some-path https://example.com/ 307
https://example.com/some-path https://example.com/ 308
</pre>
</div>
<div class="section" id="symlinks-hardlinks-loops-and-duplicated-files">
<h1>Symlinks, hardlinks, loops, and duplicated files</h1>
<p>You freely use symlinks (including pointing outside of the content root) and they will be crawled during archival respecting the &quot;logical&quot; hierarchy they introduce.
(Any loop that you introduce into the hierarchy will be ignored and a warning will be issued.)</p>
<p>You can safely symlink or hardlink the same file (or folder) in multiple places (within the content hierarchy), and its data will be stored only once.
(The same applies to duplicated files that have exactly the same data.)</p>
</div>
</div>
</body>
</html>