diff --git a/documentation/manuals/archiver.1.man b/documentation/manuals/archiver.1.man index 4b5245b..8d35914 100644 --- a/documentation/manuals/archiver.1.man +++ b/documentation/manuals/archiver.1.man @@ -139,7 +139,7 @@ Using these caches allows one to very quickly rebuild an archive when only a cou \fB\-\-exclude\-index\fP .INDENT 0.0 .INDENT 3.5 -Disables using \fBindex.*\fP files (where \fB\&.*\fP is one of \fB\&.html\fP, \fB\&.htm\fP, \fB\&.xhtml\fP, \fB\&.xht\fP, \fB\&.txt\fP, \fB\&.json\fP, and \fB\&.xml\fP) to respond to a request whose URL path ends in \fB/\fP (corresponding to the folder wherein \fBindex.*\fP file is located). +Disables using \fB_index.*\fP and \fBindex.*\fP files (where \fB\&.*\fP is one of \fB\&.html\fP, \fB\&.htm\fP, \fB\&.xhtml\fP, \fB\&.xht\fP, \fB\&.txt\fP, \fB\&.json\fP, and \fB\&.xml\fP) to respond to a request whose URL path ends in \fB/\fP (corresponding to the folder wherein \fB_index.*\fP or \fBindex.*\fP file is located). (This can be used to implement "slash" blog style URL\(aqs like \fB/blog/whatever/\fP which maps to \fB/blog/whatever/index.html\fP\&.) .UNINDENT .UNINDENT @@ -223,6 +223,35 @@ any file that exactly matches the following: \fBThumbs.db\fP, \fB\&.DS_Store\fP; By placing a file whose name matches \fB_wildcard.*\fP (i.e. with the prefix \fB_wildcard.\fP and any other suffix), it will be used to respond to any request whose URL fails to find a "better" match. .sp These wildcard files respect the folder hierarchy, in that wildcard files in (direct or transitive) subfolders override the wildcard file in their parents (direct or transitive). +.sp +In addition to \fB_wildcard.*\fP, there is also support for \fB_200.html\fP (or just \fB200.html\fP), plus \fB_404.html\fP (or just \fB404.html\fP). +.SH REDIRECT FILES +.sp +By placing a file whose name is \fB_redirects\fP (or \fB_redirects.txt\fP), it instructs the archiver to create redirect responses. +.sp +The syntax is quite simple: +.INDENT 0.0 +.INDENT 3.5 +.sp +.nf +.ft C +# This is a comment. + +# NOTE: Absolute paths are allowed only at the top of the sources folder. +/some\-path https://example.com/ 301 + +# NOTE: Relative paths are always, and are reinterpreted as relative to the containing folder. +\&./some\-path https://example.com/ 302 + +# NOTE: Redirects only for a specific domain. (The protocol is irelevant.) +# (Allowed only at the top of the sources folder.) +://example.com/some\-path https://example.com/ 303 +http://example.com/some\-path https://example.com/ 307 +https://example.com/some\-path https://example.com/ 308 +.ft P +.fi +.UNINDENT +.UNINDENT .SH SYMLINKS, HARDLINKS, LOOPS, AND DUPLICATED FILES .sp You freely use symlinks (including pointing outside of the content root) and they will be crawled during archival respecting the "logical" hierarchy they introduce. diff --git a/documentation/manuals/archiver.rst b/documentation/manuals/archiver.rst index e659c95..a23113a 100644 --- a/documentation/manuals/archiver.rst +++ b/documentation/manuals/archiver.rst @@ -88,7 +88,7 @@ Flags ``--exclude-index`` - Disables using ``index.*`` files (where ``.*`` is one of ``.html``, ``.htm``, ``.xhtml``, ``.xht``, ``.txt``, ``.json``, and ``.xml``) to respond to a request whose URL path ends in ``/`` (corresponding to the folder wherein ``index.*`` file is located). + Disables using ``_index.*`` and ``index.*`` files (where ``.*`` is one of ``.html``, ``.htm``, ``.xhtml``, ``.xht``, ``.txt``, ``.json``, and ``.xml``) to respond to a request whose URL path ends in ``/`` (corresponding to the folder wherein ``_index.*`` or ``index.*`` file is located). (This can be used to implement "slash" blog style URL's like ``/blog/whatever/`` which maps to ``/blog/whatever/index.html``.) ``--exclude-strip`` @@ -152,6 +152,34 @@ By placing a file whose name matches ``_wildcard.*`` (i.e. with the prefix ``_wi These wildcard files respect the folder hierarchy, in that wildcard files in (direct or transitive) subfolders override the wildcard file in their parents (direct or transitive). +In addition to ``_wildcard.*``, there is also support for ``_200.html`` (or just ``200.html``), plus ``_404.html`` (or just ``404.html``). + + + + +Redirect files +.............. + +By placing a file whose name is ``_redirects`` (or ``_redirects.txt``), it instructs the archiver to create redirect responses. + +The syntax is quite simple: + +:: + + # This is a comment. + + # NOTE: Absolute paths are allowed only at the top of the sources folder. + /some-path https://example.com/ 301 + + # NOTE: Relative paths are always, and are reinterpreted as relative to the containing folder. + ./some-path https://example.com/ 302 + + # NOTE: Redirects only for a specific domain. (The protocol is irelevant.) + # (Allowed only at the top of the sources folder.) + ://example.com/some-path https://example.com/ 303 + http://example.com/some-path https://example.com/ 307 + https://example.com/some-path https://example.com/ 308 + diff --git a/documentation/manuals/archiver.txt b/documentation/manuals/archiver.txt index 1648a75..ad198ef 100644 --- a/documentation/manuals/archiver.txt +++ b/documentation/manuals/archiver.txt @@ -97,11 +97,12 @@ FLAGS the file-system for the unchanged ones. --exclude-index - Disables using index.* files (where .* is one of .html, .htm, - .xhtml, .xht, .txt, .json, and .xml) to respond to a request whose - URL path ends in / (corresponding to the folder wherein index.* file - is located). (This can be used to implement "slash" blog style - URL's like /blog/whatever/ which maps to /blog/whatever/index.html.) + Disables using _index.* and index.* files (where .* is one of .html, + .htm, .xhtml, .xht, .txt, .json, and .xml) to respond to a request + whose URL path ends in / (corresponding to the folder wherein + _index.* or index.* file is located). (This can be used to + implement "slash" blog style URL's like /blog/whatever/ which maps + to /blog/whatever/index.html.) --exclude-strip Disables using a file with the suffix .html, .htm, .xhtml, .xht, and @@ -170,6 +171,29 @@ WILDCARD FILES files in (direct or transitive) subfolders override the wildcard file in their parents (direct or transitive). + In addition to _wildcard.*, there is also support for _200.html (or + just 200.html), plus _404.html (or just 404.html). + +REDIRECT FILES + By placing a file whose name is _redirects (or _redirects.txt), it + instructs the archiver to create redirect responses. + + The syntax is quite simple: + + # This is a comment. + + # NOTE: Absolute paths are allowed only at the top of the sources folder. + /some-path https://example.com/ 301 + + # NOTE: Relative paths are always, and are reinterpreted as relative to the containing folder. + ./some-path https://example.com/ 302 + + # NOTE: Redirects only for a specific domain. (The protocol is irelevant.) + # (Allowed only at the top of the sources folder.) + ://example.com/some-path https://example.com/ 303 + http://example.com/some-path https://example.com/ 307 + https://example.com/some-path https://example.com/ 308 + SYMLINKS, HARDLINKS, LOOPS, AND DUPLICATED FILES You freely use symlinks (including pointing outside of the content root) and they will be crawled during archival respecting the "logical" diff --git a/sources/cmd/archiver/manual.txt b/sources/cmd/archiver/manual.txt index 1648a75..ad198ef 100644 --- a/sources/cmd/archiver/manual.txt +++ b/sources/cmd/archiver/manual.txt @@ -97,11 +97,12 @@ FLAGS the file-system for the unchanged ones. --exclude-index - Disables using index.* files (where .* is one of .html, .htm, - .xhtml, .xht, .txt, .json, and .xml) to respond to a request whose - URL path ends in / (corresponding to the folder wherein index.* file - is located). (This can be used to implement "slash" blog style - URL's like /blog/whatever/ which maps to /blog/whatever/index.html.) + Disables using _index.* and index.* files (where .* is one of .html, + .htm, .xhtml, .xht, .txt, .json, and .xml) to respond to a request + whose URL path ends in / (corresponding to the folder wherein + _index.* or index.* file is located). (This can be used to + implement "slash" blog style URL's like /blog/whatever/ which maps + to /blog/whatever/index.html.) --exclude-strip Disables using a file with the suffix .html, .htm, .xhtml, .xht, and @@ -170,6 +171,29 @@ WILDCARD FILES files in (direct or transitive) subfolders override the wildcard file in their parents (direct or transitive). + In addition to _wildcard.*, there is also support for _200.html (or + just 200.html), plus _404.html (or just 404.html). + +REDIRECT FILES + By placing a file whose name is _redirects (or _redirects.txt), it + instructs the archiver to create redirect responses. + + The syntax is quite simple: + + # This is a comment. + + # NOTE: Absolute paths are allowed only at the top of the sources folder. + /some-path https://example.com/ 301 + + # NOTE: Relative paths are always, and are reinterpreted as relative to the containing folder. + ./some-path https://example.com/ 302 + + # NOTE: Redirects only for a specific domain. (The protocol is irelevant.) + # (Allowed only at the top of the sources folder.) + ://example.com/some-path https://example.com/ 303 + http://example.com/some-path https://example.com/ 307 + https://example.com/some-path https://example.com/ 308 + SYMLINKS, HARDLINKS, LOOPS, AND DUPLICATED FILES You freely use symlinks (including pointing outside of the content root) and they will be crawled during archival respecting the "logical"