derhuerst/_.md

## _.md

      
    Raw
  

              _.md
            
          
    curl-based HTTP mirroring script

This is a script that creatively uses the curl CLI to download an HTTP resource (colloquially "file"); It saves time & bandwidth whenever possible, but not at the expense of correctness.

Compares ETags to make sure that an unchanged resource is not transfered again, but a changed resource always is.
Requests a CE-coded (a.k.a. compressed, e.g. gzipped) representation of the resource, falling back to the "regular" one.
Supports continuation, using conditional requests, but in contrast to the -C - curl flag works with CE-coded responses, and falls back to a "full body" request.

People asked me: Why not use wget for this?

wget does not store the resource's ETag, so it cannot compare it when re-requesting.
Combining -c (continuation) and -N (timestamping using Last-Modified) don't work together.
I witnessed some subtle but significant bug in the -c (continuation) implementation once. I can't remember the details anymore, unfortunately.

server side

For all of the above features to work, you need a server that supports

serving pre-compressed sidecar files (i.e. a statically compressed file next to the original) as CE-coded;
range requests, both on the "regular" file as well as the pre-compressed one;
conditional requests, specifically If-Range with ETags.

For testing purposes, we create a test file:
yes | head -n 50000000 >/var/www/y.txt
gzip -k /var/www/y.txt
ls -lh /var/www
# -rw-r--r--   1 j  staff    95M Aug  8 16:54 y.txt
# -rw-r--r--   1 j  staff    95K Aug  8 16:54 y.txt.gz
Caddy

A recently fixed bug with a wrong ETag aside, Caddy v2 does this with the following Caddyfile:
localhost:8080 {
	root * /var/www
	file_server browse {
		precompressed gzip
	}
}
nginx

I couldn't find much about this topic, but a response on the mailing reads like nginx does not support range requests on pre-compressed files because there are (non-trivial) problems with dynamically compressed responses. 😔
It seems like nginx does not support range requests on pre-compressed files:


Note also that it's impossible to ungzip a response part if you have not preceding parts from the very start.

This as well applies to many other types of data.
The main problem with Content-Encoding and ranges is that one somehow should be able to reproduce exactly the same entity-body  (or at least make sure cache validators would change on entity-body change). This is not something trivial when you  compress on the fly with possible different compression options.
I personally think that moving towards using Transfer-Encoding would be a good step for "on the fly" compression. But browser support seems to be not here at all.

TLDR: The following nginx config file enables every aspect but the range requests:
server {
	listen 80 default_server;
	listen [::]:80 default_server;
	server_name _;

	root /var/www;
	gzip_static on;
	gzip_vary on;

	location / {
		try_files $uri $uri/ =404;
	}
}
usage

The following script

downloads the resource into a temp file (/tmp/mirror-${sha1(url)}) and stores the ETag & response headers next to it (/tmp/mirror-${sha1(url)}.etag & /tmp/mirror-${sha1(url)}-${randomHex()}.headers),
if applicable, decompresses the file (into /tmp/mirror-${sha1(url)}-${randomHex()}.decompressed),
copies the decompressed file to the actual destination path (in order to work atomically).

demo

To demonstrate that it works as intended, we abort it in between:
export LOG_LEVEL=debug

./mirror.mjs 'http://localhost:8080/y.txt' y.txt
# {
#   destPath: 'y.txt',
#   rawDestPath: '/tmp/mirror-15c86ece76',
#   headersPath: '/tmp/mirror-15c86ece76-f57746.headers',
#   etagPath: '/tmp/mirror-15c86ece76.etag'
# }
# /tmp/mirror-15c86ece76 does not exist
# /tmp/mirror-15c86ece76 does not exist, downloading "regularly" & saving ETag
# curl http://localhost:8080/y.txt -f -L -s -S -H Accept-Encoding: gzip -D /tmp/mirror-15c86ece76-f57746.headers -o /tmp/mirror-15c86ece76 --etag-save /tmp/mirror-15c86ece76.etag { stdio: [ 'ignore', 'inherit', 'inherit' ] }

# we abort the download half way:
^C
ls -lh /tmp/mirror-15c86ece76
# -rw-r--r--   1 j  staff    40M Aug  9 15:50 mirror-15c86ece76

# and then continue it by re-running the script:
./mirror.mjs 'http://localhost:8080/y.txt' y.txt
# {
#   destPath: 'y.txt',
#   rawDestPath: '/tmp/mirror-15c86ece76',
#   headersPath: '/tmp/mirror-15c86ece76-f57746.headers',
#   etagPath: '/tmp/mirror-15c86ece76.etag'
# }
# /tmp/mirror-15c86ece76 exists
# /tmp/mirror-15c86ece76 exists, continuing download
# curl http://localhost:8080/y.txt -f -L -s -S -H Accept-Encoding: gzip -D /tmp/mirror-15c86ece76-f57746.headers -o /tmp/mirror-15c86ece76 -C - -H If-Range: "rg7r1m6dvsyb" { stdio: [ 'ignore', 'inherit', 'inherit' ] }
# curl exited { status: 0, … }
# file is fully downloaded
# downloaded file is CE-coded, decompressing
# gunzip { stdio: [ 22, 23, 'inherit' ] }
# gunzip exited { status: 0, … }
# copying processed download file to destination path
# cp /tmp/mirror-15c86ece76-153154.decompressed y.txt { stdio: [ 'ignore', 'ignore', 'inherit' ] }
# cp exited { status: 0, … }
# done!

# we check if the file has been downloaded corretly:
shasum /var/www/y.txt y.txt
f1f40059b87621eca87321c4436747d75ecaebbf  /var/www/y.txt
f1f40059b87621eca87321c4436747d75ecaebbf  y.txt
Now that we have downloaded the file, let's emulate the file changing on the server by changing the ETag stored locally:
echo '"foo"' >/tmp/mirror-15c86ece76.etag

./mirror.mjs 'http://localhost:8080/y.txt' y.txt
# {
#   destPath: 'y.txt',
#   rawDestPath: '/tmp/mirror-15c86ece76',
#   headersPath: '/tmp/mirror-15c86ece76-f57746.headers',
#   etagPath: '/tmp/mirror-15c86ece76.etag'
# }
# /tmp/mirror-15c86ece76 exists
# /tmp/mirror-15c86ece76 exists, continuing download
# curl http://localhost:8080/y.txt -f -L -s -S -H Accept-Encoding: gzip -D /tmp/mirror-15c86ece76-f57746.headers -o /tmp/mirror-15c86ece76 -C - -H If-Range: "foo" { stdio: [ 'ignore', 'inherit', 'inherit' ] }
# curl: (33) HTTP server doesn't seem to support byte ranges. Cannot resume.
# curl exited { status: 33, … }
# file download couldn't be continued, server responded with 200 & full body; starting "regular" download
# curl http://localhost:8080/y.txt -f -L -s -S -H Accept-Encoding: gzip -D /tmp/mirror-15c86ece76-f57746.headers -o /tmp/mirror-15c86ece76 --etag-save /tmp/mirror-15c86ece76.etag { stdio: [ 'ignore', 'inherit', 'inherit' ] }
# curl exited { status: 0, … }
# file is fully downloaded
# downloaded file is CE-coded, decompressing
# gunzip { stdio: [ 22, 23, 'inherit' ] }
# gunzip exited { status: 0, … }
# copying exiteded download file to destination path
# cp /tmp/mirror-15c86ece76-7cb151.decompressed y.txt { stdio: [ 'ignore', 'ignore', 'inherit' ] }
# cp process { status: 0, … }
# done!
It has requested a full "regular" (re-)download, because the If-Range header has not matched, because the local ETag is different than the server one.
If we re-run it without changing the ETag again, it will refrain from re-downloading the file:
./mirror.mjs 'http://localhost:8080/y.txt' y.txt
# {
#   destPath: 'y.txt',
#   rawDestPath: '/tmp/mirror-15c86ece76',
#   headersPath: '/tmp/mirror-15c86ece76-f57746.headers',
#   etagPath: '/tmp/mirror-15c86ece76.etag'
# }
# /tmp/mirror-15c86ece76 exists
# /tmp/mirror-15c86ece76 exists, continuing download
# curl http://localhost:8080/y.txt -f -L -s -S -H Accept-Encoding: gzip -D /tmp/mirror-15c86ece76-f57746.headers -o /tmp/mirror-15c86ece76 -C - -H If-Range: "rg7r1m6dvsyb" { stdio: [ 'ignore', 'inherit', 'inherit' ] }
# curl: (22) The requested URL returned error: 416
# curl exited { status: 22, … }
# server-reported size 32601485
# downloaded size 32601485
# file is fully downloaded
# downloaded file is CE-coded, decompressing
# gunzip { stdio: [ 22, 23, 'inherit' ] }
# gunzip exited { status: 0, … }
# copying processed download file to destination path
# cp /tmp/mirror-15c86ece76-fd6f2d.decompressed y.txt { stdio: [ 'ignore', 'ignore', 'inherit' ] }
# cp exited { status: 0, … }
# done!

  
## mirror.mjs
#!/usr/bin/env node
// curl-based HTTP mirroring script
// Jannis R <[email protected]>
// from https://gist.github.com/derhuerst/745cf09fe5f3ea2569948dd215bbfe1a

import {parseArgs} from 'node:util'
import {basename} from 'node:path'
import {createHash, randomBytes} from 'node:crypto'
import {
	accessSync, constants,
	readFileSync,
	statSync,
	openSync, closeSync,
	utimesSync,
} from 'node:fs'
import {spawnSync} from 'node:child_process'
import {strictEqual} from 'node:assert'

// curl errors
// HTTP page not retrieved. The requested url was not found or returned another error with the HTTP error code being 400 or above.
const HTTP_PAGE_NOT_RETRIEVED = 22
// HTTP range error. The range "command" didn't work.
const RANGE_CMD_DIDNT_WORK = 33

const args = parseArgs({
	options: {
		help: {
			type: 'boolean',
			short: 'h',
		},
		'tmp-prefix': {
			type: 'string',
		},
		'log-level': {
			type: 'string',
			short: 'l',
		},
		'debug-curl': {
			type: 'boolean',
		},
		'times': {
			type: 'boolean',
		},
	},
	allowPositionals: true,
})
if (args.values.help) {
	process.stdout.write(`\
curl-mirror.mjs [--tmp-prefix …] [--log-level …] [--debug-curl] [--times] <url> <dest-path> [-- curl-opts...]
`)
	process.exit(0)
}

const url = args.positionals[0]
if (!url) {
	process.stderr.write('missing 1st argument: url\n')
	process.exit(1)
}
const destPath = args.positionals[1]
if (!destPath) {
	process.stderr.write('missing 2nd argument: dest-path\n')
	process.exit(1)
}
const additionalCurlArgs = args.positionals.slice(2)

const tmpPrefix = 'tmp-prefix' in args.values
	? args.values['tmp-prefix']
	: `/tmp/${basename(destPath)}.mirror-`

const ERROR = 0
const WARN = 1
const INFO = 2
const DEBUG = 3
const LOG_LEVEL = new Map([
	['warn', WARN],
	['info', INFO],
	['debug', DEBUG],
]).get(args.values['log-level'] || process.env.LOG_LEVEL) || WARN

const DEBUG_CURL = process.env.DEBUG_CURL === 'true' || Boolean(args.values['debug-curl'])

const fileExists = (path) => {
	try {
		accessSync(path, constants.R_OK | constants.W_OK) // check read/write access
		if (LOG_LEVEL >= DEBUG) console.debug(path + ' exists')
		return true
	} catch (err) {
		if (err.code !== 'ENOENT') throw err
	}
	if (LOG_LEVEL >= DEBUG) console.debug(path + ' does not exist')
	return false
}

const exitWithError = (err) => {
	if (LOG_LEVEL >= ERROR) console.error(err)
	process.exit(1)
}

const defaultIsOkExitCode = exitCode => exitCode === 0
const run = (cmd, args, opts, isOkExitCode = defaultIsOkExitCode) => {
	if (LOG_LEVEL >= DEBUG) console.debug(cmd, ...args, opts)
	const proc = spawnSync(cmd, args, opts)
	if (LOG_LEVEL >= DEBUG) console.debug(cmd, 'exited', proc)
	// for some reason, proc.error is not always populated, e.g. if curl failed with status code 22 (HTTP 416)
	if (proc.error) throw proc.error
	// so we mimick https://github.com/sindresorhus/execa/blob/c2114519066057414d47a2bed46f17df2c68219d/lib/error.js here
	if (!isOkExitCode(proc.status)) {
		const _cmd = `${cmd} ${args.join(' ')}`
		const err = new Error(`cmd failed with exit code ${proc.status}: ${_cmd}`)
		// todo: add stdout & stderr to err msg
		err.command = _cmd
		err.exitCode = proc.status
		err.stdout = proc.stdout
		err.stderr = proc.stderr
		err.process = proc
		throw err
	}
	return proc
}

const isFullyDownloaded = (destPath, responseHeaders) => {
	// https://httpwg.org/specs/rfc7233.html#header.content-range
	const unsatisfiedRange = /Content-Range:\s+bytes\s\*\/(.+)/i
	const contentRange = responseHeaders.match(unsatisfiedRange)
	if (!contentRange) return null // unknown
	const completeLength = parseInt(contentRange[1])

	const {size: bytesDownloaded} = statSync(destPath)
	if (LOG_LEVEL >= DEBUG) {
		console.debug('server-reported size', completeLength)
		console.debug('downloaded size', bytesDownloaded)
	}
	return bytesDownloaded === completeLength
}

// modified from parsehttpdate, (c) 2018-2021 Pimm "de Chinchilla" Hogeling, MIT-licensed
// https://github.com/Pimm/parseHttpDate/blob/npm-1.0.11/index.js

// (The number of seconds may start with a 6 because of leap seconds.)
const _httpDatePattern = /^[F-W][a-u]{2}, [0-3]\d (?:Jan|Feb|Mar|Apr|May|Jun|Jul|Aug|Sep|Oct|Nov|Dec) \d{4} [0-2]\d:[0-5]\d:[0-6]\d GMT$/;
//                         ⎣day o' week⎦  ⎣date ⎦ ⎣                      month                      ⎦ ⎣yr ⎦ ⎣hour ⎦ ⎣ min ⎦ ⎣ sec ⎦
//                        J F M A M J J A S O N D
const _httpMonthsNames = 'anebarprayunulugepctovec';
//                         u r c i   e y u t o e e
//                         a u h l       s e b m m
//                         r a           t m e b b
//                         y r             b r e e
//                           y             e   r r
//                                         r

// parses e.g. `Tue, 15 Nov 1994 08:12:31 GMT`
const parseHttpDate = (httpDate) => {
	if (false == _httpDatePattern.test(httpDate)) {
		return NaN
	}
	return Date.UTC(
		parseInt(httpDate.substring(12, 16), 10),
		// Skip over the first character of the month abbreviation, as we can safely detect the name by the second and third character only.
		_httpMonthsNames.indexOf(httpDate.substring(9, 11)) >> 1,
		parseInt(httpDate.substring(5, 7), 10),
		parseInt(httpDate.substring(17, 19), 10),
		parseInt(httpDate.substring(20, 22), 10),
		parseInt(httpDate.substring(23, 25), 10)
	)
}
strictEqual(parseHttpDate('Sun, 06 Nov 1994 08:49:37 GMT'), Date.parse('1994-11-06T08:49:37.000Z'))
strictEqual(parseHttpDate('Wed, 21 Oct 2015 07:28:00 GMT'), Date.parse('2015-10-21T07:28:00.000Z'))

// todo: include dest dir in hash?
// todo: include request headers in hash
const urlHash = createHash('sha256').update(url).digest('hex').slice(0, 10)
const tmpFilePath = (random = false, suffix = '') => {
	return [
		tmpPrefix,
		urlHash,
		...(random ? ['-' + randomBytes(3).toString('hex')] : []),
		...(suffix ? ['.' + suffix] : []),
	].join('')
}

const rawDestPath = tmpFilePath()

// const headersPath = tmpFilePath(true, 'headers')
const headersPath = `/tmp/mirror-${urlHash}-f57746.headers`
const readHeaders = () => {
	return readFileSync(headersPath, {encoding: 'utf8'})
}

const etagPath = tmpFilePath(false, 'etag')
const readEtag = () => {
	try {
		return readFileSync(etagPath, {encoding: 'utf8'}).trim()
	} catch (err) {
		if (err.code === 'ENOENT') return null
		throw err
	}
}

if (LOG_LEVEL >= DEBUG) {
	console.debug({destPath, rawDestPath, headersPath, etagPath})
}

// Because the HTTP RFCs define `Content-Encoding` (CE) as being a property of the entity, range requests *do not* "make sense" on CE-coded files. Therefore continuing an interrupted downloaded is only possible with a *non-CE-coded* representation of the resource. `Transfer-Encoding` would cleanly solve this problem, but unfortunately it is not widely supported in web servers and has no equivalent in HTTP/2 and HTTP/3 (yet?).
// Also, because a CE-coded entity has a different `ETag` than its un-CE-coded equivalent, we *cannot* re-use the CE-coded `ETag` to continue downloading from the un-CE-coded entity, in oder to make sure we're still downloading the same "version" of the resource!
// more details:
// - https://github.com/golang/go/issues/30829#issuecomment-476694405
// - https://github.com/httpwg/http2-spec/issues/445
// Thus, we can only use CE-coding when downloading in one go (and start over after an interruption), and support continuation *for non-CE-coded entities only*.

const baseCurlArgs = [
	url,
	'-f', // fail on HTTP errors
	'-L', // follow redirects
	...(DEBUG_CURL
		? ['-v', '-#'] // show headers & one-line progress bar
		: ['-s', '-S'] // silent mode, but show errors
	),
	'-H', 'Accept-Encoding: gzip', // request CE-coded entity (but don't decode it)
	'-D', headersPath, // dump headers into a file
	'-o', rawDestPath,
]

const curlArgs = []
if (fileExists(rawDestPath)) {
	if (LOG_LEVEL >= INFO) console.info(`${rawDestPath} exists, continuing download`)

	// $rawDestPath exists, continue downloading
	curlArgs.push('-C', '-')

	// With an *existing* ETag file and an unfinished download, curl --etag-compare *does not* continue the download, because the server reports 304 Not Modified.
	// related: https://curl.se/mail/archive-2020-03/0049.html
	// curlArgs.push('--etag-compare', etagPath)
	const etag = readEtag()
	if (etag === null) {
		curlArgs.push('--etag-save', etagPath)
	} else {
		curlArgs.push('-H', `If-Range: ${etag}`)
	}

	// todo: `-z $rawDestPath`
} else {
	if (LOG_LEVEL >= INFO) {
		console.info(`${rawDestPath} does not exist, downloading "regularly" & saving ETag`)
	}

	// With an *existing* ETag file and an unstarted download, curl --etag-compare *does not* download, because the server reports 304 Not Modified.
	// related: https://curl.se/mail/archive-2020-03/0049.html
	// curlArgs.push('--etag-compare', etagPath)

	curlArgs.push('--etag-save', etagPath)
}

try {
	const isOkExitCode = (exitCode) => [
		0,
		HTTP_PAGE_NOT_RETRIEVED,
		RANGE_CMD_DIDNT_WORK,
	].includes(exitCode)
	let curlProc = run('curl', [
		...baseCurlArgs,
		...curlArgs,
		...additionalCurlArgs,
	], {
		// todo: on HTTP_PAGE_NOT_RETRIEVED, don't let curl log to stderr
		stdio: ['ignore', 'inherit', 'inherit'],
	}, isOkExitCode)

	let headers = readHeaders()

	if (
		curlProc.status === HTTP_PAGE_NOT_RETRIEVED &&
		!isFullyDownloaded(rawDestPath, headers)
	) {
		throw new Error(`file download couldn't be continued, server responded with 416`)
	}

	// If the etag doesn't match (because the entity has changed) and the server returns the full body with 200, curl refuses to overwrite the whole file.
	if (curlProc.status === RANGE_CMD_DIDNT_WORK) {
		if (LOG_LEVEL >= INFO) {
			console.info(`file download couldn't be continued, server responded with 200 & full body; (re-)starting "regular" download`)
		}
		// We re-run curl here, with a full "regular" download. The server has a new file *anyways*, so we don't need to send an ETag.

		curlProc = run('curl', [
			...baseCurlArgs,
			'--etag-save', etagPath,
			...additionalCurlArgs,
		], {
			stdio: ['ignore', 'inherit', 'inherit'],
		})

		headers = readHeaders()
	}

	if (LOG_LEVEL >= INFO) console.info('file is fully downloaded')
	let processedPath = rawDestPath

	// todo:
	// > The HTTP/1.1 standard also recommends that the servers supporting this content-encoding should recognize x-gzip as an alias, for compatibility purposes.
	const contentEncoding = /Content-Encoding:\s+(.+)/gi.exec(headers)
	if (contentEncoding) {
		const encoding = contentEncoding[1]
		if (encoding !== 'gzip') {
			throw new Error(`invalid/unsupported Content-Encoding: ${encoding}`)
		}
		if (LOG_LEVEL >= INFO) console.info('downloaded file is CE-coded, decompressing')

		const decompressedPath = tmpFilePath(true, 'decompressed')

		const rawDestFd = openSync(processedPath, 'r')
		const decompressedFd = openSync(decompressedPath, 'wx') // fail if exists
		run('gunzip', [], {
			stdio: [
				rawDestFd, // stdin
				decompressedFd, // stdout
				'inherit',
			],
		})
		closeSync(rawDestFd)
		closeSync(decompressedFd)

		processedPath = decompressedPath
	}

	if (LOG_LEVEL >= INFO) console.info('copying processed download file to destination path')

	const runCp = (flags, src, dest) => {
		return run('cp', [...flags, src, dest], {
			stdio: ['ignore', 'ignore', 'inherit'],
		})
	}
	const cpFlags = []
	// use copy-on-write if cp & the file system support it
	try {
		if (process.platform === 'linux') { // GNU/Linux
			// note: cp from GNU coreutils 9+ does this automatically
			// see also https://unix.stackexchange.com/a/152639
			cpFlags.push('--reflink=auto')
		} else if (process.platform === 'darwin') { // macOS
			cpFlags.push('-c')
		}
		runCp(cpFlags, processedPath, destPath)
	} catch (err) {
		if (LOG_LEVEL >= DEBUG) {
			console.debug(`using copy-on-write (${cpFlags.join(' ')}) failed:`, err?.message)
			console.debug('using plain cp instead')
		}
		runCp([], processedPath, destPath)
	}

	const lastModified = /Last-Modified:\s+(.+)/gi.exec(headers)
	if (args.values.times) {
		if (lastModified === null) {
			console.warn('cannot set file mtime: response has no Last-Modified header')
		} else {
			const timeModified = parseHttpDate(lastModified[1])
			if (Number.isNaN(timeModified)) {
				console.warn('cannot set file mtime: failed to parse the Last-Modified time:', lastModified)
			} else {
				const mtime = Math.ceil(timeModified / 1000)
				if (LOG_LEVEL >= DEBUG) {
					console.debug(`changing atime & mtime to ${mtime} (${new Date(timeModified).toISOString()})`)
				}
				utimesSync(destPath, mtime, mtime)
			}
		}
	}

	if (LOG_LEVEL >= INFO) console.info('mirrored successfully!')
} catch (err) {
	exitWithError(err)
}
	#!/usr/bin/env node
	// curl-based HTTP mirroring script
	// Jannis R <[email protected]>
	// from https://gist.github.com/derhuerst/745cf09fe5f3ea2569948dd215bbfe1a

	import {parseArgs} from 'node:util'
	import {basename} from 'node:path'
	import {createHash, randomBytes} from 'node:crypto'
	import {
	accessSync, constants,
	readFileSync,
	statSync,
	openSync, closeSync,
	utimesSync,
	} from 'node:fs'
	import {spawnSync} from 'node:child_process'
	import {strictEqual} from 'node:assert'

	// curl errors
	// HTTP page not retrieved. The requested url was not found or returned another error with the HTTP error code being 400 or above.
	const HTTP_PAGE_NOT_RETRIEVED = 22
	// HTTP range error. The range "command" didn't work.
	const RANGE_CMD_DIDNT_WORK = 33

	const args = parseArgs({
	options: {
	help: {
	type: 'boolean',
	short: 'h',
	},
	'tmp-prefix': {
	type: 'string',
	},
	'log-level': {
	type: 'string',
	short: 'l',
	},
	'debug-curl': {
	type: 'boolean',
	},
	'times': {
	type: 'boolean',
	},
	},
	allowPositionals: true,
	})
	if (args.values.help) {
	process.stdout.write(`\
	curl-mirror.mjs [--tmp-prefix …] [--log-level …] [--debug-curl] [--times] <url> <dest-path> [-- curl-opts...]
	`)
	process.exit(0)
	}

	const url = args.positionals[0]
	if (!url) {
	process.stderr.write('missing 1st argument: url\n')
	process.exit(1)
	}
	const destPath = args.positionals[1]
	if (!destPath) {
	process.stderr.write('missing 2nd argument: dest-path\n')
	process.exit(1)
	}
	const additionalCurlArgs = args.positionals.slice(2)

	const tmpPrefix = 'tmp-prefix' in args.values
	? args.values['tmp-prefix']
	: `/tmp/${basename(destPath)}.mirror-`

	const ERROR = 0
	const WARN = 1
	const INFO = 2
	const DEBUG = 3
	const LOG_LEVEL = new Map([
	['warn', WARN],
	['info', INFO],
	['debug', DEBUG],
	]).get(args.values['log-level'] \|\| process.env.LOG_LEVEL) \|\| WARN

	const DEBUG_CURL = process.env.DEBUG_CURL === 'true' \|\| Boolean(args.values['debug-curl'])

	const fileExists = (path) => {
	try {
	accessSync(path, constants.R_OK \| constants.W_OK) // check read/write access
	if (LOG_LEVEL >= DEBUG) console.debug(path + ' exists')
	return true
	} catch (err) {
	if (err.code !== 'ENOENT') throw err
	}
	if (LOG_LEVEL >= DEBUG) console.debug(path + ' does not exist')
	return false
	}

	const exitWithError = (err) => {
	if (LOG_LEVEL >= ERROR) console.error(err)
	process.exit(1)
	}

	const defaultIsOkExitCode = exitCode => exitCode === 0
	const run = (cmd, args, opts, isOkExitCode = defaultIsOkExitCode) => {
	if (LOG_LEVEL >= DEBUG) console.debug(cmd, ...args, opts)
	const proc = spawnSync(cmd, args, opts)
	if (LOG_LEVEL >= DEBUG) console.debug(cmd, 'exited', proc)
	// for some reason, proc.error is not always populated, e.g. if curl failed with status code 22 (HTTP 416)
	if (proc.error) throw proc.error
	// so we mimick https://github.com/sindresorhus/execa/blob/c2114519066057414d47a2bed46f17df2c68219d/lib/error.js here
	if (!isOkExitCode(proc.status)) {
	const _cmd = `${cmd} ${args.join(' ')}`
	const err = new Error(`cmd failed with exit code ${proc.status}: ${_cmd}`)
	// todo: add stdout & stderr to err msg
	err.command = _cmd
	err.exitCode = proc.status
	err.stdout = proc.stdout
	err.stderr = proc.stderr
	err.process = proc
	throw err
	}
	return proc
	}

	const isFullyDownloaded = (destPath, responseHeaders) => {
	// https://httpwg.org/specs/rfc7233.html#header.content-range
	const unsatisfiedRange = /Content-Range:\s+bytes\s\*\/(.+)/i
	const contentRange = responseHeaders.match(unsatisfiedRange)
	if (!contentRange) return null // unknown
	const completeLength = parseInt(contentRange[1])

	const {size: bytesDownloaded} = statSync(destPath)
	if (LOG_LEVEL >= DEBUG) {
	console.debug('server-reported size', completeLength)
	console.debug('downloaded size', bytesDownloaded)
	}
	return bytesDownloaded === completeLength
	}

	// modified from parsehttpdate, (c) 2018-2021 Pimm "de Chinchilla" Hogeling, MIT-licensed
	// https://github.com/Pimm/parseHttpDate/blob/npm-1.0.11/index.js

	// (The number of seconds may start with a 6 because of leap seconds.)
	const _httpDatePattern = /^[F-W][a-u]{2}, [0-3]\d (?:Jan\|Feb\|Mar\|Apr\|May\|Jun\|Jul\|Aug\|Sep\|Oct\|Nov\|Dec) \d{4} [0-2]\d:[0-5]\d:[0-6]\d GMT$/;
	// ⎣day o' week⎦ ⎣date ⎦ ⎣ month ⎦ ⎣yr ⎦ ⎣hour ⎦ ⎣ min ⎦ ⎣ sec ⎦
	// J F M A M J J A S O N D
	const _httpMonthsNames = 'anebarprayunulugepctovec';
	// u r c i e y u t o e e
	// a u h l s e b m m
	// r a t m e b b
	// y r b r e e
	// y e r r
	// r

	// parses e.g. `Tue, 15 Nov 1994 08:12:31 GMT`
	const parseHttpDate = (httpDate) => {
	if (false == _httpDatePattern.test(httpDate)) {
	return NaN
	}
	return Date.UTC(
	parseInt(httpDate.substring(12, 16), 10),
	// Skip over the first character of the month abbreviation, as we can safely detect the name by the second and third character only.
	_httpMonthsNames.indexOf(httpDate.substring(9, 11)) >> 1,
	parseInt(httpDate.substring(5, 7), 10),
	parseInt(httpDate.substring(17, 19), 10),
	parseInt(httpDate.substring(20, 22), 10),
	parseInt(httpDate.substring(23, 25), 10)
	)
	}
	strictEqual(parseHttpDate('Sun, 06 Nov 1994 08:49:37 GMT'), Date.parse('1994-11-06T08:49:37.000Z'))
	strictEqual(parseHttpDate('Wed, 21 Oct 2015 07:28:00 GMT'), Date.parse('2015-10-21T07:28:00.000Z'))

	// todo: include dest dir in hash?
	// todo: include request headers in hash
	const urlHash = createHash('sha256').update(url).digest('hex').slice(0, 10)
	const tmpFilePath = (random = false, suffix = '') => {
	return [
	tmpPrefix,
	urlHash,
	...(random ? ['-' + randomBytes(3).toString('hex')] : []),
	...(suffix ? ['.' + suffix] : []),
	].join('')
	}

	const rawDestPath = tmpFilePath()

	// const headersPath = tmpFilePath(true, 'headers')
	const headersPath = `/tmp/mirror-${urlHash}-f57746.headers`
	const readHeaders = () => {
	return readFileSync(headersPath, {encoding: 'utf8'})
	}

	const etagPath = tmpFilePath(false, 'etag')
	const readEtag = () => {
	try {
	return readFileSync(etagPath, {encoding: 'utf8'}).trim()
	} catch (err) {
	if (err.code === 'ENOENT') return null
	throw err
	}
	}

	if (LOG_LEVEL >= DEBUG) {
	console.debug({destPath, rawDestPath, headersPath, etagPath})
	}

	// Because the HTTP RFCs define `Content-Encoding` (CE) as being a property of the entity, range requests do not "make sense" on CE-coded files. Therefore continuing an interrupted downloaded is only possible with a non-CE-coded representation of the resource. `Transfer-Encoding` would cleanly solve this problem, but unfortunately it is not widely supported in web servers and has no equivalent in HTTP/2 and HTTP/3 (yet?).
	// Also, because a CE-coded entity has a different `ETag` than its un-CE-coded equivalent, we cannot re-use the CE-coded `ETag` to continue downloading from the un-CE-coded entity, in oder to make sure we're still downloading the same "version" of the resource!
	// more details:
	// - https://github.com/golang/go/issues/30829#issuecomment-476694405
	// - https://github.com/httpwg/http2-spec/issues/445
	// Thus, we can only use CE-coding when downloading in one go (and start over after an interruption), and support continuation for non-CE-coded entities only.

	const baseCurlArgs = [
	url,
	'-f', // fail on HTTP errors
	'-L', // follow redirects
	...(DEBUG_CURL
	? ['-v', '-#'] // show headers & one-line progress bar
	: ['-s', '-S'] // silent mode, but show errors
	),
	'-H', 'Accept-Encoding: gzip', // request CE-coded entity (but don't decode it)
	'-D', headersPath, // dump headers into a file
	'-o', rawDestPath,
	]

	const curlArgs = []
	if (fileExists(rawDestPath)) {
	if (LOG_LEVEL >= INFO) console.info(`${rawDestPath} exists, continuing download`)

	// $rawDestPath exists, continue downloading
	curlArgs.push('-C', '-')

	// With an existing ETag file and an unfinished download, curl --etag-compare does not continue the download, because the server reports 304 Not Modified.
	// related: https://curl.se/mail/archive-2020-03/0049.html
	// curlArgs.push('--etag-compare', etagPath)
	const etag = readEtag()
	if (etag === null) {
	curlArgs.push('--etag-save', etagPath)
	} else {
	curlArgs.push('-H', `If-Range: ${etag}`)
	}

	// todo: `-z $rawDestPath`
	} else {
	if (LOG_LEVEL >= INFO) {
	console.info(`${rawDestPath} does not exist, downloading "regularly" & saving ETag`)
	}

	// With an existing ETag file and an unstarted download, curl --etag-compare does not download, because the server reports 304 Not Modified.
	// related: https://curl.se/mail/archive-2020-03/0049.html
	// curlArgs.push('--etag-compare', etagPath)

	curlArgs.push('--etag-save', etagPath)
	}

	try {
	const isOkExitCode = (exitCode) => [
	0,
	HTTP_PAGE_NOT_RETRIEVED,
	RANGE_CMD_DIDNT_WORK,
	].includes(exitCode)
	let curlProc = run('curl', [
	...baseCurlArgs,
	...curlArgs,
	...additionalCurlArgs,
	], {
	// todo: on HTTP_PAGE_NOT_RETRIEVED, don't let curl log to stderr
	stdio: ['ignore', 'inherit', 'inherit'],
	}, isOkExitCode)

	let headers = readHeaders()

	if (
	curlProc.status === HTTP_PAGE_NOT_RETRIEVED &&
	!isFullyDownloaded(rawDestPath, headers)
	) {
	throw new Error(`file download couldn't be continued, server responded with 416`)
	}

	// If the etag doesn't match (because the entity has changed) and the server returns the full body with 200, curl refuses to overwrite the whole file.
	if (curlProc.status === RANGE_CMD_DIDNT_WORK) {
	if (LOG_LEVEL >= INFO) {
	console.info(`file download couldn't be continued, server responded with 200 & full body; (re-)starting "regular" download`)
	}
	// We re-run curl here, with a full "regular" download. The server has a new file anyways, so we don't need to send an ETag.

	curlProc = run('curl', [
	...baseCurlArgs,
	'--etag-save', etagPath,
	...additionalCurlArgs,
	], {
	stdio: ['ignore', 'inherit', 'inherit'],
	})

	headers = readHeaders()
	}

	if (LOG_LEVEL >= INFO) console.info('file is fully downloaded')
	let processedPath = rawDestPath

	// todo:
	// > The HTTP/1.1 standard also recommends that the servers supporting this content-encoding should recognize x-gzip as an alias, for compatibility purposes.
	const contentEncoding = /Content-Encoding:\s+(.+)/gi.exec(headers)
	if (contentEncoding) {
	const encoding = contentEncoding[1]
	if (encoding !== 'gzip') {
	throw new Error(`invalid/unsupported Content-Encoding: ${encoding}`)
	}
	if (LOG_LEVEL >= INFO) console.info('downloaded file is CE-coded, decompressing')

	const decompressedPath = tmpFilePath(true, 'decompressed')

	const rawDestFd = openSync(processedPath, 'r')
	const decompressedFd = openSync(decompressedPath, 'wx') // fail if exists
	run('gunzip', [], {
	stdio: [
	rawDestFd, // stdin
	decompressedFd, // stdout
	'inherit',
	],
	})
	closeSync(rawDestFd)
	closeSync(decompressedFd)

	processedPath = decompressedPath
	}

	if (LOG_LEVEL >= INFO) console.info('copying processed download file to destination path')

	const runCp = (flags, src, dest) => {
	return run('cp', [...flags, src, dest], {
	stdio: ['ignore', 'ignore', 'inherit'],
	})
	}
	const cpFlags = []
	// use copy-on-write if cp & the file system support it
	try {
	if (process.platform === 'linux') { // GNU/Linux
	// note: cp from GNU coreutils 9+ does this automatically
	// see also https://unix.stackexchange.com/a/152639
	cpFlags.push('--reflink=auto')
	} else if (process.platform === 'darwin') { // macOS
	cpFlags.push('-c')
	}
	runCp(cpFlags, processedPath, destPath)
	} catch (err) {
	if (LOG_LEVEL >= DEBUG) {
	console.debug(`using copy-on-write (${cpFlags.join(' ')}) failed:`, err?.message)
	console.debug('using plain cp instead')
	}
	runCp([], processedPath, destPath)
	}

	const lastModified = /Last-Modified:\s+(.+)/gi.exec(headers)
	if (args.values.times) {
	if (lastModified === null) {
	console.warn('cannot set file mtime: response has no Last-Modified header')
	} else {
	const timeModified = parseHttpDate(lastModified[1])
	if (Number.isNaN(timeModified)) {
	console.warn('cannot set file mtime: failed to parse the Last-Modified time:', lastModified)
	} else {
	const mtime = Math.ceil(timeModified / 1000)
	if (LOG_LEVEL >= DEBUG) {
	console.debug(`changing atime & mtime to ${mtime} (${new Date(timeModified).toISOString()})`)
	}
	utimesSync(destPath, mtime, mtime)
	}
	}
	}

	if (LOG_LEVEL >= INFO) console.info('mirrored successfully!')
	} catch (err) {
	exitWithError(err)
	}