Skip to content

Instantly share code, notes, and snippets.

@maaku
Last active May 30, 2022 14:33
Show Gist options
  • Save maaku/8996338 to your computer and use it in GitHub Desktop.
Save maaku/8996338 to your computer and use it in GitHub Desktop.
*.o
decode-ntxid
encode-ntxid
verify-crc-gp

  BIP: X
  Title: Base-32 error correction coding
  Author: Mark Friedenbach <[email protected]>
  Status: Draft
  Type: Standards Track
  Created: 20-02-2014

Table of Contents

Abstract

The BIP proposes an human-centered encoding format for base-32 data serialization. It differs from the presumptive default hexadecimal or base58 encodings in the following ways:

1. Visually distinctive in that it includes the full range of alphanumeric digits in its base-32 encoding, except the characters 0, l, v, and 2. which are too easily confused with 1, i, u, r, or z in font or handwriting.

2. Automatic correction of up to 1 transcription error per 31 coded digits (130 bits of payload data). For a 256-bit hash or secret key, this enables seamless recovery from up to two transcription errors so long as they occur in separate halves of the coded representation.

3. Highly probable detection of errors beyond the error correction threshold, with a false negative rate on the order of 25 bits, or 1 in 33 million likelihood.

4. Case-insensitive encoding ensures that it may be displayed in an easier to read uniform case, and it is faster and more comfortable to vocally read off a base-32 encoded number than the alternatives of hexadecimal or base58.

In addition to the error correction code transformation of base-32 data, a padding scheme is specified for extending numbers or bit vectors of any length to a multiple of 5 bits suitable for base-32 encoding.

z-base-32

The bitcoin reference client already has one implementation of base-32 encoding following the RFC 3548 standard, using the following alphabet:

    const char *pbase32 = "abcdefghijklmnopqrstuvwxyz234567";

For error correction coded strings this BIP specifies usage of Phil Zimmermann's z-base-32 encoding alphabet[1], which provides better resistance to transcriptive errors than the RFC 3548 standard:

    const char *pzbase32 = "ybndrfg8ejkmcpqxot1uwisza345h769";

The same RFC 3548 coder is used for z-base-32, except that unnecessary '=' padding characters are stripped before performing the alphabet substitution. For example, the hexadecimal string 'ae653be0049be3' is RFC 3548 encoded as 'vzstxyaetprq====', and z-base-32 encoded as 'i31uzayruxto'.

CRC-5-USB error correction coding

Herein we describe an error correction encoding using cyclic redundancy check polynomial division[2], which requires 5 error correction digits per 26 digits of input, instead of the theoretically optimal 4, but is much, much easier to implement correctly then available non-patented error correction codes. Cyclic redundancy check polynomial division provides a very straightforward, patent-free mechanism for reliably detecting transcription errors in input, and performing up to 1-digit corrections per 26 digit block.

Encoding

The input to this error correction encoder is a sequence of 26 base-32 digits. These digits are decoded into 5-bit unsigned integers with values equal to their offset into the base-32 alphabet string. If the input is less than 26 digits in length, it is extended with zero-valued digits. If For example, the string 'vzstxyaetprq' using the RFC 3548 alphabet becomes the code point sequence:

    <21 25 18 19 23 24 0 4 19 15 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0>'
     |--------------input-------------|---------padding----------|

Expositionally it helps to think of this array as a 26-element column vector of 5-bit binary integers:

    <0b10101
     0b11001
     0b10010
     0b10011
     0b10111
     0b11000
     0b00000
     0b00100
     0b10011
     0b01111
     0b10001
     0b10000
     ... 14 zero elements ...>

If we explode the bits of each element into 5, 1-bit columns, we get a 26 x 5 matrix:

    <1 0 1 0 1
     1 1 0 0 1
     1 0 0 1 0
     1 0 0 1 1
     1 0 1 1 1
     1 1 0 0 0
     0 0 0 0 0
     0 0 1 0 0
     1 0 0 1 1
     0 1 1 1 1
     1 0 0 0 1
     1 0 0 0 0
     ... 14 x 5 zero elements ...>

The array is then transposed, such that we get a 5 x 26 matrix where each row represents the 5th, 4th, 3rd, 2nd, or 1st bit, respectfully, of each element:

    <1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
     1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0>
     |---------input--------|----------padding---------|

We then use each of these rows separately as input into a cyclic redundancy check polynomial division operation, using the CRC-5-USB generating polynomial x^5 + x^2 + 1. The result is 5 element column vector:

    <10111
     01000
     10010
     00111
     00110>

The elements of this vector are then exploded horizontally, and affixed to the end of the bit matrix.

    <1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
     0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
     1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
     0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
     1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0>
     |---------input--------|----------padding----------|--crc---|

At this point, calculating the CRC checksum for each row should result in zero:

    <0 0 0 0 0>

Now we reverse the process in order to encode the output. First the padding bits are removed:

    <1 1 1 1 1 1 0 0 1 0 1 1 1 0 1 1 1
     0 1 0 0 0 1 0 0 0 1 0 0 0 1 0 0 0
     1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0
     0 0 1 1 1 0 0 0 1 1 0 0 0 0 1 1 1
     1 1 0 1 1 0 0 0 1 1 1 0 0 0 1 1 0>
     |---------input--------|--crc---|

Then the matrix is once again transposed, to yield an (l+5) x 5 matrix, where l is the number of digits in the original input:

<1 0 1 0 1 -

 1 1 0 0 1  |
 1 0 0 1 0  |
 1 0 0 1 1  |
 1 0 1 1 1  i
 1 1 0 0 0  n
 0 0 0 0 0  p
 0 0 1 0 0  u
 1 0 0 1 1  t
 0 1 1 1 1  |
 1 0 0 0 1  |
 1 0 0 0 0  -
 1 0 1 0 0  -
 0 1 0 0 0  c
 1 0 0 1 1  r
 1 0 1 1 1  c
 1 0 0 1 0> -

And the rows are imploded:

    <21 25 18 19 23 24 0 4 19 15 17 16 20 8 19 23 18>'
     |--------------input-------------|----crc-----|

And the result is converted into z-base-32: 'i31uzayruxtoweuz1'.

Decoding and error recovery

The process of decoding and error detection and recovery is similar to encoding, and this section will not explain steps that were adequately covered in the encoder description.

First, the input is converted from z-base-32 into a sequence of up to 31 (26+5) 5-bit integers, with zero-valued padding inserted between the end of the input and the 5-digit checksum. Using our running example of 'i31uzayruxtoweuz1', this result is the following:

    <21 25 18 19 23 24 0 4 19 15 17 16 0 0 0 0 0 0 0 0 0 0 0 0 0 0 20 8 19 23 18>'
     |--------------input-------------|----------padding----------|----crc-----|

The binary representation of each element is exploded horizontally, and the matrix transposed to yield the following 5 x 31 bit matrix:

    <1 1 1 1 1 1 0 0 1 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 1 1 1
     0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
     1 0 0 0 1 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0
     0 0 1 1 1 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
     1 1 0 1 1 0 0 0 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 0>
     |---------input--------|----------padding----------|--crc---|

A CRC calculation of each row should yield zero if the encoded string was transcribed correctly. If, however, a digit was wrong, that would result in some or all of the bits of the corresponding column being flipped. A property of the CRC generating polynomial used is that in the case of a single digit error, each row with a flipped bit would result in the same checksum, and that checksum value can be used as the index into a table indicating which bit was flipped:

    // EC[0] has no meaning, because offsets are 1-based
    const unsigned char EC[32] =
        { 32,  4,  3, 17,  2, 30, 16, 24,
           1,  6, 29,  8, 15, 27, 23, 12,
           0, 25,  5, 18, 28, 13,  7,  9,
          14, 10, 26, 19, 22, 21, 11, 20 };

If all checksum values are zero, the decoder assumes the input is correct. If all non-zero checksum values are equal valued, the decoder assumes the corresponding digit was transcribed incorrectly and flips the bit in that column of the rows with non-zero checksums.

If any of the five checksum values are non-zero, and not equal to each other, then there is an unrecoverable error in the input. The probability of two or more digits being incorrect and yet by chance the checksums being zero or equal valued is less than 1 in 33 million.

Now that errors have been detected and single-digit errors corrected, the padding bits and CRC checksum bits are removed. The matrix is transposed, it's rows imploded, and the resulting sequence of up to 26 characters converted into base-32 using the RFC 3548 alphabet: 'vzstxyaetprq'.

CodedBase32 integer encoding

Although providing an error correction coder for base-32 data interesting and useful in contexts where base-32 is already deployed, many applications involve encoding of integers which are powers of two in length. This section provides a standard scheme for the encoding of any sized bitstring into a multiple of 5 bits in length, suitable for direct encoding into base-32.

First the size in bits of the integer is rounded up to the next highest power of two greater than or equal to 128. This value with a factor of 128 removed is known as n, and its base-2 logorithm as e. Pseudocode:

    int n = max(next_power_of_two(BITS), 128) / 128;
    int e = log2(n);

A total of 2*n padding bits are prefixed to the bitstring. These consist of e 1 bits, a single 0 bit, and 2*n-e-1 user-specified bits (the "extra" field).

The bitstring is now a multiple of 5 in length and can be directly converted into base-32.

Padding and encoding an integer with error correction bits results in a base-32 string which is a multiple of 31 digits in length. Since the length is known, the decoder is permissive and allow up to n base-32 digits to be prefixed to the encoding, which are stripped off prior to processing. This allows extra application specific identifying digits to be prefixed, rounding the length of the encoded string out to a multiple of 32.

References

[1] http://philzimmermann.com/docs/human-oriented-base-32-encoding.txt [2] http://www.drdobbs.com/article/print?articleId=184401662&siteSectionName=

#include <assert.h>
#include <iostream>
using namespace std;
#include <boost/crc.hpp>
#include <boost/foreach.hpp>
#include "common.hpp"
const char *pzbase32 = "ybndrfg8ejkmcpqxot1uwisza345h769";
const int decode_zbase32_table[256] =
{ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, 18, -1, 25, 26, 27, 30, 29, 7, 31, -1, -1, -1, -1, -1, -1,
-1, 24, 1, 12, 3, 8, 5, 6, 28, 21, 9, 10, -1, 11, 2, 16,
13, 14, 4, 22, 17, 19, -1, 20, 15, 0, 23, -1, -1, -1, -1, -1,
-1, 24, 1, 12, 3, 8, 5, 6, 28, 21, 9, 10, -1, 11, 2, 16,
13, 14, 4, 22, 17, 19, -1, 20, 15, 0, 23, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1 };
unsigned char EC[31] =
{ 4, 3, 17, 2, 30, 16, 24, 1,
6, 29, 8, 15, 27, 23, 12, 0,
25, 5, 18, 28, 13, 7, 9, 14,
10, 26, 19, 22, 21, 11, 20 };
string AddErrorCorrectionCode32(const char *pch, size_t len)
{
int gp = (0x12<<1) + 1; // CRC-5-USB
boost::crc_basic<5> code(gp & 0x1f);
int n = (len+25)/26;
// The error correcting encoder take input in batches of 26 digits.
// We zero-extend the input if necessary to be a multiple of 26.
string input(pch, len);
input.resize(26*n, pbase32[0]);
// We will build the output string one batch at a time, appending
// the 26 input digits + 5 digit checksum.
string strRet;
strRet.resize(31*n);
int i, b, r, c;
for (i=0; i < n; ++i) {
// The packet will be where the 26 digits of input +
// 5 checksum digits are stored
vector<boost::uint_t<5>::fast> pkt(31);
// Copy over the 26 input digits, converting from rfc-3548
transform(input.begin()+26*i,
input.begin()+26*(i+1),
pkt.begin(),
Rfc3548Digit);
#if DEBUG
for (r=0; r < 26; ++r)
clog << "pkt[" << r << "]=" << static_cast<int>(pkt[r]) << endl;
#endif
boost::uint_t<31>::fast msg[5] = {};
_transpose_pkt_to_msg(pkt, msg);
#if DEBUG
for (r=0; r < 5; ++r)
clog << "msg[" << r << "]=" << static_cast<int>(msg[r]) << endl;
#endif
// Calculate the checksum of the first 26 bits.
boost::uint_t<5>::fast cs[5];
for (c=0; c < 5; ++c) {
code.reset();
for (r=0; r < 26; ++r)
code.process_bit(msg[c] & (1<<r));
cs[c] = code.checksum();
}
#if DEBUG
for (c=0; c < 5; ++c)
clog << "cs[" << c << "]=" << static_cast<int>(cs[c]) << endl;
#endif
// Transpose the checksum and insert back into the
// last five digits of the packet structure, MSB-first.
// Loop is unrolled for clarity and speed.
pkt[26] = (cs[0]&0x10)
| (cs[1]&0x10)>>1
| (cs[2]&0x10)>>2
| (cs[3]&0x10)>>3
| (cs[4]&0x10)>>4;
pkt[26+1] = (cs[0]&0x08)<<1
| (cs[1]&0x08)
| (cs[2]&0x08)>>1
| (cs[3]&0x08)>>2
| (cs[4]&0x08)>>3;
pkt[26+2] = (cs[0]&0x04)<<2
| (cs[1]&0x04)<<1
| (cs[2]&0x04)
| (cs[3]&0x04)>>1
| (cs[4]&0x04)>>2;
pkt[26+3] = (cs[0]&0x02)<<3
| (cs[1]&0x02)<<2
| (cs[2]&0x02)<<1
| (cs[3]&0x02)
| (cs[4]&0x02)>>1;
pkt[26+4] = (cs[0]&0x01)<<4
| (cs[1]&0x01)<<3
| (cs[2]&0x01)<<2
| (cs[3]&0x01)<<1
| (cs[4]&0x01);
#if DEBUG
for (r=26; r < 31; ++r)
clog << "pkt[" << r << "]=" << static_cast<int>(pkt[r]) << endl;
// Verify that the checksum is in fact zero once the remainder
// has been processed (this should always be the case, but was
// written as a debugging check to make sure that the endianness
// is right and the transpositions performed correctly).
for (c=0; c < 5; ++c) {
code.reset();
for (r=0; r < 31; ++r)
code.process_bit(pkt[r] & (1<<(4-c)));
if (code.checksum()) {
cerr << "ERROR" << endl;
exit(1);
}
}
#endif
// Transform into z-base-32 encoding.
transform(pkt.begin(), pkt.end(),
strRet.begin()+31*i,
ZBase32Code);
}
return strRet;
}
string AddErrorCorrectionCode32(const string& str)
{
return AddErrorCorrectionCode32(str.c_str(), str.size());
}
string RemoveErrorCorrectionCode32(
const char *pch, size_t len,
bool* pfInvalid, map<size_t, char> *pMapErrors)
{
int gp = (0x12<<1) + 1; // CRC-5-USB
boost::crc_basic<5> code(gp & 0x1f);
int n = (len+30)/31;
// The error correcting decoder take input in batches of 31 codes.
string input(pch, len);
assert(len%31 == 0);
// We will reconstruct the output string one batch at a time,
// appending the 26 decoded digits from each 31 digit input.
string strRet;
strRet.resize(26*n);
if (pMapErrors)
pMapErrors->clear();
int i, b, f, r, c;
for (i=0; i < n; ++i) {
// The packet is the 26 digits of payload + 5 checksum digits.
vector<boost::uint_t<5>::fast> pkt(31);
// Copy over the 31 input digits, converting from z-base-32
transform(input.begin()+31*i,
input.begin()+31*(i+1),
pkt.begin(),
ZBase32Digit);
#if DEBUG
for (r=0; r < 31; ++r)
clog << "pkt[" << r << "]=" << static_cast<int>(pkt[r]) << endl;
#endif
boost::uint_t<31>::fast msg[5] = {};
_transpose_pkt_to_msg(pkt, msg);
#if DEBUG
for (c=0; c < 5; ++c)
clog << "msg[" << c << "]=" << static_cast<int>(msg[c]) << endl;
#endif
// Calculate checksums
boost::uint_t<5>::fast cs[5];
for (c=0; c < 5; ++c) {
code.reset();
for (r=0; r < 31; ++r)
code.process_bit(msg[c] & (1<<r));
cs[c] = code.checksum();
}
#if DEBUG
for (c=0; c < 5; ++c)
clog << "cs[" << c << "]=" << static_cast<int>(cs[c]) << endl;
#endif
for (f=0, c=0; c < 5; ++c)
if (cs[c])
f |= 1 << EC[cs[c]-1];
if (f) {
if (f != 1<<(ffs(f)-1)) {
if (pfInvalid) {
*pfInvalid = true;
strRet.resize(i*26);
return strRet;
} else {
throw -1; // FIXME: use some sort of exception object to
// indicate that multiple uncorrectable errors.
}
}
b = ffs(f) - 1;
for (c=0; c < 5; ++c)
if (cs[c])
msg[c] ^= f;
if (pMapErrors)
pMapErrors->insert(pair<size_t, char>(31*i+b,
ZBase32Code((msg[0]&f) >> (b-4) |
(msg[1]&f) >> (b-3) |
(msg[2]&f) >> (b-2) |
(msg[3]&f) >> (b-1) |
(msg[4]&f) >> b )));
}
#if DEBUG
for (r=0; r < 5; ++r)
clog << "msg[" << r << "]=" << static_cast<int>(msg[r]) << endl;
// Verify that the checksum is in fact zero now that any bit errors
// have been corrected.
for (c=0; c < 5; ++c) {
code.reset();
for (r=0; r < 31; ++r)
code.process_bit(msg[c] & (1<<r));
if (code.checksum()) {
cerr << "ERROR" << endl;
exit(1);
}
}
#endif
_transpose_msg_to_pkt(msg, pkt);
#if DEBUG
for (r=26; r < 31; ++r)
clog << "pkt[" << r << "]=" << static_cast<int>(pkt[r]) << endl;
// Verify that the checksum is in fact zero once the remainder
// has been processed (this should always be the case, but was
// written as a debugging check to make sure that the endianness
// is right and the transpositions performed correctly).
for (c=0; c < 5; ++c) {
code.reset();
for (r=0; r < 31; ++r)
code.process_bit(pkt[r] & (1<<(4-c)));
if (code.checksum()) {
cerr << "ERROR" << endl;
exit(1);
}
}
#endif
// Transform into rfc-3548 encoding.
transform(pkt.begin(), pkt.begin()+26,
strRet.begin()+26*i,
Rfc3548Code);
}
return strRet;
}
string RemoveErrorCorrectionCode32(
const string& str,
bool* pfInvalid, map<size_t, char> *pMapErrors)
{
return RemoveErrorCorrectionCode32(str.c_str(), str.size(), pfInvalid, pMapErrors);
}
// ---------------------------------------------------------------------
// BEGIN DEFINITIONS COPIED FROM BITCOIN SOURCE TREE
// ---------------------------------------------------------------------
const char p_util_hexdigit[256] =
{ -1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
0, 1, 2, 3, 4, 5, 6, 7, 8, 9,-1,-1,-1,-1,-1,-1,
-1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,0xa,0xb,0xc,0xd,0xe,0xf,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,
-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1,-1, };
bool IsHex(const string& str)
{
BOOST_FOREACH(char c, str)
{
if (HexDigit(c) < 0)
return false;
}
return (str.size() > 0) && (str.size()%2 == 0);
}
vector<unsigned char> ParseHex(const char* psz)
{
// convert hex dump to vector
vector<unsigned char> vch;
while (true)
{
while (isspace(*psz))
psz++;
signed char c = HexDigit(*psz++);
if (c == (signed char)-1)
break;
unsigned char n = (c << 4);
c = HexDigit(*psz++);
if (c == (signed char)-1)
break;
n |= c;
vch.push_back(n);
}
return vch;
}
vector<unsigned char> ParseHex(const string& str)
{
return ParseHex(str.c_str());
}
const char *pbase32 = "abcdefghijklmnopqrstuvwxyz234567";
const int decode32_table[256] =
{ -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 26, 27, 28, 29, 30, 31, -1, -1, -1, -1,
-1, -1, -1, -1, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14,
15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, -1, -1, -1, -1, -1, -1, 0, 1, 2,
3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22,
23, 24, 25, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1,
-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, };
string EncodeBase32(const unsigned char* pch, size_t len)
{
string strRet="";
strRet.reserve((len+4)/5*8);
int mode=0, left=0;
const unsigned char *pchEnd = pch+len;
while (pch<pchEnd)
{
int enc = *(pch++);
switch (mode)
{
case 0: // we have no bits
strRet += pbase32[enc >> 3];
left = (enc & 7) << 2;
mode = 1;
break;
case 1: // we have three bits
strRet += pbase32[left | (enc >> 6)];
strRet += pbase32[(enc >> 1) & 31];
left = (enc & 1) << 4;
mode = 2;
break;
case 2: // we have one bit
strRet += pbase32[left | (enc >> 4)];
left = (enc & 15) << 1;
mode = 3;
break;
case 3: // we have four bits
strRet += pbase32[left | (enc >> 7)];
strRet += pbase32[(enc >> 2) & 31];
left = (enc & 3) << 3;
mode = 4;
break;
case 4: // we have two bits
strRet += pbase32[left | (enc >> 5)];
strRet += pbase32[enc & 31];
mode = 0;
}
}
static const int nPadding[5] = {0, 6, 4, 3, 1};
if (mode)
{
strRet += pbase32[left];
for (int n=0; n<nPadding[mode]; n++)
strRet += '=';
}
return strRet;
}
string EncodeBase32(const string& str)
{
return EncodeBase32((const unsigned char*)str.c_str(), str.size());
}
vector<unsigned char> DecodeBase32(const char* p, bool* pfInvalid)
{
if (pfInvalid)
*pfInvalid = false;
vector<unsigned char> vchRet;
vchRet.reserve((strlen(p))*5/8);
int mode = 0;
int left = 0;
while (1)
{
int dec = decode32_table[(unsigned char)*p];
if (dec == -1) break;
p++;
switch (mode)
{
case 0: // we have no bits and get 5
left = dec;
mode = 1;
break;
case 1: // we have 5 bits and keep 2
vchRet.push_back((left<<3) | (dec>>2));
left = dec & 3;
mode = 2;
break;
case 2: // we have 2 bits and keep 7
left = left << 5 | dec;
mode = 3;
break;
case 3: // we have 7 bits and keep 4
vchRet.push_back((left<<1) | (dec>>4));
left = dec & 15;
mode = 4;
break;
case 4: // we have 4 bits, and keep 1
vchRet.push_back((left<<4) | (dec>>1));
left = dec & 1;
mode = 5;
break;
case 5: // we have 1 bit, and keep 6
left = left << 5 | dec;
mode = 6;
break;
case 6: // we have 6 bits, and keep 3
vchRet.push_back((left<<2) | (dec>>3));
left = dec & 7;
mode = 7;
break;
case 7: // we have 3 bits, and keep 0
vchRet.push_back((left<<5) | dec);
mode = 0;
break;
}
}
if (pfInvalid)
switch (mode)
{
case 0: // 8n base32 characters processed: ok
break;
case 1: // 8n+1 base32 characters processed: impossible
case 3: // +3
case 6: // +6
*pfInvalid = true;
break;
case 2: // 8n+2 base32 characters processed: require '======'
if (left || p[0] != '=' || p[1] != '=' || p[2] != '=' || p[3] != '=' || p[4] != '=' || p[5] != '=' || decode32_table[(unsigned char)p[6]] != -1)
*pfInvalid = true;
break;
case 4: // 8n+4 base32 characters processed: require '===='
if (left || p[0] != '=' || p[1] != '=' || p[2] != '=' || p[3] != '=' || decode32_table[(unsigned char)p[4]] != -1)
*pfInvalid = true;
break;
case 5: // 8n+5 base32 characters processed: require '==='
if (left || p[0] != '=' || p[1] != '=' || p[2] != '=' || decode32_table[(unsigned char)p[3]] != -1)
*pfInvalid = true;
break;
case 7: // 8n+7 base32 characters processed: require '='
if (left || p[0] != '=' || decode32_table[(unsigned char)p[1]] != -1)
*pfInvalid = true;
break;
}
return vchRet;
}
string DecodeBase32(const string& str)
{
vector<unsigned char> vchRet = DecodeBase32(str.c_str());
return string((const char*)&vchRet[0], vchRet.size());
}
// ---------------------------------------------------------------------
// END DEFINITIONS COPIED FROM BITCOIN SOURCE TREE
// ---------------------------------------------------------------------
#ifndef COMMON_H_
#define COMMON_H_
#define DEBUG 0
#include <map>
#include <string>
#include <vector>
#include <boost/integer.hpp>
// ---------------------------------------------------------------------
// BEGIN DECLARATIONS COPIED FROM BITCOIN SOURCE TREE
// ---------------------------------------------------------------------
extern const char p_util_hexdigit[256];
inline char HexDigit(char c)
{
return p_util_hexdigit[(unsigned char)c];
}
bool IsHex(const std::string& str);
std::vector<unsigned char> ParseHex(const char* psz);
std::vector<unsigned char> ParseHex(const std::string& str);
template<typename T>
std::string HexStr(const T itbegin, const T itend, bool fSpaces=false)
{
std::string rv;
static const char hexmap[16] = { '0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', 'a', 'b', 'c', 'd', 'e', 'f' };
rv.reserve((itend-itbegin)*3);
for(T it = itbegin; it < itend; ++it)
{
unsigned char val = (unsigned char)(*it);
if(fSpaces && it != itbegin)
rv.push_back(' ');
rv.push_back(hexmap[val>>4]);
rv.push_back(hexmap[val&15]);
}
return rv;
}
extern const char *pbase32;
extern const int decode32_table[256];
std::string EncodeBase32(const unsigned char* pch, size_t len);
std::string EncodeBase32(const std::string& str);
std::vector<unsigned char> DecodeBase32(const char* p, bool* pfInvalid=NULL);
std::string DecodeBase32(const std::string& str);
// ---------------------------------------------------------------------
// END DECLARATIONS COPIED FROM BITCOIN SOURCE TREE
// ---------------------------------------------------------------------
inline char Rfc3548Code(boost::uint_t<5>::fast digit)
{ return pbase32[digit]; }
inline boost::uint_t<5>::fast Rfc3548Digit(char c)
{ return decode32_table[(unsigned char)c]; }
extern const char *pzbase32;
extern const int decode_zbase32_table[256];
inline char ZBase32Code(boost::uint_t<5>::fast digit)
{ return pzbase32[digit]; }
inline boost::uint_t<5>::fast ZBase32Digit(char c)
{ return decode_zbase32_table[(unsigned char)c]; }
extern unsigned char EC[31];
// Transpose the input, creating 5 integer values each 31 bits long.
// The first integer contains the 1st bits of each input code point.
// The second integer contains the 2nd bits of each input code point.
// Etc. This allows for correctable burst errors of up to 5 bits on
// a code point boundary (e.g. an incorrect base32 digit).
inline void _transpose_pkt_to_msg(
const std::vector<boost::uint_t<5>::fast> &pkt,
boost::uint_t<31>::fast* msg)
{
fill_n(msg, 5, 0);
int r, c;
for (r=0; r < 31; ++r)
for (c=0; c < 5; ++c)
msg[c] |= ((pkt[r] >> (4-c)) & 1) << r;
}
inline void _transpose_msg_to_pkt(
const boost::uint_t<31>::fast* msg,
std::vector<boost::uint_t<5>::fast> &pkt)
{
fill(pkt.begin(), pkt.end(), 0);
int r, c;
for (c=0; c < 5; ++c)
for (r=0; r < 31; ++r)
if (msg[c] & 1<<r)
pkt[r] |= 1 << (4-c);
}
std::string AddErrorCorrectionCode32(const char *pch, size_t len);
std::string AddErrorCorrectionCode32(const std::string& str);
std::string RemoveErrorCorrectionCode32(const char *pch, size_t len, bool* pfInvalid=NULL, std::map<size_t, char> *pMapErrors=NULL);
std::string RemoveErrorCorrectionCode32(const std::string& str, bool* pfInvalid=NULL, std::map<size_t, char> *pMapErrors=NULL);
#endif // COMMON_H_
#include <iostream>
#include <string>
#include <vector>
using namespace std;
#include <strings.h> // for ffs
#include "common.hpp"
const char *hexdigits = "0123456789abcdef";
int
main
(int argc, char **argv)
{
const int BITS=256;
// The length of the payload for coding is always a power-of-2
// multiple of 128. The parameter e is the solution to the equations
// size=128*n and n=2^e. Most or all of the following bit-twiddling
// should be optimized away by any decent compiler. See:
// <http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2>
int n, e;
e = (BITS+127) / 128 - 1;
e = (e >> 1) | e;
e = (e >> 2) | e;
e = (e >> 4) | e;
e = (e >> 8) | e;
e = (e >> 16) | e;
e = ffs(e + 1) - 1;
n = 1 << e;
int a, i, j;
for (a=1; a < argc; ++a) {
// Deserialize the normative transaction ID from the command line
string ntxid(argv[a]);
int padding = ntxid.size() - 31*n;
if (n < padding) {
cerr << "invalid ntxid: " << ntxid << " (improper length)"
<< endl;
exit(1);
}
// Drop prefix characters "tx", if present
string payload = ntxid.substr(ntxid.size()-31*n, string::npos);
bool fInvalid = false;
map<size_t, char> mapErrors;
payload = RemoveErrorCorrectionCode32(payload, &fInvalid, &mapErrors);
if (fInvalid) {
cerr << "invalid ntxid: " << ntxid << " (fInvalid=true)"
<< endl;
exit(1);
}
// Reverse encoding, moving prefix to end
// and flipping endianness of hash
reverse(payload.begin(), payload.end());
// Add zero-valued padding to achieve 8-bit alignment
payload.resize((payload.size()+7)&~7, pbase32[0]);
payload = DecodeBase32(payload);
#if DEBUG
clog << "payload=";
string::iterator itr;
for (itr = payload.begin(); itr != payload.end(); ++itr) {
clog << hexdigits[(unsigned char)((*itr>>4)&0xf)];
clog << hexdigits[(unsigned char)( *itr &0xf)];
}
clog << endl;
#endif
// Skip to the end of the payload, after the 32-byte transaction hash.
string::iterator meta = payload.begin()+BITS/8;
// Check prefix bits encoding the length of the payload.
for (i=0, j=2*n-1; i<e; ++i, --j) {
if (!(*(meta + j/8) & 1 << (7 - (j&7)))) {
cerr << "invalid ntxid: " << ntxid
<< " (prefix[" << i << "] bit is 0, not 1)"
<< endl;
exit(1);
}
}
if (*(meta + j/8) & 1 << (7 - (j&7))) {
cerr << "invalid ntxid: " << ntxid
<< " (prefix[" << i << "] bit is 1, not 0)"
<< endl;
exit(1);
}
// Extract nExtra field, which is 2*n-e-1 bits in length.
int nExtra = 0;
for (i=2*n-e-2; i >= 0; --i) {
j = 7 - (i&7);
nExtra <<= 1;
nExtra |= (*(meta + i/8) & (1 << j)) >> j;
}
payload.resize(BITS/8);
for (meta=payload.begin(); meta != payload.end(); ++meta) {
cout << hexdigits[(unsigned char)((*meta>>4)&0xf)];
cout << hexdigits[(unsigned char)( *meta &0xf)];
}
cout << " nExtra=" << nExtra;
if (mapErrors.size()) {
cout << " {";
map<size_t, char>::iterator iter;
for (iter=mapErrors.begin(); iter != mapErrors.end(); ++iter) {
if (iter != mapErrors.begin())
cout << ", ";
cout << padding + iter->first << ":\'"
<< iter->second << "\'";
}
cout << "}";
}
cout << endl;
}
}
#include <iostream>
#include <string>
#include <vector>
using namespace std;
#include <strings.h> // for ffs
#include "common.hpp"
int
main
(int argc, char **argv)
{
const int BITS=256;
// The length of the payload for coding is always a power-of-2
// multiple of 128. The parameter e is the solution to the equations
// size=128*n and n=2^e. Most or all of the following bit-twiddling
// should be optimized away by any decent compiler. See:
// <http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2>
int n, e;
e = (BITS+127) / 128 - 1;
e = (e >> 1) | e;
e = (e >> 2) | e;
e = (e >> 4) | e;
e = (e >> 8) | e;
e = (e >> 16) | e;
e = ffs(e + 1) - 1;
n = 1 << e;
int a, i, j;
for (a=1; a < argc; ++a) {
// An extra data payload, maximum 2 bits in length, which is included
// in the serialization in order to achieve correct base32 padding.
// Currenly has no use and set to zero.
int nExtra=0;
// Deserialize the transaction ID received on the command line
// from hex to binary.
string txid(argv[a]);
vector<unsigned char> pn = ParseHex(txid.c_str());
if (pn.size() != BITS/8) {
cerr << "invalid txid: " << txid
<< endl;
exit(1);
}
// Our correction codes are able to encode 130-bit payloads,
// giving us 4 extra bits to work with for a 256-bit txid hash.
// First we extend the size of the buffer use to hold the payload
// (and convert it into a std::string).
string payload(pn.begin(), pn.end());
payload.resize((130*n+7)/8); // 65 bits per 64 bits input
// Skip to the end of the payload, after the 32-byte transaction hash.
string::iterator meta = payload.begin()+BITS/8;
// Add the nExtra field, which is 2*n-e-1 bits in length
for (i=0; i<2*n-e-1; ++i, nExtra>>=1)
*(meta + i/8) |= (nExtra&1) << (7 - (i&7));
// Add prefix bits encoding the length of the payload.
for (i=0, j=2*n-1; i<e; ++i, --j)
*(meta + j/8) |= 1 << (7 - (j&7));
string b32 = EncodeBase32(payload); // Encode base-32
b32.resize(26*n); // strip redundant padding
// Reverse encoding, moving prefix to front
// and flipping endianness of hash
reverse(b32.begin(), b32.end());
// Output normalized/canonical hash with error correction
// encoding, and "tx" prefix (to round us out to 64 bytes).
cout << "tx" << AddErrorCorrectionCode32(b32) << endl;
}
}
CXXFLAGS := -I/opt/local/include
PROGS := verify-crc-gp encode-ntxid decode-ntxid
all: ${PROGS}
clean:
rm ${PROGS} *.o
verify-crc-gp: verify-crc-gp.o
$(CXX) -o $@ verify-crc-gp.o
encode-ntxid: common.hpp common.o encode-ntxid.o
$(CXX) -o $@ common.o encode-ntxid.o
decode-ntxid: common.hpp common.o decode-ntxid.o
$(CXX) -o $@ common.o decode-ntxid.o
#include <strings.h>
#include <algorithm>
#include <iostream>
#include <string>
#include <boost/crc.hpp>
using namespace std;
int
main
( int argc, char **argv )
{
unsigned char EC[32] = {0xff};
int gp = (0x12<<1) + 1, hpo2 = 32;
boost::crc_basic<5> code(gp & 0x1f);
clog << "testing " << gp;
// see how long the cycle is from 1 to 1
int i=1;
code.reset(i);
do
{ code.process_bit(true); i += 1; }
while (code.checksum() != 1);
clog << " cyc:" << i;
if (i == hpo2)
clog << " winner!" << endl;
// build error corrective table
int r, b, cs, pkt;
fill_n(EC, 32, 0x7f);
for (r=0; r < 31; ++r) {
code.reset();
for (b=0; b < 31; ++b)
code.process_bit(b==r);
EC[code.checksum()] = r;
}
clog << "EC table for gp = " << gp << endl;
clog << "index position of" << endl;
clog << " the bad bit" << endl;
clog << " ----" << endl;
for (r=1; r < hpo2; ++r) {
clog << " ";
if (r<10) clog << " ";
clog << r << " | ";
if (EC[r]<10) clog << " ";
clog << static_cast<int>(EC[r])
<< " |" << endl;
}
int j, msg;
for (i=0; i < (1<<26); ++i) {
// Reset to zero CRC
code.reset();
// Add payload
for (b=0; b < 26; ++b)
code.process_bit((i>>b)&1);
// Add checksum
cs = code.checksum();
for (j=4; j >= 0; --j)
code.process_bit((cs>>j)&1);
// Sanity check that the CRC actually works...
if (code.checksum()) {
cerr << "ERROR: i=" << i
<< " cs=" << cs
<< " res=" << static_cast<int>(code.checksum())
<< ": CRC failure" << endl;
exit(1);
}
// Compute message
msg = i;
msg |= (cs&0x01)<<30
| (cs&0x02)<<28
| (cs&0x04)<<26
| (cs&0x08)<<24
| (cs&0x10)<<22;
// Sanity check that our message packing is correct...
code.reset();
for (b=0; b < 31; ++b)
code.process_bit((msg>>b)&1);
if (code.checksum()) {
cerr << "ERROR: i=" << i
<< " cs=" << cs
<< " msg=" << msg
<< " res=" << static_cast<int>(code.checksum())
<< ": message pack failure" << endl;
exit(1);
}
// Verify that a single bit error is correctly identified
for (r=0; r < 31; ++r) {
pkt = msg ^ 1<<r;
code.reset();
for (b=0; b < 31; ++b)
code.process_bit((pkt>>b)&1);
if (EC[code.checksum()] != r) {
cerr << "ERROR: i=" << i
<< " r=" << r
<< " cs=" << cs
<< " msg=" << msg
<< " pkt=" << pkt
<< " res=" << static_cast<int>(code.checksum())
<< " EC=" << static_cast<int>(EC[code.checksum()])
<< ": wrong bit error detected" << endl;
exit(1);
}
}
if (!(i&((1<<18)-1)))
clog << "Progress: " << (i>>18) << " of " << (1<<(26-18)) << endl;
}
return 0;
}
@jonasschnelli
Copy link

[09:25:29] jonasschnelli: have you seen maaku's base-32 encoding? https://gist.github.com/maaku/8996338

[09:28:24] jonasschnelli: for something like that I'd probably also add a DAMM check digit... at the time maaku was writing that I wasn't aware of quasigroup error detection schemes.

I think adding a DAMM check digit wouldn't hurt.

My plans is to use this "bip" for a hdwallet master seed handwrite "export/backup".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment