I wanted to compare solutions from JonSG and chepner to see if any ran particularly faster (particularly to see if chepner's ran faster), and to see if they only add the BOM (and don't mutate the text along the way).
Both failed, but for different reasons; JonSG's can easily be fixed.
My comparator:
- runs and times both functions against a 10MB UTF-8 encoded file of random text that runs the full spectrum of Unicode, minus invalid UTF-16 surrogate pairs
- reads the output and asserts the output has a BOM; also chomps the BOM leaving what should be the original UTF-8 bytes