Skip to content

Instantly share code, notes, and snippets.

@khanzadimahdi
Last active December 1, 2024 19:17
Show Gist options
  • Save khanzadimahdi/bab8a3416bdb764b9eda5b38b35735b8 to your computer and use it in GitHub Desktop.
Save khanzadimahdi/bab8a3416bdb764b9eda5b38b35735b8 to your computer and use it in GitHub Desktop.
regex pattern base64 data uri according to RFC 2397

pattern:

^data:((?:\w+\/(?:(?!;).)+)?)((?:;[\w\W]*?[^;])*),(.+)$

test this pattern on regexr: https://regexr.com/4inht

regex pattern to match RFC 2397 data URL

syntax:

dataurl := "data:" [ mediatype ] [ ";base64" ] "," data mediatype := [ type "/" subtype ] *( ";" parameter ) data := *urlchar parameter := attribute "=" value

examples:

example1: simple



example2: with meta key=value

data:image/jpeg;key=value;base64,UEsDBBQAAAAI

example3: without base64 key name

data:image/jpeg;key=value,UEsDBBQAAAAI

example4: without mime-type

data:;base64;sdfgsdfgsdfasdfa=s,UEsDBBQAAAAI

example5: without mime-type , base64 and meta key=value

data:,UEsDBBQAAAAI

@Gpinchon
Copy link

Gpinchon commented Feb 1, 2021

Maybe I am wrong but from wikipedia (and RFC 2397), base64 should not be before parameters, therefore exemple4 should not validate
https://en.wikipedia.org/wiki/Data_URI_scheme
Here is the regex I figured out : https://regex101.com/r/6DOoB1/1
The same regex but with a short list of tests : https://regexr.com/5lf3v

@khanzadimahdi
Copy link
Author

example 4 is valid. you can use it in your browser.

@Gpinchon
Copy link

Gpinchon commented Feb 2, 2021

I suppose the browser fixes it, but tools like this one seem to reject it...
https://bit.dev/chriso/validator-js/is-data-uri
Are characters [;=,] valid for the data part of a data uri ?

@khanzadimahdi
Copy link
Author

yes. according to https://tools.ietf.org/html/rfc2397
the syntax is like the below

   dataurl    := "data:" [ mediatype ] [ ";base64" ] "," data
   mediatype  := [ type "/" subtype ] *( ";" parameter )
   data       := *urlchar
   parameter  := attribute "=" value

as you see, parameter is attribute = value and before a parameter we must have a ; mark.

@khanzadimahdi
Copy link
Author

and about https://bit.dev/chriso/validator-js/is-data-uri you should know, it is a NPM package. and any packages could have bugs.

@Gpinchon
Copy link

Gpinchon commented Feb 2, 2021

Ok, thanks for the clarifications there 👍
My understanding was that parameters, as being part of [mediatype] had to be placed before the [";base64"] and after the [type"/"subtype] tokens if present, but I guess I was wrong.

@Gpinchon
Copy link

Gpinchon commented Feb 2, 2021

and about https://bit.dev/chriso/validator-js/is-data-uri you should know, it is a NPM package. and any packages could have bugs.

Sure, if you know a more reliable tool to validate data uris I would be very interrested

@wfoojjaec
Copy link

Even in valid URL data is allowed to be empty, so it might be a good idea to replace
,(.+)$ to ,(.*)$

@thced
Copy link

thced commented Sep 1, 2021

One should perhaps also be humble to admit that one can have own bugs; I would second that example #4 is wrong: data:;base64;sdfgsdfgsdfasdfa=s,UEsDBBQAAAAI is not valid afaik.

The reason for me saying so is that if you look at the standard you posted, you see that:

  1. attribute = value is a parameter
  2. a parameter is placed after type / subtype as part of media-type
  3. a parameter is prefixed by a semi-colon
  4. a parameter is optional
  5. a media-type is always placed before ";base64"

A valid example according to these statements would be: data:;sdfgsdfgsdfasdfa=s;base64,UEsDBBQAAAAI

I may of course be wrong.. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment