Skip to content

Instantly share code, notes, and snippets.

@netgusto
Last active April 26, 2024 08:04
Show Gist options
  • Save netgusto/f9866c8abff3672406d4 to your computer and use it in GitHub Desktop.
Save netgusto/f9866c8abff3672406d4 to your computer and use it in GitHub Desktop.
PEG.js grammar for parsing simple HTML-ish balanced markup language - use it on http://pegjs.majda.cz/online
Content = (DocType / Comment / BalancedTag / SelfClosingTag / Text)*
DocType = "<!doctype " doctype:[^>]* ">" {
return {
type: 'DocType',
content: doctype.join('')
};
}
Comment = "<!--" c:(!"-->" c:. {return c})* "-->" {
return {
type: 'Comment',
content: c.join('')
};
}
BalancedTag = startTag:StartTag content:Content endTag:EndTag {
if (startTag.name != endTag) {
throw new Error("Expected </" + startTag.name + "> but </" + endTag + "> found.");
}
return {
type: 'BalancedTag',
name: startTag.name,
attributes: startTag.attributes,
content: content
};
}
SelfClosingTag = "<" name:TagName attributes:Attributes* "/>" {
return {
type: 'SelfClosingTag',
name: name,
attributes: attributes
};
}
StartTag = "<" name:TagName attributes:Attributes* ">" {
return {
name: name,
attributes: attributes
}
}
EndTag = "</" name:TagName ">" { return name; }
Attributes = " " attributes:Attribute* { return attributes; }
Attribute = (ValuedAttribute / ValuelessAttribute)
ValuedAttribute = name:AttributeName "=" value:AttributeValue {
return {
name: name,
value: value
};
}
ValuelessAttribute = name:AttributeName {
return {
name: name,
value: null
};
}
AttributeName = chars:[a-zA-Z0-9\-]+ { return chars.join(""); }
AttributeValue = (QuotedAttributeValue / UnquotedAttributeValue)
QuotedAttributeValue = value:QuotedString { return value; }
UnquotedAttributeValue = value:decimalDigit* { return value.join(''); }
TagName = chars:[a-zA-Z0-9]+ { return chars.join(""); }
Text = chars:[^<]+ {
return {
type: 'Text',
content: chars.join("")
}
}
decimalDigit = [0-9]
QuotedString = quoteStart:('"'/"'") chars:[a-zA-Z0-9://\.-]+ quoteEnd:('"'/"'") {
if (quoteStart != quoteEnd) {
throw new Error("Unmatched quote; Expected " + quoteStart + " but " + quoteEnd + " found.");
}
return chars.join("");
}
QuotedString
= "\"\"\"" d:(stringData / "'" / $("\"" "\""? !"\""))+ "\"\"\"" {
return d.join('');
}
/ "'''" d:(stringData / "\"" / "#" / $("'" "'"? !"'"))+ "'''" {
return d.join('');
}
/ "\"" d:(stringData / "'")* "\"" { return d.join(''); }
/ "'" d:(stringData / "\"" / "#")* "'" { return d.join(''); }
stringData
= [^"'\\#]
/ "\\0" !decimalDigit { '\0' }
/ "\\0" &decimalDigit { throw new SyntaxError ['string data'], 'octal escape sequence', offset(), line(), column() }
/ "\\b" { '\b' }
/ "\\t" { '\t' }
/ "\\n" { '\n' }
/ "\\v" { '\v' }
/ "\\f" { '\f' }
/ "\\r" { '\r' }
/ "\\" c:. { c }
/ c:"#" !"{" { c }
@Galen-Yip
Copy link

hey bro, i find it can't parse the text '< ' or '>'

@bakso
Copy link

bakso commented Jul 21, 2016

www

@touzoku
Copy link

touzoku commented Nov 29, 2016

<script>var hello = a < b;</script>

fail

@ochui
Copy link

ochui commented Apr 22, 2017

QuotedString on line 93 is a duplicate rule

@kunKun-tx
Copy link

<h2>A<B</h2>

would fail

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment