Last active
August 30, 2024 01:30
-
-
Save laughinghan/fec2f7b72ace119e82182f40d453266a to your computer and use it in GitHub Desktop.
A table of every HTML element and some properties relevant to formatting HTML into plaintext
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
> A table of every HTML element and some properties relevant to formatting | |
> HTML into plaintext. | |
> | |
> Key: | |
> transparent | |
> elements that are noted in the HTML spec as having a "transparent content model", | |
> and aren't [replaced elements] | |
> https://developer.mozilla.org/en-US/docs/Web/CSS/Replaced_element | |
> (just <a>, <del>, <ins>, and <map>) | |
> trans-space | |
> elements with transparent content model and *are* replaced elements; will want | |
> to plaintext-ify as inline containers but with spaces around contents | |
> block | |
> noted in the HTML spec for breaking <p> elements, and non-void (only <hr>) | |
> https://html.spec.whatwg.org/multipage/syntax.html#optional-tags:the-p-element | |
> block-void | |
> breaks <p> elements and *is* void (only <hr>) | |
> block* | |
> not noted in the HTML spec for breaking <p> elements, but for our purposes | |
> we want to break lines when we encounter it. Primarily table rows (<tr>s), | |
> table row groupings (<thead> <tbody> <tfoot>), and list items (<li>s), | |
> because we want each table row or list item to plaintext-ify to its own line. | |
> (Table row groupings should only ever be immediately between a <table> and <tr> | |
> but just in case I guess.) <dialog> also falls under this but that appears to be | |
> a spec bug, bc observed behavior in browsers is it does break <p>s | |
> https://github.com/whatwg/html/issues/10590 | |
> block-void* | |
> <br>, which is considered inline-level (aka phrasing content category) by the | |
> browser when parsing HTML and does not break paragraphs, but for our purposes | |
> we do want to break lines when we encouter it | |
> inline | |
> noted in the HTML spec for as being in the "phrasing content category" and | |
> not noted for having a transparent content model; and neither void nor | |
> a replaced element. For our purposes, ends up treated the same as transparent | |
> inline-space | |
> same as inline, but *is* a replaced element (and non-void). Ends up treated | |
> same as trans-space | |
> inline-void | |
> same as inline, but void meaning empty content model or, for the purposes | |
> of plaintext-ifying, we want to ignore its contents (<template>, <slot>, | |
> <math>, <svg>) | |
> void | |
> not in the flow content category in the HTML spec, but void so we want to | |
> ignore its contents regardless | |
> Note: all "flow content" elements that aren't phrasing content and aren't transparent content model, except for <dialog>, breaks paragraphs | |
transparent <a> | |
inline <abbr> | |
block <address> | |
inline-void <area>, if it is a descendant of a <map> element | |
block <article> | |
block <aside> | |
trans-space <audio> | |
inline <b> | |
void <base> | |
inline <bdi> | |
inline <bdo> | |
block <blockquote> | |
block-void* <br> | |
inline-space <button> | |
trans-space <canvas> | |
inline <cite> | |
inline <code> | |
void <col> | |
void <colgroup> | |
inline <data> | |
inline <datalist> | |
transparent <del> | |
block <details> | |
inline <dfn> | |
block* <dialog> | |
block <div> | |
block <dl> | |
inline <em> | |
inline-void <embed> | |
block <fieldset> | |
block <figcaption> | |
block <figure> | |
block <footer> | |
block <form> | |
block <h1>-<h6> | |
block <header> | |
block <hgroup> | |
block-void <hr> | |
inline <i> | |
inline-void <iframe> | |
inline-void <img> | |
inline-void <input> | |
transparent <ins> | |
inline <kbd> | |
inline <label> | |
block* <li> | |
inline-void <link>, if the itemprop attribute is present | |
block <main> | |
transparent <map> | |
inline <mark> | |
inline-void <math> | |
block <menu> | |
inline-void <meta>, if the itemprop attribute is present | |
inline-space <meter> | |
block <nav> | |
inline-void <noscript> | |
trans-space <object> | |
block <ol> | |
inline <output> | |
void <optgroup> | |
void <option> | |
block <p> | |
inline-void <picture> | |
block <pre> | |
inline-space <progress> | |
inline-space <q> | |
inline-space <ruby> | |
inline <s> | |
inline <samp> | |
inline-void <script> | |
block <search> | |
block <section> | |
inline-void <select> | |
void <source> | |
inline-void <slot> | |
inline <small> | |
inline <span> | |
inline <strong> | |
void <style> | |
inline <sub> | |
inline <sup> | |
inline-void <svg> | |
block <table> | |
block* <thead> | |
block* <tbody> | |
block* <tfoot> | |
block* <tr> | |
inline-space <th> | |
inline-space <td> | |
inline-void <template> | |
inline-void <textarea> | |
inline <time> | |
void <title> | |
void <track> | |
inline <u> | |
block <ul> | |
inline <var> | |
trans-space <video> | |
inline-void <wbr> |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment