Last active
August 19, 2024 01:37
-
-
Save airtonix/84653b8117ac80840a74000f52f10d60 to your computer and use it in GitHub Desktop.
Partial Lark Grammar for Hledger CSV transform rules
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
// https://hledger.org/1.34/hledger.html#csv | |
// https://lark-parser.readthedocs.io/en/latest/grammar.html | |
// | |
// TODO: | |
// - IF tables (import_rule) | |
// - IF empty row (import_rule) | |
// | |
// NEEDS TESTING: | |
// - every rule except import and include | |
// | |
// Mini Help | |
// | |
// TERMINATOR: a thing that matches a thing | |
// rule: a thing that matches things and results in output nodes | |
// _UPPERCASE_WORD: means a hidden terminator | |
// "stri ngs": match exactly this and dont create output node | |
// /regex/: match this pattern | |
// %ignore: this pattern will not produce output nodes (unsure about this definition) | |
// many rules ending in a possible new line | |
start: rule* | _NEWLINE* | |
// a rule can be an "include" or an "import transform" | |
rule: source_rule | |
| separator_rule | |
| skip_rule | |
| date_format_rule | |
| timezone_rule | |
| newest_first_rule | |
| intra_day_reversed_rule | |
| decimal_mark_rule | |
| field_list_rule | |
| field_assignment_rule | |
| balance_type_rule | |
| include_rule | |
| import_rule | |
// https://hledger.org/1.34/hledger.html#source | |
source_rule: "source" source_value _NEWLINE* | |
source_value: /[^\n]+/ | |
// https://hledger.org/1.34/hledger.html#separator | |
separator_rule: "separator" separator_value _NEWLINE* | |
separator_value: /(.|TAB|SPACE)/i | |
// https://hledger.org/1.34/hledger.html#skip | |
skip_rule: "skip" skip_row_count _NEWLINE* | |
skip_row_count: /0-9/ | |
// https://hledger.org/1.34/hledger.html#date-format | |
date_format_rule: "date-format" date_format _NEWLINE* | |
date_format: /[^\n]+/ | |
// https://hledger.org/1.34/hledger.html#timezone | |
timezone_rule: "timezone" timezone_value _NEWLINE* | |
timezone_value: /[^\n]+/ | |
// https://hledger.org/1.34/hledger.html#newest-first | |
newest_first_rule: "newest-first" _NEWLINE* | |
// https://hledger.org/1.34/hledger.html#intra-day-reversed | |
intra_day_reversed_rule: "intra-day-reversed" _NEWLINE* | |
// https://hledger.org/1.34/hledger.html#decimal-mark | |
decimal_mark_rule: "decimal-mark" decimal_mark_value _NEWLINE* | |
decimal_mark_value: "," | "." | |
// https://hledger.org/1.34/hledger.html#fields-list | |
field_list_rule: "fields " (field_header|_HEADER_SEPARATOR)* _NEWLINE* | |
field_header: /\w+/ | |
_HEADER_SEPARATOR: /,\s*/ | |
// https://hledger.org/1.34/hledger.html#field-assignment | |
field_assignment_rule: assignment_field assignment_value _NEWLINE* | |
assignment_field: _HLEDGER_FIELD_KEY | |
assignment_value: /[^\n]+/ | |
// | |
balance_type_rule: "balance-type" balance_type _NEWLINE* | |
balance_type: "=" | |
| "=*" | |
| "==" | |
| "==*" | |
// Include | |
include_rule: "include" include_path _NEWLINE* | |
include_path: /[^\n]+/ | |
// import rules start with "if" | |
import_rule: _IF (match_field | match_line)+ transform+ _NEWLINE* | |
match_line: match_line_value _NEWLINE | |
match_line_value: /^[^%](.+)/im | |
match_field: (match_field_or_key|match_field_and_key) match_field_value _NEWLINE | |
match_field_or_key: /%[a-zA-Z0-9-_]+/ | |
match_field_and_key: /&\ %?[a-zA-Z0-9-_]+/ | |
match_field_value: /.+/ | |
transform: _INDENT transform_key transform_value _NEWLINE? | |
transform_key: _HLEDGER_FIELD_KEY | |
transform_value: /[^\n]+/ | |
_HLEDGER_FIELD_KEY: "date" | |
| "date2" | |
| "status" | |
| "code" | |
| "description" | |
| /comment\d?/ | |
| /account\d?/ | |
| /amount\d?/ | |
| /amount-in(-\d)?/ | |
| /amount-out(-\d)?/ | |
| /currency\d?/ | |
| /balance\d?/ | |
_WS: /^\s+$/i | |
_INDENT: " "+ | |
_IF: /if.*\n?/i | |
_MATCH_KEY: _MATCH_OR_KEY | _MATCH_AND_KEY | |
_MATCH_OR_KEY: "&" | |
_MATCH_AND_KEY: "& %" | |
%import common.NEWLINE -> _NEWLINE | |
%import common.WORD -> _WORD | |
%import common.CNAME -> _CNAME | |
%import lark.COMMENT -> _COMMENT | |
%import common.WS_INLINE -> _WS_INLINE | |
%import common.WS | |
%ignore _COMMENT _NEWLINE? | |
%ignore _WS |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
from lark import Lark | |
hledger_parser = Lark.open( | |
"hledger-csv-rule.lark", | |
rel_to=__file__, | |
parser="lalr", | |
) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
if | |
SOME DESCRIPTION | |
%a-field-name some value in field | |
account2 expenses:self:entertainment:eatingout ;something | |
comment icon:🍫 | |
if | |
SOME DESCRIPTION | |
%a-field-name some value in field | |
account2 expenses:self:entertainment:eatingout ;something | |
comment icon:🍫 | |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment