Sunday, March 23, 2014

Braced

I've found that the Braced package was undocumented. First of all, I've added the proper unit tests for it, and thus renamed it from Triceps::X::Braced to Triceps::Braced (the X namespace is for the packages with limited testing). I've also renamed the methods, replacing the word "quote" with "escape", to make their meaning clearer.  And here goes the documentation section that I wrote for the upcoming manual:


The package Braced is designed to parse the Tcl-like nested lists where the elements are separated by whitespace, and braces are used to enquote the elements with spaces in them. These lists are used to write the pipelines that form the Tql queries. For example:

{read table tWindow} {project fields {symbol price}} {print tokenized 0}

These lists can then be parsed into elements, and the elements might be also lists that could be parsed into elements and so on. The spaces between the braces are optional, braces also serve as separators. For example, the following lines are equivalent:

a b c
{a} {b} {c}
{a}{b}{c}
{a}b{c}

In case if a brace character needs to be included into one of the strings, they can be escaped by backslashes, for example:

{a\{} b\}c

Any other Perl backslash escapes, such as \n or \x20, work too. The quote characters have no special meaning, they don't need to be escaped and they don't group the words. For example, the following two are equivalent:

"a b c"
{"a} {b} {c"}

Escaping the spaces (\ ) provides another way to combine the words into one element. The following two are equivalent:

{a b c}
a\ b\ c

 There is no need for the nested escaping. The characters need to be escaped only once, and then the resulting strings can be wrapped into any number of brace levels.

All the methods in this module are static, there are no objects.

$string = $data;
@elements = Triceps::Braced::raw_split_braced($string)
confess "Unbalanced braces around '$string'" if $string;

Split the string into the braced elements. If any of the elements were enclosed into their own braces, these braces are left in place, the element string will still contain them. For example, a {b} {c d} will be split into a, {b}, {c d}. No unescaping is done, the escaped characters are passed through as-is. This method of splitting is rarely used, it's present as a baseline.

The original string argument will be fully consumed. If anything is left unconsumed, this is an indication of a syntax error, with unbalanced braces. The argument may not be a constant because it gets modified.

$string = $data;
@elements = Triceps::Braced::split_braced($string)
confess "Unbalanced braces around '$string'" if $string;

Split the string into the braced elements. If any of the elements were enclosed into their own braces, these braces will be removed from the results. For example, a {b} {c d} will be split into a, b, c d. No unescaping is done, the escaped characters are passed through as-is. This is the normal method of splitting, it allows the elements to be split further recursively.

The original string argument will be fully consumed. If anything is left unconsumed, this is an indication of a syntax error, with unbalanced braces. The argument may not be a constant because it gets modified.

$result = Triceps::Braced::bunescape($string);

Un-escape a string by processing all the escape characters in it. This step is normally done last, after all the splitting is done. The result will become unsuitable for the future splitting because the escaped characters will lose their special meaning. If any literal braces are present in the argument, they will pass through to the result as literals. For example, {a \{b } will become {a {b }.

@results = Triceps::Braced::bunescape_all(@strings);

Perform the un-escaping on a whole array of strings. The result array will contain the same number of elements as the argument.

$ref_results = Triceps::Braced::split_braced_final($string);
confess "Unbalanced braces around '$string'" if $string;

The combined functionality of splitting a string and un-escaping the result elements. That's why it's final: no further splits must be done after un-escaping. The return value is different from the other split methods. It is a reference to the array of result strings. The difference has been introduced to propagate the undef from the argument to the result: if the argument string is undef, the result will be also undef, not a reference to an empty array. The string gets consumed in the same way as for the other split methods, and anything left in it indicates an unbalanced brace.

No comments:

Post a Comment