Next Previous Contents

4. Text manipulation

All expressions mentioned here use / (slash) as bounding character. This can be replaced by any expect letters, numbers, blanks, underscore, hyphen and semicolon. But the beginning and the ending bounding characters must be the same, there is no way to refer to that character inside the expression.

4.1 Regular expressions

Kaptain understands regular expressions like grep or perl. For a detailed review, type man grep or man perlre at the command prompt. You can give a regular expression as a parameter to some special symbols, like this:

@regexp(m/$[0-9]*^/)

Here m/$[0-9]*^/ means that the string value in the input line must match the corresponding regular expression. In this particular case, this means the user can only type integers into the line input field.

When parentheses are found in the regular expression, a subexpression is matched which can be referred with \d where d is a digit. For example in

@multicol(m/([^[:blank:]]*)[[:blank:]]+([^[:blank:]]*)/, "First_name Last_name",
  "Albert Enstein Dr.", "Isaac Newton", "Rudolf Kepler")

the regular expression matches each separate string and the subexpressions match the first and the second word. In this case, a twocolumn listview is displayed, each line contains a name. The first column contains their first name, the second the last name. Note that the third word "Dr." in line "Albert Einstein Dr." is not matched by the second subexpression, so it is not displayed. However, when this special symbol evaluates, it displays the whole string.

4.2 Substitution

Substitution is based on regular expression matching just like in perl or sed. (For some metacharacters, sed uses different syntax.) Substitution expression needs a regular expression and a substitution string as an input:

s/regexp/subs/

For example, to replace the words "dog" in a text to "cat, just write s/dog/cat/g. That g at the end means that substitution is repeated until the regular expression cannot match. In the second part, you can refer to the matched subexpressions by \d, where d is a digit. \0 refers to the whole matched string. Thus

s/([^[:blank:]]*)[[:blank:]]+([^[:blank:]]*)/\2 \1/

swaps the first two words in the text. You can use it in listbox:

@list(s/([^[:blank:]]*)[[:blank:]]+([^[:blank:]]*)/\2 \1/,
  "Albert Enstein Dr.", "Isaac Newton", "Rudolf Kepler")

Here, the names in the listbox will appear in reverse order (this is common in Hungary) while the selected name will appear in the generated text in western style.

4.3 Translations

Translation is a very simple operation which replaces some characters with some others.

tr/abc/def/

replaces a with d, b with e, c with f.

4.4 Using text manipulation in grammar rules

If you put some substitution or translation expressions on the beginning of the right side of a rule, those are executed each time when text is generated with that rule. This means that for

no_jim_and_joe -> s/Jim/Peter/g s/Joe/Peter/g tr/+/-/ @string;

if the user writes Jim or Joe into the input box, it is replaced with Peter, and plus signs are changet to minus when the text is generated. You can only mention s/// and tr/// operations just after the arrow of a rule, but any number of such expressions can be written there. They are executed from right to left, as it is expected naturally (in my opinion).


Next Previous Contents