STATES(1) STATES STATES(1)
NAME
states - awk alike text processing tool
SYNOPSIS
states [-hV] [-D var=val] [-f file] [-o outputfile] [-s
startstate] [-W level] [filename ...]
DESCRIPTION
States is an awk-alike text processing tool with some
state machine extensions. It is designed for program
source code highlighting and to similar tasks where state
information helps input processing.
At a single point of time, States is in one state, each
quite similar to awk's work environment, they have regular
expressions which are matched from the input and actions
which are executed when a match is found. From the action
blocks, states can perform state transitions; it can move
to another state from which the processing is continued.
State transitions are recorded so states can return to the
calling state once the current state has finished.
The biggest difference between states and awk, besides
state machine extensions, is that states is not line-ori-
ented. It matches regular expression tokens from the
input and once a match is processed, it continues process-
ing from the current position, not from the beginning of
the next input line.
OPTIONS
-D var=val, --define=var=val
Define variable var to have string value val.
Command line definitions overwrite variable defi-
nitions found from the config file.
-f file, --file=file
Read state definitions from file file. As a
default, states tries to read state definitions
from file states.st in the current working direc-
tory.
-h, --help
Print short help message and exit.
-o file, --output=file
Save output to file file instead of printing it to
stdout.
-s state, --state=state
Start execution from state state. This definition
overwrites start state resolved from the start
block.
-V, --version
Print states version and exit.
-W level, --warning=level
Set the warning level to level. Possible values
for level are:
light light warnings (default)
all all warnings
STATES PROGRAM FILES
States program files can contain on start block,
startrules and namerules blocks to specify the initial
state, state definitions and expressions.
The start block is the main() of the states program, it is
executed on script startup for each input file and it can
perform any initialization the script needs. It normally
also calls the check_startrules() and check_namerules()
primitives which resolve the initial state from the input
file name or the data found from the begining of the input
file. Here is a sample start block which initializes two
variables and does the standard start state resolving:
start
{
a = 1;
msg = "Hello, world!";
check_startrules ();
check_namerules ();
}
Once the start block is processed, the input processing is
continued from the initial state.
The initial state is resolved by the information found
from the startrules and namerules blocks. Both blocks
contain regular expression - symbol pairs, when the regu-
lar expression is matched from the name of from the begin-
ning of the input file, the initial state is named by the
corresponding symbol. For example, the following start
and name rules can distinguish C and Fortran files:
namerules
{
/.(c|h)$/ c;
/.[fF]$/ fortran;
}
startrules
{
/- [cC] -/ c;
/- fortran -/ fortran;
}
If these rules are used with the previously shown start
block, states first check the beginning of input file. If
it has string -*- c -*-, the file is assumed to contain C
code and the processing is started from state called c.
If the beginning of the input file has string -*- fortran
-*-, the initial state is fortran. If none of the start
rules matched, the name of the input file is matched with
the namerules. If the name ends to suffix c or C, we go
to state c. If the suffix is f or F, the initial state is
fortran.
If both start and name rules failed to resolve the start
state, states just copies its input to output unmodified.
The start state can also be specified from the command
line with option -s, --state.
State definitions have the following syntax:
state { expr {statements} ... }
where expr is: a regular expression, special expression or
symbol and statements is a list of statements. When the
expression expr is matched from the input, the statement
block is executed. The statement block can call states'
primitives, user-defined subroutines, call other states,
etc. Once the block is executed, the input processing is
continued from the current intput position (which might
have been changed if the statement block called other
states).
Special expressions BEGIN and END can be used in the place
of expr. Expression BEGIN matches the beginning of the
state, its block is called when the state is entered.
Expression END matches the end of the state, its block is
executed when states leaves the state.
If expr is a symbol, its value is looked up from the
global environment and if it is a regular expression, it
is matched to the input, otherwise that rule is ignored.
The states program file can also have top-level expres-
sions, they are evaluated after the program file is parsed
but before any input files are processed or the start
block is evaluated.
PRIMITIVE FUNCTIONS
call (symbol)
Move to state symbol and continue input file pro-
cessing from that state. Function returns what-
ever the symbol state's terminating return state-
ment returned.
check_namerules ()
Try to resolve start state from namerules rules.
Function returns 1 if start state was resolved or
0 otherwise.
check_startrules ()
Try to resolve start state from startrules rules.
Function returns 1 if start state was resolved or
0 otherwise.
concat (str, ...)
Concanate argument strings and return result as a
new string.
float (any)
Convert argument to a floating point number.
getenv (str)
Get value of environment variable str. Returns an
empty string if variable var is undefined.
int (any)
Convert argument to an integer number.
length (item, ...)
Count the length of argument strings or lists.
list (any, ...)
Create a new list which contains items any, ...
panic (any, ...)
Report a non-recoverable error and exit with sta-
tus 1. Function never returns.
print (any, ...)
Convert arguments to strings and print them to the
output.
range (source, start, end)
Return a sub-range of source starting from posi-
tion start (inclusively) to end (exclusively).
Argument source can be string or list.
regexp (string)
Convert string string to a new regular expression.
regexp_syntax (char, syntax)
Modify regular expression character syntaxes by
assigning new syntax syntax for character char.
Possible values for syntax are:
'w' character is a word constituent
' ' character isn't a word constituent
regmatch (string, regexp)
Check if string string matches regular expression
regexp. Functions returns a boolean success sta-
tus and sets sub-expression registers $n.
regsub (string, regexp, subst)
Search regular expression regexp from string
string and replace the matching substring with
string subst. Returns the resulting string. The
substitution string subst can contain $n refer-
ences to the n:th parenthesized sup-expression.
regsuball (string, regexp, subst)
Like regsub but replace all matches of regular
expression regexp from string string with string
subst.
split (regexp, string)
Split string string to list considering matches of
regular rexpression regexp as item separator.
sprintf (fmt, ...)
Format arguments according to fmt and return
result as a string.
strcmp (str1, str2)
Perform a case-sensitive comparision for strings
str1 and str2. Function returns a value that is:
-1 string str1 is less than str2
0 strings are equal
1 string str1 is greater than str2
string (any)
Convert argument to string.
strncmp (str1, str2, num)
Perform a case-sensitive comparision for strings
str1 and str2 comparing at maximum num characters.
substring (str, start, end)
Return a substring of string str starting from
position start (inclusively) to end (exclusively).
BUILTIN VARIABLES
$. current input line number
$n the nth parenthesized regular expression sub-
expression from the latest state regular expres-
sion or from the regmatch primitive
$` everything before the matched regular rexpression.
This is usable when used with the regmatch primi-
tive; the contents of this variable is undefined
when used in action blocks to refer the data
before the block's regular expression.
$B an alias for $`
argv list of input file names
filename
name of the current input file
program name of the program (usually states)
version program version string
FILES
/usr/share/enscript/enscript.st enscript's states definitions
SEE ALSO
awk(1), enscript(1)
AUTHOR
Markku Rossi <mtr@iki.fi> <http://www.iki.fi/~mtr/>
GNU Enscript WWW home page: <http://www.iki.fi/~mtr/gen-
script/>
STATES Jun 6, 1997 STATES(1)