This file defines the moonfilter module, wrapping the osbf module and adding functionality for more comfortable training and classification. Unless explicitly stated otherwise, all methods raise an error in case of an error situation (instead of returning nil + an error message).
threshold = 20
Minimum absolute pR a correct classification must get not to trigger a reinforcement.
buckets = 94321
Number of buckets in the database. The minimum value recommended for production is 94321.
max_text_size = 0
Maximum text size, 0 means full document (default). A reasonable value might be 500000 (half a megabyte).
min_p_ratio = 1
Minimum probability ratio over the classes a feature must have not to be ignored. 1 means ignore nothing (default).
delimiters = ""
Token delimiters, in addition to whitespace. None by default, could be set e.g. to ".@:/".
wrap_around = true
Whether text should be wrapped around (by re-appending the first 4 tokens after the last).
classdir = ""
The directory where class database files are stored. Defaults to the
current working directory (empty string). Note that the directory name MUST
end in a path separator (typically '/' or '\', depending on your OS) in all
other cases. Changing this value will only affect future calls to the
classes
command, it won't change the location of currently active
classes.
function classes(...)
Selects the classes to use for all following operations (until a new set of classes is selected). Specify two or more classes as arguments. Returns true on success.
function create()
Creates new databases for the active classes. Returns true on success.
function destroy()
Deletes the databases for all active classes.
function readuntil(delimiter_line)
Reads standard input until the specified delimiter_line is encountered. Reads until the next empty line if delimiter_line is nil or empty. The read lines (excluding the delimiter_line) are stored as standard argument for subsequent train and classify operations.
function classify(filename)
Classifies a file. If the filename argument is omitted/nil, the text read
by the last readuntil
or classify
operation (whichever came later) is
trained instead. The special filename "-" means to read from standard input
until the end of input (must be the last command).
Returns a table with the following name=value pairs:
function train(class, filename)
Trains the specified file as an instance of the specified class, if
necessary. If the filename argument is omitted/nil, the text read by the
last readuntil
or classify
operation (whichever came later) is trained
instead. The special filename "-" means to read from standard input until
the end of input (must be the last command).
Training is skipped as unnecessary if a call to classify(filename)
returns the correct class and no need for reinforcement. The result of the
last classify
operation is cached and will be inspected if this method is
subsequently invoked on the same file/text; otherwise this method will
internally call classify
to determine whether training is necessary.
Returns a table with name=value pairs describing the training operation:
Both misclassified
and reinforced
will be false if (and only if)
training has been skipped as unnecessary; misclassified
and reinforced
will never both be true.
function stats(class)
Returns a string with statistics reports for a given class; or for all active classes if no class parameter is given.
[Last generated: 2024-09-21] |