This file defines the moonfilter module, wrapping the osbf module and adding functionality for more comfortable training and classification. Unless explicitly stated otherwise, all methods raise an error in case of an error situation (instead of returning nil + an error message).
threshold = 20
Minimum absolute pR a correct classification must get not to trigger a reinforcement.
buckets = 94321
Number of buckets in the database. The minimum value recommended for production is 94321.
max_text_size = 0
Maximum text size, 0 means full document (default). A reasonable value might be 500000 (half a megabyte).
minpratio = 1
Minimum probability ratio over the classes a feature must have not to be ignored. 1 means ignore nothing (default).
delimiters = ""
Token delimiters, in addition to whitespace. None by default, could be set e.g. to ".@:/".
wrap_around = true
Whether text should be wrapped around (by re-appending the first 4 tokens after the last).
classdir = ""
The directory where class database files are stored. Defaults to the
current working directory (empty string). Note that the directory name
MUST end in a path separator (typically '/' or '\',
depending on your OS) in all other cases. Changing this value will
only affect future calls to the classes command, it won't change
the location of currently active classes.
function classes(...)
Selects the classes to use for all following operations (until a new set of classes is selected). Specify two or more classes as arguments. Returns true on success.
function create()
Creates new databases for the active classes. Returns true on success.
function destroy()
Deletes the databases for all active classes.
function readuntil(delimiter_line)
Reads standard input until the specified delimiter_line is encountered. Reads until the next empty line if delimiter_line is nil or empty. The read lines (excluding the delimiter_line) are stored as standard argument for subsequent train and classify operations.
function classify(filename)
Classifies a file. If the filename argument is omitted/nil, the text read
by the last readuntil or classify operation (whichever came later) is
trained instead. The special filename "-" means to read from standard
input until the end of input (must be the last command).
Returns a table with the following name=value pairs:
function train(class, filename)
Trains the specified file as an instance of the specified class, if
necessary. If the filename argument is omitted/nil, the text read by the
last readuntil or classify operation (whichever came later) is
trained instead. The special filename "-" means to read from standard
input until the end of input (must be the last command).
Training is skipped as unnecessary if a call to classify(filename)
returns the correct class and no need for reinforcement. The result of the
last classify operation is cached and will be inspected if this method is
subsequently invoked on the same file/text; otherwise this method will
internally call classify to determine whether training is necessary.
Returns a table with name=value pairs describing the training operation:
Both misclassified and reinforced will be false if (and only if)
training has been skipped as unnecessary; misclassified and reinforced
will never both be true.
function stats(class)
Returns a string with statistics reports for a given class; or for all active classes if no class parameter is given.
| [Last generated: 2008-04-28] |
|