moonfilter

This file defines the moonfilter module, wrapping the osbf module and adding functionality for more comfortable training and classification. Unless explicitly stated otherwise, all methods raise an error in case of an error situation (instead of returning nil + an error message).

Public Variables

threshold

    threshold = 20

Minimum absolute pR a correct classification must get not to trigger a reinforcement.

buckets

    buckets = 94321

Number of buckets in the database. The minimum value recommended for production is 94321.

max_text_size

    max_text_size = 0

Maximum text size, 0 means full document (default). A reasonable value might be 500000 (half a megabyte).

min_p_ratio

    min_p_ratio = 1

Minimum probability ratio over the classes a feature must have not to be ignored. 1 means ignore nothing (default).

delimiters

    delimiters = ""

Token delimiters, in addition to whitespace. None by default, could be set e.g. to ".@:/".

wrap_around

    wrap_around = true

Whether text should be wrapped around (by re-appending the first 4 tokens after the last).

classdir

    classdir = ""

The directory where class database files are stored. Defaults to the current working directory (empty string). Note that the directory name MUST end in a path separator (typically '/' or '\', depending on your OS) in all other cases. Changing this value will only affect future calls to the classes command, it won't change the location of currently active classes.

Public Functions

classes

    function classes(...)

Selects the classes to use for all following operations (until a new set of classes is selected). Specify two or more classes as arguments. Returns true on success.

create

    function create()

Creates new databases for the active classes. Returns true on success.

destroy

    function destroy()

Deletes the databases for all active classes.

readuntil

    function readuntil(delimiter_line)

Reads standard input until the specified delimiter_line is encountered. Reads until the next empty line if delimiter_line is nil or empty. The read lines (excluding the delimiter_line) are stored as standard argument for subsequent train and classify operations.

classify

    function classify(filename)

Classifies a file. If the filename argument is omitted/nil, the text read by the last readuntil or classify operation (whichever came later) is trained instead. The special filename "-" means to read from standard input until the end of input (must be the last command).

Returns a table with the following name=value pairs:

train

    function train(class, filename)

Trains the specified file as an instance of the specified class, if necessary. If the filename argument is omitted/nil, the text read by the last readuntil or classify operation (whichever came later) is trained instead. The special filename "-" means to read from standard input until the end of input (must be the last command).

Training is skipped as unnecessary if a call to classify(filename) returns the correct class and no need for reinforcement. The result of the last classify operation is cached and will be inspected if this method is subsequently invoked on the same file/text; otherwise this method will internally call classify to determine whether training is necessary.

Returns a table with name=value pairs describing the training operation:

Both misclassified and reinforced will be false if (and only if) training has been skipped as unnecessary; misclassified and reinforced will never both be true.

stats

    function stats(class)

Returns a string with statistics reports for a given class; or for all active classes if no class parameter is given.


[Last generated: 2024-09-21] Valid XHTML 1.0 Transitional