OSBF-Lua Reference Manual

Text classification module for the Lua programming language

home · introduction · reference · examples


Introduction

OSBF-Lua (Orthogonal Sparse Bigrams with confidence Factor) is a Lua C module for text classification. It is a port of the OSBF classifier implemented in the CRM114 project. This implementation attempts to put focus on the classification task itself by using Lua as the scripting language, a powerful yet light-weight and fast language, which makes it easier to build and test more elaborated filters and training methods.

OSBF-Lua is free software and is released under the GPL version 2. You can get a copy of the license at GPL. This distribution includes a copy of the license in the file gpl.txt.

Reference

OSBF-Lua offers the following functions:

text: String with the text to be classified;

dbset: Lua table with the following structure:

In case of error, it returns 2 values: nil and an error message.

        dbset: table with the classes. Same structure as in osbf.classify;

       class_index: index to the single class, in db.classes, to be trained with text;



Examples

------------------------------------------------------------------

-- create_databases.lua: Script for creating the databases

require "osbf"


-- class databases to be created
dbset = { classes = {"nonspam.cfc", "spam.cfc"} }

-- number of buckets in each database
num_buckets = 94321

-- remove previous databases with the same name
osbf.remove_db(dbset.classes)

-- create new, empty databases
osbf.create_db(dbset.classes, num_buckets)

------------------------------------------------------------------

-- classify.lua: Script for classifying a message read from stdin

require "osbf"

dbset = {
    classes = {"nonspam.cfc", "spam.cfc"},
    ncfs = 1,
    delimiters = ""
}
classify_flags = 0

-- read entire message into var "text"
text = io.read("*all")
pR, p_array, i_pmax = osbf.classify(text, dbset, classify_flags)
if (pR == nil) then
    print(p_array)    -- in case of error, p_array contains
                      -- the error message
else
    io.write(string.format("The message score is %f - ", pR))
    if (pR >= 0) then
        io.write("HAM\n")
    else
        io.write("SPAM\n")
    end
end
------------------------------------------------------------------

See more examples of the use of the osbf module in the spamfilter dir. In special, take a look at the script toer.lua, which is a very fast way of preparing your databases using a previously classified corpora.

home · introduction· reference · examples