cl-mlep API reference

About mlep:

mlep is a Machine Learning library for Educational Purposes.

It aims at providing a collection of simple machine learning algorithms with the following claims:

to use only ANSI Common Lisp (thus to be implementation independent)
to be fairly easy to use so that even intermediate Common Lisp programmers should be able to use this library instantly without pain
to provide a tutorial-style documentation so that one should get to know this library easily

About mlep-add:

mlep-add contains all parts of mlep that don't run without dependencies. Currently lla and cl-num-utils are needed.

Other functions in mlep

Function analyze (instance input)

Arguments:

instance -- an instance of markov-chain
input -- some input data

Returns:

the probability of input

Details:

Check the probability of input being generated by instance.

Function classify (instance &key new-data-set verbose)

Arguments:

instance -- an instance of k-means, perceptron or neuronal-network
new-data-set -- use new-data-set instead of the internal data-set
verbose -- print some more information (only taken into account for neuronal-network)

Returns:

a list with a classification number according to each sample in the classified data-set

Details:

Classifying some data-set.

Function data-set (instance)

Arguments:

instance -- an instance of any mlep learning algorithm

Returns:

the data-set of instance

Details:

Get the data-set of instance.

Function distance (instance)

Arguments:

instance -- an instance of k-means or k-nearest-neighbors

Returns:

the function for calculating the distance for instance

Details:

Get the function for calculating the distance for instance, e.g. #'euclidian-distance.

Function forward (instance &key input)

Arguments:

instance -- an instance of neuronal-network
input -- the input data to be considered

Returns:

the output of the neuronal-network given the input

Details:

Computes a forward path through the network and gives its output.

Function k (instance)

Arguments:

instance -- an instance of k-means or k-nearest-neighbors

Returns:

the parameter k

Details:

k determines how many means are assumed (for k-means) resp. how many neighbors are considered (for k-nearest-neighbors).

Function learning-rate (instance)

Arguments:

instance -- an instance of neuronal-network or perceptron

Returns:

the learning-rate

Details:

The learning rate controls the size of change during updating the weights.

Function means (instance)

Arguments:

instance -- an instance of k-means or principal-component-analysis

Returns:

the current means

Details:

Get the current means.

Function order (instance)

Arguments:

instance -- an instance of markov-chain

Returns:

the order of the markov chain

Details:

The order of a markov chain determines how much past events are considered for producing a current event.

Function plot-points (vals &key (height 20) (width 80) (char x))

Arguments:

vals -- a list of list with x/y-points or a 2d-array -- ((x1 y1) ... (xn yn)) or #2a((x1 y1) ... (xn yn))
height -- the height in characters used for the plot
width -- the width in characters used for the plot
char -- the character used for printing

Returns:

nothing

Details:

Plotting points with x/y-coordinates.

Function plot-values (vals &key (height 20) (char x))

Arguments:

vals -- a sequence of numbers to be plotted
height -- the height in characters used for the plot
char -- the character used for printing

Returns:

nothing

Details:

Plot the values of vals successively.

Function probabilities (instance)

Arguments:

instance -- an instance of markov-chain

Returns:

the probabilities of the markov chain

Details:

Get the probability matrix (or tensor) -- it's rank is order+1.

Function random-from-to (from to &key (state *random-state*))

Arguments:

from -- the lower bound (inclusive)
to -- the upper bound (exclusive)
state -- a random state object containing information used by the pseudo-random number generator

Returns:

a random number

Details:

Gives a random number in certain range.

Function run (instance &key (epochs 100) (threshold 0.1))

Arguments:

instance -- an instance of any mlep learning algorithm
threshold -- a threshold that is the minimum global error to be achieved -- iterative training runs until threshold is reached (supported by perceptron)
epochs -- number of how often a iterative algorithm should be performed (supported by k-means and neuronal-network)

Returns:

depends on the learning algorithm:

k-means: the computed means
k-nearest-neighbors: the classes assigned to test-set
max-likelihood: a list with the first item being the mean and the second one being the co-variance matrix of the normal distribution
markov-chain: the probability matrix (or tensor)
naive-bayes: the classes assigned to test-set
neuronal-network: nothing
perceptron: nothing
principal-component-analysis: list with three matrices - unitary-matrix1 U (orthogonal matrix), unitary-matrix2 Vt (orthogonal matrix) and singular-values (a diagonal matrix with the diagonal elements being the singular values called D; UxDxVt should be a reconstruction of the input matrix)
imputer: The default replace values for each column.

Details:

A general interface for 'running' a learning algorithm.

Function set-labels (instance)

Arguments:

instance -- an instance of k-nearest-neighbors, naive-bayes, neuronal-network or perceptron

Returns:

the target labels for data-set of a supervised learning algorithm.

Details:

Get the target labels.

Function synthesize (instance &key (start) (howmany 10))

Arguments:

instance -- an instance of markov-chain
start -- 1) the symbol random -- a new random sequence as beginning is generated; 2) nil (default) -- a literal subsequence with random starting index is taken as beginning, 3) a list with an user given starting sequence
howmany -- number of elements to be synthesized

Details:

Synthesize some data.

Function test-set (instance)

Arguments:

instance -- an instance of k-nearest-neighbors or naive-bayes

Returns:

the test, i.e. a set that has no target labels and needs to be classified.

Details:

Get the test-set.

Function transform (instance &key new-data new-data inverse components)

Arguments:

instance -- an instance of principal-component-analysis or imputer
components -- a number that states how many dimensions should be used for a transformation (default is nil which means that it should use all dimensions of data-set
inverse -- do an inverse transformation (t or nil, default: nil)
new-data -- do the transformation on this data-set (default is nil which means, that it should use data-set)

Returns:

the transformed data-set

Details:

Project some data on its principal components. / Fit missing values.

Function unique (instance)

Arguments:

instance -- an instance of markov-chain

Returns:

all unique elements in data-set

Details:

Get all unique values are considered by the chain.

Other classes in mlep

Class imputer

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

missing-value --

The value that is recognized as a missing value.

missing-value-test --

The test-function for comparing each item in a data-set with this missing value.

replacers --

The values that are used for replacing missing values per column.

Details:

Replace missing value by the mean (for numercial data) or the mode (for categorical data).

Class k-means

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

distance --

A distance measuring function.

k --

The number of groups/clusters to be determined.

means --

The means of the data points.

Details:

k-means is a simple unsupervised clustering algorithm for a known number of clusters.

Class k-nearest-neighbors

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set that is already known. (set-labels go hand in hand with it.)

distance --

A distance measuring function.

k --

The number of neighbors to be taken into account.

set-labels --

The labels for data-set.

test-set --

The data-set that has no labels and needs to be classified.

Details:

k-nearest-neighbors is a simple supervised clustering algorithm for a known number of clusters.

Class markov-chain

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

order --

The order of the markov chain.

probabilities --

A matrix/tensor with probabilities.

unique --

Unique values of data-set

Details:

A Markov-Chain.

Class max-likelihood

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

degrees-of-freedom --

Delta Degrees of Freedom. Divisor is length of data-set minus degrees-of-freedom.

Details:

With max-likelihood one can estimate the parameters of the normal distributed probability density function that fits the data-set.

Class naive-bayes

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

all-labels --

All given data-labels. To be pre-computed.

data-set --

The data-set that is already known. (set-labels go hand in hand with it.)

label-count --

Counting of each item of a label. To be pre-computed.

likelihoods --

The likelihoods of a feature attribute given a class. To be pre-computed.

possible-data-values --

All possible values occurring in each attribute of data-set. To be pre-computed.

prior-probabilities --

The prior-probabilities of each class. To be pre-computed.

set-labels --

The labels for data-set.

test-set --

The data-set that has no labels and needs to be classified.

Details:

Naive-bayes takes an probabilistic approach for a simple supervised clustering.

Class neuronal-network

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

activation-function --

The activation function, usually a Heaviside step function.

data-set --

The data-set to be analyzed.

learning-rate --

The learning rate.

net-structure --

The net-structure of the net. (Neurons per layer.)

output-net --

The output of all neurons of the network.

output-net-before-activation --

The output of all neurons of the network before the activation function was applied.

set-labels --

The output-values for data-set.

weight-init-range --

The maximum range for initializing the weights.

weights --

The weights from input to output values.

Details:

A fully-connected Feed-Forward Multi-Layer Perceptron.

Class perceptron

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

activation-function --

The activation function, usually a Heaviside step function.

data-set --

The data-set to be analyzed.

learning-rate --

The learning rate.

max-weight-init-value --

The maximum value for initializing the weights.

set-labels --

The output-values for data-set.

weights --

The weights from input to output values.

Details:

A perceptron is a very simple neuron model and turns out to be a linear classificator.

Other variables in mlep

Variable *heights-weights*

Details:

SOCR Data Dinov 020108 HeightsWeights

Human Height and Weight are mostly hereditable, but lifestyles, diet, health and environmental factors also play a role in determining individual's physical characteristics. The dataset below contains 25,000 records of human heights and weights. These data were obtained in 1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for weight, height, weight-for-age, weight-for-height and body mass index (BMI). See also the Major League Baseball Players Height and Weight dataset.

http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

Attribute Information:

1. Height (Inches)
2. Weight (Pounds)

Variable *iris*

Details:

Iris flower data set by Sir Ronald Fisher (1936)

http://archive.ics.uci.edu/ml/datasets/Iris

Attribute Information:

1. sepal length in cm
2. sepal width in cm
3. petal length in cm
4. petal width in cm
5. class (Iris Setosa, Iris Versicolour, Iris Virginica)

Variable *lenses*

Details:

Lenses Data Set by J. Cendrowska (1987)

https://archive.ics.uci.edu/ml/datasets/Lenses

Attribute Information:

1. age of the patient (1 = young, 2 = pre-presbyopic, 3 = presbyopic)
2. spectacle prescription (1 = myope, 2 = hypermetrope)
3. astigmatic (1 = no, 2 = yes)
4. tear production rate (1 = reduced, 2 = normal)
5. class
- 1 = the patient should be fitted with hard contact lenses,
- 2 = the patient should be fitted with soft contact lenses,
- 3 = the patient should not be fitted with contact lenses.

Variable *wages*

Details:

Determinants of Wages from the 1985 Current Population Survey

Therese Stukel

The datafile contains 534 observations on 11 variables sampled from the Current Population Survey of 1985

http://lib.stat.cmu.edu/datasets/CPS_85_Wages

Attribute Information:

1. EDUCATION: Number of years of education.
2. SOUTH: Indicator variable for Southern Region (1=Person lives in South, 0=Person lives elsewhere).
3. SEX: Indicator variable for sex (1=Female, 0=Male).
4. EXPERIENCE: Number of years of work experience.
5. UNION: Indicator variable for union membership (1=Union member, 0=Not union member).
6. WAGE: Wage (dollars per hour).
7. AGE: Age (years).
8. RACE: Race (1=Other, 2=Hispanic, 3=White).
9. OCCUPATION: Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other).
10. SECTOR: Sector (0=Other, 1=Manufacturing, 2=Construction).
11. MARR: Marital Status (0=Unmarried, 1=Married)

Other classes in mlep-add

Class principal-component-analysis

Superclasses:

common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t

Documented Subclasses:

None

Direct Slots:

data --

Converted data.

singular-values --

The singular values for every matrix.

unitary-matrix1 --

Unitary matrix U.

unitary-matrix2 --

Unitary matrix V.

data-set --

The data-set to be analyzed.

means --

Means of data.

Details:

Principal Component Analysis by Singular Value Decomposition

Index of exported symbols

`mlep:`	`heights-weights`, variable
`mlep:`	`iris`, variable
`mlep:`	`lenses`, variable
`mlep:`	`wages`, variable
`mlep:`	`analyze`, function
`mlep:`	`classify`, function
`mlep:`	`data-set`, function
`mlep:`	`distance`, function
`mlep:`	`forward`, function
`mlep:`	`imputer`, class
`mlep:`	`k`, function
`mlep:`	`k-means`, class
`mlep:`	`k-nearest-neighbors`, class
`mlep:`	`learning-rate`, function
`mlep:`	`markov-chain`, class
`mlep:`	`max-likelihood`, class
`mlep:`	`means`, function
`mlep:`	`naive-bayes`, class
`mlep:`	`neuronal-network`, class
`mlep:`	`order`, function
`mlep:`	`perceptron`, class
`mlep:`	`plot-points`, function
`mlep:`	`plot-values`, function
`mlep-add:`	`principal-component-analysis`, class
`mlep:`	`probabilities`, function
`mlep:`	`random-from-to`, function
`mlep:`	`run`, function
`mlep:`	`set-labels`, function
`mlep:`	`synthesize`, function
`mlep:`	`test-set`, function
`mlep:`	`transform`, function
`mlep:`	`unique`, function