About mlep:

mlep is a Machine Learning library for Educational Purposes.

It aims at providing a collection of simple machine learning algorithms with the following claims:
  • to use only ANSI Common Lisp (thus to be implementation independent)
  • to be fairly easy to use so that even intermediate Common Lisp programmers should be able to use this library instantly without pain
  • to provide a tutorial-style documentation so that one should get to know this library easily

About mlep-add:

mlep-add contains all parts of mlep that don't run without dependencies. Currently lla and cl-num-utils are needed.

Other functions in mlep

Function analyze (instance input)
Arguments:
  • instance -- an instance of markov-chain
  • input -- some input data
Returns:
the probability of input
Details:
Check the probability of input being generated by instance.

Function classify (instance &key new-data-set verbose)
Arguments:
  • instance -- an instance of k-means, perceptron or neuronal-network
  • new-data-set -- use new-data-set instead of the internal data-set
  • verbose -- print some more information (only taken into account for neuronal-network)
Returns:
a list with a classification number according to each sample in the classified data-set
Details:
Classifying some data-set.

Function data-set (instance)
Arguments:
  • instance -- an instance of any mlep learning algorithm
Returns:
the data-set of instance
Details:
Get the data-set of instance.

Function distance (instance)
Arguments:
  • instance -- an instance of k-means or k-nearest-neighbors
Returns:
the function for calculating the distance for instance
Details:
Get the function for calculating the distance for instance, e.g. #'euclidian-distance.

Function forward (instance &key input)
Arguments:
  • instance -- an instance of neuronal-network
  • input -- the input data to be considered
Returns:
the output of the neuronal-network given the input
Details:
Computes a forward path through the network and gives its output.

Function k (instance)
Arguments:
  • instance -- an instance of k-means or k-nearest-neighbors
Returns:
the parameter k
Details:
k determines how many means are assumed (for k-means) resp. how many neighbors are considered (for k-nearest-neighbors).

Function learning-rate (instance)
Arguments:
  • instance -- an instance of neuronal-network or perceptron
Returns:
the learning-rate
Details:
The learning rate controls the size of change during updating the weights.

Function means (instance)
Arguments:
  • instance -- an instance of k-means or principal-component-analysis
Returns:
the current means
Details:
Get the current means.

Function order (instance)
Arguments:
  • instance -- an instance of markov-chain
Returns:
the order of the markov chain
Details:
The order of a markov chain determines how much past events are considered for producing a current event.

Function plot-points (vals &key (height 20) (width 80) (char x))
Arguments:
  • vals -- a list of list with x/y-points or a 2d-array -- ((x1 y1) ... (xn yn)) or #2a((x1 y1) ... (xn yn))
  • height -- the height in characters used for the plot
  • width -- the width in characters used for the plot
  • char -- the character used for printing
Returns:
nothing
Details:
Plotting points with x/y-coordinates.

Function plot-values (vals &key (height 20) (char x))
Arguments:
  • vals -- a sequence of numbers to be plotted
  • height -- the height in characters used for the plot
  • char -- the character used for printing
Returns:
nothing
Details:
Plot the values of vals successively.

Function probabilities (instance)
Arguments:
  • instance -- an instance of markov-chain
Returns:
the probabilities of the markov chain
Details:
Get the probability matrix (or tensor) -- it's rank is order+1.

Function random-from-to (from to &key (state *random-state*))
Arguments:
  • from -- the lower bound (inclusive)
  • to -- the upper bound (exclusive)
  • state -- a random state object containing information used by the pseudo-random number generator
Returns:
a random number
Details:
Gives a random number in certain range.

Function run (instance &key (epochs 100) (threshold 0.1))
Arguments:
  • instance -- an instance of any mlep learning algorithm
  • threshold -- a threshold that is the minimum global error to be achieved -- iterative training runs until threshold is reached (supported by perceptron)
  • epochs -- number of how often a iterative algorithm should be performed (supported by k-means and neuronal-network)
Returns:
depends on the learning algorithm:
  • k-means: the computed means
  • k-nearest-neighbors: the classes assigned to test-set
  • max-likelihood: a list with the first item being the mean and the second one being the co-variance matrix of the normal distribution
  • markov-chain: the probability matrix (or tensor)
  • naive-bayes: the classes assigned to test-set
  • neuronal-network: nothing
  • perceptron: nothing
  • principal-component-analysis: list with three matrices - unitary-matrix1 U (orthogonal matrix), unitary-matrix2 Vt (orthogonal matrix) and singular-values (a diagonal matrix with the diagonal elements being the singular values called D; UxDxVt should be a reconstruction of the input matrix)
  • imputer: The default replace values for each column.
Details:
A general interface for 'running' a learning algorithm.

Function set-labels (instance)
Arguments:
  • instance -- an instance of k-nearest-neighbors, naive-bayes, neuronal-network or perceptron
Returns:
the target labels for data-set of a supervised learning algorithm.
Details:
Get the target labels.

Function synthesize (instance &key (start) (howmany 10))
Arguments:
  • instance -- an instance of markov-chain
  • start -- 1) the symbol random -- a new random sequence as beginning is generated; 2) nil (default) -- a literal subsequence with random starting index is taken as beginning, 3) a list with an user given starting sequence
  • howmany -- number of elements to be synthesized
Details:
Synthesize some data.

Function test-set (instance)
Arguments:
  • instance -- an instance of k-nearest-neighbors or naive-bayes
Returns:
the test, i.e. a set that has no target labels and needs to be classified.
Details:
Get the test-set.

Function transform (instance &key new-data new-data inverse components)
Arguments:
  • instance -- an instance of principal-component-analysis or imputer
  • components -- a number that states how many dimensions should be used for a transformation (default is nil which means that it should use all dimensions of data-set
  • inverse -- do an inverse transformation (t or nil, default: nil)
  • new-data -- do the transformation on this data-set (default is nil which means, that it should use data-set)
Returns:
the transformed data-set
Details:
Project some data on its principal components. / Fit missing values.

Function unique (instance)
Arguments:
  • instance -- an instance of markov-chain
Returns:
all unique elements in data-set
Details:
Get all unique values are considered by the chain.

Other classes in mlep

Class imputer
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
data-set --
The data-set to be analyzed.
missing-value --
The value that is recognized as a missing value.
missing-value-test --
The test-function for comparing each item in a data-set with this missing value.
replacers --
The values that are used for replacing missing values per column.
Details:
Replace missing value by the mean (for numercial data) or the mode (for categorical data).

Class k-means
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
data-set --
The data-set to be analyzed.
distance --
A distance measuring function.
k --
The number of groups/clusters to be determined.
means --
The means of the data points.
Details:
k-means is a simple unsupervised clustering algorithm for a known number of clusters.

Class k-nearest-neighbors
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
data-set --
The data-set that is already known. (set-labels go hand in hand with it.)
distance --
A distance measuring function.
k --
The number of neighbors to be taken into account.
set-labels --
The labels for data-set.
test-set --
The data-set that has no labels and needs to be classified.
Details:
k-nearest-neighbors is a simple supervised clustering algorithm for a known number of clusters.

Class markov-chain
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
data-set --
The data-set to be analyzed.
order --
The order of the markov chain.
probabilities --
A matrix/tensor with probabilities.
unique --
Unique values of data-set
Details:
A Markov-Chain.

Class max-likelihood
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
data-set --
The data-set to be analyzed.
degrees-of-freedom --
Delta Degrees of Freedom. Divisor is length of data-set minus degrees-of-freedom.
Details:
With max-likelihood one can estimate the parameters of the normal distributed probability density function that fits the data-set.

Class naive-bayes
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
all-labels --
All given data-labels. To be pre-computed.
data-set --
The data-set that is already known. (set-labels go hand in hand with it.)
label-count --
Counting of each item of a label. To be pre-computed.
likelihoods --
The likelihoods of a feature attribute given a class. To be pre-computed.
possible-data-values --
All possible values occurring in each attribute of data-set. To be pre-computed.
prior-probabilities --
The prior-probabilities of each class. To be pre-computed.
set-labels --
The labels for data-set.
test-set --
The data-set that has no labels and needs to be classified.
Details:
Naive-bayes takes an probabilistic approach for a simple supervised clustering.

Class neuronal-network
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
activation-function --
The activation function, usually a Heaviside step function.
data-set --
The data-set to be analyzed.
learning-rate --
The learning rate.
net-structure --
The net-structure of the net. (Neurons per layer.)
output-net --
The output of all neurons of the network.
output-net-before-activation --
The output of all neurons of the network before the activation function was applied.
set-labels --
The output-values for data-set.
weight-init-range --
The maximum range for initializing the weights.
weights --
The weights from input to output values.
Details:
A fully-connected Feed-Forward Multi-Layer Perceptron.

Class perceptron
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
activation-function --
The activation function, usually a Heaviside step function.
data-set --
The data-set to be analyzed.
learning-rate --
The learning rate.
max-weight-init-value --
The maximum value for initializing the weights.
set-labels --
The output-values for data-set.
weights --
The weights from input to output values.
Details:
A perceptron is a very simple neuron model and turns out to be a linear classificator.

Other variables in mlep

Variable *heights-weights*
Details:
SOCR Data Dinov 020108 HeightsWeights

Human Height and Weight are mostly hereditable, but lifestyles, diet, health and environmental factors also play a role in determining individual's physical characteristics. The dataset below contains 25,000 records of human heights and weights. These data were obtained in 1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for weight, height, weight-for-age, weight-for-height and body mass index (BMI). See also the Major League Baseball Players Height and Weight dataset.

http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

Attribute Information:

  • 1. Height (Inches)
  • 2. Weight (Pounds)

Variable *iris*
Details:
Iris flower data set by Sir Ronald Fisher (1936)

http://archive.ics.uci.edu/ml/datasets/Iris

Attribute Information:

  • 1. sepal length in cm
  • 2. sepal width in cm
  • 3. petal length in cm
  • 4. petal width in cm
  • 5. class (Iris Setosa, Iris Versicolour, Iris Virginica)

Variable *lenses*
Details:
Lenses Data Set by J. Cendrowska (1987)

https://archive.ics.uci.edu/ml/datasets/Lenses

Attribute Information:

  • 1. age of the patient (1 = young, 2 = pre-presbyopic, 3 = presbyopic)
  • 2. spectacle prescription (1 = myope, 2 = hypermetrope)
  • 3. astigmatic (1 = no, 2 = yes)
  • 4. tear production rate (1 = reduced, 2 = normal)
  • 5. class
    • 1 = the patient should be fitted with hard contact lenses,
    • 2 = the patient should be fitted with soft contact lenses,
    • 3 = the patient should not be fitted with contact lenses.

Variable *wages*
Details:
Determinants of Wages from the 1985 Current Population Survey

Therese Stukel

The datafile contains 534 observations on 11 variables sampled from the Current Population Survey of 1985

http://lib.stat.cmu.edu/datasets/CPS_85_Wages

Attribute Information:

  • 1. EDUCATION: Number of years of education.
  • 2. SOUTH: Indicator variable for Southern Region (1=Person lives in South, 0=Person lives elsewhere).
  • 3. SEX: Indicator variable for sex (1=Female, 0=Male).
  • 4. EXPERIENCE: Number of years of work experience.
  • 5. UNION: Indicator variable for union membership (1=Union member, 0=Not union member).
  • 6. WAGE: Wage (dollars per hour).
  • 7. AGE: Age (years).
  • 8. RACE: Race (1=Other, 2=Hispanic, 3=White).
  • 9. OCCUPATION: Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other).
  • 10. SECTOR: Sector (0=Other, 1=Manufacturing, 2=Construction).
  • 11. MARR: Marital Status (0=Unmarried, 1=Married)

Other classes in mlep-add

Class principal-component-analysis
Superclasses:
common-lisp:standard-object, sb-pcl::slot-object, common-lisp:t
Documented Subclasses:
None
Direct Slots:
data --
Converted data.
singular-values --
The singular values for every matrix.
unitary-matrix1 --
Unitary matrix U.
unitary-matrix2 --
Unitary matrix V.
data-set --
The data-set to be analyzed.
means --
Means of data.
Details:
Principal Component Analysis by Singular Value Decomposition

Index of exported symbols

mlep:*heights-weights*, variable
mlep:*iris*, variable
mlep:*lenses*, variable
mlep:*wages*, variable
mlep:analyze, function
mlep:classify, function
mlep:data-set, function
mlep:distance, function
mlep:forward, function
mlep:imputer, class
mlep:k, function
mlep:k-means, class
mlep:k-nearest-neighbors, class
mlep:learning-rate, function
mlep:markov-chain, class
mlep:max-likelihood, class
mlep:means, function
mlep:naive-bayes, class
mlep:neuronal-network, class
mlep:order, function
mlep:perceptron, class
mlep:plot-points, function
mlep:plot-values, function
mlep-add:principal-component-analysis, class
mlep:probabilities, function
mlep:random-from-to, function
mlep:run, function
mlep:set-labels, function
mlep:synthesize, function
mlep:test-set, function
mlep:transform, function
mlep:unique, function