*About mlep:*

`mlep`is a Machine Learning library for Educational Purposes.

It aims at providing a collection of simple machine learning algorithms with the following claims:

- to use only ANSI Common Lisp (thus to be implementation independent)
- to be fairly easy to use so that even intermediate Common Lisp programmers should be able to use this library instantly without pain
- to provide a tutorial-style documentation so that one should get to know this library easily

*About mlep-add:*

`mlep-add`contains all parts of

`mlep`that don't run without dependencies. Currently

`lla`and

`cl-num-utils`are needed.

### Other functions in mlep

**Function analyze**(instance input)

Arguments:

`instance`-- an instance of`markov-chain``input`-- some input data

Returns:

the probability of

`input`Details:

Check the probability of

`input`being generated by`instance`.**Function classify**(instance &key new-data-set verbose)

Arguments:

`instance`-- an instance of`k-means`,`perceptron`or`neuronal-network``new-data-set`-- use`new-data-set`instead of the internal`data-set``verbose`-- print some more information (only taken into account for`neuronal-network`)

Returns:

a list with a classification number according to each sample in the classified data-set

Details:

Classifying some data-set.

**Function data-set**(instance)

Arguments:

`instance`-- an instance of any`mlep`learning algorithm

Returns:

the data-set of

`instance`Details:

Get the data-set of

`instance`.**Function distance**(instance)

Arguments:

`instance`-- an instance of`k-means`or`k-nearest-neighbors`

Returns:

the function for calculating the distance for

`instance`Details:

Get the function for calculating the distance for

`instance`, e.g.`#'euclidian-distance`.**Function forward**(instance &key input)

Arguments:

`instance`-- an instance of`neuronal-network``input`-- the input data to be considered

Returns:

the output of the

`neuronal-network`given the`input`Details:

Computes a forward path through the network and gives its output.

**Function k**(instance)

Arguments:

`instance`-- an instance of`k-means`or`k-nearest-neighbors`

Returns:

the parameter

`k`Details:

`k`determines how many means are assumed (for

`k-means`) resp. how many neighbors are considered (for

`k-nearest-neighbors`).

**Function learning-rate**(instance)

Arguments:

`instance`-- an instance of`neuronal-network`or`perceptron`

Returns:

the learning-rate

Details:

The learning rate controls the size of change during updating the weights.

**Function means**(instance)

Arguments:

`instance`-- an instance of`k-means`or`principal-component-analysis`

Returns:

the current means

Details:

Get the current means.

**Function order**(instance)

Arguments:

`instance`-- an instance of`markov-chain`

Returns:

the order of the markov chain

Details:

The order of a markov chain determines how much past events are considered for producing a current event.

**Function plot-points**(vals &key (height 20) (width 80) (char x))

Arguments:

`vals`-- a list of list with x/y-points or a 2d-array --`((x1 y1) ... (xn yn))`or`#2a((x1 y1) ... (xn yn))``height`-- the height in characters used for the plot`width`-- the width in characters used for the plot`char`-- the character used for printing

Returns:

nothing

Details:

Plotting points with x/y-coordinates.

**Function plot-values**(vals &key (height 20) (char x))

Arguments:

`vals`-- a sequence of numbers to be plotted`height`-- the height in characters used for the plot`char`-- the character used for printing

Returns:

nothing

Details:

Plot the values of

`vals`successively.**Function probabilities**(instance)

Arguments:

`instance`-- an instance of`markov-chain`

Returns:

the probabilities of the markov chain

Details:

Get the probability matrix (or tensor) -- it's rank is

`order+1`.**Function random-from-to**(from to &key (state *random-state*))

Arguments:

`from`-- the lower bound (inclusive)`to`-- the upper bound (exclusive)`state`-- a random state object containing information used by the pseudo-random number generator

Returns:

a random number

Details:

Gives a random number in certain range.

**Function run**(instance &key (epochs 100) (threshold 0.1))

Arguments:

`instance`-- an instance of any`mlep`learning algorithm`threshold`-- a threshold that is the minimum global error to be achieved -- iterative training runs until threshold is reached (supported by`perceptron`)`epochs`-- number of how often a iterative algorithm should be performed (supported by`k-means`and`neuronal-network`)

Returns:

depends on the learning algorithm:

`k-means`: the computed means`k-nearest-neighbors`: the classes assigned to`test-set``max-likelihood`: a list with the first item being the mean and the second one being the co-variance matrix of the normal distribution`markov-chain`: the probability matrix (or tensor)`naive-bayes`: the classes assigned to`test-set``neuronal-network`: nothing`perceptron`: nothing`principal-component-analysis`: list with three matrices -`unitary-matrix1 U`(orthogonal matrix),`unitary-matrix2 Vt`(orthogonal matrix) and`singular-values`(a diagonal matrix with the diagonal elements being the singular values called`D`;`UxDxVt`should be a reconstruction of the input matrix)`imputer`: The default replace values for each column.

Details:

A general interface for 'running' a learning algorithm.

**Function set-labels**(instance)

Arguments:

`instance`-- an instance of`k-nearest-neighbors`,`naive-bayes`,`neuronal-network`or`perceptron`

Returns:

the target labels for

`data-set`of a supervised learning algorithm.Details:

Get the target labels.

**Function synthesize**(instance &key (start) (howmany 10))

Arguments:

`instance`-- an instance of`markov-chain``start`-- 1) the symbol`random`-- a new random sequence as beginning is generated; 2)`nil`(default) -- a literal subsequence with random starting index is taken as beginning, 3) a list with an user given starting sequence`howmany`-- number of elements to be synthesized

Details:

Synthesize some data.

**Function test-set**(instance)

Arguments:

`instance`-- an instance of`k-nearest-neighbors`or`naive-bayes`

Returns:

the test, i.e. a set that has no target labels and needs to be classified.

Details:

Get the test-set.

**Function transform**(instance &key new-data new-data inverse components)

Arguments:

`instance`-- an instance of`principal-component-analysis`or`imputer``components`-- a number that states how many dimensions should be used for a transformation (default is`nil`which means that it should use all dimensions of`data-set``inverse`-- do an inverse transformation (`t`or`nil`, default:`nil`)`new-data`-- do the transformation on this data-set (default is`nil`which means, that it should use`data-set`)

Returns:

the transformed data-set

Details:

Project some data on its principal components. / Fit missing values.

**Function unique**(instance)

Arguments:

`instance`-- an instance of`markov-chain`

Returns:

all unique elements in

`data-set`Details:

Get all unique values are considered by the chain.

### Other classes in mlep

**Class imputer**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

missing-value --

The value that is recognized as a missing value.

missing-value-test --

The test-function for comparing each item in a data-set with this missing value.

replacers --

The values that are used for replacing missing values per column.

Details:

Replace missing value by the mean (for numercial data) or the mode (for categorical data).

**Class k-means**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

distance --

A distance measuring function.

k --

The number of groups/clusters to be determined.

means --

The means of the data points.

Details:

k-means is a simple unsupervised clustering algorithm for a known number of clusters.

**Class k-nearest-neighbors**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set that is already known. (

`set-labels`go hand in hand with it.)distance --

A distance measuring function.

k --

The number of neighbors to be taken into account.

set-labels --

The labels for

`data-set`.test-set --

The data-set that has no labels and needs to be classified.

Details:

k-nearest-neighbors is a simple supervised clustering algorithm for a known number of clusters.

**Class markov-chain**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

order --

The order of the markov chain.

probabilities --

A matrix/tensor with probabilities.

unique --

Unique values of data-set

Details:

A Markov-Chain.

**Class max-likelihood**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

data-set --

The data-set to be analyzed.

degrees-of-freedom --

Delta Degrees of Freedom. Divisor is length of

`data-set`minus`degrees-of-freedom`.Details:

With max-likelihood one can estimate the parameters of the normal distributed probability density function that fits the data-set.

**Class naive-bayes**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

all-labels --

All given data-labels. To be pre-computed.

data-set --

The data-set that is already known. (

`set-labels`go hand in hand with it.)label-count --

Counting of each item of a label. To be pre-computed.

likelihoods --

The likelihoods of a feature attribute given a class. To be pre-computed.

possible-data-values --

All possible values occurring in each attribute of

`data-set`. To be pre-computed.prior-probabilities --

The prior-probabilities of each class. To be pre-computed.

set-labels --

The labels for

`data-set`.test-set --

The data-set that has no labels and needs to be classified.

Details:

Naive-bayes takes an probabilistic approach for a simple supervised clustering.

**Class neuronal-network**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

activation-function --

The activation function, usually a Heaviside step function.

data-set --

The data-set to be analyzed.

learning-rate --

The learning rate.

net-structure --

The net-structure of the net. (Neurons per layer.)

output-net --

The output of all neurons of the network.

output-net-before-activation --

The output of all neurons of the network before the activation function was applied.

set-labels --

The output-values for

`data-set`.weight-init-range --

The maximum range for initializing the weights.

weights --

The weights from input to output values.

Details:

A fully-connected Feed-Forward Multi-Layer Perceptron.

**Class perceptron**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

activation-function --

The activation function, usually a Heaviside step function.

data-set --

The data-set to be analyzed.

learning-rate --

The learning rate.

max-weight-init-value --

The maximum value for initializing the weights.

set-labels --

The output-values for

`data-set`.weights --

The weights from input to output values.

Details:

A perceptron is a very simple neuron model and turns out to be a linear classificator.

### Other variables in mlep

**Variable *heights-weights***

Details:

SOCR Data Dinov 020108 HeightsWeights

Human Height and Weight are mostly hereditable, but lifestyles, diet, health and environmental factors also play a role in determining individual's physical characteristics. The dataset below contains 25,000 records of human heights and weights. These data were obtained in 1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for weight, height, weight-for-age, weight-for-height and body mass index (BMI). See also the Major League Baseball Players Height and Weight dataset.

http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

Attribute Information:

Human Height and Weight are mostly hereditable, but lifestyles, diet, health and environmental factors also play a role in determining individual's physical characteristics. The dataset below contains 25,000 records of human heights and weights. These data were obtained in 1993 by a Growth Survey of 25,000 children from birth to 18 years of age recruited from Maternal and Child Health Centres (MCHC) and schools and were used to develop Hong Kong's current growth charts for weight, height, weight-for-age, weight-for-height and body mass index (BMI). See also the Major League Baseball Players Height and Weight dataset.

http://wiki.stat.ucla.edu/socr/index.php/SOCR_Data_Dinov_020108_HeightsWeights

Attribute Information:

- 1. Height (Inches)
- 2. Weight (Pounds)

**Variable *iris***

Details:

Iris flower data set by Sir Ronald Fisher (1936)

http://archive.ics.uci.edu/ml/datasets/Iris

Attribute Information:

http://archive.ics.uci.edu/ml/datasets/Iris

Attribute Information:

- 1. sepal length in cm
- 2. sepal width in cm
- 3. petal length in cm
- 4. petal width in cm
- 5. class (Iris Setosa, Iris Versicolour, Iris Virginica)

**Variable *lenses***

Details:

Lenses Data Set by J. Cendrowska (1987)

https://archive.ics.uci.edu/ml/datasets/Lenses

Attribute Information:

https://archive.ics.uci.edu/ml/datasets/Lenses

Attribute Information:

- 1. age of the patient (1 = young, 2 = pre-presbyopic, 3 = presbyopic)
- 2. spectacle prescription (1 = myope, 2 = hypermetrope)
- 3. astigmatic (1 = no, 2 = yes)
- 4. tear production rate (1 = reduced, 2 = normal)
- 5. class
- 1 = the patient should be fitted with hard contact lenses,
- 2 = the patient should be fitted with soft contact lenses,
- 3 = the patient should not be fitted with contact lenses.

**Variable *wages***

Details:

Determinants of Wages from the 1985 Current Population Survey

Therese Stukel

The datafile contains 534 observations on 11 variables sampled from the Current Population Survey of 1985

http://lib.stat.cmu.edu/datasets/CPS_85_Wages

Attribute Information:

Therese Stukel

The datafile contains 534 observations on 11 variables sampled from the Current Population Survey of 1985

http://lib.stat.cmu.edu/datasets/CPS_85_Wages

Attribute Information:

- 1. EDUCATION: Number of years of education.
- 2. SOUTH: Indicator variable for Southern Region (1=Person lives in South, 0=Person lives elsewhere).
- 3. SEX: Indicator variable for sex (1=Female, 0=Male).
- 4. EXPERIENCE: Number of years of work experience.
- 5. UNION: Indicator variable for union membership (1=Union member, 0=Not union member).
- 6. WAGE: Wage (dollars per hour).
- 7. AGE: Age (years).
- 8. RACE: Race (1=Other, 2=Hispanic, 3=White).
- 9. OCCUPATION: Occupational category (1=Management, 2=Sales, 3=Clerical, 4=Service, 5=Professional, 6=Other).
- 10. SECTOR: Sector (0=Other, 1=Manufacturing, 2=Construction).
- 11. MARR: Marital Status (0=Unmarried, 1=Married)

### Other classes in mlep-add

**Class principal-component-analysis**

Superclasses:

`common-lisp:standard-object`,

`sb-pcl::slot-object`,

`common-lisp:t`

Documented Subclasses:

None

Direct Slots:

data --

Converted data.

singular-values --

The singular values for every matrix.

unitary-matrix1 --

Unitary matrix U.

unitary-matrix2 --

Unitary matrix V.

data-set --

The data-set to be analyzed.

means --

Means of data.

Details:

Principal Component Analysis by Singular Value Decomposition

### Index of exported symbols

mlep: | *heights-weights*, variable |

mlep: | *iris*, variable |

mlep: | *lenses*, variable |

mlep: | *wages*, variable |

mlep: | analyze, function |

mlep: | classify, function |

mlep: | data-set, function |

mlep: | distance, function |

mlep: | forward, function |

mlep: | imputer, class |

mlep: | k, function |

mlep: | k-means, class |

mlep: | k-nearest-neighbors, class |

mlep: | learning-rate, function |

mlep: | markov-chain, class |

mlep: | max-likelihood, class |

mlep: | means, function |

mlep: | naive-bayes, class |

mlep: | neuronal-network, class |

mlep: | order, function |

mlep: | perceptron, class |

mlep: | plot-points, function |

mlep: | plot-values, function |

mlep-add: | principal-component-analysis, class |

mlep: | probabilities, function |

mlep: | random-from-to, function |

mlep: | run, function |

mlep: | set-labels, function |

mlep: | synthesize, function |

mlep: | test-set, function |

mlep: | transform, function |

mlep: | unique, function |