Wednesday, August 19, 2015

Research Task 1 Notes

OpenViBe paper

Section 1:
The OpenViBE platform is a way to design BCI systems.
Section 2:
BioSig - software library for BCI research.
BCI1200 - system for BCI research.
BCI++ - framework for designing BCI systems.
Section 3:
OpenViBE is a free platform for designing, testing and using BCI. Very versatile, can adapt to various needs.
OpenViBE allows non programmers to design BCI without having to write code. Also can connect and work with VR systems easily.
Section 4:
4 types of users:
The developer - uses OpenViBE adding new pieces of software. OpenViBE has software development kit. Can either create new/modify old. Programmer.
The application developer - uses the software programming kit to create standalone applications. Ex: VR applications that the BCI can interact with. Programmer.
The author - uses easy process to create BCI system. Basic signal processing knowledge etc, but non-programmer.
The operator - tests out BCI, knows how to run it.
Section 5:
A- acquisition of training data
B- offline training
C- Online BCI loop:
1- brain activity measurment
2- preprocessing (filtering out/enhancing the specific signals)
3- feature extraction (extraction of features that describe relevant info embedded in the signals, such as the power of signals in specific frequency bands. These features are gathered in a feature vector.)
4- classification (the feature vector is fed into an algorithm called the classifier. This assigns a class to each vector, an identifier of the brain signal recognized.)
5- translation into a command
6- feedback (so the user can see if he did it correctly)
Section 6:
Tools provided in the system:
The acquisition server - provides interface to various kinds of acquisition machines (EEG systems, for ex.).
The designer - helps author build scenarios using a graphical interface. The author has access to a list of existing modules and can drag and drop them in the scenario window.
The 2D visualization features - specific boxes that include brain activity related visualizations. Ex: power spectrum, time frequency map etc.
Existing and pre-configured ready-to-use scenarios - proposed to assist author. Ex: a scenario that enables the user to spell letters using only brain activity.
Section 7:
The platform is able to interact with VR applications. It has a complete library to build 3D plugins, and offers the functionalities to load and modify a 3D scene based on input data. The platform also can be used to visualize brain activity in real time.
Section 8:
The box concept
The kernel
Plug-Ins
(Internals, how OpenViBE works)
Section 9:
Examples of implementation explained: handball game, use-the-force application
Section 10:
Tests proved efficiency of the system.
Section 11:
Available on windows
Includes a lot of boxes and stuff it can do
Section 12:
OpenViBE website

NueroSky MindWave user guide

I read through it all and despite all the fancy jargon (or maybe because of it) I couldn't successfully connect and start using mindwave.  So below is all necessary information regarding mindwave: (I copied/pasted these instructions from an email Mr. Lin sent me, they are really clear and helpful)
For pairing the Bluetooth device:
1. Left click the little "triangle" ("show hidden icons") at the bottom right area of the Taskbar. You will see the Bluetooth icon in the pop-up menu.
2. Right click the "Bluetooth icon", and select "Add a Device".
3. Now turn on the NeuroSky Mindwave headset, and you will see it blinking.
4. You can push the switch further (the blue light will blink faster) for a few seconds to pair with PC.
5. If you push too long, the light will turn red, and you will need to turn the headset off and on again. Try a few times, and it will work.
6. Once the PC found it, click the "Mindwave icon" and then click "Next".
7. After loading the driver and configuring the computer, there will be message to tell you "This device has been successfully added to this computer". Now click "Close" and you are done.
For running OpenViBE with the headset:
1. Click "Start > All Programs > OpenViBE > openvibe acquisition server. A terminal window will pop up and running some scripts.
2. At the end of the scripts, a new OpenViBE Acquisition Server window will pop up.
3. Change "Driver" to "NeuroSky MindSet" by clicking and selecting from the drop-down menu.
4. Click "Connect" and you will see some scripts (with green "INFs" running in the terminal window.
5. Click "Play". In a few seconds, you will see some "Blue Bars" changing at the bottom. It means that the OpenViBE has start acquire your brainwaves. You are all set!

For runnning OpenViBE Designer:

1. Click "Start > All Programs > OpenViBE > openvibe designer. Another terminal window will pop up and running some scripts.
2. At the end of the scripts, a new OpenViBE Designer window will pop up. Now you are ready to follow the video tutorial and learn the OpenViBE step-by-step!
NueroSky Mindwave and OpenViBE video
·       Try wearing the device backwards and compare results.
OpenViBE video tutorial
seems simple enough, but I tried following the instructions several times and it won’t work. (for example, I can’t seem to find the sample ghp files to run.)

Classification algorithms video series

Video 1:
Decision trees algorithm
Building a decision tree
When you have a set with samples of different patterns, set a threshold and divide it, building a tree (kinda like a family tree style). For ex: all elements higher than 90 go right, all lower than 90 go left. Now you purposely want to build your threshold so as to isolate different features. For ex: once you've divided your data in the first threshold, you see that all data on the right is blue, while the data on the left is red and blue. Success! You've established that all data higher than 90 is blue. Now the right leaf of your decision tree is done, and will not extend any further. Now create a new threshold for the left side (ex: all data older than a year right, all other left), and aim to create a leaf that will not continue. Do so until your last node yields two leaves that stop.
·       Leaf: branch on decision tree that does not continue (no entropy) (it's a 'pure' node)
·       Node: branch with entropy, that will continue to be divided in two so as to reduce the entropy
·       Threshold: a purity measure to improve the feature of selection.
·       Entropy: how uncertain are we of the outcome?
H(P1, P2,...Pn) = -K £ Pi Log Pn
(-P1/sum x log P1/sum)+(-P2/sum x log P2/sum) etc
·       Gain: how much entropy was reduced by some test?
Info(all features) - info(nodes) = gain(feature)
So basically said set is a bunch of jumbled together elements, thus containing high entropy. We set a threshold; we have now divided the set and created a pure leaf. We have lowered the entropy. What is the exact gain of this threshold then? (How much entropy did this specific threshold reduce?) calculate this by subtracting the entropy value of the (node + leaf) from the entropy value of the data before the threshold was set. Your answer will be the gain of the leaf you just created. (That's because if you've in fact created a pure leaf, the entropy of the leaf is zero. Thus, you're just subtracting the entropy of the node from the previous entropy value. If you've created two nodes, though, you might still have a gain, because the nodes might be better divided and filtered down. That's why you do (node+leaf), cause in case you haven't actually created a leaf it ends up being (node+node).)
Ex: follow along with his video, it's pretty clear once you get it.

Video 2:
Self organizing maps
The inputs are data from a specific set of neurons (let's say 2). They are on a graph.
1. Select a random input.
2. Compute winner neuron (which neuron is closest to input).
3. Update neurons (the neuron will be arranged closer to the input. But because both neurons are connected, in a sense, both neurons will kind of be rearranged).
4. Repeat for all inputs
5. Classify input data (all inputs closer to neuron 1 will be classified in the same groups it. All inputs closer to neuron 2 will be classified in the same group as it.).

Video 3:
K-Means algorithm
·       Proper for compact clusters
·       Sensitive to outliers and noise
·       Uses only numerical attributes
Input: set of feature vectors
K: number of clusters to be detected
Convergence: number of input that stays in the same cluster as it was last
1. Define an initial (random) solution as vectors of means Mt = (M1, M2,...MK)
2. Classify all input data according to initial solution (is it closest to M1, M2, etc etc)
3. Use classifications obtained in step 2 to recompute vectors of means as M(t+1)
4. Update t=(t+1)
5. If ||Mt - M(t-1)|| < convergence , use Mt as solution. If not go back to step 2.

·       (How to know what is convergence threshold??)

Video 4:
Neural networks algorithm
Propagate the neural network: go from input to output.
Follow along video example, it's pretty simple.
Two inputs: 0.0 and 1.0. We will plug both into the step function, and receive an output. We will plug both into the logistics function, and receive an output. Now we will forget the two old inputs and just take these two new outputs and they will become our new input values. Now we continue to propagate it by plugging both into the linear function, and receiving an output. This is the final output.
A) step function
1. Each input will receive a weight. In this case, 0.0 has a weight of 0.3, and 1.0 has a weight of 0.7.
2. Multiply each input by its weight. Add both results. In this case, (0.0x0.3=0.0)+(1.0x0.7=0.7)=0.7
3. Now plug your x value - 0.7 - into the activation function for the step function. (Imagine plotting a value on a graph for x and seeing what the y value is.)
4. In this case, f(x) = 1.0. This is our output. It will later become our new input.
B) logistic function
1. Again, each input will receive a new weight. In this case, 0.0 has a weight of 0.1 and 1.0 has a weight of 0.1.
2. Multiply each input by its weight. Add both results. In this case, (0.0x0.1=0.0)+(1.0x0.1=0.1)=0.1
3. Same as before. X value is 0.1, plug into activation function for logistic function.
4. F(x) = 0.2. Our output for this function (and our eventual new input) is 0.2.
C) linear function (now we are dealing with new inputs, the outputs from the past functions.)
1. Each input receives a weight. 1.0 has a weight of 0.3, and 0.2 has a weight of 0.5.
2. Same as before. (1.0x0.3=0.3)+(0.2x0.5=0.1)=0.4
3. X value is 0.4. Plug into activation function.
4. F(x) = 0.7. This is our final output.

Video 5:
SVM algorithm
·       For linearly separable binary sets
·       Hyperplane (imagine a line)
The goal is to design a hyperplane that classifies all training vectors into two classes.
Imagine two clusters on a graph. Now we need to draw a line to show a division between the two clusters. Many lines can be made. (let's say cluster one is above 20 and cluster 2 below 10. Now we can make a line y=12 and it divides both, but we can also draw y=13 and it divides it.) we want the line with the most margin between both classes. (In this case, y=15 is the best bet.)
The hyperplane (line) has an equation. If we plug in the values of the class above the hyperplane into the hyperplane's equation, it will yield values greater or equal to 1. If we plug in the class below the hyperplane, it will yield values less than or equal to -1. (In this case, when we plug in any value from the cluster under 10 into the (y=15) line's equation, it will give us a value less than or equal to -1, and if we plug in any value from the cluster above 20, it will give us a value greater than or equal to 1.) so the distance from the closest element until the hyperplane is always at least 1.
We are trying to maximize the separability and maximize the margin. Use equation.

Video 6:
kNN algorithm
Given N training vectors, kNN algorithm identifies the k nearest neighbors of c, regardless of labels.
Ex:
·       K=3
·       Classes a and o
·       Aim: Find class for c.
c is another point on the graph. We are going to determine if it falls into class a or o by finding its 3 nearest neighbors. We find that it's nearest neighbors are 3 elements, two of class o and one of class a. Thus, c is classified as class o.
Remarks:
·       Choose an odd k value for a two class problem
·       K must not be a multiple of the number of classes
·       The main drawback of kNN is the complexity for searching the nearest neighbors for each sample (in cases with lots of elements).
When k=1:
Each training vector will define a region in the feature space. Defines a voronoi partition. The distance between any point in this space till the vector is always smaller than the distance of the same point to a different vector (that's what makes it a voronoi partition).

Video 7:
Random forest algorithm
·       The combination of different learning models increases the classification accuracy.
·       Bagging: to average noisy and unbiased models to create a model with low variance
·       This algorithm works as a large collection of decorrelated decision trees.
How it works:
1. Assume matrix S of various training samples. fA1---fAn, fB1---fBn, etc. (feature A of sample 1 till sample n. Feature B of sample 1 till sample n.) last column specifies how many features of the training class. (Ex: C1---Cn)
2. Create random subsets. (For ex: all features of samples 12, 15, and 35 in one subset. All features of samples 2, 6, and 20 in another subset. Etc)
3. Create decision trees using your random subsets. (So let's say you now have 4 trees you created using four random subsets.)
4. Predict the classification for each sample. (Let's say 2 trees are classified as class 1, one as class 2 and one as class 3.)
5. Count the votes. (In this case, the information will be classified as class 1.)

Video 8:
Supervised and unsupervised classification algorithm
·       A priori information: information that is justified by knowledge. We know something about this information.
·       When dealing with supervised classification, we are dealing with a priori information.
Ex: points on a graph. We are trying to classify them. The information we know: three areas of training patterns. Now we can classify all other points based on these areas of patterns.
·       Examples of supervised algorithms: decision trees, random forests, kNN, svm, neural networks
·       When dealing with unsupervised classifications we do not have a priori information available.
To find groups of points we use clustering. Ex: a cluster of points are close to each other so we group them together. The question is always how many clusters to define. Sometimes it is given (for example in k-means, the k stands for how many clusters we are supposed to find.).
·       Examples of unsupervised algorithms: k-means, self organizing maps, expectation-maximization, isodata, hierarquical clustering

Video 9:
MLE algorithm
we are trying to find class conditional density given a feature vector and given C sets.
Ex: c=3. We have three classes - red, blue, green. Every class has several values. We are trying to find the parameters of each class. So let's say for class red,
·       average x value = -0.07.
·       The standard deviation in x values = 3.02.
·       The average y value = 4.83
·       The standard deviation in y values = 0.98.
We do the same for class blue and green. This defines the parameters of all classes.
So if we're trying to find the maximum likelihood estimation of a random point, we use the class conditional density equation. We plug in the x coordinate and get an answer, we plug in the y coordinate and get an answer, and we add the two. We repeat with all classes. The class that yields the highest result is the class that this point has the maximum likelihood of being in.



Monday, August 3, 2015

Research Task 2 (08/03/15 - 08/16/15)

While you are re-reading the materials from task 1 and learning how to run the OpenViBE with the Mindwave headset, we are going to continue our journey.

The second task includes the study of two important classification techniques in OpenViBE: Support Vector Machine (SVM) and Linear Discriminate Analysis (LDA). They are purely mathematical in nature, so, it is expected to be very challenging! However, don't get frustrated too quick, since most likely, you are not going to develop, modify, or prove those math tools. Instead, you will use OpenViBE to access those tools in your application. So, understand the schemes, learn the vocabularies, be familiar with the mathematical symbols, and grasp the concepts are the most important.

You may need some Linear Algebra to read the math. Feel free to google whatever you don't know. Some matrix operations are not too hard to pick up. Good luck!
     
Classification Algorithm - Support Vector Machine (SVM)

       Start with a series of 3 simple step-by-step tutorials about SVM.
  1. SVM - Understanding the math - Part 1 - The margin by Alexandre Kowalczyk.
  2. SVM - Understanding the math - Part 2 - Calculate the margin by Alexandre Kowalczyk.
  3. SVM - Understanding the math - Part 3 - The optimal hyperplane by Alexandre Kowalczyk.
       Followed by another 2 in-depth videos:
  1. SVM Tutorial - Part 1: A YouTube video tutorial on Support Vector Machine.
  2. Support Vector Machines (1): Linear SVMs, primal form: Covers the primal form of SVM in a concise way.
Classification Algorithm - Linear Discriminate Analysis (LDA)

       Study the following documents. It may need some repetition in order to grasp the concepts.
  1. Pattern recognition systems – Lab 10 Linear Discriminant Analysis.
  2. Lecture 10- Linear Discriminant Analysis.
  3. A Tutorial on Data Reduction Linear Discriminant Analysis LDA
  4. Fisher Linear Discriminant Analysis.