Saturday, October 27, 2018

Hacking Brain with Neural Network

Detecting brain activity state using Brain Computer Interface

Brain computer interface device (BCI) from OpenBCI
You may have noticed that within past few years scientists, labs, startups and companies like Facebook and Google are working on brain computer interfaces that will enable human to interact with machine just with power of thought.

You may think that such technology is exotic and at least very expensive to play with. However, today there are devices available on the market that would allow you to experiment on, write your own projects and learn how to create mind machine yourself!

Within this project let me guide you through one of such exercises - reading human brain with handy and geek friendly device.


Brain Computer Interface sometimes called a neural-control interface (NCI), mind-machine interface (MMI), direct neural interface (DNI), or brain–machine interface (BMI), is a direct communication pathway between an enhanced or wired brain and an external device. BCI differs from neuromodulation in that it allows for bidirectional information flow. BCIs are often directed at researching, mapping, assisting, augmenting, or repairing human cognitive or sensory-motor functions.
Brain computer interface technology represents a highly growing field of research with application systems. Its contributions in medical fields range from prevention to neuronal rehabilitation for serious injuries. Mind reading, and remote communication have their unique fingerprint in numerous fields such as educational, self-regulation, production, marketing, security as well as games and entertainment. It creates a mutual understanding between users and the surrounding systems. 

The motivation for this project is to research and learn the BCI technology with applying Machine Learning algorithms to estimate the complexity of creating systems of communications between human and machine or in between humans with machine as intermediaries, using brain activity and BCI devices.
The data analysis and model training were performed on pre-collected datasets from EEG device.
Electroencephalography is an electrophysiological monitoring method to record electrical activity of the brain. It is typically noninvasive, with the electrodes placed along the scalp, although invasive electrodes are sometimes used such as in electrocorticography. EEG measures voltage fluctuations resulting from ionic current within the neurons of the brain. The EEG device used for this project is Ultracortex "Mark IV" EEG Headset (8 channels) from OpenBCI. EEG has 8 electrodes that located on designated areas of scalp and provide 8 channels data respectively.
For easy exploration and convenient representation, the analysis and algorithmic model were put in Jupyter notebook. Code and requirements along with the notebook you may find in GitHub repository or open notebook in separate tab right away to navigate as you read material here and lookup the code.
         Data collected and used for this project is not available within repository. If you are interested in data sample please see Data_extract.txt file.

Case statement

For the purpose of this project it was decided to train neural network model to recognize two user's states while user performs two different but mentally similar activities: reading and writing. Reading and writing activities were performed in almost identic environments but within various 24 hours ranges. User was reading the same book and writing at the same desk in same light conditions and pose.
Data collection session consisted of two phases: 1) EEG setup (mounting, connecting to interface, checking); 2) Recording while performing activity (when user reads or writes, EEG is activated and recording signals to file).
For network training the significant amount of data is required. To make collected data consistent and eliminate excessive preprocessing, data samples should be coherent, and noise excluded (occurred due to distractions, unwanted muscular activity, etc.). With that purpose multiple recorded datasets need to fit 'identical' environment and psychological conditions of user. To produce datasets more coherent to each other it was decided to make recordings for around one-minute length.

Data Exploration

During this project it was noted that the environment, user’s mood and psychological state play very important role. Such conditions like daydreaming, twilight state (when person is drifting to sleep or from awake) become significant factors from data standpoint. Within those factors such brain waves of specific diapason – alpha – may skew sample data significantly.
Therefore, there were three options for collecting and analyzing data identified:
Option 1:To collect necessary amount of data with multiple datasets obtained within multiple sessions for short period of time. For instance, in between 5pm and 8pm within single 24h period.
Option 2:To collect large number of datasets during multiple sessions completed within long range of time - one week, month or couple of months.
Option 3:Mixed option - collect data in similar conditions within unidentified period of time. For instance, each day between 6pm and 8pm within couple of weeks.
Option 1 is good to exclude mental and psychological states that may impact research results, while option 2 might be good to embrace most occurred states for long period of time to consider them and train the model appropriately.
Benefit from 1-st option, where the focus is just on current user’s state and psychological state is to get relatively fast and proved results for research, train model on more coherent data and obviously achieving goal of the project, distinguishing 2 activities' states. However, within this option there is no opportunity to use pre-trained model for prediction of user's states at any other time in future due to lack of background data generated from users mental and psychological conditions. This option is also not perfect to proceed due to persons physical and emotional variance - working on project in lab conditions reading and writing states blur and merge into single 'dreaming/abandoned' passive condition, where person starts loosing focus. This drawback was noticed after numerous attempts and analysis of brainwaves' patterns.
There is much more benefit from 2-nd option, where users mental and psychological states may be considered due to long term period of data collection.
Using this option, unique dataset will be created that is useful not only for purposes of this research. Also, dataset collected within such option will help to train the model, which may be used in future any time to predict user's state of given actions (reading/writing). Such model will be more reliable. However, this option is very time and resource consuming.
Given the purpose of the project, option 3 was selected and performed.
8 channels EEG data is represented in ┬ÁV and collected with 1000Hz sample rate. Filters are applied within equipment firmware to generate data with minimum noise. For this particular dataset the 7–13Hz bandpass and a 60Hz notch filters were applied.

Data Visualization

Obtaining data and first look:
It is necessary to get 8 channels of data from EEG recorded dataset. Raw datafile contains more than that and requires preprocessing and cleanup. 
Sample of data

As you can see from tiny sample of data above, there are columns 1 - 8 representing corresponding channels (obtained data from respective electrodes 1 - 8). Column 12 is required to break out data by seconds.

Below is plot of one of the whole recorded sessions.
Raw dataset

Data Preprocessing

Samples rebalancing

Data is logged and recorded in high resolution with 1000 Hz sample rate. Ideally each row of data represents 1/1000 fraction of 1 second. Other words, there are 1000 samples/ impulses recorded per 1 second. And 1 second represents 1 data sample required for the network model. However, due to equipment specifics some data cycles/impulses may be lost, causing unequal number of impulses per second rather than 1000. Therefore, the goal is to compile each sample of data input consisting of equal number of impulses for 1 second.
Therefore to balance the data and equalize number of impulses in one data sample it is needed to go through each sample of training data and preprocess it respecitvely. It was decided to equalize each sample of data to have 990 rows (recorded impulses). Samples with less rows than 990 will be deleted.
Spikes cleaning

Before even rescaling the dataset, there is another problem to solve. Although data received from EEG is filtered, it can still contain noise  - spikes that are coming from muscular activity. This needs to be eliminated.
Lets see short sample of data before rescaling. There is noticeable spike between second 15 and 16.

Short piece of data before rescaling and spikes removing.
To handle that it was decided to remove spikes on non-rescaled dataset first. The main parameter for spikes removing is ‘margin’variable, which sets up the ‘corridor’ for wave oscillation. Such ‘corridor’ is set up by written function 'variance_clean()'Wave taken by that ‘corridor’ function will be trimmed to fit. It is important to note that spikes trimming was performed for each second of wave – such approach helped to preserve wave’s wider dynamic range observed through whole recorded time long.
Below image represents the part of wave with spike:


And here is the same part with spike after trimming:
Trimmed spike

There is still shift on the place where spike has been removed, however such slip has  much less impact on data integrity. 
It also noticeable that the length of dataset (number of seconds we want to work with) decreases while we clean up the data. For this reason simple 'seconds()' function was used to monitor the remaining useful data left for training. 


One of the most important functions for data preprocessing is scaling function -'scaler()'. With it help the dataset is rescaled with equal seconds batches.
Rescaling is major thing for dataset before feeding it to training model. But each data sample can’t be rescaled against the whole dataset because it will skew the integrity and content of dataset after rescaling. As a result of wholistic rescaling it could be possible to train the model and even achieve good results, but such model will not be able to make prediction on shorter datasets that would represent just couple of seconds.

Below images shows how long (300 seconds/samples)  'reading' dataset looks like.

'Reading' dataset
It is noticeable where subsets received from multiple sessions are concatenated and how scattered data is. 

As a result it was decided to rescale dataset by seconds' batches. Dedicated 'scaler()' function iterates through all dataset with short seconds-long window to rescale each window/batch consecutively. The size of window/batch was defined given two goals:
1) Shortest possible interval for the model to be able to predict (especially will be useful for live wave stream recognition);
2) Get highest model accuracy. Given the amount of training data and specifics of this project it was challenging to pick the right size of batch to train the model effectively.

Given rescaled batches it should not be confused with training batches for the model, where one second represents just one data sample for training.

With regards to rescaling range it was decided to rescale all channels in between 0 and 1 range.
On below image there are samples of data after seconds rebalancing and spikes cleaning:

Spikes cleaned
Below image depicts data samples after rescaling – last preprocessing step before labeling and training:

Preprocessed set

Noticeable shifts are acceptable as far as shifts' edges coincide with rescaled batches boundaries.

Preprocessed results

Lets have a look now how larger dataset looks before preprocessing and after.
'Reading' raw dataset
'Reading' dataset after preprocessing
Lets zoom into couple of rescaled batches:

Batches concatenation

Labeling and training preparation

After data is preprocessed it should be properly labelled. For that purpose, two labels were created: 0 and 1 - 'reading' and 'writing' correspondingly. Dataset that contained recorded ‘reading’ state has been measured in length and respective series of labels of the same size have been created. The same approach was taken for ‘writing’ dataset. After that two datasets have been concatenated: features data represents ‘x’ data variable and labels data represents ‘y’ data variable. It is important to note that output labels were one-hot encoded using keras.utils.to_categorical()function.
Before model training, the compiled dataset has been split onto training and testing samples. For this purpose train_test_split() function was used from python sklearn library (dataset size allowed to use that - beware having enormously big datasets you will not be able to use train_test_split() from sklearn and more likely to write something custom).

The Model

Convolutional 1D Neural Network model from Keras has been selected due to specifics and volume of data to train on. As far as the problem deals with classification and the model will be used for multiclassification, the decision was to build corresponding model architecture.
The 1D Convolutional model consists of 2 convolutional layers, 2 MaxPooling layers, 2 BatchNormalization, 1 Flatten and 2 Dense layers where last Dense layer has softmax activation. Such architecture allows to train model fast and with high accuracy.

Model summary:
Model Summary
For model optimization ‘Adam’optimizer has been selected as best practice. Learning rate was set on 0.001. Such configuration for optimizer worked best.
For the purposes of saving best training results and use them to upload model when necessary ‘ModelCheckpoint()’ function has been used to store model weights in separate file.
Model was trained on 16 batches through 25 epochs. Such number of epochs proved to be optimal for training. If model was not successfully trained within 25 batches, it was a sign to change the model parameters rather that number of batches.
Parameters of model were adjusted in course of multiple training attempts. The final model has been tested on different datasets, different rescaled batches sizes to prove model’s efficiency. However, it was noted that if dataset was significantly skewed by mental/physical state of user, or subsets had different length, the model parameters required some tweaks to reach highest accuracy. The following parameters of the model may need to be tuned in case if dataset features like length change drastically:
  •      Pool size of MaxPooling layer;
  •      Batch size.

Rest of parameters were proven to be stable. In case if model fails to train this an evidence of low quality dataset that need to be fixed. 
Given the task for this project the higher model accuracy the better it will help in recognizing ‘reading’ and ‘writing’ states through complex noisy environment. So it is critical to receive highest accuracy possible. 

Benchmarking and Test

The benchmarking goal is to test model in real life conditions, i.e. on absolutely new dataset. For this purpose, new dataset was collected separately at different time. As it was mentioned before, the challenge here is variability of data given users mental and physical states. Also, given that data collected is relatively small and does not refer to persons mental states, the results expected might be not very much impressive. If total accuracy of developed model is more than 65% it means that approach works, and project goal may deem accomplished for now.
There were bunch of new datasets used for testing. Some of them turned test to prove model’s accuracy more than 50%, some of them, up to 93%. 
Accuracy measurement:
Total model accuracy is calculated by measuring average accuracy for both reading test and writing test. Each test accuracy is measured by number of correct predictions of state. For example, if model predicts 50 seconds of reading from 100 seconds of reading test set, the accuracy is 50% for reading. 
The formula for overall accuracy measurement is following: 
Accuracy = (Nr / Tr + Nw / Tw) / S
Nr – correct 'reading' seconds predicted
Tr – total 'reading' seconds
Nw – correct 'writing' seconds predicted
Tw – total 'writing' seconds
S – number of states (in this case = 2 (reading and writing)

The length of benchmark dataset should be not less than 30 seconds to ensure clean results of the test.
In this project benchmark test there were two new datasets used for reading and writing with 50 seconds and 100 seconds respectively.
Final model benchmark test has proved that the model corresponds to the problem and resolves it. Accuracy achieved was over 70%. Please see notebook for detailed results.


The idea of recognizing ‘reading’and ‘writing’states was picked as simple task to train and test the model for the purpose of this project. 
However, in course of research it was identified that those two states may not significantly differ from each other. First of all, when person performs routine exercises it is very hard to capture expressive signal patterns, secondly it was noted that person while reading or writing is being mostly in idle state.
What might help to contribute in distinguishing those state are eyeballs muscular patterns that may be captured from electrodes located on forehead and mid-scalp. In the rest of it, both writing or reading activities may activate similar areas of brain. For example, when person writes not just piece of text but something that requires thinking ahead, person is planning the story and vision-processing regions in the brain become active. Same process may appear while reading thoughtfully and focused
It was noticed that unfocused reading as well as compulsive writing does not bring any benefit for the purpose of research. To address that challenge the user’s data was collected within different time ranges. There was also control put in place to maintain user focused on specific task. At the end of the day only reliable datasets were picked for training with most confidence of being appropriate for each ‘reading’ and ‘writing’ state.

Finally, the model was able to distinguish such activities given the patterns learned. The accuracy tested may not be impressive, however this is good start to research this problematic further.

Future plans

Given this project proved opportunity to create deep learning models to train on brain signals obtained from BCI device - research and experiments to be continued!

My next project with BCI will deal with recognizing limbs movement.

Stay tuned!


  1. Hi, I'm amazed by this project because i'm working on something similar(music listening data for training neural net or SVM). You wrote, that filters were applied within equipment firmware, the 7–13Hz bandpass and a 60Hz notch filters were applied. I'm aware, that there is no bandpass or notch filter on firmware(I'm using ganglion and there is 0.3 Hz high pass filter) Also when bandpass is applied, there should be dc offset in data. am i wrong ? If there was dc offset in data, how would it affect results ?

    1. Hi! This is very exciting project you are working on and it is very challenging. It will be tricky to train NN to recognize - good luck on this and hope you will share your progress!
      The notch filter will disable DC offset. Not familiar with Ganglion, however you should be able to switch it on from GUI. Have you installed latest version: ?
      Here is similar question:
      or this:
      If you keep 60 Hz offset and train your NN you may not reach the goal in your project - it is better to get rid of this noise.

    2. Sorry, I was a little bit confused, because I didn't realized, that you are using OpenBCI GUI to save data (I'm using Python library, so I have to do the filtering by myself). I don't thing that notch filter removes dc offset, but high pass (or band pass) does. :
      Did you do any filtering by yourself ? Because I'm aware, that GUI saves data without applied filters. :
      Data shown in your Jupyter notebook, just looks unfiltered to me.

    3. I see now. This is great you have mentioned you are using python libs. I am working with it as well on instant live recognition. I will check GUI capabilities and limitations and come back to confirm on your concern. Thank you for point!

    4. You are welcome. What have you find out ?

    5. Ok, for consistency I have conducted tests here:

      GUI filtering does its job and data can be used for model training quite well with notch 60 and bandpass 7-13.

      Another thing that confuses are values recorded in 10K diapason. Will take a look at this when I am hands-on with python processing. Let me know what you find out also till this moment.

    6. I looked at it :
      I used your data and did some fft plots and they show the opposite. Take a look.
      I also found this :

    7. See also this:

    8. Fantastic! Thank you, it makes a lot more sense. So basically, I think it would be efficient just to remove DC, especially given this:

      I will use your filters to make the exercise again to see what happens.

    9. I'm glad I helped. I hope, there will be improvement, then please share the results. :)