Verbalize is a Machine Learning Model that aims to predict Imagined Speech based on brain signals. Electroencephalography (EEG) data is used to create a model that is able to map the brain signals to its corresponding labels.

Responsive image

1 The Target Audience

This data from National Center for Biotechnology Information, there are over 2.5 million people in the world with Multiple Sclerosis.

Based on this data from the World Health Organization, between 250,000 and 500,000 people every year suffer from Spinal Cord Injuries.

The global prevalence of muscular dystrophy was estimated at 3.6 per 100,000 people with the largest prevalence in the Americans at 5.1 per 100,000 people as per this research from the National Center of Biotechnology Information.

Responsive image

This means a considerable amount of people can benefit from such a product being developed. Based on this information, I was able to create a User Persona for a potential user of this product.

Responsive image

2 Comparisons with AAC

Currently, there are Augmentative and alternative communication (AAC) devices available to help people "speak". There are no-tech, low-tech, and high-tech options available, the most popular being text-to-speech applications that can be installed on handheld devices such as a mobile phone or a tablet. This is how Verbalize Compares to them.

Responsive image

3 The Dataset

The dataset that I will be using to train an Imagined Speech model is from the 2020 International BCI Competition. This dataset contains 5 labels which are Hello, Help me, Stop, Thank you and Yes which are commonly used phrases.

After analyzing and understanding the dataset, I converted the MATLAB File to a CSV File that I can use to create a Model. I then realized something...

4 Practical Limitations

The dataset comes with the recordings for all 64 Channels. We can't expect our users to wear a 64-Channel EEG Cap during all times so we need to choose a more realistic option.

Performing Princple Component Analysis to find the most "impactful" channels for Imagined Speech may sound like a good option, but we need to keep in mind the hardware limitations. If the channels that contain the highest/most relevant information are not located in the vicinity, then it would be impractical to create an EEG headset that can capture all this information.

The alternative and much more realistic option was to look at existing EEG headsets that are portable. Two options I considered were:

Responsive image

I will be thoroughly experimenting with both the 8 Channel Dataset and 14 Channel Dataset (Channels chosen based on Hardware Specifications of these EEG Headsets) and will be choosing the one which resulted in a higher accuracy.

5 The Model

After a lot of experimentation, the 14-Channel Dataset was performing considerably better than the 8-Channel one. This was the model I settled with and it achieved an accuracy of 68% on a test set consisting of 50 Trials. My reason for choosing an LSTM (Long Short-Term Memory) lies in its ability to selectively remember or forget information from the previous time steps. This makes it suitable for processing time series data such as EEG signals, where the order in which signals appear is very important.

Responsive image

6 The Results

To get an idea of how consistent the model is, I ran it multiple times to check the variations in accuracy. The average accuracy of the model across these 5 runs was 67.6 and the Standard Deviation is 0.8. Based on these results, we can conclude that the model is consistent.

Responsive image

To examine the class-wise performance, I calculated the F1, Precision and Recall scores of the model for each class. From this, we can see that the model performs best at classifying Class 2 while the model performs worst at classifying Class 4. Perhaps the data for Class 4 is too similar to the other classes, leading to wrong classification. It's also possible that the Class 4 data has fewer distinct patterns or features that the model can learn from, making it harder to classify accurately.

Responsive image

7 My Reflections

As I have mentioned before, this is only the starting point to demonstrate a working prototype as the dataset only contains 5 labels. As the number of labels increases, the complexity of the model will also increase. It may require more advanced methods such as transfer learning or using ensemble models to create a fully working imagined speech model that can replicate regular speech. Building such a model will require a lot of expertise, experience and research. It can also be expensive to acquire and process the data required to build the model. Regardless, I am confident that my initial prototype can serve as a good starting point to find a working solution.

The best model is tailored for each individual, but it is not a practical approach to bring in a specific user to collect their data and design a model solely for them. One method to circumvent this is by increasing the number of participants from whom the EEG data is collected and thus, somewhat generalizing the model, but this process can be expensive and can be time-consuming. It is better if the participants of the data collection are part of the intended audience for the EEG headset. This is because the EEG signals can vary significantly across different populations, and therefore using data collected from individuals who are not representative of the target audience can lead to a less accurate model. In addition, if the participants are part of the intended audience, they may be more motivated to participate in the data collection process as they may be eventual users of the product.

I hope the model meets your expectations. In case you want to contact me for further projects, you can reach out to me at pradhyumnaag30@gmail.com.

8 References

https://www.nidcd.nih.gov/health/statistics/quick-statistics-voice-speech-language#:~:text=By%20the%20first%20grade%2C%20roughly,disorders%20have%20no%20known%20cause.&text=More%20than%20three%20million%20Americans%20(about%20one%20percent)%20stutter

https://www.degruyter.com/document/doi/10.1515/jisys-2022-0076/html

https://www.ncbi.nlm.nih.gov/books/NBK499849/

https://www.who.int/news-room/fact-sheets/detail/spinal-cord-injury

https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8848641/#:~:text=The%20global%20prevalence%20of%20muscular,%E2%80%937.8%20per%20100%2C000%20people)

https://osf.io/pq7vb/

https://www.emotiv.com/product/emotiv-epoc-x-14-channel-mobile-brainwear/#tab-description

https://neurosity.co/crown