Mind reading and AI: Non-invasive brain text decoding and sensors for deep learning architectures from Meta AI

Konrad Wolfenstein

1 year ago

Mind reading and AI: Non-invasive brain-text decoding and sensors for deep learning architectures from Meta AI – Image: Xpert.Digital

The future of human-machine interaction is now – brain signals as the key to communication

Brain-to-text decoding technologies: A comparison between non-invasive and invasive approaches

The ability to translate thoughts into text represents a revolutionary advancement in human-computer interaction and holds the potential to fundamentally improve the quality of life for people with communication impairments. Both Meta AI's non-invasive Brain2Qwerty technology and invasive electrocorticography (ECoG) aim to achieve this goal by decoding speech intentions directly from brain signals. Although both technologies share the same overarching objective, they differ fundamentally in their approach, strengths, and weaknesses. This comprehensive comparison highlights the crucial advantages of the non-invasive method without diminishing the role and benefits of invasive procedures.

Safety profile and clinical risks: A crucial difference

The most significant difference between non-invasive and invasive brain-computer interfaces (BCIs) lies in their safety profile and the associated clinical risks. This aspect is of central importance, as it significantly influences the accessibility, applicability, and long-term acceptance of these technologies.

Avoiding neurosurgical complications: An undeniable advantage of non-invasive procedures

Electrocorticography (ECoG) requires neurosurgical intervention in which electrode arrays are implanted directly onto the surface of the brain, beneath the dura mater (the outermost membrane covering the brain). While routinely performed in specialized centers, this procedure carries inherent risks. Statistics indicate a 2 to 5 percent risk of serious complications following such procedures. These complications can encompass a wide range, including:

Intracranial hemorrhages

Bleeding within the skull, such as subdural hematomas (blood collections between the dura mater and arachnoid mater) or intracerebral hemorrhages (bleeding directly within the brain tissue), can be caused by the surgery itself or by the presence of the electrodes. This bleeding can lead to increased intracranial pressure, neurological deficits, and in severe cases, even death.

Infections

Every surgical procedure carries a risk of infection. With ECoG implantation, infections of the wound, the meninges (meningitis), or the brain tissue (encephalitis) can occur. Such infections often require aggressive antibiotic therapy and, in rare cases, can lead to permanent neurological damage.

Neurological deficits

Although the goal of ECoG implantation is to improve neurological function, there is a risk that the procedure itself or the placement of the electrodes may lead to new neurological deficits. These can manifest as weakness, loss of sensation, speech disorders, seizures, or cognitive impairment. In some cases, these deficits may be temporary, but in others, they may be permanent.

Anesthesia-related complications

ECoG implantation usually requires general anesthesia, which also carries its own risks, including allergic reactions, respiratory problems, and cardiovascular complications.

In contrast, Meta AI's MEG/EEG-based approach completely eliminates these risks. This non-invasive method involves attaching sensors externally to the scalp, similar to a conventional EEG examination. No surgery is required, thus avoiding all the aforementioned complications. Clinical trials with the Brain2Qwerty system, conducted with 35 participants, showed no adverse effects requiring treatment. This underscores the superior safety profile of non-invasive methods.

Long-term stability and hardware failure: An advantage for chronic applications

Another important aspect regarding clinical applicability is the long-term stability of the systems and the risk of hardware failure. With ECoG electrodes, there is a risk of them losing functionality over time due to tissue scarring or electrode degradation. Studies suggest that ECoG electrodes can have a lifespan of approximately 2 to 5 years. After this time, electrode replacement may be necessary, which involves another surgical procedure and its associated risks. Furthermore, there is always the possibility of sudden hardware failure, which can abruptly terminate the system's functionality.

Non-invasive systems, such as those developed by Meta AI, offer a clear advantage in this regard. Because the sensors are attached externally, they are not subject to the same biological degradation processes as implanted electrodes. Non-invasive systems offer virtually unlimited maintenance cycles. Components can be replaced or upgraded as needed without requiring invasive surgery. This long-term stability is particularly crucial for chronic applications, especially for patients with locked-in syndrome or other chronic paralysis conditions who rely on a permanent communication solution. The need for repeated surgical interventions and the risk of hardware failure would significantly impair these patients' quality of life and limit the acceptance of invasive systems for long-term applications.

Signal quality and decoding performance: A detailed comparison

While safety is an undeniable advantage of non-invasive methods, signal quality and the resulting decoding performance are a more complex field where both invasive and non-invasive approaches have their strengths and weaknesses.

Spatial-temporal resolution comparison: Precision vs. Non-invasiveness

ECoG systems, in which electrodes are placed directly on the cerebral cortex, offer outstanding spatial and temporal resolution. The spatial resolution of ECoG is typically in the range of 1 to 2 millimeters, meaning it can capture neural activity from very small and specific areas of the brain. The temporal resolution is also excellent, at approximately 1 millisecond, enabling ECoG systems to accurately capture extremely rapid neural events. This high resolution allows ECoG systems to achieve clinically validated character error rates (CER) of less than 5%. This means that out of 100 characters generated with an ECoG-based BCI, fewer than 5 will contain errors. This high accuracy is crucial for effective and fluent communication.

Brain2Qwerty, Meta AI's non-invasive system, currently achieves sign error rates of 19 to 32% using magnetoencephalography (MEG). While these are higher error rates compared to ECoG, it is important to emphasize that these results are achieved with a non-invasive method that carries no surgical risks. The spatial resolution of MEG is in the range of 2 to 3 millimeters, which is slightly lower than ECoG but still sufficient to capture relevant neural signals. The temporal resolution of MEG is also very good, in the millisecond range.

However, Meta AI has made significant progress in improving the signal quality and decoding performance of non-invasive systems. This progress is based on three key innovations:

CNN-Transformer hybrid architecture

This advanced architecture combines the strengths of convolutional neural networks (CNNs) and transformer networks. CNNs are particularly effective at extracting spatial features from the complex patterns of neural activity captured by MEG and EEG. They can identify local patterns and spatial relationships in the data that are relevant for decoding speech intentions. Transformer networks, on the other hand, excel at learning and utilizing linguistic context. They can model the relationships between words and sentences over long distances, thereby improving the prediction of speech intentions based on context. Combining these two architectures in a hybrid model allows for the effective use of both spatial features and linguistic context to enhance decoding accuracy.

Wav2Vec integration

The integration of Wav2Vec, a self-supervised learning model for speech representations, represents another significant advancement. Wav2Vec is pre-trained on large amounts of unlabeled audio data, learning to extract robust and context-rich representations of speech. By integrating Wav2Vec into the Brain2Qwerty system, neural signals can be matched against these pre-built speech representations. This allows the system to learn the relationship between neural activity and linguistic patterns more effectively and improve decoding accuracy. Self-supervised learning is particularly valuable because it reduces the need for large amounts of labeled training data, which are often difficult to obtain in neuroscience.

Multisensor fusion

Brain2Qwerty leverages synergistic effects by fusing MEG and high-density electroencephalography (HD-EEG). MEG and EEG are complementary neurophysiological measurement techniques. MEG measures magnetic fields generated by neuronal activity, while EEG measures electrical potentials at the scalp. MEG offers superior spatial resolution and is less susceptible to artifacts from the skull, while EEG is more cost-effective and portable. By simultaneously acquiring and fusing MEG and HD-EEG data, the Brain2Qwerty system can harness the advantages of both modalities, further enhancing signal quality and decoding performance. HD-EEG systems with up to 256 channels enable a more detailed capture of electrical activity at the scalp, complementing the spatial precision of MEG.

Cognitive decoding depth: Beyond motor skills

A key advantage of non-invasive systems like Brain2Qwerty lies in their ability to go beyond simply measuring motor cortex activity and also capture higher-level language processes. ECoG, particularly when placed in motor areas, primarily measures activity related to the motor execution of speech, such as movements of the speech muscles. Brain2Qwerty, on the other hand, by utilizing MEG and EEG, can also capture activity from other brain regions involved in more complex language processes, such as:

Correction of typos through semantic prediction

Brain2Qwerty is able to correct typos by using semantic prediction. The system analyzes the context of the entered words and sentences and can recognize likely errors and correct them automatically. This significantly improves the fluency and accuracy of communication. This ability to make semantic predictions suggests that the system not only decodes motor intentions but has also developed a certain understanding of the semantic content of language.

Reconstruction of complete sets outside of the training set

A remarkable feature of Brain2Qwerty is its ability to reconstruct complete sentences, even when those sentences were not included in the original training dataset. This suggests a generalization capability of the system that goes beyond simply memorizing patterns. The system appears to be able to learn underlying language structures and rules and apply them to new and unfamiliar sentences. This is an important step toward more natural and flexible brain-text interfaces.

Detection of abstract language intentions

Initial studies have shown that Brain2Qwerty achieves an accuracy of 40% in detecting abstract speech intentions in untrained participants. Abstract speech intentions refer to the overarching communicative intent behind an utterance, such as "I want to ask a question," "I want to express my opinion," or "I want to tell a story." The ability to recognize such abstract intentions suggests that non-invasive BCIs may one day be able to not only decode individual words or sentences but also understand the user's overarching communicative intent. This could lay the foundation for more natural and dialogue-oriented human-computer interactions.

It is important to note that the decoding performance of non-invasive systems has not yet reached the level of invasive ECoG systems. ECoG remains superior in terms of decoding precision and speed. However, advances in non-invasive signal processing and deep learning are steadily closing this gap.

Scalability and application range: accessibility and cost-efficiency

Besides safety and decoding performance, scalability and applicability play a crucial role in the widespread acceptance and societal benefit of brain-text decoding technologies. In this area, non-invasive systems show clear advantages over invasive methods.

Cost efficiency and accessibility: Reducing barriers

A key factor influencing the scalability and accessibility of technologies is cost. ECoG systems are associated with significant costs due to the need for surgery, specialized medical equipment, and highly skilled personnel. The total cost of an ECoG system, including implantation and long-term monitoring, can reach approximately €250,000 or more. These high costs make ECoG systems unaffordable for the general public and restrict their use to specialized medical centers.

In contrast, Meta AI, with its MEG-based solution Brain2Qwerty, aims for significantly lower costs. By utilizing non-invasive sensors and the possibility of mass-producing MEG devices, the goal is to reduce the cost per device to below €50,000. This substantial cost difference would make non-invasive BCIs accessible to a much larger number of people. Furthermore, non-invasive systems eliminate the need for specialized neurosurgical centers. Applications could be made in a wider range of medical settings and even in home environments. This is a crucial factor for providing care to rural areas and ensuring equitable access to this technology for people worldwide. The lower costs and greater accessibility of non-invasive systems have the potential to transform brain-text decoding technology from a specialized and expensive treatment into a more widely available and affordable solution.

Adaptive generalizability: Personalization vs. standardization

Another aspect of scalability is the adaptability and generalizability of the systems. ECoG models typically require individual calibration for each patient. This is because the neural signals recorded by ECoG electrodes are highly dependent on the individual brain anatomy, electrode placement, and other patient-specific factors. Individual calibration can be time-consuming, requiring up to 40 training hours per patient. This calibration effort presents a significant obstacle to the widespread use of ECoG systems.

Brain2Qwerty takes a different approach, utilizing transfer learning to reduce the need for time-consuming individual calibration. The system is pre-trained on a large dataset of MEG/EEG data collected from 169 individuals. This pre-trained model already contains extensive knowledge about the relationship between neural signals and speech intentions. For new participants, only a short adaptation phase of 2 to 5 hours is required to tailor the model to the individual characteristics of each user. This short adaptation phase makes it possible to achieve 75% of the maximum decoding performance with minimal effort. The use of transfer learning enables significantly faster and more efficient commissioning of non-invasive systems, thus contributing to their scalability and broad applicability. The ability to transfer a pre-trained model to new users is a key advantage of non-invasive BCIs in terms of their widespread applicability.

Ethical and regulatory aspects: Data protection and admission procedures

The development and application of brain-text decoding technologies raises important ethical and regulatory questions that must be carefully considered. Differences also exist between invasive and non-invasive approaches in this field.

Data protection through limited signal yield: Protection of privacy

An ethical aspect often discussed in connection with BCIs is data privacy and the possibility of thought manipulation. Invasive ECoG systems, which allow direct access to brain activity, potentially pose a higher risk of brain data misuse. In principle, ECoG systems could be used not only to decode speech intentions but also to record other cognitive processes and even to manipulate thoughts through closed-loop stimulation. Although current technology is still far from such scenarios, it is important to keep these potential risks in mind and develop appropriate safeguards.

Brain2Qwerty and other non-invasive systems are limited to the passive acquisition of motor intention signals. Their architecture is designed to automatically filter out non-verbal activity patterns. The attenuated and noisy signals captured by MEG and EEG due to scalp interference make it technically more challenging to extract detailed cognitive information or even manipulate thoughts. The “limited signal yield” of non-invasive methods can, in some ways, be seen as a protection of privacy. However, it is important to emphasize that non-invasive BCIs also raise ethical questions, particularly regarding data protection, informed consent, and the potential for misuse of the technology. It is essential to develop ethical guidelines and regulatory frameworks that ensure the responsible use of all types of BCIs.

Approval pathway for medical devices: Faster to application

The regulatory pathway for medical device approval is another important factor influencing the speed at which new technologies can be introduced into clinical practice. Invasive ECoG systems are generally classified as high-risk medical devices because they require surgical intervention and can potentially cause serious complications. Therefore, the approval of ECoG systems requires extensive Phase III trials with comprehensive long-term safety data. This approval process can take several years and require significant resources.

Non-invasive systems, on the other hand, potentially have a faster regulatory pathway. In the United States, non-invasive systems that build upon and complement existing EEG/MEG devices may be eligible for approval through the Food and Drug Administration's (FDA) 510(k) process. The 510(k) process is a simplified approval pathway for medical devices that are "substantially equivalent" to already approved products. This faster pathway could allow non-invasive brain-text decoding technologies to enter clinical use more quickly and benefit patients sooner. However, it is important to emphasize that even for non-invasive systems, rigorous safety and efficacy evidence is required for approval. The regulatory framework for BCIs is an evolving field, and it is essential that regulators, researchers, and industry collaborate to develop clear and appropriate regulatory pathways that foster innovation while ensuring patient safety.

Limitations of the non-invasive approach: Technical challenges remain

Despite the numerous advantages of non-invasive brain-text decoding systems, it is important to acknowledge the existing technical hurdles and limitations. These challenges must be addressed to fully realize the potential of non-invasive BCIs.

Real-time latency

Brain2Qwerty and other non-invasive systems currently exhibit higher decoding latency than invasive ECoG systems. Brain2Qwerty decodes speech intentions only after a sentence has finished, resulting in a delay of approximately 5 seconds. In comparison, ECoG systems achieve a significantly lower latency of around 200 milliseconds, enabling near real-time communication. The higher latency of non-invasive systems is due to the more complex signal processing and the need to analyze weaker and noisier signals. Reducing latency is a key goal for the further development of non-invasive BCIs to enable smoother and more natural communication.

Motion artifacts

MEG systems are highly sensitive to motion artifacts. Even slight head movements can significantly disrupt measurements and impair signal quality. Therefore, MEG-based data acquisition typically requires a fixed head position, which limits mobile applications. While EEG is less susceptible to motion artifacts, muscle movements and other artifacts can still affect signal quality. Developing robust artifact suppression algorithms and creating portable and motion-tolerant MEG and EEG systems are crucial areas of research for expanding the range of applications for non-invasive BCIs.

Patient compatibility

Non-invasive systems based on decoding tap intention signals may reach their limits in patients with severely atrophied motor cortex, such as that seen in the late stages of amyotrophic lateral sclerosis (ALS). In such cases, motor intention-based decoding may fail because the neural signals associated with tapping movements are too weak or absent. For these patient groups, alternative non-invasive approaches may be needed, such as those based on decoding cognitive language processes or other modalities like eye tracking. Furthermore, it is important to consider individual differences in brain activity and the variability in signal quality between individuals to make non-invasive brain-computer interfaces (BCIs) accessible to a broader patient population.

Complementary roles in neuroprosthetics: coexistence and convergence

Despite existing technical challenges and the superior precision of invasive ECoG systems, the non-invasive approach of Meta AI and other researchers is revolutionizing early interventional care in the field of neuroprosthetics. Non-invasive BCIs offer the advantage of being low-risk and usable even at the onset of a disease, such as ALS. They can provide early communication support to patients with emerging communication difficulties, thereby improving their quality of life and participation in society.

ECoG systems remain indispensable for high-precision applications in completely paralyzed patients, particularly those with locked-in syndrome, where maximum decoding accuracy and real-time communication are crucial. For this patient group, the potential benefits of invasive BCIs justify the higher risks and costs.

The future of brain-computer interfaces may lie in the convergence of both technologies. Hybrid systems that combine the advantages of non-invasive and invasive approaches could usher in a new era of neuroprosthetics. For example, such a hybrid approach could utilize epidural microelectrodes, which are less invasive than ECoG electrodes but still offer higher signal quality than non-invasive sensors. Combined with advanced AI algorithms for signal processing and decoding, such hybrid systems could bridge the gap between invasiveness and accuracy, enabling a wider range of applications. The continued development of both non-invasive and invasive brain-text decoding technologies, along with the exploration of hybrid approaches, promises a future where people with communication impairments have access to effective, safe, and accessible communication solutions.

Related to this:

Your global marketing and business development partner

☑️ Our business language is English or German

☑️ NEW: Correspondence in your native language!