WO2024211637A1

WO2024211637A1 - Systems and methods associated with determination of user intensions involving aspects of brain computer interface (bci), artificial reality, activity and/or state of a user's mind, brain or other interactions with an environment and/or other features

Info

Publication number: WO2024211637A1
Application number: PCT/US2024/023165
Authority: WO
Inventors: Ekram ALAM; Jack BABER; Francesca BIANCO; Suyi Zhang; Lucia LOPEZ
Original assignee: Mindportal Inc
Current assignee: Mindportal Inc
Priority date: 2023-04-04
Filing date: 2024-04-04
Publication date: 2024-10-10
Anticipated expiration: 2025-10-04

Abstract

Various systems and methods of the disclosed technology relate to improved computer- based systems, wearable devices, Al models, methods, platforms, user interfaces and/or combinations thereof, such as those associated with wearable brain computer interfaces and/or user selections/interactions regarding objects and elements on different types of user interfaces (UIs) and/or in artificial environments involving a brain-interface. Certain example implementations involve decoding imagined speech from a user to determine an intended action. In some implementations, systems and methods herein may improve accuracy to overcome problems and drawbacks with false positives. Other illustrative implementations may leverage visual feedback from the computer system and use the visual feedback to evoke responses from the user which can be detectable via the brain-interface and thereby produce outputs such as signals regarding whether or not a user intends to select UI elements. Other innovations and aspects of signal and computer processing are also disclosed.

Description

SYSTEMS AND METHODS ASSOCIATED WITH DETERMINATION OF USER INTENSIONS INVOLVING ASPECTS OF BRAIN COMPUTER INTERFACE (BCI), ARTIFICIAL REALITY, ACTIVITY AND/OR STATE OF A USER’S MIND, BRAIN OR OTHER INTERACTIONS WITH AN ENVIRONMENT AND/OR OTHER FEATURES

Cross-Reference to and Incorporation of Related Applications Information

[1] This PCT International Application claims benefit of/priority to U.S. provisional patent applications No. 63/457,134, filed April 4, 2023, and No. 63/457,577, filed April 6, 2023, both of which are incorporated herein by reference in entirety.

Description of Related Information

[2] Some challenges in the general fields of brain-computer interfaces and determining user intension relate to absences of and/or problems with options that are one or both of non-invasive and/or effective with regard to the interfaces utilized, the interactions involved, the models, interpretations and/or processing used to analyze brain and/or user activity, and/or the outputs desired. In regard to some applications, for example, existing brain-interfaces don’t provide solutions for efficiently and fluidly interacting with user-interfaces, operating on a real-time basis, requiring users to do counter-intuitive things, processing data, providing results and/or outputs of sufficient quality, and processing user input incorrectly, among a host of other drawbacks.

[3] Among the drawbacks of some known solutions, for example, existing brain interfaces are rudimentary in nature and do not enable users to directly interact with machines in a natural and high-bandwidth way with low risk of error. Humans naturally use speech to communicate with others as well as computers and humans naturally imagine commands, potential actions, and desires in our own minds using internal/imagined speech. Brain-computer interface that can directly decode our imagined, intended speech would have massive impact on the field and across various industries. Instead of using keyboards, individuals simply think thoughts and directly have brain data translated to text. Instead of using controllers in virtual or artificial reality, users think of language commands to interface with the spatial software. In patients with disabilities and who have lost the ability to communicate, such technology may give them unrestricted capacity to communicate. Further, such innovations can be used across industries.

[4] At present, Electrocorticography (ECOG), an invasive surgically implanted array of electrodes, has been demonstrated to be capable of reconstructing imagined speech with word-to- word reconstruction, albeit with a limited vocabulary of words. However, it is highly desirable to enable non-invasive decoding of continuous imagined speech, so that the capability is available to everyone and is much more widely accessible. In addition, the added benefit of being able to collect vast quantities of data with non-invasive systems offers the opportunity to leverage modern artificial intelligence algorithms more effectively and broaden the vocabulary, capabilities and robustness of the system.

[5] Significant challenges in the general field of brain-computer interfaces lies in the absence of options that are one or more of; non-invasive, effective with regard to the interfaces that are in communication with both neurons (e.g., a brain) and computing systems, and/or involve models and/or interpretation(s) that ineffectively process brain activity. Among other drawbacks, for example, using purely E-wave and eye-tracking alone, there can be false positives, e.g., the user can end up selecting things when they look at the elements but don’t actually intend to select elements that prior techniques might flag.

Overview of Some Aspects of the Disclosed Technology

[6] One or more aspects of the present disclosure generally relate to improved computer- based systems, wearable devices, methods, platforms and/or user interfaces, and/or combinations thereof, and particularly to, improved computer-based systems, wearable devices, hardware architectures, methods, signal and other computer processing, platforms and/or user interfaces associated with mind/brain-computer interfaces that are driven by various technical features and functionality, including laser/optical-based brain signal acquisition, decoding modalities, encoding modalities, brain-computer interfacing, virtual reality (VR)/extended reality (XR)/ augmented reality (AR)/mixed reality (i.e., “artificial reality” or “altered reality”) environments and/or content interaction, signal processing, data processing, among other features and functionality set forth herein. Aspects of the disclosed technology and platforms here may comprise and/or involve processes of collecting and processing brain activity data, such as those associated with the use of a brain-computer interface that enables, for example, decoding and/or encoding a user’s brain/neural activities/activity patterns associated with thoughts such as those involving all types of senses (e.g., vision, language, movement, touch, smell functionality, sound, etc.), and the like. Systems and methods herein may include and/or involve the leveraging of innovative brain-computer interface aspects and/or associated user environments, non- invasive wearable or portable devices and/or systems to facilitate and enhance user interactions and which provide technical outputs, solutions and results, such as those required for or associated with next generation wearable devices, controllers, and/or other computing components based on human thought/brain/mind signal detection and processing and/or computer processing and interaction.

[7] As set forth in the various illustrative embodiments described below, the present disclosure provides exemplary technically improved computer-based processes, systems and computer readable media. In some implementations, such innovations may be associated with and/or involve a brain-computer interface based platform that decodes and/or encodes neural activities associated with thoughts (e.g., human thoughts, etc.), user motions, and/or brain activity based on signals gathered from location(s) where brain-detection optodes are placed, which may operate in modalities such as vision, speech, sound, gesture, movement, actuation, touch, smell, and the like.

[8] According to some embodiments, the disclosed technology may include or involve process for detecting, collecting, recording, and/or analyzing brain signal activity, and/or generating output(s) and/or instructions regarding various data and/or data patterns associated with various neural activities/activity patterns, all via a non-invasive brain-computer interface platform. Empowered by the improvements set forth in the wearable and associated hardware, optodes, etc. herein, as well as its various improved aspects of data acquisition/processing set forth below, aspects of the disclosed brain-computer interface technology achieve high resolution, portability, and/or enhanced volume in terms of data collecting ability, among other benefits and advantages. These systems and methods leverage numerous technological solutions and their combinations to create a novel brain-computer interface platform, wearable devices, and/or other innovations that facilitate mind/brain computer interactions to provide specialized, computer-implemented functionality, such as optical modules (e.g., with optodes, etc.), thought detection and processing, limb movement decoding, whole body continuous movement decoding, AR and/or VR content interaction, direct movement goal decoding, direct imagined speech decoding, touch sensation decoding, and the like.

[9] One or more aspects of the present disclosure generally relate to improved computer- based systems, wearable devices, Al models, methods, platforms, user interfaces and/or combinations thereof, and particularly to those associated with wearable brain computer interfaces for revolving around users being able to select things and elements on different types of UI using a brain-interface rather than a controller or a mouse click, e.g., involving VR headset with built-in eye tracking, wearable caps/devices that measure brain/mind signals in realtime, and/or Al models which interpret the brain activity allowing for different brain-computer interface (BCI) applications.

[10] In some embodiments, the systems described herein may include a non-invasive braininterface system that enables a selection of imagined command words to be decoded directly from the brain of a user. These command words may be used to interact with user interfaces of various forms, both 2D such as smartphones, laptops, computer screens and TV screens and 3D such as augmented and virtual reality and metaverse content.

[11] The disclosed technology’s use of the imagined speech commands may include interacting with user interfaces (UIs), including (but not limited to) in combination with eyetracking. For example, the eye-tracking system may determine that the user is looking at the UI icon for an application on their smartphone. In illustrative implementations described below, for example, the user may then imagine the command word “open” and the application is opened. Alternatively, they may imagine the command word “delete” and the UI icon is deleted.

[12] The following also include exemplary designs that enable the collection of brain data which directly corresponds to a selection of command words of interest such as “open’, “close’, “send” etc. as well as example hardware electrode/optode placement to enable optimal data recording and quality for imagined speech decoding. Furthermore, the instant disclosure includes a description of attention-based deep networks for imagined speech decoding, with two different example implementations to enable robust classification of multiple command words.

[13] Consistent with various aspects of the disclosed technology, there are a few ways to improve the accuracy and to overcome the false positives problem. One way is to leverage visual feedback from the computer system and use the visual feedback to evoke responses from the user which can be detectable via the brain-interface to thereby produce a confirmatory signal that they either do intend to select the UI element, or that they don’t and were just gazing at it.

[14] Some other aspects of the disclosed technology describe systems and methods for decoding user brain activity (e.g., translating electrode firing into imagined speech) using various algorithms including neural networks (e.g., convolutional neural networks), combinations of multiple neural networks, and/or other types of algorithms. Brief Description of the Drawings

[15] Various embodiments of the present disclosure can be further explained with reference to the attached drawings, wherein like structures are referred to by like numerals throughout the several views. The drawings shown are not necessarily to scale, with emphasis instead generally being placed upon illustrating the principles of the present disclosure. Therefore, specific structural and functional details disclosed herein are not to be interpreted as limiting, but merely as a representative basis for teaching one skilled in the art to variously employ one or more illustrative embodiments.

[16] Figure 1A is a diagram illustrating one exemplary process decoding neural activities associated with human thoughts of manifesting sensory modalities, consistent with exemplary aspects of certain embodiments of the present disclosure.

[17] Figure IB is a diagram illustrating one exemplary BCI configured with exemplary features, consistent with exemplary aspects of certain embodiments of the present disclosure.

[18] Figure 2 depicts various exemplary aspects/principles involving detecting neural activities, consistent with exemplary aspects of certain embodiments of the present disclosure.

[19] Figure 3 depicts other exemplary aspects/processes involving detecting neural activities, consistent with exemplary aspects of certain embodiments of the present disclosure.

[20] Figures 4A-4D are diagram illustrating one exemplary wearable BCI device and system, consistent with exemplary aspects of certain embodiments of the present disclosure.

[21] Figures 5A-5B depict two illustrative implementations including components associated with the combined VR and eye tracking hardware and EEG measuring BCI hardware, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[22] Figures 6A-6B depict two illustrative implementations including exemplary optode and electrode placement aspects associated with certain exemplary BCI hardware, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[23] Figures 7A-7B depict exemplary process flows associated with processing brain data and eye tracking data and converting the data to compressed images in the latent space, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[24] Figure 8 depicts an exemplary flow diagram associated with processing compressed images from the latent space as well as creation of a generator/discriminator network for comparison of brain signal (e g., EEG, etc.) generated versus eye tracking actual visual saliency maps, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[25] Figure 9 depicts an exemplary user interface generated by a VR headset, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[26] Figure 10 depicts an illustrative flow diagram detailing one exemplary process of using a VR (and/or other) headset in combination with a BCI to create and compare a ground truth visual saliency map with a BCI-EEG generated visual saliency map, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[27] Figures 11A and 1 IB depict various illustrative user interfaces and aspects of an exemplary user interfaces, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[28] Figure 12 depicts an exemplary set of command words, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[29] Figure 13 depicts an exemplary optode array arranged against a user’s head, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[30] Figures 14 depicts an example method for predicting words or phrases imaged by a user, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[31] Figure 15 depicts an example model that may predict words or phrases, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[32] Figure 16 depicts an exemplary flow diagram illustrating aspects associated with combining data streams from the integrated brain computer interface and the mixed reality system as well as associated analysis of event-related brain data, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[33] Figure 17 depicts an exemplary flow diagram illustrating aspects associated with combining data streams from the integrated brain computer interface and the mixed reality system to predict words or phrases, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[34] Figure 18 depicts one exemplary workflow of using secondary response (reaction) as a confirmatory signal to determine whether or not the user intends to select an object. [35] Figure 19 depicts one exemplary workflow of using secondary response (imagined speech) as a confirmatory signal to determine whether or not the user intends to select an object.

[36] Figure 20 is an illustration of exemplary E-wave and emotional/error potential real-time decoders.

[37] Figure 21 is an illustration of E-wave and Error-related evoked potential signals. Left graph of Figure 21 is an illustration of signals from a brain signal recording device before time Os, including an original signal and a filtered signal, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Right graph of Figure 21 is an illustration of signals from a brain signal recording device after time Os.

[38] Figure 22 depicts the workflow of determining emotional responses for selection threshold modulation.

[39] Figure 23 depicts one exemplary electrode arrangement having a new electrode montage design for combined E-wave and error potential detection, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[40] Figure 24 depicts each EEG channel ranked according to its importance in the trained model for decoding the two conditions, indicated in y axis (the most important one starting from the bottom of the graph).

[41] Figures 25A-25D depict example training paradigms which allow both E-waves and error potentials to be recorded from user brain data, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[42] Figure 26 depicts a representative process of speech decoding, showing illustrative steps associated with processing signals from an exemplary system, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[43] Figure 27 illustrates electrical activity of the brain, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[44] Figure 28 illustrates an example of two trials from the Imagined Speech paradigm, including both spoken and imagined speech trials, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[45] Figures 29A and 29B illustrate noise removal in spoken and imagined EEG data respectively, e.g. using ICA, consistent with various exemplary aspects of one or more implementations of the disclosed technology. [46] Figures 30A and 30B are examples of Euclidean Alignment impact on spoken and imagined data before and after Euclidean alignment, respectively, where aligned data share a similar space, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[47] Figure 31 illustrates an example image that was converted back to a 2D matrix, denormalized, and was input into a pre-trained vocoder trained with general purpose audio generation, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[48] FIG. 32 illustrates a model architecture for a machine-learning-based decoder, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[49] Figure 33 illustrates an example implementation of a machine-leaming-based model in a trial, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[50] Figure 34 illustrates feature importance and dimensionality reduction for classification, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[51] Figure 35 A illustrates a representative example of ERPs in offline imagined positive/no conditions at channel F3, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[52] Figure 35B illustrates a representative example of ERPs in real-time imagined positive/no conditions at channel F3, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

Detailed Description of Illustrative Implementations

[53] Systems, methods and wearable devices associated with mind/brain-computer interfaces are disclosed. Embodiments herein include features related to one or more of optical-based brain signal acquisition, decoding modalities, encoding modalities, brain-computer interfacing, artificial reality environments and/or content interaction, signal processing, signal to noise ratio enhancement, motion artefact reduction, and/or various aspects of related user intention detection, processing and/or output generation, among other features set forth herein. Certain implementations may include or involve processes of collecting and processing brain activity data, such as those associated with the use of a brain-computer interface that enables, for example, decoding and/or encoding a user’s brain functioning, neural activities, and/or activity patterns associated with thoughts, including sensory -based thoughts. Further, the present systems and methods may be configured to leverage brain-computer interface and/or non-invasive wearable device aspects to provide enhanced user interactions for next-generation wearable devices, controllers, and/or other computing components based on the human thoughts, brain signals, and/or mind activity that are detected and processed. Certain underlying aspects, such as detection of mind/brain activity, signals associated therewith and device therefor, may be implemented as disclosed in co-owned PCT International publication No. WO2022/198142A1, which is incorporated herein as if part of this document.

[54] Figure 1A is a diagram illustrating one exemplary process related to decoding neural activities associated with human thoughts involving or manifesting sensory modalities, consistent with exemplary aspects of certain embodiments of the present disclosure. In this illustrated example, a human user 205 may wear a non-invasive wearable device 210 that implements the brain computer interface (BCI) technology disclosed herein. Here, e.g., in connection with the BCI platform, device 210 may be configured to be positioned around the head/brain of the user so as to mount the scalp of the user. The BCI device 210 may also be configured to be worn in other suitable ways that do not require surgery on the user’s head/brain. With the neural signals detected/collected, the BCI device 210 may directly decode those neural activities of the user’s thoughts associated with all types of sensory abilities. For example, as shown herein, the neural activities may be decoded by the BCI device 210 into vision 215 (e g., mental images, etc.), language 220 (e.g., imagined speech, etc.), movement 225, touch 230, and other sensory modalities. Empowered with the sensory data decoded from the neural activities, a variety of applications 235 may be implemented via the BSI technology disclosed herein. In some embodiments, direct human goals and software symbiosis is enabled using the present brain computer interface technology.

[55] Figure IB is a diagram illustrating one exemplary brain computer interface 240 and various illustrative features, consistent with exemplary aspects of certain embodiments of the present disclosure. Here, such brain-computer interface 240 may be configured with various exemplary features and functionality such as ability to obtain neural activity data of high spatial resolution (e.g., highly precise) 245, obtain data spanning full brain coverage 250, obtain data of high temporal resolution (e.g., works at real-time speed/manner, etc.) 255, obtain data that is robust and accurate, e.g., when the user moves about in everyday situations, etc., at 260, obtain data of high classification accuracy 265, obtain data of high signal to noise ratio 270, and the like. As a result of these features and benefits, the brain-computer interface 240 may be configured to be worn by the user (or otherwise applied onto the user) in a more compelling and/or natural fashion.

[56] Figure 2 depicts exemplary principles involved with detection of neural activities, consistent with exemplary aspects of certain embodiments of the disclosed technology. In this example, a non-invasive technology such as optoelectronic based technique is illustrated to implement a brain-computer interface of, for example, FIGs. 1A-1B. Here, a detector 310 and a source (e.g., an optical source, such as a laser light source, etc.) 305 are applied to the brain of the user 205. As the brain incurs neural activities (e.g., upon thoughts, including by not limited to those of senses such as an image, sound, speech, movement, touch, smell, etc., upon body movement(s), and/or upon brain activity in certain locations, etc.), different regions of neurons in the brain are “activated” as manifested in, e.g., changes in neurons themselves, changes in blood supplies to the neurons, and so on. In this example, using optical system(s) herein, braincomputer interfaces consistent with the disclosed technology may be configured to detect: 1) neuronal changes, as illustrated in the upper half circle, at 315; and/or 2) blood changes at active site of the brain, as illustrated in the lower half circle, at 320.

[57] Figure 3 is an exemplary flow diagram involving detection of neural activities, consistent with exemplary aspects of certain embodiments of the present disclosure. In this illustrated example, two pathways are shown with regard to communicate the user’ s thoughts to an external device. Here, the user (shown with wearable device 210) thinks or carries out an action at step 405. Upon such thoughts and/or actions, the pattern(s) of the brain activities 410 incurred are detected/collected via the brain-computer interface, at 420 and 435. Along the upper pathway, such patterns of neuron activities may be manifested in, as shown at step 415, activated neuron change in size and/or opacity, and/or other characteristics associated with a neuron activation state. As such, at step 420, the patterns of brain activities may be detected (e.g., by use of the optodes configured on device 210, etc.) via detectable changes in light scattering and/or properties caused by the afore-described changes in neurons. In this illustrative implementation, at 425, the detected signals are shown, in turn, being transmitted to an external device for further processing/application. Along the lower pathway, such patterns of neuron activities 410 may be manifested in, as shown at step 430, oxygenated blood supply increase, and/or other blood/blood vessel characteristics associated with a neuron activation state. As such, at step 435, the patterns of brain activities may be detected (e.g., by use of the optodes configured on device 210, etc.) via detectable changes in light absorption. In turn, such detected signals may then be transmitted to the external device for further processing/application, at 440. In some embodiments, the two pathways may also be reversed such that the brain-computer interface is a bi-directional and/or encoding interface that is capable of encoding signals (e.g., data, information of images, texts, sounds, movements, touches, smell, etc.) onto the user’ brain to invoke thoughts/actions based thereof. In some embodiments, the BCI may be configured to achieve signal detection precision level of 5mm cubed, but the precision (spatial resolution) can be altered so as to extract information from volumes of the brain of different volumes/sizes.

[58] Figure 4A is a diagram illustrating one exemplary wearable brain-computer interface device, consistent with exemplary aspects of certain embodiments of the present disclosure. As shown in this example, a wearable/portable brain-computer interface device 902 may be attached/associated and configured with respect to a user’ head in a non-invasive manner. Here, one illustrative BCI device 902 is shown as mounted atop/across the user’s head, with one or more brain-facing detection portions, panels or subcomponents 904 facing towards the user’s brain. Of course, in other embodiments, various other systems, devices and techniques may be utilized to acquire mind/brain activity of a user, as set forth elsewhere herein. According to implementations herein, such brain-facing portions, panels or subcomponents 904 may include one or more optodes, which may each comprise: one or more sources, such as dual-wavelength sources, and/or one or more detectors, such as photodiodes (e.g., in some exemplary embodiments, with integrated TIAs, transimpedance amplifiers, etc.). Here, e.g., examples of such sources and detectors are shown and described in more detail in connection with Figures 4C-4D, below. In this illustrative example of Fig. 4A, the BCI device 902 may be adapted in any wearable shape or manner, including but not limited to the example embodiment shown, here, having a curved and/or head-shaped design. Here, for example, such that the wearable devices and/or subcomponents thereof (e.g., with optodes and/or comparable sources and detectors) can be adapted to adjustably fit and cover the user’s head such that the desired optodes (or equivalent) are positioned over the portion(s) of the user’s brain to capture the signals needed and/or of interest. In this example, the BCI device 902 may also be configured with one or more processing/computing subcomponents 906, which may be positioned or located on the wearable device itself and/or all or one or more portions thereof may be located elsewhere, physically and/or operationally/computationally, such as in a separate subcomponent and/or integrated with other computing/processing components of the disclosed technology e.g., the housing of 906 and everything within 906 may be placed on a wristband, watch, another such wearable, or other device and be connected to the remainder of the headset wirelessly, such as via Bluetooth or WiFi. Further, element 906 may be a housing which in this particular embodiment of a wearable is being used to house the wiring and the electronic circuitry shown in Figure 4B, including components such as, e.g., the optical source drivers 922, analog to digital converters 916, 920, and microcontroller and wifi module 914.

[59] Figure 4B is a block diagram illustrating an exemplary brain-computer interface device, such as wearable device 902 shown in Figure 4A and an associated computing device 912 (e.g., computer, PC, gaming console, etc ), consistent with exemplary aspects of certain embodiments of the present disclosure. As shown herein, an exemplary brain-computer interface device may comprise one or more of: one or more one optical source driver(s) 922, which may, e.g., be utilized to control the intensity, frequency and/or wavelength of the optical signals emitted by the optical sources and may also be configured, in some implementations, to set the electromagnetic energy to be emitted in continuous form or in timed pulses of various length or in other such frequency-type variation(s).; at least one optical source 924 configured to emit the optical signal (e.g., light, laser, electro-magnetic energy, etc.), which, in some embodiments, may be in the near-infrared, infrared or visual range of the spectrum; one or more optional conversion components 916, 920, if/as needed, such as analog to digital and/or digital to analog converters; one or more optical detectors 918 that detect optical signals that exit the brain tissue containing information regarding the properties of the brain of the human user; at least one microcontroller and/or wifi module 914, where such microcontroller may be configured to sends control signals to activate the various components on the electronics layout and may also be configured to control the WiFi module. In some implementations, the microcontroller and/or wifi module 914 may be connected to one or more computing components 912, such as a PC, other computing device(s), gaming consoles, etc.] by one or both of a physical/hard-wire connection (e.g., USB, etc.) and/or via a wireless (e.g., WiFi, etc.) module. Further, the microcontroller and wifi module may be housed together, as shown, they may be split or distributed, and one, both or neither may be integrated with a wearable BCI device, another mobile device of the user (e g., watch, smartphone, etc.) and/or the other computing and/or PC device(s) 912.

[60] According to one or more embodiments, in operation, the driver(s) 922 may be configured to send a control signal to drive/activate the light sources 922 at a set intensity (e.g., energy, fluence, etc.), frequency and/or wavelength, such that optical sources 924 emit the optical signals into the brain of the human subject.

[61] Turning next to operations associated with detection and/or handling of detected signals, in some embodiments, various processing occurs utilizing fast optical signals and/or haemodynamic measurement features, as also set forth elsewhere herein. In some embodiments, for example, one or both of such processing may be utilized, which may be carried out simultaneously (whether being performed simultaneously, time-wise, or in series but simultaneous in the sense that they are both performed during a measurement sequence) or separately from one another:

Fast optical signal (FOS) processing:

[62] According to such fast optical signal implementations, the optical signal entering the brain tissue passes through regions of neural activity, in which changes in neuronal properties alter optical properties of brain tissue, causing the optical signal to scatter differently as a scattered signal. Further, such scattered light then serves as the optical signal that exits the brain tissue as an output signal, which is detected by the one or more detectors to be utilized as the received optical signal that is processed.

Haemodynamic:

[63] According to such haemodynamic implementations, first, optical signals entering brain tissue pass through regions of active blood flow near neural activity sites, at which changes in blood flow alter optical absorption properties of the brain tissue. Further, the optical signal is then absorbed to a greater/lesser extent, and, finally, the non-absorbed optical signal(s) exit brain tissue as an output signal, which is detected by the one or more detectors to be utilized as the received optical signal that is processed.

[64] Turning to next steps or processing, the one or more detectors 918 pick-up optical signals which emerge from the human brain tissue. These optical signals may be converted, such as from analog to digital form or as otherwise needed, by one or more converters, such as one or more analog to digital converters 916, 920. Further, the resulting digital signals can be transferred to a computing component, such as a computer, PC, gaming console, etc. via the microcontroller and communication components (wireless, wired, WiFi module, etc.) for signal processing and classification.

[65] Finally, various specific components of such exemplary wearable brain-computer interface device are also shown in the illustrative embodiment depicted in Fig. 4B, such as microcontroller/wireless component(s) 914 (which may include, e.g., a Pyboard D component), converters 916, 920 (which may include, e.g., ADS1299 ADC components), detectors 918 (which may include, e.g., an OPT101 component having photodiode with integrated TIA, etc ), optical sources (which may, e.g., include laser sources, LEDs, etc.), driver(s) 922 (which may include, e.g., one or more TLC5940 components, etc.), and a power management component, though the innovations herein are not limited to any such illustrative subcomponents.

[66] Figures 4C-4D are diagrams illustrating aspects of the one or more brain-facing detection portions, panels or subcomponents 904, consistent with exemplary aspects of certain embodiments of the present disclosure. As set forth in more detail elsewhere herein, such portions, panels or subcomponents 904 may be comprised of one or more panels, which each comprise one or more optodes 928. As set forth, below, each such optode may comprise: one or more sources, such as dual -wavelength sources, and/or one or more detectors, such as photodiodes (e.g., in some exemplary embodiments, with integrated TIAs, transimpedance amplifiers, etc.), such as shown and described in more detail in connection with Figures 4C-4D. Brain Computer Interface (BCD + Artificial Reality with Eye Tracking, EEG and Other

Features

[67] Figures 5A-5B depict two illustrative implementations including components associated with the combined artificial reality and user action/activity (e.g., eye tracking, etc.) hardware and brain signal (e.g., EEG measuring, etc.) BCI hardware, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Turning to embodiments of the presently-disclosed inventions, here referring to Figures 5A-5B, a user may simultaneously wear an artificial reality headset 1202, such as a VR, XR or similar headset, in combination with the brain computer interface (BCI) 1201. Herein, while the term VR headset 1202 is used in numerous instances for the sake of convenience, it should be understood that this term refers to artificial reality headsets, such as VR headsets, XR headsets, and the like. In one illustrative embodiment, shown in Figure A, the BCI and VR headset containing eye tracking hardware and software components are implemented as two separate components, such as two separate headsets. In another illustrative embodiment, shown in Figure 5B, the BCI and VR headset containing eye tracking hardware and software components are contained within one combined headset. Other configurations and arrangements of such components may be utilized, e.g., in other embodiments. Consistent with the disclosed technology, the VR headset 1202 may further contain built in eye-tracking hardware and software components. Further, the VR headset 1202 may be arranged and configured to be capable of displaying a visual user interface, such as the exemplary visual user interface 1601 shown in Figure 9. According to embodiments herein, the eye tracking hardware and software components of the VR headset 1202 may be utilized to measure a user’s eye movement in response to the display of such visual user interface 1601. Further, the BCI 1201 may include or involve various optodes, electrodes, and/or other measurement instruments and/or functionality for the collection of EEG and other brain data, as described elsewhere herein.

[68] In the illustrative embodiments shown in Figures 5 A and 5B, such exemplary BCI 1201 and VR headset 1202 systems may also be connected to a computing component 1204, with the computing component 1204 operating and/or implementing software, which may be, e.g., in one embodiment, the Unity software system, but in other embodiments may include and/or involve any extended reality software system and which implements/displays an extended reality experience. In such embodiments, the BCI and VR headset components, 1201 and 1202, connected to the computing component 1204, can be used to operate a virtual, augmented, or mixed reality experimental paradigm 1206. Here, for example, in some embodiments, the BCI 1201 and corresponding measurement instruments used with the BCI acquire EEG measurements while the VR headset 1202 with eye tracking hardware and software components simultaneously acquires eye tracking data. Further, according to the experimental paradigm, such features may be utilized in training and implementing the system. For example, in some embodiments when training the user to use the system, a VR game may be played where objects appear in a random position of the field of view. Here and otherwise, according to aspects of the disclosed technology, associated EEG signal data and eye position data may be detected and synchronously registered onto the connected computing components 1204 of the system.

[69] Figure 6A depicts one illustrative electrode and optode arrangement or montage that may be utilized, e.g., in an exemplary implementation in which XR, VR, etc. eye-tracking information is involved, and such montage may comprise 36 channels, 68 optodes, 15 sources, 21 detectors, and 8 S.D. detectors. Figure 6B depicts another illustrative electrode and optode arrangement or montage that may be utilized, e.g., in an exemplary implementation in which no XR, VR, etc. eye-tracking information is involved, and such montage may comprise 48 channels, 70 optodes, 15 sources, 23 detectors, and 8 S.D. detectors. Other quantities and ranges of such components may also be utilized, in various differing embodiments. Herein, while this one style or representation of the electrodes and optodes is depicted, other arrangements, configurations and/or illustrations of signal detection and acquisition hardware may also be used consistent with the inventions herein, such as those used by various organization affiliated with the study of the brain and/or sleep. According to other embodiments consistent with the disclosed technology, an electrode/optode arrangement may use some or all of the known 10/20 EEG system electrode/optode positioning, such as that of the 10/20 EEG placement described by the European Respiratory Society (ERS). Still other electrode and optode arrangements that provide suitable signals may also be utilized. The 10/20 EEG positioning is noted, as this is a standard arrangement used when recording brain data, particularly when recording using EEG devices. It is further noted that such 10/20 positioning, and other such electrode/optode placements, may be utilized in certain embodiments of the disclosed technology and inventions herein.

[70] Referring to Figures 6A-6B, the illustrated montages of electrode and optode positions were specifically engineered to provide superior coverage/results regarding the brain activity most pertinent to aspects of the innovations herein, such as yielding accurate visual attention and saliency map determinations. According to various embodiments herein, for example, there is a denser clustering of sensors over some of the visual cortical areas of the brain including the primary visual (striate) cortex, the prestriate cortex and posterior parietal regions of the brain. Such montage positions may comprise optodes arranged to capture the optical data, as described above, and electrodes to capture specified EEG data, e.g., such as an array (ntrials x nchannels x nsamples) as explained further below. The exemplary sensor and detector locations of Figures 6A-6B are configured for utilization of EEG and Optical equipment for NIRS and fast optical signal processing. The exemplary sensor and detector montages of Figures 6A-6B are specific arrangements developed explicitly for this visual attention paradigm, with an emphasis on visual cortical areas. Additionally, these embodiments describe multimodal relationships, i.e., so they illustrate both EEG (electrodes) and optical detector locations for simultaneous measurements. Further, it is noted here that, while certain exemplary configurations of electrode and optode arrangements are shown in Figures 6A-6B, various other possible electrode and optode arrangements may be implemented to function in a same way to yield similar results. Among a variety of such alternative arrangements, for example, various electrodes (both dry and wet electrodes) and near-infrared optodes may be utilized in a regular arrangement purely over the visual cortex of the participant, removing any data acquired from other brain regions. As one further example of such alternative arrangements, systems and methods involving active channel selection may be utilized, whereby brain data is recorded in a standard arrangement such as 10/20 system EEG arrangement, and then the channels which best contribute to an accurate saliency map in training can be selected automatically via an algorithm, e.g., based on each channel’s weighted contribution to the accurate parts of the saliency map.

[71] According to certain embodiments, the VR headset is capable of generating a stimulus presentation 1205 to create trials for data collection utilizing both the BCI and VR headsets. Here, for example, such stimulus presentation 1205 may include, but is not limited to, the presentation of visual stimulus in the form of flashes of light and alternating light and colors across the VR headset 1206. The EEG signal data captured by the BCI 1201 and the visual data captured by the VR headset 1202 are both captured and synchronously registered in specified windows of time surrounding each visual stimulus event produced by the VR headset 1202. In one embodiment, the window of data collection and registration occurs beginning one second before the visual stimulus event and extending three seconds after the disappearance of the visual stimulus. In other embodiments, the window for data capturing may be at other intervals to acquire more or less data surrounding a visual event.

[72] According to some embodiments, the raw EEG data may be captured and configured as an array formatted as comprising the number of trials (Ni) by the number of channels (N2) by the number of samples (N3). Further, in one exemplary implementation, the images of data may then be encoded into data streams from the BCI 1201 and VR headset 1202 eye tracking hardware and software components to the computing component 1204 using a variational autoencoder (VAE) 1203. In embodiments here, a variational autoencoder 1203 may be utilized because there is not always a one-to-one relationship between brain activity and the user’s attention in a visual saliency map, in other words there may occur a different brain activation time for each event but can correspond to the same task. Using a variational autoencoder 1203 allows for estimating the distribution (characterized by the mean and the standard deviation) of the latent space 1407, meaning the apparatus can be used to study the relationship between the distribution of brain activations rather than just a one-to-one relationship between the latent vectors. Each sample of raw brain data is converted to images in the format as [n trials x n down x h x w],

[73] Figures 7A-7B depict exemplary process flows associated with processing brain data and eye tracking data and converting the data to compressed images in the latent space, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Referring first to the process map in Figure 7A, the variational autoencoder 1203 encodes both the BCI data 1401 and the VR eye tracking data 1404 from the BCI 1201 user interface and the VR headset 1202 user interface to the computing component 1204 in latent space 1407. Latent space 1407 is a theoretical representation of the process of transforming and compressing the raw data (from the BCI 1401, and from the VR headset 1404) output from the BCI 1201 and VR headset 1202, to the images in a specified format, or the Representations in the Latent Space 1 and 2 (1403 and 1406 respectively). In other embodiments, the images generated may be in a format other than the format specified above, where such other data image formats are equivalent, e.g., function in a same way and/or achieve similar result as the formats specified above. Here, for example, with recent demands of deep learning applications, synthetic data have the potential to become a vital component in the training pipeline. As a result, a multitude of image synthesis methods now exist which can be implemented with or involved in particular brain-computer interfacing contexts of the present inventions. Further, various of these image synthesis methods have not been applied to brain data previously, though due, e.g., to the capability to transform the data into conventional computational forms such as matrices and vectors in some cases, such image synthesis methods are applicable in the instant brain-computer interfacing contexts and would represent novel usage.

[74] Similarly, while the present embodiment specifies using a variational autoencoder 1203 to encode the data streams, 1401 and 1404, other encoders or comparable hardware and/or software can be used to encode the data received from the BCI 1201 and VR headset 1202 eye tracking hardware and software components. Among other options, for example, in the place of a variational autoencoder, a generative adversarial network (GAN) may be utilized to synthesize image data directly from the eye-tracking data and then use another GAN to synthesize the brain data derived saliency map consistent with the inventions described herein. In still other embodiments, a diffusion method and/or a transformer may also be utilized.

[75] In addition, there are a number of data augmentation techniques which can subsequently be applied to the synthetic image data to enlarge the dataset and potentially improve the accuracy of the discriminator, including but not limited to flips, translations, scale increases or decreases, rotations, crops, addition of Gaussian noise, and use of conditional GANs to alter the style of the generated image.

[76] In still other implementations, brain data other than EEG may be utilized consistent with systems and methods of the disclosed technology, e.g., to provide similar or comparable functionality and/or similar results. Examples of other brain data that may be utilized, here, include NIRS, FOS, and combined EEG.

[77] Figure 8 depicts an exemplary process flow associated with processing compressed images from the latent space as well as creation of a generator network and/or discriminator network for comparison of EEG generated versus eye tracking actual visual saliency maps, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Referring to the exemplary process flow or map shown in Figure 8, EEG brain data images 1501 are generated from the raw brain data 1401 features obtained (as encoded and compressed by the variational autoencoder 1203, in some embodiments), based on the spatial location of the features, optodes, or electrodes on the user’s head. Consistent with the innovations herein, autoencoding of the saliency map from the brain data may be achieved, inter alia, via a four-part process:

1. Constructing Images from raw brain data via spatial location of features on the user’s head

[78] According to aspects of the disclosed technology, generation of such brain data images 1501 may be accomplished by creating an array of data from multiple trials (e.g., via creation of a visual or other stimulus 1206 by the VR headset 1202, shown in Figures 5A-5B), then organizing each trial data set by electrode number, and by time. The trial data sets may be organized in other ways, as well, consistent with the innovations herein. According to certain systems and methods herein, the data may be additionally processed to obtain the desired/pertinent signals. In some embodiments, for example, a low-pass fdter and/or down sampling can be applied to the data, such as through the variational autoencoder and/or other computing components, to extract just the pertinent signals, while excluding the artifacts or unwanted data. Further, the data may preprocessed using the computing components to remove the remaining noise and artifacts, or unwanted data. The data may then be represented on a two- dimensional map, such as via utilization of an azimuthal projection calculation. Next, in some implementations, the data may be represented as a continuous stream of data through bicubic interpolation, and this process may be repeated for all data samples collected by trial and separated by electrode location. In accordance with such aspects, the data stream created through bicubic interpolation and subsequent projection onto a two-dimensional map through azimuthal projection is then concatenated from each sample to produce an image with the number of channels corresponding to the number of temporal samples after down-sampling the signals. Finally, a process or step of normalizing the data may then be performed, e.g., after such image generation and down-sampling is performed.

[79] Note that the above-described features represent only one exemplary methodology regarding taking and processing the raw data from the BCI 1401 and representing such data in a two-dimensional map. Other methods may be used to represent the raw data in a two- dimensional map and may be utilized to achieve the desired result(s), here, consistent with the disclosed technology. Among other things, e.g., calculations other than a bicubic interpolation may be used to interpolate the data, projections other than an azimuth projection may be used to map the signal data, and/or fdtering and down sampling are ways to exclude unwanted data though are not necessary to achieve the result(s) or they may be achieved in a consonant way.

[80] In certain embodiments herein, a random signal following a Gaussian distribution of zero mean or average and a standard deviation of 0.25 can be added to the filtering and image creation model to increase the model’s stability and to better filter between noise and EEG signals. Further, according to some implementations, the use of an image format leads to better results when using convolutional networks than when using a simple array representation of the brain data, but the use of simple array representations of brain data may still be utilized, in various instances, to achieve desired results consistent with the innovations herein.

2, Processing EEG Images (e g., through a VAE) to Represent in Sub Space

[81] As can be seen, in part, in Figure 7B, a process of utilizing or involving a variational autoencoder 1203 to encode and filter eye tracking data 1404 may be performed to create a saliency map 1405 represented in the latent space 1407 derived from the eye-tracking data 1404. According to implementations, here, the raw eye-tracking data 1404 may first be converted to a saliency map 1405 of the user's attention using existing methods (e.g., while watching something, one can extract saliency features representing the degree of attention and average position of the center of interest in the video). Then, a variational autoencoder 1203 may be applied to recreate the saliency images or map representations of the images 1406 in latent space 1407. For example, using e.g. a raw eye tracker in some instances, one can create a visual saliency map representing the area of attention in an image of one channel with values between 0 and 1 representing the degree of visual attention on specific pixels and their neighbors. As such, the data and corresponding value between 0 and 1 can also be considered as a probability for a given pixel to be watched or not. Accordingly, visual saliency images 1405 as representations 1406 in the latent space 1407 are thus generated.

3, Generation of Visual Saliency Images

[82] Next, various exemplary aspects of illustrative embodiments are described. Here, for example, in some implementations, the VR eye tracking and EEG data and resulting two- dimensional maps may be generated and/or recorded simultaneously. Further, discrete VR eye tracking measurements may be projected on two-dimensional images (e.g., one per trial). According to certain aspects, accuracy may be taken into account using circles of radius proportional to error rate. Further, in some instances, Gaussian fdtering may be applied with kernel size corresponding to the eye-tracker field of view, to improve output/results.

4, Representing the Images in Lower Sub Space

[83] Once the saliency images have been generated, the images may be represented in a lower sub space. Here, for example, in some embodiments, a variational autoencoder may be trained to represent the images in a lower sub space. In one illustrative embodiment, for example, a ResNet architecture may be utilized, though in other embodiments similar software and/or other programming may be used. Here, e.g., in this illustrative implementation, for the encoding section, four (4) stacks of ResNet may be used, each with 3 conversion layers and batch norm separated by a max pooling operation, though other quantities of such elements may be utilized in other embodiments. Further, in such illustrative implementations, in a decoding section, the same architecture may be utilized though with an up-sampling layer instead of a max-pooling operation. Moreover, regardless of the exact architecture, an objective of such autoencoding saliency map network is to recreate an image as close as possible to the original saliency map with a representation in shorter latent space. Additionally, in some embodiments, the latent space should be continuous, without favoring one dimension over another. Finally, in one or more further/optional embodiments, data augmentation techniques may be implemented or applied to avoid overfitting of the variational autoencoder.

[84] Figure 8 depicts an exemplary flow diagram associated with processing compressed images from the latent space as well as creation of a generator/discriminator network for comparison of EEG generated versus eye tracking actual visual saliency maps, consistent with various exemplary aspects of one or more implementations of the disclosed technology. As shown in part via the process map of Figure 8, in some embodiments, e.g., after the generation of both the EEG two-dimensional map 1403 and the eye tracking data visual saliency map 1406, a generator/discriminator adversarial network (GAN) 1502 for producing the brain data derived saliency map 1503 may be implemented.

[85] Figure 8 illustrates an exemplary GAN 1500, which combines the EEG/brain data latent space with the saliency latent space derived from two VAEs, e.g., such as described above. However, here it is also noted that there are various other possible approaches to map the latent spaces consistent with the disclosed technology. Here, according to the disclosed technology, a goal is to map the 2 distributions (e.g., the map from the eye-tracking and the map from the EEG signals in the illustrated example). Consistent with certain embodiments, an aim is to create a model permitting the estimation of a saliency map from EEG without considering a 1 : 1 correspondence between modalities.

[86] Referring to the example embodiment of Figure 8, such GAN implementation may achieve the desired objective(s) via use of various aspects and features. For example, implementations herein may utilize a generator or generator network 1502 to recreate the image latent representation from EEG latent representation. In the example generator model shown and described, here, the generator may be created/implemented by concatenating the two parts of the VAE and linking them with fully connected layers, e.g., CNN (convolutional neural network) layers, etc. Further, a discriminator or discriminator network 1504 may be utilized to distinguish the images derived from the generator (e.g., in this illustration, as generated from the EEG representation which was in turn generated by the encoding and decoding part of the VAE) and those that are just derived from the eyetracking VAE. According to some embodiments, noise following a normal-centered distribution may be concatenated to the latent vector at the center of the generator. Overall, in the example shown in Figure 8, the generator 1502 may perform the concatenation of the encoding part of the EEG VAE and decoding part of saliency VAE through a generator composed of fully connected layers.

[87] Additionally, as shown in Figure 8, a discriminator 1504 is then placed at the end of the model. Here, for example, such discriminator 1504 may then process the output(s) of the generator 1502 and discern whether the saliency maps are derived from the model (synthetic) or from the real-time eye-tracking recordation.

[88] Importantly, it is also noted that other methods besides adversarial methods or processing (i.e., other than GAN, etc.) can be utilized in order to produce a saliency map just from the brain data. Examples of such other methods include, though are not limited to, transformer architectures and/or diffusion models (e.g., denoising diffusion model s/score-based generative models, etc.) and other implementations that work in this context by adding Gaussian noise to the eye-tracking derived saliency map (the input data), repeatedly, and then performing learning to get the original data back by reversing the process of adding the Gaussian noise.

[89] Figure 9 depicts an exemplary user interface generated by a VR headset, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Referring to Figure 9, an exemplary user interface is depicted, e g., as generated by the system/component(s) herein such as a VR or other headset, consistent with various exemplary aspects of one or more implementations of the disclosed technology. As shown in Figure 9, an example user interface 1601, which may be two-dimensional (2D) or three-dimensional (3D) may be generated and employed to represent a ‘heat spot’ image indicating the locus of attention of the user and therefore which element the user would like to select in the environment.

[90] Figure 10 depicts an illustrative flow diagram detailing one exemplary process of using an artificial reality (e.g., VR or other) headset in combination with a BCI to create and compare a ground truth visual saliency map (e.g., developed from eye tracking) with a BCI-EEG generated visual saliency map, i.e., to update the performance of the generator adversarial network, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Referring to Figure 10, such overall, exemplary approach may be utilized to estimate visual attention of a user directly from EEG data taken from a BCI, while simultaneously recording data from headset eye tracking hardware and software components to create a visual saliency map to update and refine the EEG measurement data. Here, then, one illustrative process utilized to estimate visual attention directly from brain data, may include steps such as: creating images representing the features from brain data according to their spatial location on the participant scalp, at 1702; encoding of the brain data derived images using, at 1704, which may be performed, e.g., via a variational autoencoder (example above) or similar equipment or components; performing distribution mapping of the brain data latent space to the original/ground truth/eye tracking derived saliency map latent space, at 1706; decoding the distribution map to estimate the saliency map derived from the brain data signals alone, at 1708; and performing discrimination processing between the saliency map generated from brain data signals and the ground truth saliency map (e.g., developed from eye-tracking), at 1710. Consistent with such process, then, a loss function may be derived, which is then used to update the performance of the generator network. Accordingly, such processing provides for more accurately estimating visual attention directly from the brain data, such as by more accurately mapping the user’s locus of attention using EEG signals detected through the BCI alone. Further, in other embodiments, such processing and modeling may also be utilized for other estimations from brain data. Here, by way of one example, a quantified map of a user’s body movement may be derived from brain data, such as by first using a 3D body movement tracker or sensor, taking the place of the eye-tracker used in this visual attention context, and then generating an appropriate saliency map from the brain data associated with the brain activity derived from, for example, the user’s premotor and motor cortices. Further, such ‘motor saliency map’ may then be utilized, to, e.g., predict the imagined and intended movement of a user purely from the brain data recorded from the premotor cortex, in the absence of tracking the user’s body movements directly with an external body tracker. In another such example, a quantified emotional state of a user may be derived, such as by first using a combination of signals derived from external signals such as eye movements, heart rate, body movement, sweat sensors, glucose sensors, etc., to derive a quantified map of a user’s emotional state by weighting the various external signals according to a graph or scale of human emotions and the corresponding bodily signals. Based on this, an appropriate saliency map from brain data from a across the user’s neocortex may then be generated via a similar method to that described in the visual attention context, by having the machine learning network learn to generate the equivalent saliency map from the brain data alone. This ‘emotional saliency map’ may then be utilized to predict a user’s emotional state automatically using, e.g., the presently described brain-computer interface headwear or equivalent. Visual Attention Tracking with Eye Tracking and Utilization of BCI + Artificial Reality Saliency Mapping to Generate a Brain-Derived Selection (e.g.. click, etc.)

1. Participant Wears a BCI Headgear with Eye Tracking and/or Artificial Reality Headset

[91] With regard to initial aspects of capturing and/or gathering various BCI and/or eyetracking information from a user, here, the technology illustrated and described above in connection with Figure 5A through Figure 8 may be utilized in some embodiments, though various inventions herein are not limited thereto. Namely, it is noted that, in the context of the inventions disclosed here, various other types of technologies, such as differing BCI and/or eyetracking devices and their associated signals, may also be utilized in implementations consistent with the presently-disclosed technology, such as the alternative examples of devices (e.g., BCI and/or eye-tracking devices, etc.) set forth herein. a. Electrode/Optode Arrangement with the Artificial Reality Headset

[92] With regard to the electrode/optode arrangements, embodiments herein that involve capturing and/or gathering various BCI information from a user along with capture of eyetracking information via use of an XR, VR, etc. headset may utilize the electrode/optode arrangement and related aspects set forth and described above in connection with Figure 6A. Similarly, though, as above, it is noted that, in the context of these innovations, various other types of devices and technologies, such as differing BCI and/or eye-tracking devices and their associated signals, may also be utilized in such implementations, including the alternative examples of devices (e g., BCI and/or eye-tracking devices, etc.) set forth herein. a. Electrode/Optode Arrangement with No Artificial Reality Headset

[93] With regard to the electrode/optode arrangements initial aspects of capturing and/or gathering various BCI information from a user without any corresponding use of such XR, VR, etc. headset, the electrode/optode arrangement and related aspects set forth and described above may be utilized. In some embodiments, for example, such BCI electrode/optode configurations may be utilized for interactions with conventional 2D computer screens, phones, tablets, other screens, and the like. Further, as above, it is noted that, in the context of these innovations, various other types of devices and technologies, such as differing BCI devices and their associated signals, may also be utilized in such implementations, the alternative examples of devices (e.g., BCI, etc. devices) set forth herein. Decoding/Processing Imagined Words

[94] Systems, methods and wearable devices associated with mind/brain-computer interfaces are disclosed. Embodiments herein include features related to one or more of optical-based brain signal acquisition, decoding modalities, encoding modalities, brain-computer interfacing, and augmented reality (AR)/virtual reality (VR) content interaction, among other features set forth herein. Certain implementations may include or involve processes of collecting and processing brain activity data, such as those associated with the use of a brain-computer interface that enables, for example, decoding and/or encoding a user’s brain functioning, neural activities, and/or activity patterns associated with thoughts, including sensory-based thoughts. Further, the present systems and methods may be configured to leverage brain-computer interface and/or non-invasive wearable device aspects to provide enhanced user interactions for next-generation wearable devices, controllers, and/or other computing components based on the human thoughts, brain signals, and/or mind activity that are detected and processed.

Example Headband With Optional Eye-Tracking

[95] As illustrated in the example system 100 in FIG. 11A, a user 105 may be wearing both a headband brain-interface 110 (e.g., a non-invasive EEG or NIRS or MEG or fMRI system, etc.) as well as an eye-tracking system 125 in the form of, e.g., glasses, eye-tracking embedded within an AR or VR headset, an eye-tracking additional bar, built in front-facing cameras on a smartphone or laptop, etc. The eye-tracking system may register what the user is looking at, for example the UI icon 122 in a UI 115. UI icon 122, when looked at, may have some form of UI visual feedback such as a subtle color change or dotted circle. In one example, user 105 may imagine a command word 130 from a selection of trained command words. For example, user 105 may imagine the word “open” and the brain-interface decodes the pattern of brain activity associated with the imagined speech of the word “open” and the machine learning algorithm classifies what the user is thinking as “open.” When the brain-interface registers the user thinking open, this command may be directly sent through to the UI wirelessly or via wired link and the command may be immediately actioned.

[96] In some embodiments, as shown in FIG. 1 IB in system 150, there may be a second check in UI 115 to confirm that the user does indeed wish to open the application as opposed to another command such as close or delete. One form of this second check may be visual feedback 152. In an example implementation, here, when the imagined speech word “open” is decoded and classified from the brain-interface, it may be sent to the UI controller and, in some embodiments, an aesthetically pleasing visual popup may appear, e.g., with the word “open” and a question mark, or similar function. The user may then confirm that is their choice by imagining a second confirmatory command word 160 such as the word “yes,” or alternatively by looking at the popup with the eye-tracking, or alternatively, using a solution such as an e-wave detected by headband 110.

[97] Note that there are a multitude of implementations for this combination of imagined speech and visual feedback. In another example, a user may think of the word “new” and multiple options may appear on the screen - Document? Tab? Sentence? - and the user may then select the one of interest rapidly. The imagined speech commands in such embodiments may act as effective keyboard shortcuts.

[98] Other implementations in the absence of eye-tracking may involve interaction with present and future artificial intelligence (Al) assistance that include natural language processing. Here, for example, using a selection of word commands (or continuous phrases), the user may interact with an Al assistant seamlessly without needing to type or touch-type with their fingers and thumbs on a smartphone. Using just binary commands such as “yes/no’, a user may play 20 questions with the assistant or use the simple word commands in addition to typing longer-form responses to more swiftly end, close, start the chat, and so forth.

Example Experiment Design

[99] In one example experimental design involving training/trials, during each trial, the user’s brain activity is recorded throughout the trial and sessions. For example, as illustrated in design 200 in FIG. 12, a word cue 210 may be displayed on screen for two seconds e.g. “open,” a user may be tasked to visually focus on a cross 212 for one second, then use overt speech 214 for two seconds (e.g., user says the word out loud), then resume visual focus on a cross 216 for one second, then use imagined speech 218 for two seconds (e.g., user imagines saying the word). In some examples, there may be an optional participant subjective assessment on their performance, such as asking them to rate of 1-5 on how well they imagined the word. In some experimental designs, there may be a separation between trials of a few seconds and/or multiple trials per word (e g., 10-minute sessions for each word). Some word examples may include, without limitation, Open, Close, Zoom, New, Send, and/or Message.

[100] There are alternate variations of this data collection procedure that may be completed, for example with a three second fixation after the imagined speech to “clear the afterimage” of the first word before moving on. Any number of trials per word may be recorded, for example 25 trials per word for a total of 250 trials when training a 10-word batch.

[101] Importantly, different words may be chosen and trained for different applications. For example, the brain-interface may be used in a defense application and different words could be used for controlling a drone, including “fire,” “navigate,” “descend,” etc.

Example Sensor Configuration

[102] The systems described herein may use various different headset setups in different implementations. In some embodiments, the systems described herein may apply an attentional- deep network to the imagined speech data, meaning that it will automatically look for the most salient information in the brain data, even without optimal placement of sensor channels.

[103] FIG. 13 illustrates one example recording setup using 28 channels across occipital and temporal-parietal regions to cover classical language network and others which are particularly key regions for language/semantic decoding. According to one illustrative implementation, the 28 channels utilized may comprise the following, as also shown by way of example in FIG. 13: (i) FC6, FT8; (ii) C6, T8; (iii) CP6, TP8; (iv) P6, P8, P10; (v) PO4, PO8; (vi) 02; (vii) Oz; (viii) CMS, POz, DRL; (ix) 01; (x) PO7, PO3; (xi) P5, P7, P9; (xii) CP5, TP7; (xiii) C5, T7; (xiv) FC5, FT7

[104] Another example setup is to simply use the classical 10-20 arrangement as shown in FIG. 13 without optimization of channels on the hardware front, thereby relying on automated channel selection methodologies or the attentional mechanisms described below. For example, a large number of EEG channels may be used, such as 58 placed them on the subject’s scalp according to the international 10/20 system. EOG signals may also be recorded by placing electrodes around the eyes, to understand the impact of user eye movements on the data quality. Reference electrodes may be chosen, such as FCz and FPz channels for reference and ground respectively.

[105] Different signal processing pipelines may be used, dependent on whether EEG or NIRS or another modality is used. In some implementations, a pipeline may be used for an EEG setup with an example sampling rate 2048Hz with a 24-bit ADC and/or less than 1000 ohms impedance for each electrode. For example, a pre-processing step may include (i) bandpass fdter to remove low frequency below 0.01Hz and high frequency noise above 250Hz, (ii) FIR notch filter in the range of 48-52Hz to remove the 50Hz line noise harmonic, (iii) eye blink artefacts removed using independent component analysis, and/or (iv) recorded data epoched into trials.

[106] In some examples, the most likely activated brain regions may include the left/right superior temporal gyrus, Wernicke’s area (left posterior superior temporal gyrus), and/or the posterior frontal lobe. Example visually discernible activation on time-frequency analysis may include early positive deflection in the left temporal region, which represents the speech comprehension and/or higher positive deflection at the right temporal region during imagined speech production.

[107] In one implementation, the systems described herein may perform feature extraction by having recorded signals downsampled to 512Hz and decomposed using db4 mother wavelet to eight levels corresponding to different frequency bands and/or PCA applied to reduce dimensionality and identify the components with maximum variance.

Classification of the appropriate word using machine learning

[108] Classification may be carried out with multiple algorithms, including but not limited to SVM, Multi-layered CNN, and/or attention based deep networks. Attention based deep networks may improve results for imagined speech classification by focusing on important features and suppressing unimportant features. Different versions of attention based deep networks may be instantiated to classify the imagined speech commands from brain activity.

[109] For example, as illustrated in FIG. 14, a system 400 may enable user 105 to imagine a command 405 that is interpreted by headband 110. At a data acquisition step 410, the systems described herein may measure brain activity related to command 405. Next, at a signal processing step 415, the systems described herein may process this brain activity as described in further detail below. At a classification step 420, the systems described herein may determine which option of a preset set of options 430 matches command 405.

[110] In some implementations, it is possible to decode imagined speech using EEGNet with an additional attention module. EEGNet is a deep network used for decoding EEG data, which is able to use a smaller number of parameters by leveraging depthwise separable convolutional blocks. Depthwise Separable Convolution is a technique used in convolutional neural networks to reduce the number of parameters and computational cost of a convolutional layer while maintaining its effectiveness in feature extraction. A Depthwise Separable Convolution block consists of two separate operations: a depthwise convolution and a pointwise convolution. The depthwise convolution applies a single filter to each input channel separately. This operation reduces the spatial dimensions of the feature maps while preserving the number of input channels. The pointwise convolution, on the other hand, applies a 1x1 convolution to the output of the depthwise convolution. This operation combines the feature maps from the depthwise convolution and reduces the number of input channels, allowing for a more efficient use of the following convolutional layer.

[111] The EEGNet network consists of temporal convolution and spatial depth-wise convolution blocks, followed by the depth-wise separable convolutional block as shown in FIG. 15. Each convolutional block uses batchnorm and dropout, along with an ELU function for the activation function. ELU stands for Exponential Linear Unit, and it is a type of activation function used in artificial neural networks. The ELU function is defined as follows: f(x) = { x, x >= 0 { alpha * (exp(x) - 1 ), x < 0

[112] where alpha is a hyperparameter that controls the value of the function for negative inputs. Typically, alpha is set to a small positive value, such as 0.1. The ELU function is similar to the ReLU (Rectified Linear Unit) function, which sets all negative inputs to zero. However, the ELU function has a smooth curve for negative inputs, which may be beneficial for gradientbased optimization methods. The ELU function has been shown to improve the performance of neural networks in some cases, particularly for deep networks with many layers. It may help to prevent the “vanishing gradient” problem that may occur in deep networks when the gradient becomes very small, making it difficult to update the weights.

[113] The addition here is to add an attention module prior to the EEGNet network, to allow for more important features with respect to the imagined speech to be “highlighted” relative to the less salient features, this in simple terms is a multi-head attention module plus multi-layered convolutional network (in this case EEGNet or similar) for imagined speech-based brain data, along with pre-layer normalization to help train the model with stability. Note that although pre- processed input data is noted in the diagram (and may follow the signal pre-processing steps given earlier), it is also feasible to use raw input data.

The addition operation [1 14] In some embodiments, the output of the multi-head attention module is added to the input data through a residual connection. This is done to preserve the original input data and allow the network to learn residual features that capture the difference between the input and output of the attention module.

[115] According to implementations, here, for example, the residual connection allows the output of the attention module to be added to the original input data before being passed through the feed-forward network. This means that the feed-forward network may learn to focus on the residual features of the input data that are not captured by the attention module, while still preserving the original input data.

[116] The use of residual connections has been shown to improve the performance of deep neural networks by allowing the network to learn the identity function, which may help to prevent the vanishing or exploding gradients problem that may occur during training. Additionally, by preserving the original input data, residual connections may also help to improve the interpretability of the network by allowing researchers to analyze the contribution of different parts of the network to the final output.

[117] Overall, the addition of the output of the multi-head attention module to the input data through a residual connection is an important technique used in transformer-based neural networks to improve the stability and interpretability of the network.

[118] In one example, as illustrated in FIG. 16, a method 600 may receive pre-processed data 610 and perform an addition operation 620 as well as using a Pre-Layer Normalization (Pre-LN) operation 630 that feeds into multi-head attention 640 which also feeds into addition operation 620. In this example, the systems described herein may perform Pre-LN 650 that feeds into a feed forward network 670 as well as addition operation 660 to arrive at output 680.

Pre-layer normalization

[119] Pre-Layer Normalization (Pre-LN) architecture refers to a neural network architecture in which the normalization operation is applied before the non-linear activation function in each layer of the network. In contrast, the more common practice is to apply normalization after the non-linear activation function, known as Post-Layer Normalization (Post-LN).

[120] In Pre-LN architecture, the input to each layer is first normalized using techniques such as Batch Normalization (BN) or Layer Normalization (LN). Then, the normalized input is passed through the non-linear activation function. The normalized input helps to mitigate the vanishing gradient problem and helps to ensure that each layer's input has a similar scale, making training more stable.

[121] The vanishing gradient problem is a common issue that occurs during the training of deep neural networks, especially those with many layers. It occurs when the gradients used to update the weights of the network during backpropagation become very small, making it difficult or impossible for the network to learn and converge to a good solution.

[122] During backpropagation, the gradients are calculated by multiplying the error of the output layer by the derivative of the activation function of each layer. In a deep network, these gradients may become very small as they are multiplied by the derivative of the activation function of each layer, which is typically less than one. As a result, the gradients may quickly approach zero, making it difficult for the network to learn.

[123] The vanishing gradient problem is especially pronounced in recurrent neural networks (RNNs), which are often used in natural language processing and other sequential data tasks. In RNNs, the gradients may become very small as they are backpropagated through time, leading to difficulty in learning long-term dependencies.

[124] Various techniques have been developed to address the vanishing gradient problem, including the use of different activation functions, gradient clipping, and normalization techniques such as Batch Normalization and Layer Normalization. Additionally, newer architectures such as the Transformer architecture in natural language processing have been developed that may mitigate the vanishing gradient problem.

[125] One advantage of the Pre-LN architecture is that it may reduce the internal covariate shift problem. Internal covariate shift refers to the change in the distribution of input values to a layer caused by the changing parameters of previous layers during training. By normalizing the input, the effect of the parameter changes on the distribution of the input is reduced, leading to faster and more stable training.

[126] Overall, Pre-LN architecture has been found to improve the performance of various deep neural networks, especially in natural language processing tasks.

[127] In one example, a standard process may be described as Input — > Norm — > Linear — > Activation -> Norm -> Linear -> Activation — > ... — > Norm — > Linear -> Output. However, a multi-head attention module may be described as Query — > Linear — > Split — > Attention — > Concatenate — > Linear — > Output.

[128] In this architecture, the input sequence is first transformed using a linear layer, and then split into multiple “heads.” Each head performs an attention operation, which calculates the importance of each element in the input sequence relative to the other elements in that head. The outputs of each head are then concatenated and passed through another linear layer to generate the final output. This architecture is particularly useful for capturing long-range dependencies and dealing with sequential data.

[129] Multi-head attention is a mechanism used in the transformer architecture, which is a neural network architecture used in natural language processing tasks such as machine translation and language modelling. Multi-head attention allows the model to focus on different parts of the input sequence simultaneously, enabling it to capture more complex relationships and dependencies between the input and output.

[130] An illustrative multi-head attention mechanism may be implemented by splitting the input into multiple “heads” or subspaces, each of which is processed separately. Each head performs an attention operation, which calculates the importance of each element in the input sequence relative to the other elements in that head. The outputs of each head are then concatenated and passed through a linear projection layer to generate the final output.

[131] The benefits of multi -head attention are that it allows the model to attend to multiple positions in the input sequence at once, which is particularly useful for capturing long-range dependencies and dealing with sequential data. Additionally, it allows the model to capture different types of information at different scales, which may improve the model's performance on complex tasks.

[132] Multi-head attention, here, is effective in a range of natural language processing tasks, including machine translation, language modelling, and question answering. It may also be applied to other types of data, such as images and music, etc.

[133] In the context of deep learning and natural language processing, an attention operation is a mechanism that allows a model to selectively focus on different parts of an input sequence, based on their relevance to the task at hand. It works by calculating an attention score for each element in the input sequence, which reflects its importance relative to the other elements. [134] There are different types of attention mechanisms, but one common form is called dotproduct attention, which operates as follows: (1) Compute a query vector and a set of key vectors for each element in the input sequence. (2) Calculate the dot product between the query vector and each key vector, and scale the results by the square root of the dimension of the key vectors. (3) Apply a softmax function to the scaled dot products, to obtain an attention distribution over the keys. This gives a weight to each key element, indicating its relative importance for the query. (4) Multiply each value vector by its corresponding attention weight and sum the results, to obtain a weighted sum of the value vectors.

[135] The result of the attention operation is a new vector, which summarizes the input sequence based on the relevance of its elements to the query. This vector may then be used as input to a subsequent layer in the neural network.

[136] The attention operation allows the model to selectively attend to different parts of the input sequence, based on their relevance to the task at hand. This may be particularly useful in natural language processing tasks such as machine translation and summarization, where the model needs to focus on specific parts of the input sequence to generate an accurate output.

[137] FIG. 17 illustrates an additional attention-based network with the goal of improving imagined speech classification accuracy with good stability. As illustrated in FIG. 17, a network 700 may receive raw brain data 710, process this raw brain data via an attention mechanism 720 that includes change attention 722 and/or temporal attention 724, send the processed data to a classification network 730 that includes various networks (e.g., N1 732, N2 734, and/or N3 736), and produce word classification output 740. This implementation may include both temporal and channel attention mechanisms. This means that from a hardware point of view, the algorithm may adjust to different arrangements of electrodes and optodes.

[138] The algorithm follows a similar overall plan to the algorithm above, with an attention mechanism module to first process the raw input data followed by a network architecture to carry out the classification of the imagined speech word. In this case, pre-processing is not used. However, this architecture has two types of attention mechanism built into the attention module - a temporal attention mechanism to identify the time slices of the brain data which are imagined speech/word related and a channel attention mechanism to focus on the channels of information which are imagined speech/word related. [139] The output of the temporal attention mechanism are slices of the raw brain data which are most related to the imagined speech task. The time slices of the raw brain data are fed as input to network N2 in the classification network module. The output of the channel attention mechanism is information that may be used to update the weights in the network Nl. This means that N1 receives as input both the raw brain data and also information to update its weights. This information is regularization information. The output of Nl is a feature map representing the entirety of the brain data inputted, and modulated by the channel attention mechanism. The output of N2 is a feature map representing just the most salient time slices, not the whole of the raw brain data. The outputs of Nl and N2 are combined into a combined feature map. The combined feature map represents the input to N3. N3 performs the final classification to determine the imagined word, which is the word classification output.

[140] In some embodiments, a temporal attention mechanism may include a signal preprocessing method to identify the imagined speech related time slices from the raw brain data. The result is that the N2 network may focus on the imagined speech related information, meaning that the overall architecture may better focus on the salient information from the task, improving accuracy by removing noise /irrelevant information.

[141] This method works by taking in both the raw brain data as input and also leveraging an initial pass output of network N3 (prior to having had the output of N2 determined, thereby just being based on the entirety of the brain data). The output of N3 is a probability distribution over the possible word class members, which is generated by first processing the entire time series data to generate a global feature map from Nl, which is a set of vectors that represent the data at different time slices. Then, the global feature map is used to make a preliminary classification of the data by N3.

[142] Next, a temporal salience map is calculated based on the preliminary classification using the entropy of the classification distribution. The salience map represents the importance of each time slice in the data.

[143] Using the salience map, the temporal attention mechanism selects a small number of time slices with the highest salience scores as the task-related time slices. These time slices are the most important ones for the algorithm to focus on, and the algorithm may perform fine-grained processing on them for better performance. [144] The entropy is a measure of the amount of uncertainty or randomness in a probability distribution. In the context of the temporal attention mechanism, the entropy is calculated based on the preliminary classification distribution over the possible classes.

[145] The entropy of the preliminary classification distribution is calculated using the following formula:

H(y) = - sum(p(y_i) * log(p(y_i)))

[146] where H(y) is the entropy of the distribution, p(y_i) is the probability of the i-th class given the input data, and log is the natural logarithm.

[147] This formula calculates the entropy value for each time slice in the input data, which is used to create a temporal salience map. Specifically, the temporal salience level of each time slice is determined by the gradient of the entropy with respect to that time slice's corresponding vector in the global feature map.

[148] This means that the time slices that have the largest gradients of the entropy (i.e., the highest level of uncertainty or randomness in the classification distribution) are considered the most important ones and are selected as the task-related time slices. These task-related time slices are then used for fine-grained processing, by being passed through as the input to N2.

[149] In some embodiments, the systems described herein may employ the channel attention mechanism to adaptively adjust the attention paid to different electrode channels. In some embodiments, this may include a regularization method to adjust the weight coefficients in the network architecture, which are channel-related. Regularization methods may adjust the weight coefficients in the network architecture by adding an additional term to the loss function that penalizes large weights. The regularization term is typically a function of the weights, and its value increases as the weights get larger.

[150] There are several types of regularization methods, each of which adjusts the weights in a different way: a. LI regularization: This method adds a penalty term to the loss function that is proportional to the absolute value of the weights. This has the effect of forcing some of the weights to become zero, which may simplify the model and improve its interpretability. b. L2 regularization: This method adds a penalty term to the loss function that is proportional to the square of the weights. This has the effect of shrinking the weights towards zero, which may help to prevent overfitting and improve the generalization performance of the model. c. Dropout regularization: This method randomly drops out some of the neurons in the network during training, which has the effect of reducing the interdependence between the neurons and preventing overfitting. d. Data augmentation: This method generates additional training data by applying random transformations to the existing data, which has the effect of making the model more robust and generalizable.

[151] In each of these methods, the additional term is added to the loss function and the weights are adjusted during backpropagation. By adjusting the weights in this way, regularization methods may prevent overfitting and improve the generalization performance of the model.

[152] According to aspects of the disclosed technology, the channel attention mechanism may adjust the attention paid to different electrode/optode channels to account for nonuniform information distribution among channels. The signals from different channels contain varying levels of imagined speech task information, and the channel attention mechanism explores the salience levels of different channels to impose a well-designed regularization term on the training loss. This regularization term is applied to weight coefficients associated with each channel to properly weight the signals from different channels, guiding the model to focus more on the task-related channels for improved classification accuracy.

[153] In some embodiments, to calculate the salience level of each channel, wavelet packet transform to decompose the EEG signal into sub-band signals may be used, and the wavelet packet sub-band energy ratio (WPSER) is used to quantify the ratio of the energy within the frequency range most relevant to imagined speech, such as within the gamma frequency range between 25 and 140Hz in the case of EEG, to the signal's total energy.

[154] The WPSER is then used as a measure of the salience level of task-related information for each electrode channel. The channel salience map is based on the WPSER of all electrode channels, which is used to adjust the weight coefficients of the spatial convolutional layer in the global sub-network.

[155] A regularize is added to the loss function to regularize the learning of each weight coefficient in the spatial convolutional layer, which is guided by the channel salience map calculated before the training phase and remains fixed. By minimizing this regularize, the channel with more imagined speech information will have a larger weight coefficient in the spatial convolutional layer of the model, resulting in improved classification accuracy.

[156] In other words, N1 ’ s weight coefficients are regularized by the channel attention mechanism, meaning that the channel with more imagined speech related information will have a greater representation in the spatial layer of N2.

[157] In some embodiments, the classification network module may consist of Nl, N2 and N3. The module itself may have as input both the raw brain data and the time slices of the brain data derived from the temporal attention module. Nl also may also have its weight coefficients adjusted by the regularization derived from the channel attention mechanism.

[158] Nl works on all of the raw brain data and may take different structures. One example is that it may have a spatial convolutional layer (e.g., performing convolution over space to extract spatial features) followed by a temporal convolutional layer (e.g., performing convolution over time to extract temporal features). The filter sizes of these layers may be determined through experimentation, but a number around 60 provides good results for imagined speech classification.

[159] N2 works on the derived time slices from the temporal attention mechanism and may take the form of a spatial convolutional layer followed by multiple temporal convolutional layers with intervening mean pooling layers. Multiple temporal layers may be used here to enable more detailed processing of the time slices which have been determined as most pertinent/ salient for imagined speech.

[160] The outputs of Nl and N2 may be combined to act as the input to N3 which may take the form of a fully connected layer and a softmax layer to produce the final probability distribution for the classification of the imagined speech word.

Reducing False Positive Embodiments/Selections

[161] According to various exemplary aspects of the disclosed technology, wearable brain computer interfaces are implemented for hands-free navigation and interaction with devices including personal computers, smartphones, etc. and artificial reality (e.g., VR, etc.) systems and/or devices. As set forth herein, systems and methods are presented involving brain interfaces which reduce false positives when a user is intending to select elements on a UI. Aspects, here, may involve eye-tracking and detecting a secondary response from the brain. Other aspects may be implemented by enabling control signals, such as a binary control signal, which can be used for ‘clicking’ and ‘selecting’ any icon on a user interface just via determination and processing of user intentions to do so. Here, for example, aspects of the disclosed technology may relate to systems and methods for determining and/or processing user intentions such as via utilization of expectancy waves or E-waves.

[162] Consistent with the E-ware features and functionality set forth herein, the disclosed technology relates to systems and methods involving a BCI that allows a user to simply look at the (GUI) button etc. they want to press, i.e., instead of using a controller to navigate in VR or a mouse to interact with a computer by clicking. In the present disclosure, the intention to select is automatically decoded from the user’s brain signals without requiring a physical click, and to improve the accuracy and to overcome the false positives problem. Here, for example, systems and methods herein utilize an E-wave, i.e., an anticipation-related EEG component occurring in the occipital and parietal regions of the brain when an interaction with the interface is anticipated, to carry out the disclosed technology. As set forth in more detail, below, to decode the intention of the user, systems and methods herein record and stream brain signals in realtime, and include pre-processing of the data and/or classifying intentional vs. spontaneous gaze dwells using a secondary response from the brain that would indicate whether the user is happy or not. This secondary recorded response can serve as a confirmatory signal. According to various embodiments, decisions of such classifier may then be utilized to control the virtual reality or software user interface. In some implementations, when a user looks at a selectable or manipulatable object on the user interface, systems and methods herein may decide whether or not the user has likely determined to select the object, as indicated by the presence of the brain anticipatory signal, confirmatory processing and/or other features and functionality herein.

[163] Accordingly, systems and methods herein may therefore improve user experience, inter alia, by eliminating false positive selection caused by purely gaze-based interfaces. As set forth in more detail, below, aspects of the disclosed technology may include and/or involve a fusion of ‘gaze direction,’ highlighting objects and BCI. According to some embodiments, for example, a VR headset with built-in eye-tracking may be utilized to assess where the user is looking, and employ advanced neuroimaging technology and the detection of secondary response (emotional response) to assess what the user intends to do. Further, in other embodiments, systems and methods herein may also utilize other eyetrackers, e g., when working with phones, laptops, etc. Conceptually, aspects herein might be imagined with such ‘gaze direction’ feature being equivalent to the cursor of a computer and the decoding of brain signals as the ‘click’ operation of a mouse. Using independent modalities for these two operations addresses challenges in the development of eye-gaze interfaces, e.g., among other things, if everything a user looks at gets activated (even when the user is just exploring a scene or reading a description) it leads to frustration, discomfort and eye-strain. As disclosed herein, utilization of a BCI to decode if a user intends to select or manipulate a UI object based on secondary response (emotional response) offers a solution.

[164] Systems, methods and wearable devices associated with mind/brain-computer interfaces which reduce false positive selection are disclosed. Embodiments herein include features related to one or more of optical-based brain signal acquisition, decoding modalities, encoding modalities, brain-computer interfacing, AR/VR content interaction, signal to noise ratio enhancement, false positive selection reduction, and/or motion artefact reduction, among other features set forth herein. Certain implementations may include or involve processes of collecting and processing brain activity data, such as those associated with the use of a brain-computer interface that enables, for example, decoding and/or encoding a user’s brain functioning, neural activities, and/or activity patterns associated with thoughts, including sensory-based thoughts. Further, the present systems and methods may be configured to leverage brain-computer interface and/or non-invasive wearable device aspects to provide enhanced user interactions for next-generation wearable devices, controllers, and/or other computing components based on the human thoughts, brain signals, and/or mind activity that are detected and processed.

Advanced E-Wave Neural Interface for Intention Decoding

[165] As explained in detail, below, to decode the intention of the user, brain signals may be collected and streamed in real-time, followed by pre-processing of the data, and then classification to determine intentional vs. spontaneous gaze dwells using a secondary response (emotional response). The decisions made by the secondary response may then be used for control, such as to control a virtual reality interface. As such, when a user looks at a selectable or manipulatable object in VR for a given amount of time, the disclosed technology is able to decide whether the user is likely aiming to select the object, as indicated by the presence of the brain anticipatory signal (E-wave), or whether they are just exploring the surroundings without the intention to interact. [166] One way to reduce false positive selection is to leverage visual feedback from the computer system and use the visual feedback to evoke responses from the user which is detectable via the brain-interface to thereby produce a confirmatory signal that the user either does intend to select the UI element, or the user does not and was just gazing at it.

[167] E-waves are produced in anticipation of feedback. If providing visual feedback, i.e., highlighting the object, then a secondary response from the brain that would indicate whether the user is happy or not, might be detectable. This secondary recorded response can serve as a confirmatory signal. There are the reaction part shown in Figure 18 and the imagined part shown in Figure 19.

[168] Figure 18 depicts one exemplary workflow of using secondary response (reaction) as a confirmatory to determine whether or not a user intends to select an object. Said workflow is consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[169] Referring to Figure 18, to enable active interaction with users, an intermediate step of “highlighting” can be added before “selection” or “click.” During this intermediate phase the subject’s reaction 130 can be detected. This paradigm is similar to hovering a mouse cursor on icons, which would highlight them and then click using a mouse to select the desired icon. In response to the visual feedback, emotional reactions can be detected using EEG/fNTRS or FOS and error potentials can be detected.

[170] Figure 19 depicts one exemplary workflow of using secondary response (imagined speech) as a confirmatory to determine whether or not a user intends to select an object. Said workflow is consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[171] Referring to Figure 19, to enable active interaction with users, an intermediate step of “highlighting” can be added before “selection” or “click.” During this intermediate phase the subject’s imagined speech can be detected. This paradigm is similar to hovering a mouse cursor on icons, which would highlight them and then click using a mouse to select the desired icon. In response to the visual feedback, emotional reactions can be detected using EEG/fNTRS or FOS and error potentials can be detected. Yes/no imagined speech can be decoded after detecting E- waves. A cascade of E- wave detection followed by Yes/No discrimination will improve the accuracy beyond the level that can be achieved from each method individually. [172] This can be achieved by designing experiments in which the user is trained to imagine Yes or No after the object at which he/she is staring is highlighted. “Yes” and “No” words are different in concept and also the letters and vowels are pronounced and articulated differently.

[173] Figure 20 is an illustration of E-wave and emotional/error potential real-time decoders, consistent with various exemplary aspects of one or more implementations of the disclosed technology. Referring to Figure 20, to address any false positive selection (for example, an object is selected when the user had no intention of selecting it and was only looking at it), a decoding pipeline including both E-wave and Error-related evoked potentials (ErrP) is utilized. When the E-wave decoder suggests there is a selection intention from the user, an indicative visual response is presented to the user, and the ErrP decoder will decode any evoked responses for errors. If an error signal is detected, it would indicate that the selection was falsely decoded, and the system will not proceed with the selection action.

Neural Signal Processing

[174] Figure 21 depicts illustrative signals associated with a user’s brain signals (e.g., Ewave, selection condition, etc.) and a user’s gazing condition (e.g., no Ewave, etc.), showing an exploded view of a portion thereof illustrating potential error conditions (e.g., ErrP signal, etc.), consistent with various exemplary aspects of one or more implementations of the disclosed technology. Referring to the drawing, Figure 21 illustrates an E-wave (the green line in the color graph with potential decreasing steeply at -500 ms), a gazing condition signal (the purple line), and an ErrP signal (the dark black line along the top of the exploded view).

Machine Learning-Based Decoder

[175] Advanced machine learning techniques can pick up on patterns in brain signals that are not detectable by conventional statistical methods. Furthermore, they can deal with the great amount of parallel-streamed data with ease, eliminating the need for handcrafted model features. Consistent with aspects of the disclosed technology, intention decoding may be accomplished with machine learning-based decoders. These decoders may be trained on a large dataset collected from a range of individuals. Such training involves processing the recorded data, segmenting and realigning it, and optimizing the decoder parameters given the training data.

[176] According to various embodiments herein, the decoder(s) utilized may be based on deep neural networks (DNNs). One of the network architectures used as a base for some embodiments is called EEGNetV4, which is a convolutional neural network (CNN) based architecture optimized for EEG signals. While base components of such decoders are provided by as packaged products by known vendors (e g., Braindecode, using PyTorch and Skorch, etc.), additional features have been developed for the model to use the datasets disclosed herein. For certain embodiments, other DNNs provide better decoding performance, including transformers and geometric deep learning-based models. Further, such embodiments enable the transfer of trained models to new users, e.g., where DNN model weights may be frozen and the classification layer weights may be retrained with new data on day of use.

[177] Referring to Figure 21, the left side (before time Os) shows an example of E-wave generating at -750 ms to 0 ms to feedback from system, where a noticeable negativity in the signal is detected in the selection condition only (green line), as compared to the control condition of gazing at an object but without selection intention (blue line). A visual feedback is provided to the user at 0 s. The right side of Figure 21 (after time 0 s) shows if the system provides a wrong feedback to the user (as the user’s intention was falsely decoded), an error- related potential at time +200 to +400 ms will be evident from the brain signals (negative component followed by positive component in the red box). This signal can be used to correct any false positive selection from the E-wave decoder described herein.

[178] Consistent with some embodiments, one exemplary approach for correcting any false positive selection from the E-wave decoder may comprise the following steps: (a) Using the E- wave for decoding the user’s intention, (b) Providing an indicative visual feedback to the user (e g. grey highlight of the target object) to monitor whether the user’s intention was correctly decoded before the final selection action, (c) Determining whether an error-related potential is detected following the system’s feedback; and (dl) if an error-related potential is NOT detected, the system will select the target object, as the user’s intention was decoded correctly, or (d2) in contrast, if an error-related potential is detected, it would indicate that the selection was falsely decoded and the system will not proceed with the selection action. This approach will help reduce false positives in the decoder’s classification of user’s intentions.

[179] Figure 22 depicts the workflow of determining emotional responses for selection threshold modulation. Referring to Figure 22, in addition to, or instead of error potentials, the emotional response of the user can be detected to confirm or cancel a selection event. Erroneous manipulation in the user interface can lead to user’s frustration that is detectable from the user's brain signals. The system described herein can leverage this emotional response to change the selection threshold to fit the user’s intention. As a result, if the system has been selecting frequently and frustration is detected, the probability threshold for selection is increased to require a higher level of confidence before outputting a “selection” prediction. Adding complexity to the BCI disclosed herein, in some earlier versions of BCI, brain signals are being monitored to perform a binary classification task determining if the user wants to select an object or not. In some later version of BCI, a second classifier is simultaneously decoding the user’s emotions, specifically checking for frustration. If frustration is detected, the probability threshold of the “selection classifier” is adjusted to ensure accurate results. If this classifier had mostly predicted “selection” up to this point, the probability threshold for selection is increased to require a higher level of confidence before outputting a “"selection” prediction. Such sophisticated schemes enable integration of a person’s thoughts and emotions in BCI’s consistent with the disclosed technology, e g., to enhance the user experience by reducing false positives and false negatives, minimizing frustration as a result.

New Montage Design for Combined E-wave and Error Potential Detection

[180] Among other benefits, the disclosed innovations improve user experience by eliminating cumbersome controller point-and-clicks and eye strains caused by purely gaze-based interfaces. Systems and methods herein have the advantage of not requiring separate controllers, which can be misplaced or lost. In addition, purely gaze-based systems have been shown to lead to frustration, discomfort and eye strain of users, leading to avoidance behaviors, as everything the user looks at automatically gets selected. In contrast, aspects of the disclosed technology may utilize, inter alia, independent modalities for these two operations (e.g., to establish ‘point’ via eye-tracking and ‘click’ via the disclosed BCI), which provides a solution for such problems. Aspects Involving Electroencephalography (EEG)-based Signal Capture/Features

[181] Various aspects of EEG systems and techniques may be used to directly measure electrical activity of the brain in a non-invasive way. As background, EEG systems and techniques do not record the activity of single neurons, but rather detect the signals created when populations of neurons are active at the same time. Other techniques may also be utilized consistent with the innovations herein. As to EEG, the measured signals provide an image of electrical activity in the brain represented as waves of varying frequency, amplitude, and shape. Further, the brain signals detected by EEG may be amplified to allow for viewing and processing. EEG can be used to measure neural activity that occurs during an event, e.g., the completion of a task or the presentation of a stimulus, or to measure spontaneous neural activity that happens in the absence of a specific event. For signal capture, the electrical activity of the brain is detected by electrodes placed on the scalp, with one or both of wet and/or dry electrodes being utilized according to certain implementations.

[182] For wet EEG electrodes, a conductive gel is applied to reduce el ectrode- scalp impedance and obtain better signal quality. In some embodiments, for example, EEG brain activity may be collected using various wet-electrode EEG systems, such as known 32-channel wet-electrode montages, a novel electrode montage design (e.g., Figure 23), or other known arrangements.

Such arrangements may also, for example, be augmented to measure the signals based on various known optimized techniques, such as, in one example, by using a 19-channel montage known in the art.

[183] Figure 23 depicts one exemplary electrode arrangement having a new electrode montage design for combined E-wave and error potential detection, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[184] Referring to Figure 23, the illustrative montage shown utilizes channels include Fpl, Fp2, AF7, AF3, AFz, AF4, AF8, F7, F5, F3, Fl, Fz, F2, F4, F6, F8, FT9, FT7, FC5, FC3, FC1, FCz, FC2, FC4, FC6, FT8, FT10, T7, C5, C3, Cl, Cz, C2, C4, C6, T8, TP9, TP7, CP5, CP3, CPI, Pz, P2, P4, P6, P8, PO9, PO7, PO3, POz, PO4, PO8, 01, Oz, and 02. The E-wave detection EEG electrodes are concentrated in the occipital lobe of the brain (blue). The E-wave detection EEG electrodes comprise P07, P03, POz, P04, P08, 01, Oz, and 02. The error-related potentials detection electrodes are concentrated in fronto-central region of the brain (orange). The error- related potentials detection electrodes comprise Fz, FCz, Cz, CPz, Pz.

E-wave Characterization Aspects

[185] Additional processing may be performed in connection with the Ewave processes herein For example, with regard to the information captured and processed by the system, certain information and data, such as raw EEG data collected using the above paradigm, may be analyzed offline, such as by using EEGLAB, an interactive MATLAB toolbox for processing continuous and event-related EEG, and/or other such tools. All such information gathered may then be utilized to further inform and/or perform further processing and achieve insights using, inter alia, Al methodologies, and/or other technologies. Here, for example, one representative example of the E-wave detected in illustrative channels of interest (here, in this example, PO7, PO3, POz, PO4, PO8, 01, Oz, 02, as depicted in Figure 23) may be analyzed offline.

[186] Figure 7 depicts each EEG channel ranked according to its importance in the trained model for decoding the two conditions, indicated along the y axis (the most important one starting from the bottom of the graph). Referring to Figure 24, integrated gradient is used to check trained deep neural network models, without the need to alter trained models. Individual channel and time point information can be visualized by passing testing data into trained models, which helps to sanity check for important features used by the model. It can also be used to prune EEG channels not fully used by the model in a particular application, reducing the setup time required for a full system. Figure 24 shows each EEG channel ranked according to its importance in the trained model for decoding the two conditions, indicated in y axis (the most important one starting from the bottom of the graph). The higher attributes in time across the x axis suggest the important period in the signal being used by the model, where a consistent yellow pattern can be seen from adjacent channels.

[187] Figures 25A-25D depict example training paradigms which allow both E-waves and error potentials to be recorded from user brain data, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[188] Referring to the illustrative example shown in Figures 25A-25D, Ey eLines was used to generate the data. EyeLines is a simple computer puzzle game, with the goal to construct as many “lines” from colored balls as possible. The player is presented with a square board (7x7) on which three colored balls appear in the beginning of the game. On each turn, the player has to move one ball to a free cell on the board. When the required or higher number of balls with the same color form a “full” line, either horizontal, vertical or diagonal, these balls disappear. If no such line is formed, three new balls randomly selected out of seven different colors are put on random cells. The game is over when the board is full.

[189] There are 3 steps to making a move: (a) turn on the control button (left yellow box) by looking at it for 500 (Version 1) or 750 ms (Version 2) - once selected, a ball appears inside the box; (b) select the ball you want to move by looking at it for 500 (Version 1) or 750 ms (Version 2), once selected, a red box appears around the ball; (c) select the free cell you would like to move the ball to by looking at the free cell for 500 (Versionl) or 750ms (Version2) - once selected, the ball will move to the free cell. This way, there are three visual feedbacks from the system when making a move.

[190] In the standard paradigm, there are two conditions, Selection vs Gazing, which are described below. Some Error conditions were additionally included, which are also described below, to study intermediate feedback and reduce false positives.

[191] In operation, the participant needs to switch on the control button (dwell threshold of 500 (Version 1) or 750 ms (Version 2)) to determine the start of the selection condition. When the control button is switched on, participants can move a ball (selected by looking at target ball for 500 (Version 1) or 750 ms (Version 2)) to a free cell (selected by looking at the target cell for 750 ms). The selection condition ends after the ball is moved to a new cell. The user makes a move using 3 steps (see below for details) and receives the correct feedback from the system. There are 3 types of visual feedback from the system upon selection: (a) selection of control button: a colored ball appears in the control button; (b) selection of a ball: a red box appears surrounding the selected ball; (c) selection of free cell: the target ball moves to the target free cell. Participants need to have the intention to select objects in this condition.

[192] If the participant does not switch on the control button, they find themselves in the gazing condition. Participants do not need to have the intention to select objects in this condition, but only to explore the board and strategize next moves. The gazing condition ends after the participant switches on the control button. Dwell times of 500 (Version 1) or 750 ms (Version 2) on balls and cells are still considered.

[193] To help illustrate the innovations herein, the error condition was built to address our false positives. There are 3 types of error condition. The first one is unexpected selection, e.g., an error occurring during gazing condition which is similar to the Gazing condition. In this condition, the participant does need to switch on the control button, but only needs to look around the board to explore the board and strategize next moves. However, in this unexpected selection error condition, a ball will be selected unexpectedly, when the participant did not have the intention to select it but was only looking at it. This will happen in a scaled approach: 25, 50, 75% error rate.

[194] The second one is wrong ball selection, which is an error occurring during the selection condition. Similar to the selection condition, in this condition, the participant needs to switch on the control button (dwell threshold of 750 ms) and then make the move (ball selection + cell selection). However, in this wrong ball selection error condition, the wrong ball (neighboring to the target ball) will be selected instead (feedback error from the system). This will happen in a scaled approach: 25, 50, 75% error rate.

[195] The third one is wrong cell selection, which is an error occurring during the selection condition. Similar to the selection condition, in this condition, the participant needs to switch on the control button (dwell threshold of 750 ms) and then make the move (ball selection + cell selection). However, in this wrong cell selection error condition, the wrong ball (neighboring to the target cell) will be selected instead (feedback error from the system). This will happen in a scaled approach: 25, 50, 75% error rate.

Speech Decoding

[196] Figure 26 depicts a representative process of speech decoding, showing illustrative steps associated with processing signals from an exemplary system, consistent with various exemplary aspects of one or more implementations of the disclosed technology.

[197] Referring to Figure 26, a person is seen imagining the word “yes” in order to exit the screen. Their brain signals are captured and translated, allowing them to interface with a computer (or alternatively a VR environment) through the power of imagined speech. Through simple single word commands (such as Yes/No), this interface will allow users to perform selections, execute programs, and potentially perform more complex tasks such as conducting simple conversations with language-based AIs. It can revolutionize the way people interact with technology by prompting computers with just one thought.

[198] The approach disclosed herein comprises using imagined speech and the different brain activity associated with the “Yes” and “No” words as an additional feature to feed to the decoder to classify the user’s intention and reduce false positives.

[199] More detailed methodologies for this can be found as described herewith, and the methodologies for language classification can be combined with the methodologies for classifying the E-wave and using eye-tracking, so the overall system produces superior accuracy when the user intends to select things on a UI.

Binary Imagined Speech Decoder

[200] In some embodiments, the systems described herein may include a binary imagined speech decoder that is able to differentiate positive/no words from EEG data of participants while engaging in imagined speech. [201] Briefly, EEG data from both spoken and imagined speech is collected from subjects, as well as voice recordings, by the systems described herein. A spoken EEG generative model is first trained with actual recorded voice as ground truth, then transfer learning is used to finetune the model to adapt to imagined speech data. In some embodiments, the systems described herein may use Mel-spectrograms for words derived from voice recordings as ground truth and labels for machine learning. In some embodiments, the systems described herein may extend this approach to multiple sessions, multiple subjects, and/or real-time testing (where models are adapted for online decoding of imagined speech). Both data collection and RT testing may be done on 2D paradigms.

Brain signals for BCI

[202] Electroencephalography (EEG) is a technique used to directly measure the electrical activity of the brain in a non-invasive way. Typically, EEG does not record the activity of single neurons, but rather detects the signals created when populations of neurons are active at the same time. Electrical activity of the brain (e.g., as illustrated in FIG. 27) is detected by electrodes, placed on the scalp.

[203] In some embodiments, the systems described herein may use a 32-channel dry EEG system. Dry electrodes (spider-like sensors contacting the scalp through the hair) may reduce setup time, increase comfort for the user, and do not require the application of gel which needs to be washed out of the hair following use. In some implementations, a reduced 32-channel montage may provide whole head coverage while improving decoding performance. This suggests that, while (imagined) speech processing may recruit several regions of the brain, having 64 channels is redundant.

[204] EEG provides an image of electrical activity in the brain represented as waves of varying frequency, amplitude, and shape. The brain signals detected by EEG are amplified to allow for its viewing and processing. EEG data is collected using the Imagined Speech paradigm, which is a simple word presentation paradigm. Subjects need to determine whether the presented words are upper or lower case (based on the question asked at the start of the task). On each trial, subjects reply “positive” or “no”, either vocally (when a sound icon appears) or using imagined speech (if the sound icon does not appear). The participant needs to provide the answer when the fixation cross appears on the screen. FIG. 28 illustrates an example of two trials from the Imagined Speech paradigm, including both spoken and imagined speech trials. Spoken and imagined speech trials may be presented in a random, counterbalanced order throughout the testing session.

Signal processing and machine learning

[205] In one embodiment, EEG signals may be filtered at hardware level with a bandpass 0.5- 100Hz filter and 50Hz notch filter. The recorded signals may be separated into different trial types given event markers from the paradigm. In one example, these may include 4 classes: (i) spoken yes/positive, (ii) imagined yes/positive, (iii) spoken no, and/or (iv) imagined no.

[206] The systems described herein may use both raw signals and features extracted from raw signals as input to the machine learning model. In one implementation, the systems described herein may process raw signals for decoding. Additionally or alternatively, the systems described herein may apply Common Spatial Patterns (CSP) to raw data for feature decomposition. CSP may exaggerate spatial variations in the raw signal (e.g. laterality differences) and/or may act as a dimension reduction tool. For example, a trial of 32 channels x 500 time points matrix can be reduced to a 16 x 16 matrix after CSP.

[207] Additional processing steps may be used to reduce noise and align spoken data to imagined data. Noise reduction techniques including Independent Component Analysis (ICA), additional bandpass filtering (e.g., investigating 30-100hz bandpass filtering), and EMG based artefact removal may be used to reduce the impact of noise in the data, particularly movement artefacts from spoken speech, as well as to identify the brain frequencies of interest for imagined speech decoding. FIGS. 29A and 29B illustrate noise removal in spoken and imagined EEG data using ICA, respectively. In FIGS. 29A and 29B, red represents the original noisy data and black represents the reconstructed cleaned data

[208] Because of the different noise profile, spoken and imagined EEG can have different spatial patterns, therefore realigning data from these domains (spoken and imagined) is useful to improve decoder performance. Techniques including spatial and temporal normalization, Euclidean Alignment, Riemannian Alignment have been used to transform data to adapt to the other domain. In addition, sharing a CSP across domains can also be used as an additional step to domain transfer. FIGS.30A and 30B are examples of Euclidean Alignment impact on spoken and imagined data before and after Euclidean alignment, respectively, where aligned data share a similar space. Audio Signal Processing

[209] In some examples, audio data for spoken EEG may provide ground truths for training which does not exist for imagined speech. In one example, audio may be recorded using a PC microphone at 11025Hz with a sample size of 16. The data may be epoched into trials with the same method as EEG, each trial lasting for 2s. Then the data may be resampled to 22050Hz and denoised using any appropriate software package.

[210] Mel-scaled spectrograms convert audio data into images representing temporal information in x-axis, human audible range of frequency information on y-axis (Mel scale), and amplitude information as decibels with different colors. They may be a more intuitive representation of speech data.

[211] To generate the Mel-scaled spectrograms required for model training, the systems described herein may send the audio data through Short-time Fourier Transform (STFT) with 1024 FFT window length, 256 hop size, and 80 Mel bands. The systems described herein may perform additional normalization steps to the resulting Mel spectrograms individually. In short, each Mel spectrogram represents each word spoken in a single trial. For the same word, each Mel spectrogram is similar in shape and duration, potentially with slight shifts in time.

[212] In some embodiments, the systems described herein may convert Mel spectrograms to audio. FIG. 31 illustrates an example image that was converted back to a 2D matrix, denormalized, and was input into a pre-trained vocoder trained with general purpose audio generation. FIG. 31 illustrates a Mel-spectrogram and the audio wave of original voice, reconstructed voice from spoken EEG, and reconstructed voice from imagined EEG. The two examples of reconstruction include Positive and “No” words.

Machine-learning-based decoder

[213] FIG. 32 illustrates a model architecture for a machine-learning-based decoder. In one implementation, the systems described herein may use a generator model that includes a classifier optimized as part of the generator architecture. In this way, the systems described herein may produce a combined loss based on the mel-spectrogram reconstruction loss and the classification loss (e.g., weighted sum of these where the weights are fine-tuned to achieve best results on the available data).

[214] In one embodiment, the systems described herein may include a generator designed to accept either raw EEG signals or features extracted with CSP. This may be achieved by allowing a slight modification to be made to the initial pre-convolutional layer depending on the desired input. The input of the generator is, thus, given as the embedding/raw vector of EEG signals and the output is generated as a mel-spectrogram. The embedding/raw vector goes through the preconvolution layer consisting of a Id convolution meant for feature extraction and consecutively through a bi-directional GRU layer to capture the sequential information in the EEG. The purpose of adding a residual connection as shown in FIG. 32 is to allow capturing the temporal and spatial information while preventing vanishing gradients. This residual connection results in a concatenation of the features from the bi-directional GRU and pre-conv layer. Following this, the generator upsamples the features to generate the correct size mel-spectrogram. This is done by using a transposed convolution with different strides and adding a multi-receptive field fusion (MRF) module (i.e. sum of outputs of multiple residual blocks with different kernel size).

[21 ] After the mel-spectrogram is generated, the systems described herein may calculate the reconstruction (RMSE) loss against the ground truth mel-spectrogram and pass it to the classifier where the classification loss (binary cross entropy) is computed considering the correct and the predicted class label (positive/no). Additionally, the systems described herein may include the option to make this model simpler (reducing number of layers and thus parameters) in case overfitting is observed.

[216] Aside from the simple classifier consisting of a fully-connected layer as shown in FIG. 32, the systems described herein may include more complex architectures for this model, such as a simplified version of a discriminator.

[217] In some embodiments, the systems described herein may be modified from other systems for classifying similar data in one or more of the following ways: (i) the systems described herein adapted the task to a binary yes/no or positive/no classification, (ii) the systems described herein use generator only (and allow the option to modify slightly its architecture into one containing less parameters), (iii) feature embedding with CSP is optional, (iv) the systems described herein added classification loss (two types of classifier), (v) the systems described herein are mixing spoken and imagined trials in a session, (vi) the systems described herein are using vocoder reconstruction for QA only, and/or (vii) the systems described herein added additional noise reduction (ICA) and data alignment technique (EA). FIG. 33 illustrates an example implementation of this model in a trial. [218] The above-described model may be able to classify spoken EEG (positive and no classes) with high accuracy both offline and in real-time. FIG. 34 illustrates feature importance and dimensionality reduction for classification.

[219] In some examples, using traditional neuroscience methods for imagined speech decoding on offline and real-time data included ERP analysis (e.g., analysis of the average time course of the EEG evoked responses in the different conditions) may include offline ERP analysis for imagined positive/no that does not seem to show any significant differences between conditions, suggesting that their differentiation would be challenging if using this approach. Additionally, real-time ERP analysis may show, upon visual inspection, similar ERPs to offline data. FIG. 35 A illustrates representative example of ERPs in offline imagined positive/no conditions at channel F3. FIG. 35B illustrates representative example of ERPs in real-time imagined positive/no conditions at channel F3.

Overall Implementations of the Disclosed Technology

[222] According to the above such technology, systems and methods herein may be utilized to perform various brain state assessment features and functionality, and/or include aspects that detect and/or involve detection of haemodynamic signals and direct neuronal signals (fast optical signals) that both correspond to neural activity, simultaneously, among other things.

[223] While the above disclosure sets forth certain illustrative examples, such as embodiments utilizing, involving and/or producing fast optical signal (FOS) and haemodynamic (e.g., NIRS, etc.) brain-computer interface features, the present disclosure encompasses multiple other potential arrangements and components that may be utilized to achieve the brain-interface innovations of the disclosed technology. Some other such alternative arrangements and/or components may include or involve other optical architectures that provide the desired results, signals, etc. (e.g., pick up NIRS and FOS simultaneously for brain-interfacing, etc.), while some such implementations may also enhance resolution and other metrics further.

[224] Among other aspects, for example, implementations herein may utilize different optical sources than those set forth, above. Here, for example, such optical sources may include one or more of: semiconductor LEDs, superluminescent diodes or laser light sources with emission wavelengths principally, but not exclusively within ranges consistent with the near infrared wavelength and/or low water absorption loss window (e.g., 700-950nm, etc.); nonsemiconductor emitters; sources chosen to match other wavelength regions where losses and scattering are not prohibitive; here, e.g., in some embodiments, around 1060nm and 1600nm, inter alia, narrow linewidth (coherent) laser sources for interferometric measurements with coherence lengths long compared to the scattering path through the measurement material (here, e.g., (DFB) distributed feedback lasers, (DBR) distributed Bragg reflector lasers, vertical cavity surface emitting lasers (VCSEL) and/or narrow linewidth external cavity lasers; coherent wavelength swept sources (e.g., where the center wavelength of the laser can be swept rapidly at 10-200 KHz or faster without losing its coherence, etc.); multi wavelength sources where a single element of co packaged device emits a range of wavelengths; modulated sources (e.g., such as via direct modulation of the semiconductor current or another means, etc.); and pulsed laser sources (e.g., pulsed laser sources with pulses between picoseconds and microseconds, etc.), among others that meet sufficient/proscribed criteria herein.

[225] Implementations herein may also utilize different optical detectors than those set forth, above. Here, for example, such optical detectors may include one or more of: semiconductor pin diodes; semiconductor avalanche detectors; semiconductor diodes arranged in a high gain configuration, such as transimpedance configuration(s), etc.; single-photon avalanche detectors (SPAD); 2-D detector camera arrays, such as those based on CMOS {complementary metal oxide semiconductor} or CCD {charge-coupled device} technologies, e.g., with pixel resolutions of 5x5 to 1000x1000; 2-D single photon avalanche detector (SPAD) array cameras, e.g., with pixel resolutions of 5x5 to 1000x1000; and photomultiplier detectors, among others that meet sufficient/proscribed criteria herein.

[226] Implementations herein may also utilize different optical routing components than those set forth, above. Here, for example, such optical routing components may include one or more of: silica optical fibre routing using single mode, multi-mode, few mode, fibre bundles or crystal fibres; polymer optical fibre routing; polymer waveguide routing; planar optical waveguide routing; slab waveguide / planar routing; free space routing using lenses, micro optics or diffractive elements; and wavelength selective or partial mirrors for light manipulation (e.g. diffractive or holographic elements, etc.), among others that meet sufficient/proscribed criteria herein.

[227] Implementations herein may also utilize other different optical and/or computing elements than those set forth, above. Here, for example, such other optical/computing elements may include one or more of: interferometric, coherent, holographic optical detection elements and/or schemes; interferometric, coherent, and/or holographic lock-in detection schemes, e.g., where a separate reference and source light signal are separated and later combined; lock in detection elements and/or schemes; lock in detection applied to a frequency domain (FD) NIRS; detection of speckle for diffuse correlation spectroscopy to track tissue change, blood flow, etc. using single detectors or preferably 2-D detector arrays; interferometric, coherent, holographic system(s), elements and/or schemes where a wavelength swept laser is used to generate a changing interference patter which can be analyzed; interferometric, coherent, holographic system where interference is detected on, e.g., a 2-D detector, camera array, etc.; interferometric, coherent, holographic system where interference is detected on a single detector; controllable routing optical medium such as a liquid crystal; and fast (electronics) decorrelator to implement diffuse decorrelation spectroscopy, among others that meet sufficient/proscribed criteria herein.

[228] Implementations herein may also utilize other different optical schemes than those set forth, above. Here, for example, such other optical schemes may include one or more of: interferometric, coherent, and/or holographic schemes; diffuse decorrelation spectroscopy via speckle detection; FD-NIRS; and/or diffuse decorrelation spectroscopy combined with TD-NIRS or other variants, among others that meet sufficient/proscribed criteria herein.

[229] Implementations herein may also utilize other multichannel features and/or capabilities than those set forth, above. Here, for example, such other multichannel features and/or capabilities may include one or more of: the sharing of a single light source across multiple channels; the sharing of a single detector (or detector array) across multiple channels; the use of a 2-D detector array to simultaneously receive the signal from multiple channels; multiplexing of light sources via direct switching or by using “fast” attenuators or switches; multiplexing of detector channels on to a single detector (or detector array) via by using “fast” attenuators or switches in the routing circuit; distinguishing different channels / multiplexing by using different wavelengths of optical source; and distinguishing different channels / multiplexing by modulating the optical sources differently, among others that meet sufficient/proscribed criteria herein.

[230] Among other aspects, for example, implementations herein may utilize different optical sources than those set forth, above. Here, for example, such optical sources may include one or more of: semiconductor LEDs, superluminescent diodes or laser light sources with emission wavelengths principally, but not exclusively within ranges consistent with the near infrared wavelength and/or low water absorption loss window (e.g., 700-950nm, etc.); nonsemiconductor emitters; sources chosen to match other wavelength regions where losses and scattering are not prohibitive; here, e.g., in some embodiments, around 1060nm and 1600nm, inter alia; narrow linewidth (coherent) laser sources for interferometric measurements with coherence lengths long compared to the scattering path through the measurement material (here, e.g., (DFB) distributed feedback lasers, (DBR) distributed Bragg reflector lasers, vertical cavity surface emitting lasers (VCSEL) and/or narrow linewidth external cavity lasers; coherent wavelength swept sources (e.g., where the center wavelength of the laser may be swept rapidly at 10-200 KHz or faster without losing its coherence, etc.); multi -wavelength sources where a single element of co packaged device emits a range of wavelengths; modulated sources (e.g., such as via direct modulation of the semiconductor current or another means, etc.); and pulsed laser sources (e g., pulsed laser sources with pulses between picoseconds and microseconds, etc.), among others that meet sufficient/proscribed criteria herein.

[231] Implementations herein may also utilize different optical detectors than those set forth, above. Here, for example, such optical detectors may include one or more of: semiconductor pin diodes; semiconductor avalanche detectors; semiconductor diodes arranged in a high gain configuration, such as transimpedance configuration(s), etc.; single-photon avalanche detectors (SPAD); 2-D detector camera arrays, such as those based on CMOS {complementary metal oxide semiconductor} or CCD {charge-coupled device} technologies, e.g., with pixel resolutions of 5x5 to 1000x1000; 2-D single photon avalanche detector (SPAD) array cameras, e.g., with pixel resolutions of 5x5 to 1000x1000; and photomultiplier detectors, among others that meet sufficient/proscribed criteria herein.

[232] Implementations herein may also utilize different optical routing components than those set forth, above. Here, for example, such optical routing components may include one or more of: silica optical fibre routing using single mode, multi-mode, few mode, fibre bundles or crystal fibres; polymer optical fibre routing; polymer waveguide routing; planar optical waveguide routing; slab waveguide / planar routing; free space routing using lenses, micro optics or diffractive elements; and wavelength selective or partial mirrors for light manipulation (e.g. diffractive or holographic elements, etc.), among others that meet sufficient/proscribed criteria herein. [233] As disclosed herein, implementations and features of the present inventions may be implemented through computer-hardware, software and/or firmware. For example, the systems and methods disclosed herein may be embodied in various forms including, for example, one or more data processors, such as computer(s), server(s), and the like, and may also include or access at least one database, digital electronic circuitry, firmware, software, or in combinations of them. Further, while some of the disclosed implementations describe specific (e.g., hardware, etc.) components, systems, and methods consistent with the innovations herein may be implemented with any combination of hardware, software and/or firmware. Moreover, the above-noted features and other aspects and principles of the innovations herein may be implemented in various environments. Such environments and related applications may be specially constructed for performing the various processes and operations according to the inventions or they may include a general-purpose computer or computing platform selectively activated or reconfigured by code to provide the necessary functionality. The processes disclosed herein are not inherently related to any particular computer, network, architecture, environment, or other apparatus, and may be implemented by a suitable combination of hardware, software, and/or firmware. For example, various general-purpose machines may be used with programs written in accordance with teachings of the inventions, or it may be more convenient to construct a specialized apparatus or system to perform the required methods and techniques.

[234] In the present description, the terms component, module, device, etc. may refer to any type of logical or functional device, process or blocks that may be implemented in a variety of ways. For example, the functions of various blocks can be combined with one another and/or distributed into any other number of modules. Each module can be implemented as a software program stored on a tangible memory (e.g., random access memory, read only memory, CD- ROM memory, hard disk drive) within or associated with the computing elements, sensors, receivers, etc. disclosed above, e.g., to be read by a processing unit to implement the functions of the innovations herein. Also, the modules can be implemented as hardware logic circuitry implementing the functions encompassed by the innovations herein. Finally, the modules can be implemented using special purpose instructions (SIMD instructions), field programmable logic arrays or any mix thereof which provides the desired level performance and cost.

[235] Aspects of the systems and methods described herein may be implemented as functionality programmed into any of a variety of circuitry, including programmable logic devices (PLDs), such as field programmable gate arrays (FPGAs), programmable array logic (PAL) devices, electrically programmable logic and memory devices and standard cell-based devices, as well as application specific integrated circuits. Some other possibilities for implementing aspects include: memory devices, microcontrollers with memory (such as EEPROM), embedded microprocessors, firmware, software, etc. Furthermore, aspects may be embodied in microprocessors having software-based circuit emulation, discrete logic (sequential and combinatorial), custom devices, fuzzy logic, neural networks, other Al (Artificial Intelligence) or machine learning systems, quantum devices, and hybrids of any of the above device types.

[236] It should also be noted that various logic and/or features disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in tangible various forms (e.g., optical, magnetic or semiconductor storage media), though do not encompass transitory media.

[237] Other implementations of the inventions will be apparent to those skilled in the art from consideration of the specification and practice of the innovations disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the inventions being indicated by the present disclosure and various associated principles of related patent doctrine.

[238] As one overview of aspects of the disclosed technology, systems, methods and wearable devices associated with mind/brain-computer interfaces are disclosed. Embodiments herein include features related to one or more of optical-based brain signal acquisition, decoding modalities, encoding modalities, brain-computer interfacing, AR/VR content interaction, brain state assessment, signal to noise ration enhancement, and/or motion artefact reduction, among other features set forth herein. Certain implementations may include or involve processes of collecting and processing brain activity data, such as those associated with the use of a braincomputer interface that enables, for example, decoding and/or encoding a user’s brain functioning, neural activities, and/or activity patterns associated with thoughts, including sensory-based thoughts. Further, the present systems and methods may be configured to leverage brain-computer interface and/or non-invasive wearable device aspects to provide enhanced user interactions for next-generation wearable devices, controllers, and/or other computing components based on the human thoughts, brain signals, and/or mind activity that are detected and processed.

[239] It should also be noted that various logic and/or features disclosed herein may be enabled using any number of combinations of hardware, firmware, and/or as data and/or instructions embodied in various machine-readable or computer-readable media, in terms of their behavioral, register transfer, logic component, and/or other characteristics. Computer-readable media in which such formatted data and/or instructions may be embodied include, but are not limited to, non-volatile storage media in tangible various forms (e.g., optical, magnetic or semiconductor storage media), though do not encompass transitory media.

[240] Other implementations of the disclosed technology /present inventions will be apparent to those skilled in the art from consideration of the specification and practice of the innovations disclosed herein. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the inventions being indicated by the present disclosure and various associated principles of related patent doctrine.

Claims

Claims:

1. A computer-implemented method of processing data associated with a brain computer interface (BCI) system and/or artificial reality system, the method comprising: acquiring and/or processing, via the BCI system, a user's brain activity using at least one neuro-assessment technique, wherein the user’s brain activity is recorded as first data; tracking and/or monitoring, the user movement and/or interaction with a user interface and/or artificial reality environment, to acquire second data regarding the movement and/or interaction; performing processing, via at least one processor, of the first data and the second data to determine an intended action indicated by the user; determining, as a function of the intended action indicated by the user, different and/or updated content to feedback to the user; and displaying to and/or immersing the user with the different and/or updated content in the user interface and/or artificial reality environment.

2. A computer-implemented method of processing data associated with a brain computer interface (BCI) system and/or an artificial reality system, the method comprising: acquiring and/or processing, via the BCI system, a user's brain activity using at least one neuro-assessment technique, wherein the user’s brain activity is recorded as first data, which may be temporally formatted, may include time-stamped samples of brain activity, etc.; tracking the user's movements and/or interactions with a user interface and/or artificial reality environment, to acquire or obtain second data therefrom, wherein the second data may be temporally formatted and may include time-stamped samples of the user's motion, position, orientation, and/or actions relative to the user interface and/or artificial reality environment; performing processing, via at least one processor, of the first data, the second data, at least one first data stream based on the first data, and/or at least one second data stream based on the second data to determine an intended action indicated by the user; determining, as a function of the intended action indicated by the user, different and/or updated content to feedback to the user; and displaying to and/or immersing the user with the different and/or updated content in the user interface and/or artificial reality environment.

3. The method of either one of claim 1 or claim 2, or the invention of any claim herein, wherein the processing step comprises: performing pre-processing on at least one of the first data and the second data; processing the pre-processed data by one or more of a pre-layer normalization module and/or a multi-head attention module; performing an addition operation on the pre-processed data to create processed data; performing one or more of an additional addition operation and/or an additional pre-layer normalization operation followed by feedback network processing on the processed data to create output; and determining the intended action of the user based at least in part on the output.

4. The method of either one of claim 1 or claim 2, or the invention of any claim herein, wherein the processing step comprises: providing the first data as input to an attention mechanism that comprises one or more of a temporal attention mechanism, a channel attention mechanism, and/or a network N1 within a classification network module; providing data output by the temporal attention mechanism to a network N2 within the classification network module; updating one or more weights of network N1 based at least in part on output produced by the channel attention mechanism; combining output of N1 and output of N2 into a combined feature map; providing the combined feature map to a network N3 within the classification network module; performing, by network N3, classification to determine the intended action of the user.

5. The method of any one of claims 1-4, or the invention of any claim herein, wherein the first data includes imagined speech by the user.

6. The method of any one of claims 1-5, or the invention of any claim herein, wherein the imagined speech includes one or more command words.

7. The method of any one of claims 1-6, or the invention of any claim herein, wherein the second data includes eye-tracking data of the user.

8. The method of any one of claims 1-7 or the invention of any claim herein, further comprising: synchronizing information from the first data, or a first data stream associated therewith, and the second data, or a second stream associated therewith, to analyze event-related brain data occurring in temporal conjunction with occurrences and/or activity of an artificial reality experience in the artificial reality environment.

9. The method of any one of claims 1-8 or the invention of any claim herein, wherein the user’s brain activity is recorded, via the BCI system, in real-time, and/or wherein the at least one neuroassessment technique includes one or more of electroencephalography (EEG), functional magnetic resonance imaging (fMRI), near-infrared spectroscopy (NIRS), functional near-infrared spectroscopy (fNTRS), FOS imaging, and/or other such known techniques.

10. The method of any one of claims 1-9 or the invention of any claim herein, wherein the method records EEG activity via a set of any or all of twenty eight channels across occipital and/or temporal-parietal regions comprising: FC6, FT8; C6, T8; CP6, TP8; P6, P8, PIO; PO4, PO8; 02; Oz; CMS, POz, DRL; 01; PO7, PO3; P5, P7, P9; CP5, TP7; C5, T7; and/or FC5, FT7.

11. The method of any one of claims 10 or the invention of any claim herein, wherein the processing step comprises a pre-processing step comprising any of: performing bandpass filtering to remove low frequency (e.g., 0.01Hz, etc.) and high frequency (e.g., 250Hz, etc.) noise; performing a finite impulse response notch filter in a predefined range (e.g., 48-52Hz, etc.) to remove a specified line noise harmonic (e g., 50Hz, etc.); removing eye blink artifacts via independent component analysis; and/or recording data epoched into trials.

11 . The method of any one of claims 1-10 or the invention of any claim herein, wherein the user’s brain activity is recorded via any one, two, or three of: left and/or right superior temporal gyrus; left posterior superior temporal gyrus; and/or posterior frontal lobe.

12. The method of any one of claims 1-11 or the invention of any claim herein, wherein the processing step comprises feature extraction comprising one or more of: a discrete wavelet transform (DVT); recorded signals downsampled (e.g., to 512Hz, etc.) and decomposed (e.g., using db4 mother wavelet, etc.) to eight levels corresponding to different frequency bands; and/or principal component analysis applied to reduce dimensionality and identify the components with maximum variance.

13. The method of any one of claims 1-12 or the invention of any claim herein, wherein the processing step comprises classification carried out via one or more of: a support vector machine; a multi-layer convolutional neural network (CNN); and/or an attention-based deep network.

13. The method of any one of claims 1-12 or the invention of any claim herein, wherein the classification comprises providing at least one of the first data and the second data as input data to a network comprising of temporal convolution and spatial depth-wise convolution blocks followed by a depth-wise separable convolutional block.

14. The method of any one of claims 1-13 or the invention of any claim herein, wherein the classification further comprises providing output of a multi-head attention module to the input data through a residual connection.

15. The method of any one of claims 1-14 or the invention of any claim herein, wherein the processing step comprises using a signal preprocessing method to identify imagined-speech- related time slices from the first data by: generating, by a CNN, a probability distribution over possible word class candidates; calculating a temporal salience map based on a preliminary classification based at least in part on the probability distribution; designating, by a temporal attention mechanism, a plurality of time slices as task-related time slices based at least in part on salience scores within the temporal salience map; and providing the task-related time slices as input to an additional CNN.

16. The method of any one of claims 1-15 or the invention of any claim herein, wherein the processing step comprises a regularization step that comprises at least one of calculating a salience of each electrode channel via calculating a wavelet packet sub-band energy ratio (WPSER) to quantify the ratio of the energy within a frequency range most relevant to imagined speech; measuring a salience level of task-related information for each electrode channel via the WPSER; adding a regulariser to a loss function to regularise a learning of each weight coefficient in a spatial convolutional layer of a CNN that is guided by a channel salience map; and minimizing the regulariser such that more imagined speech information correlates to a larger weight coefficient for a given electrode channel.

17. One or more computer readable media that contain and/or are configured to execute computer-readable instructions, the computer-readable instructions comprising instructions that, when executed by at least one processor, cause the at least one processors to: perform one or more portions, aspects and/or steps of any of claims 1-16, of any other claim herein, and/or of other features or functionality set forth elsewhere in this disclosure.

18. A system comprising: one or more computers, processors, devices and/or computer readable media, one or more of which contain and/or are configured to execute computer-readable instructions, the computer- readable instructions comprising instructions that, when executed by at least one processor, cause the at least one processors to: perform one or more portions, aspects and/or steps of any of claims 1-16 and/or of other features or functionality set forth elsewhere in this disclosure.

19. A computer-implemented method of processing data associated with a brain computer interface (BCI) system and/or artificial reality system, the method comprising: acquiring and/or processing, via the BCI system, a user's brain activity using at least one neuro-assessment technique, wherein the user’s brain activity is recorded as first data; tracking and/or monitoring, the user movement and/or interaction with a user interface and/or artificial reality environment, to acquire second data regarding the movement and/or interaction; performing processing, via at least one processor, of the first data and the second data to determine an intended action indicated by the user; determining, as a function of the intended action indicated by the user, different and/or updated content to feedback to the user; and displaying to and/or immersing the user with the different and/or updated content in the user interface and/or artificial reality environment.

20. A computer-implemented method of processing data associated with a brain computer interface (BCI) system and/or an artificial reality system, the method comprising: acquiring and/or processing, via the BCI system, a user's brain activity using at least one neuro-assessment technique, wherein the user’s brain activity is recorded as first data, which may be temporally formatted, may include time-stamped samples of brain activity, etc.; tracking the user's movements and/or interactions with a user interface and/or artificial reality environment, to acquire or obtain second data therefrom, wherein the second data may be temporally formatted and may include time-stamped samples of the user's motion, position, orientation, and/or actions relative to the user interface and/or artificial reality environment; performing processing, via at least one processor, of the first data, the second data, at least one first data stream based on the first data, and/or at least one second data stream based on the second data to determine an intended action indicated by the user; determining, as a function of the intended action indicated by the user, different and/or updated content to feedback to the user; and displaying to and/or immersing the user with the different and/or updated content in the user interface and/or artificial reality environment.

21. The method of either one of claim 19 or claim 20, or the invention of any claim herein, wherein the processing step comprises: performing pre-processing on at least one of the first data and the second data; processing the pre-processed data by one or more of a pre-layer normalization module and/or a multi-head attention module; performing an addition operation on the pre-processed data to create processed data; performing one or more of an additional addition operation and/or an additional pre-layer normalization operation followed by feedback network processing on the processed data to create output; and determining the intended action of the user based at least in part on the output.

22. The method of either one of claim 19 or claim 20, or the invention of any claim herein, wherein the processing step comprises: providing the first data as input to an attention mechanism that comprises one or more of a temporal attention mechanism, a channel attention mechanism, and/or a network N1 within a classification network module; providing data output by the temporal attention mechanism to a network N2 within the classification network module; updating one or more weights of network N1 based at least in part on output produced by the channel attention mechanism; combining output of N1 and output of N2 into a combined feature map; providing the combined feature map to a network N3 within the classification network module; performing, by network N3, classification to determine the intended action of the user.

23. The method of any one of claims 19-22, or the invention of any claim herein, wherein the first data includes imagined speech by the user.

24. The method of any one of claims 19-23, or the invention of any claim herein, wherein the imagined speech includes one or more command words.

25. The method of any one of claims 19-24, or the invention of any claim herein, wherein the second data includes eye-tracking data of the user.

26. The method of any one of claims 19-25 or the invention of any claim herein, further comprising: synchronizing information from the first data, or a first data stream associated therewith, and the second data, or a second stream associated therewith, to analyze event-related brain data occurring in temporal conjunction with occurrences and/or activity of an artificial reality experience in the artificial reality environment.

27. The method of any one of claims 19-26 or the invention of any claim herein, wherein the user’s brain activity is recorded, via the BCI system, in real-time, and/or wherein the at least one neuro-assessment technique includes one or more of electroencephalography (EEG), functional magnetic resonance imaging (fMRI), near-infrared spectroscopy (NIRS), functional near-infrared spectroscopy (fNTRS), FOS imaging, and/or other such known techniques.

28. The method of any one of claims 19-27 or the invention of any claim herein, wherein the method records EEG activity via a set of any or all of twenty eight channels across occipital and/or temporal-parietal regions comprising: FC6, FT8; C6, T8; CP6, TP8; P6, P8, PIO; PO4, PO8; 02; Oz; CMS, POz, DRL; 01; P07, P03; P5, P7, P9; CP5, TP7; C5, T7; and/or FC5, FT7.

29. The method of any one of claims 19-28 or the invention of any claim herein, wherein the processing step comprises a pre-processing step comprising any of: performing bandpass filtering to remove low frequency (e.g., 0.01Hz, etc.) and high frequency (e.g., 250Hz, etc.) noise; performing a finite impulse response notch filter in a predefined range (e.g., 48-52Hz, etc.) to remove a specified line noise harmonic (e.g., 50Hz, etc.); removing eye blink artifacts via independent component analysis; and/or recording data epoched into trials.

30. The method of any one of claims 19-29 or the invention of any claim herein, wherein the user’s brain activity is recorded via any one, two, or three of: left and/or right superior temporal gyrus; left posterior superior temporal gyrus; and/or posterior frontal lobe.

31. The method of any one of claims 19-30 or the invention of any claim herein, wherein the processing step comprises feature extraction comprising one or more of a discrete wavelet transform (DVT); recorded signals downsampled (e.g., to 512Hz, etc.) and decomposed (e.g., using db4 mother wavelet, etc.) to eight levels corresponding to different frequency bands; and/or principal component analysis applied to reduce dimensionality and identify the components with maximum variance.

32. The method of any one of claims 19-31 or the invention of any claim herein, wherein the processing step comprises classification carried out via one or more of a support vector machine; a multi-layer convolutional neural network (CNN); and/or an attention-based deep network.

33. The method of any one of claims 19-32 or the invention of any claim herein, wherein the classification comprises providing at least one of the first data and the second data as input data to a network comprising of temporal convolution and spatial depth-wise convolution blocks followed by a depth-wise separable convolutional block.

34. The method of any one of claims 19-33 or the invention of any claim herein, wherein the classification further comprises providing output of a multi-head attention module to the input data through a residual connection.

35. The method of any one of claims 19-34 or the invention of any claim herein, wherein the processing step comprises using a signal preprocessing method to identify imagined-speech- related time slices from the first data by: generating, by a CNN, a probability distribution over possible word class candidates; calculating a temporal salience map based on a preliminary classification based at least in part on the probability distribution; designating, by a temporal attention mechanism, a plurality of time slices as task-related time slices based at least in part on salience scores within the temporal salience map; and providing the task-related time slices as input to an additional CNN.

36. The method of any one of claims 19-35 or the invention of any claim herein, wherein the processing step comprises a regularization step that comprises at least one of: calculating a salience of each electrode channel via calculating a wavelet packet sub-band energy ratio (WPSER) to quantify the ratio of the energy within a frequency range most relevant to imagined speech; measuring a salience level of task-related information for each electrode channel via the WPSER; adding a regulariser to a loss function to regularise a learning of each weight coefficient in a spatial convolutional layer of a CNN that is guided by a channel salience map; and minimizing the regulariser such that more imagined speech information correlates to a larger weight coefficient for a given electrode channel.

37. A computer-implemented method of processing data associated with a brain computer interface (BCI) system and/or artificial reality system, the method comprising: acquiring and/or processing, via the BCI system, a user's brain activity using at least one neuro-assessment technique, wherein the user’s brain activity is recorded as first data; tracking and/or monitoring, the user movement and/or interaction with a user interface and/or artificial reality environment, to acquire second data regarding the movement and/or interaction; performing processing, via at least one processor, of the first data and the second data to determine a result or information associated with the user; performing an action and/or providing an output as a function of the result or information.

38. A computer-implemented method of processing data associated with a brain computer interface (BCI) system and/or an artificial reality system, the method comprising: acquiring and/or processing, via the BCI system, a user's brain activity using at least one neuro-assessment technique, wherein the user’s brain activity is recorded as first data, which may be temporally formatted, may include time-stamped samples of brain activity, etc.; tracking the user's movements and/or interactions with a user interface and/or artificial reality environment, to acquire or obtain second data therefrom, wherein the second data may be temporally formatted and may include time-stamped samples of the user's motion, position, orientation, and/or actions relative to the user interface and/or artificial reality environment; performing processing, via at least one processor, of the first data, the second data, at least one first data stream based on the first data, and/or at least one second data stream based on the second data to determine information associated with the user; determining, as a function of the information, an output, a result and/or an action to be generated and/or performed.

39. One or more computer readable media that contain and/or are configured to execute computer-readable instructions, the computer-readable instructions comprising instructions that, when executed by at least one processor, cause the at least one processors to: perform one or more portions, aspects and/or steps of any of claims 19-38, of any other claim herein, and/or of other features or functionality set forth elsewhere in this disclosure.

40. A system comprising: one or more computers, processors, devices and/or computer readable media, one or more of which contain and/or are configured to execute computer-readable instructions, the computer- readable instructions comprising instructions that, when executed by at least one processor, cause the at least one processors to: perform one or more portions, aspects and/or steps of any of claims 19-38 and/or of other features or functionality set forth elsewhere in this disclosure.