JP6728507B2

JP6728507B2 - Low power integrated circuit for analyzing digitized audio streams

Info

Publication number: JP6728507B2
Application number: JP2020005933A
Authority: JP
Inventors: エリック・リウ; ステファン・ジェイ・マーティ; スン・ウォク・キム
Original assignee: Qualcomm Inc
Current assignee: Qualcomm Inc
Priority date: 2020-01-17
Filing date: 2020-01-17
Publication date: 2020-07-22
Anticipated expiration: 2031-12-07
Also published as: JP2020098342A

Description

コンピューティングデバイスは、音声命令（audio instructions）を処理し、応答を提供することによって、ユーザに対する高度化を増している。ユーザは、これらのコンピューティングデバイスを制御するために使用され得る音声命令を読み上げ得る。たとえば、ユーザは、特定の場所への道程を提供するようにとの命令といった情報を提供するために、コンピューティングデバイスに話し得る。 Computing devices are becoming more sophisticated for users by processing audio instructions and providing responses. A user may read aloud voice instructions that may be used to control these computing devices. For example, a user may speak to a computing device to provide information, such as instructions to provide a journey to a particular location.

添付図面において、同一の参照番号は、同一のコンポーネントまたはブロックを指す。以下の詳細な説明は、図面を参照する。 In the accompanying drawings, the same reference numerals refer to the same components or blocks. The following detailed description refers to the drawings.

図１は、音声ストリームを分析するための低電力集積回路と、集積回路によるキーワードの検出に応答してデジタル化された音声ストリームを分析するためのプロセッサと、を含む例示的なコンピューティングデバイスのブロック図である。FIG. 1 illustrates an exemplary computing device including a low power integrated circuit for analyzing an audio stream and a processor for analyzing a digitized audio stream in response to detection of a keyword by the integrated circuit. It is a block diagram. 図２は、音声ストリームを分析し、キーワードが音声ストリーム中に検出された場合に電力を増大させるようプロセッサに信号を送信するための、例示的な低電力集積回路のブロック図である。FIG. 2 is a block diagram of an exemplary low power integrated circuit for analyzing a voice stream and signaling a processor to increase power if a keyword is detected in the voice stream. 図３は、デジタル化された音声ストリームを分析するための例示的なコンピューティングデバイスと、デジタル化された音声ストリームから発生させたテキストストリームを分析するためにコンピューティングデバイスと通信するサーバと、のブロック図である。FIG. 3 illustrates an exemplary computing device for analyzing a digitized audio stream and a server in communication with the computing device for analyzing a text stream generated from the digitized audio stream. It is a block diagram. 図４は、音声ストリームを受信し、応答を決定するために、コンピューティングデバイスで実行される例示的な方法のフローチャートである。FIG. 4 is a flowchart of an exemplary method performed on a computing device for receiving an audio stream and determining a response. 図５は、デジタル化された音声ストリームを圧縮し、応答を差し出すために、コンピューティングデバイスで実行される例示的な方法のフローチャートである。FIG. 5 is a flow chart of an exemplary method performed on a computing device to compress a digitized audio stream and present a response.

Detailed description

音声情報処理において、ユーザは典型的に、ボタンを押すことおよび／または命令を読み上げることにより、音声を処理するためのアプリケーションをアクティブにする。音声処理アプリケーションを起動すると、ユーザは加えて、彼らがコンピューティングデバイスに実行を所望するであろう明示的な命令を読み上げる必要がある。したがって、ユーザからの話声命令を処理することは、時間を要し、反復的であり得る。加えて、ユーザからの命令を絶えず監視することは、多くの電力を消費し、バッテリーを消耗する。 In voice information processing, a user typically activates an application for processing voice by pressing buttons and/or reading instructions. Launching a voice processing application requires the user to additionally read out explicit instructions that they may want the computing device to perform. Therefore, processing speech commands from the user can be time consuming and repetitive. In addition, constantly monitoring commands from the user consumes a lot of power and drains the battery.

これらの問題に対処するために、本明細書に開示される例示的な実施形態は、低電力集積回路を使用して音声ストリーム（audio stream）（たとえば、ユーザの話声）中のキーワードの出現を絶えず監視しながら、ユーザの話声のより徹底した分析についてはプロセッサに依拠する。たとえば、本明細書に開示されるさまざまな例は、低電力集積回路において音声ストリームを受信することと、音声ストリームをデジタル化することと、キーワードを認識するためにデジタル化された音声ストリームを分析することと、を提供する。デジタル化された音声ストリーム内にキーワードが認識されると、集積回路は、電力を増大させるようプロセッサに信号を送る。プロセッサへの電力が増大すると、デジタル化された音声ストリームが検索されて、応答が決定される。これは、ユーザが特定の音声処理アプリケーションを起動するために消費する時間の長さを減じ、ユーザの話声の反復を防止する。検索された音声ストリームから応答を決定することは、ユーザが、コンピューティングデバイスに話声分析を実行させるための追加の明示的な命令を提供することを防止する。 To address these issues, the exemplary embodiments disclosed herein use a low power integrated circuit to generate the appearance of keywords in an audio stream (eg, a user's speech). , While relying on the processor for a more thorough analysis of the user's voice. For example, various examples disclosed herein receive a voice stream in a low power integrated circuit, digitize the voice stream, and analyze the digitized voice stream to recognize keywords. Provide what to do. When a keyword is recognized in the digitized audio stream, the integrated circuit signals the processor to increase power. As the power to the processor increases, the digitized audio stream is searched to determine the response. This reduces the amount of time the user spends invoking a particular voice processing application and prevents repetitive user speech. Determining a response from the retrieved audio stream prevents the user from providing additional explicit instructions to cause the computing device to perform speech analysis.

加えて、本明細書に開示されるさまざまな例では、プロセッサへの電力が増大すると、プロセッサは、メモリからデジタル化された音声ストリームを検索し、デジタル化された音声ストリームをテキストストリームに変換する。テキストストリームへの変換後、プロセッサは、テキストストリーム内のテキストに基づいて応答を決定する。テキストストリームから応答を決定することは、コンピューティングデバイスのユーザがコンピューティングデバイスに命令するための時間を減じる。加えてさらに、プロセッサは、音声ストリームのコンテキストに基づいて、適切な応答を決定し得る。さらに、コンピューティングデバイスは、ユーザへの応答を履行するためにどのアプリケーションが実行される必要があるかを決定する。さらにまた、デジタル化された音声ストリーム内にキーワードが認識されるとプロセッサへの電力が増大することにより、コンピューティングデバイスは、より少ない電力を消費しながら、ユーザの話声を聞く。 Additionally, in various examples disclosed herein, as the power to the processor increases, the processor retrieves the digitized audio stream from memory and converts the digitized audio stream to a text stream. .. After conversion to a text stream, the processor determines a response based on the text in the text stream. Determining the response from the text stream reduces the time for the computing device user to command the computing device. Additionally, the processor may also determine an appropriate response based on the context of the audio stream. In addition, the computing device determines which application needs to be executed to fulfill the response to the user. Furthermore, due to the increased power to the processor when keywords are recognized in the digitized audio stream, the computing device listens to the user while consuming less power.

一実施形態において、コンピューティングデバイスはまた、サーバからまたはプロセッサから応答を受信することによって応答を決定し得る。さらなる実施形態において、メモリは、所定の時間期間にわたる記憶されたデジタル化された音声ストリームを維持する。この実施形態では、プロセッサは、時間増分（time increments）でデジタル化された音声ストリームを検索し得る。たとえば、プロセッサは、完全なデジタル化された音声ストリームを検索し得るか、または、より短い時間間隔のデジタル化された音声ストリームを検索し得る。デジタル化された音声ストリームの検索は、音声ストリームのコンテキストを分析して適切な応答を決定することをプロセッサに可能にさせる。 In one embodiment, the computing device may also determine the response by receiving the response from the server or from the processor. In a further embodiment, the memory maintains the stored digitized audio stream for a predetermined time period. In this embodiment, the processor may search the digitized audio stream for time increments. For example, the processor may retrieve a complete digitized audio stream or a shorter time interval digitized audio stream. Retrieval of the digitized audio stream allows the processor to analyze the context of the audio stream and determine the appropriate response.

このように、本明細書に開示される例示的な実施形態は、コンピューティングデバイスが音声ストリームのコンテキストに基づいて適切な応答を決定するがゆえにコンピューティングデバイスへの反復する音声命令が防止されることにより、ユーザの時間を節約する。さらに、コンピューティングデバイスは、より少ない電力を消費しながら、音声ストリームを受信および処理する。 Thus, the exemplary embodiments disclosed herein prevent repetitive voice commands to the computing device because the computing device determines an appropriate response based on the context of the voice stream. This saves the user time. Further, the computing device receives and processes the audio stream while consuming less power.

ここで図面を参照すると、図１は、音声ストリーム１０２を受信するための低電力集積回路１０４と、音声ストリームをデジタル化してメモリ１１２にデジタル化された音声ストリーム１１４を提供するためのデジタル化モジュール１０６と、を含む例示的なコンピューティングデバイス１００のブロック図である。さらに、低電力集積回路１０４は、デジタル化された音声ストリーム１１４をキーワードと比較し、キーワードの認識に基づいて、電力１２２を増大させるようプロセッサ１１８に信号１１６を送信するための、キーワード比較モジュール１０８を含む。さらにまた、プロセッサは、デジタル化された音声ストリーム１１４を分析するための分析モジュール１２０を含む。コンピューティングデバイス１００の実施形態は、コンポーネント１０４、１１２、および１１８を含むのに適した、クライアントデバイス、パーソナルコンピュータ、デスクトップコンピュータ、ラップトップ、モバイルデバイス、または他のコンピューティングデバイスを含む。 Referring now to the drawings, FIG. 1 illustrates a low power integrated circuit 104 for receiving an audio stream 102 and a digitizing module for digitizing the audio stream and providing a digitized audio stream 114 in a memory 112. 1 is a block diagram of an exemplary computing device 100 including. Further, the low power integrated circuit 104 compares the digitized audio stream 114 with the keywords and, based on the recognition of the keywords, sends a signal 116 to the processor 118 to increase the power 122, the keyword comparison module 108. including. Furthermore, the processor includes an analysis module 120 for analyzing the digitized audio stream 114. Embodiments of computing device 100 include a client device, personal computer, desktop computer, laptop, mobile device, or other computing device suitable for including components 104, 112, and 118.

音声ストリーム１０２が、コンピューティングデバイス１００、特に、低電力集積回路１０４によって受信される。音声ストリーム１０２は、デジタル化された音声ストリーム１１４を提供するためにデジタル化１０６される入力アナログ信号である。音声ストリーム１０２の実施形態は、ユーザからの話声または別のコンピューティングデバイスからの音声を含む。たとえば、音声ストリーム１０２を受信するいくつかのコンピューティングデバイス１００が存在し得、それらは混乱をきたし得る。したがって、コンピューティングデバイスは、音声ストリーム１０２を受信するための中央ポイントとして１つのデバイスを指定し得る。この実施形態において、低電力集積回路１０４は、１つ以上のコンピューティングデバイスの中央ユニットであり得るアドホックネットワークの一部として動作する。 The audio stream 102 is received by the computing device 100, in particular the low power integrated circuit 104. Audio stream 102 is an input analog signal that is digitized 106 to provide digitized audio stream 114. Embodiments of audio stream 102 include speech from a user or audio from another computing device. For example, there may be several computing devices 100 that receive the audio stream 102, which may be confusing. Thus, a computing device may designate one device as a central point for receiving audio stream 102. In this embodiment, the low power integrated circuit 104 operates as part of an ad hoc network, which may be a central unit of one or more computing devices.

たとえば、ユーザが、ニューヨークからカリフォルニア州のロサンゼルスまでの最短ルートを別の人と話し合い得る。この例において、音声ストリームは、ニューヨークからロサンゼルスまでの最短ルートの話し合いであろう。さらなる実施形態では、音声ストリーム１０２は、所定の時間期間にわたる音声を含み得る。たとえば、音声ストリーム１０２は、低電力集積回路１０４によって受信された場合に数秒または数分を含み得る。この例において、低電力集積回路１０４は、音声ストリーム１０２を他の音声ストリーム１０２から区別し得る。 For example, a user may discuss the shortest route from New York to Los Angeles, California with another person. In this example, the audio stream would be the shortest route discussion from New York to Los Angeles. In a further embodiment, audio stream 102 may include audio over a predetermined time period. For example, audio stream 102 may include seconds or minutes when received by low power integrated circuit 104. In this example, low power integrated circuit 104 may distinguish audio stream 102 from other audio streams 102.

低電力集積回路１０４は、音声ストリーム１０２をデジタル化するためのモジュール１０６と、デジタル化された音声ストリーム１１４をキーワードと比較するためのモジュール１０８と、を含む。低電力集積回路１０４は、他の電子コンポーネント間の相互接続を形成する材料の表面上にパターン化されたトレース素子を有する電子回路である。たとえば、低電力集積回路１０４は、プロセッサ１１８とメモリ１１２との間の接続を形成する。低電力集積回路１０４の実施形態は、音声ストリーム１０２を受信し、信号１１６を送信することができる、マイクロチップ、チップセット、電子回路、チップ、マイクロプロセッサ、半導体、マイクロコントローラ、または他の電子回路を含む。低電力集積回路１０４は、音声ストリーム１０２を絶えず監視し、デジタル化モジュール１０６を利用して音声ストリームをデジタル化し、デジタル化された音声ストリームをメモリ１１２に記憶することができる。したがって、低電力集積回路１０４のさらなる実施形態は、送信機、受信機、マイクロフォン、または、音声ストリーム１０２を受信するための他の適切なコンポーネントを含む。 The low power integrated circuit 104 includes a module 106 for digitizing the audio stream 102 and a module 108 for comparing the digitized audio stream 114 with keywords. Low power integrated circuit 104 is an electronic circuit having patterned trace elements on the surface of a material that forms the interconnection between other electronic components. For example, low power integrated circuit 104 forms a connection between processor 118 and memory 112. Embodiments of low-power integrated circuit 104 can receive audio stream 102 and send signals 116, such as microchips, chipsets, electronic circuits, chips, microprocessors, semiconductors, microcontrollers, or other electronic circuits. including. The low power integrated circuit 104 can constantly monitor the audio stream 102, utilize the digitizing module 106 to digitize the audio stream, and store the digitized audio stream in the memory 112. Thus, further embodiments of low power integrated circuit 104 include a transmitter, receiver, microphone, or other suitable component for receiving audio stream 102.

音声ストリームがモジュール１０６でデジタル化されて、デジタル化された音声ストリーム１１４が提供される。デジタル化モジュール１０６は、音声ストリームを離散時間信号表現に変換する。デジタル化モジュール１０６の実施形態は、低電力集積回路１０４と共に動作する、アナログデジタルコンバータ（ＡＤＣ）、デジタル変換デバイス、命令、ファームウェア、および／またはソフトウェアを含む。たとえば、デジタル化モジュール１０６は、入力アナログ電圧をアナログ信号の大きさに比例したデジタル数に変換するための電子デバイスを含み得る。 The audio stream is digitized at module 106 to provide digitized audio stream 114. The digitization module 106 transforms the audio stream into a discrete time signal representation. Embodiments of digitization module 106 include analog-to-digital converters (ADCs), digital conversion devices, instructions, firmware, and/or software that operate with low power integrated circuit 104. For example, digitization module 106 may include an electronic device for converting an input analog voltage into a digital number proportional to the magnitude of the analog signal.

音声ストリーム１０２がモジュール１０６でデジタル化されると、それは、モジュール１０８でキーワードと比較される。モジュール１０８で、音声ストリーム１０２が電力１２２を増大させ、デジタル化された音声ストリーム１１４を取得してモジュール１２０で分析するよう、プロセッサ１１８にシグナリング１１６するためのインジケーションとして動作する、キーワードに対して比較される。１０８の実施形態は、命令、処理、動作、論理、アルゴリズム、技法、論理関数、ファームウェア、および／またはソフトウェアを含む。キーワードが認識されると、低電力集積回路１０４は、プロセッサ１１８に電力１２２を増大させるよう信号１１６を送信する。 Once the audio stream 102 has been digitized at module 106, it is compared at module 108 to the keywords . In module 108, to the audio stream 102 increases power 122, to analyze the module 120 acquires the digitized audio stream 114, it operates as indication for signaling 116 to the processor 118, to keywords Are compared. Embodiments of 108 include instructions, processes, acts, logic, algorithms, techniques, logical functions, firmware, and/or software. When the keyword is recognized, the low power integrated circuit 104 sends a signal 116 to the processor 118 to increase the power 122.

キーワードの実施形態は、モジュール１０８で比較するための、デジタル信号、アナログ信号、パターン、データベース、コマンド、指示、命令、または他の表現を含む。たとえば、コンピューティングデバイスのユーザが、小エビとクルマエビの違いを友人と話し合い、その後、ウェブ検索を実行して答えを特定することを所望し得る。したがって、ユーザは、キーワード比較モジュール１０８によるキーワードの認識と分析モジュール１２０による先の話し合いのその後の分析とをトリガするための、所定のキーワードを表明し得る。 Keyword embodiments include digital signals, analog signals, patterns, databases, commands, instructions, instructions, or other representations for comparison at module 108. For example, a user of a computing device may wish to discuss the difference between shrimp and prawns with a friend and then perform a web search to identify the answer. As such, the user may assert certain keywords to trigger recognition of the keywords by the keyword comparison module 108 and subsequent analysis of the previous discussion by the analysis module 120.

キーワードは、たとえば、フレーズ、単一のキーワード、またはコンピューティングデバイスのユーザにとって私的な単一のキーワードを含み得る。先の例を踏まえると、キーワードは、「コンピュータ、何だと思いますか？」というフレーズであり得る。この例において、このフレーズは、このフレーズの前または後に音声を含み得るデジタル化された音声ストリーム１１４を取得するようプロセッサ１１８に信号１１６を送ることを、低電力集積回路１０４にさせる。したがって、プロセッサ１１８がデジタル化された音声ストリーム１１４を分析して適切な応答のために音声ストリーム１０２のコンテキストを決定するので、ユーザは、命令を繰り返す必要がない。また、さらなる例において、単一のキーワードは、「ジャジャーン（Shazam）」を含み得る。したがって、特定の例として、ユーザが「ジャジャーン」という単語を話すと、回路１０４は、キーワードを検出し、デジタル化された音声ストリーム１１４を取得してこのストリームをテキストストリームに変換するようプロセッサ１１８に命令するための信号１１６を送信し得る。テキストストリームがユーザの母親へのテキストメッセージを作成するようにとの命令であると仮定すると、適切な応答は、テキストメッセージを作成することであろう。したがって、上述したように、所定のキーワード（単数または複数）を使用して、低電力集積回路１０４は、コンピューティングデバイスのユーザが道程またはウェブ検索の実行といったさらなる応答を完了する必要がある場合を認識する。 Keywords can include, for example, phrases, single keywords, or single keywords that are private to the user of the computing device. Based on the previous example, the keyword could be the phrase "What do you think a computer is?" In this example, this phrase causes low power integrated circuit 104 to send signal 116 to processor 118 to obtain digitized audio stream 114, which may include audio before or after this phrase. Thus, the processor 118 analyzes the digitized audio stream 114 to determine the context of the audio stream 102 for a proper response so that the user need not repeat the instructions. Also, in a further example, the single keyword may include "Shazam." Thus, as a particular example, when a user speaks the word “jajang”, the circuit 104 detects to the keyword and tells the processor 118 to take the digitized audio stream 114 and convert this stream into a text stream. A signal 116 may be transmitted to instruct. Assuming the text stream is a command to compose a text message to the user's mother, a suitable response would be to compose a text message. Thus, as described above, using the predetermined keyword(s), the low power integrated circuit 104 may require the user of the computing device to complete further responses such as performing a journey or a web search. recognize.

モジュール１０８のさらなる実施形態において、キーワードがデジタル化された音声ストリーム１１４内に認識されない場合、低電力集積回路１０４は、モジュール１０６でデジタル化され、メモリ１１２に記憶された、別の音声ストリーム１０２を監視し続ける。さらなる別の実施形態において、低電力集積回路１０４は、デジタル化された音声ストリーム１１４を圧縮し、この圧縮されたデジタル化された音声ストリームは、それをモジュール１０８でキーワードと比較することによってキーワードを認識するために使用される。 In a further embodiment of module 108, if the keyword is not recognized in digitized audio stream 114, low power integrated circuit 104 may output another audio stream 102 digitized at module 106 and stored in memory 112. Continue to monitor. In yet another embodiment, the low power integrated circuit 104 compresses the digitized audio stream 114, which is compressed by comparing the keywords with the keywords at module 108. Used to recognize.

メモリ１１２は、デジタル化された音声ストリーム１１４を記憶および／または維持する。メモリ１１２の実施形態は、デジタル化された音声ストリーム１１４を記憶および／または維持することができる、メモリバッファ、キャッシュ、不揮発性メモリ、揮発性メモリ、ランダムアクセスメモリ（ＲＡＭ）、電気的に消去可能なプログラム可能な読み出し専用メモリ（ＥＥＰＲＯＭ（登録商標））、ストレージドライブ、コンパクトディスク読み出し専用メモリ（ＣＤＤＲＯＭ）、または他のメモリを含み得る。 The memory 112 stores and/or maintains the digitized audio stream 114. Embodiments of the memory 112 can store and/or maintain a digitized audio stream 114, such as memory buffers, caches, non-volatile memory, volatile memory, random access memory (RAM), electrically erasable. Programmable read only memory (EEPROM), storage drive, compact disc read only memory (CDDROM), or other memory.

デジタル化された音声ストリーム１１４は、メモリ１１２に記憶される。実施形態は、低電力集積回路１０４が、デジタル化モジュール１０６の後に音声ストリーム１０２を圧縮して、メモリ１１２における配置の前に、圧縮されたデジタル化された音声ストリームを取得することを含み得る。図１はメモリ１１２に記憶されたデジタル化された音声ストリーム１１４を示しているが、デジタル化された音声ストリームはまた、低電力集積回路１０４上のメモリに記憶されることもできる。さらなる実施形態において、デジタル化された音声ストリーム１１４は、所定の長さの時間の音声ストリーム１０２を含む。この実施形態において、音声ストリーム１０２が、数秒または数分といった所定の時間期間にわたって受信されると、この所定の時間期間の音声ストリーム１０２は、デジタル化され、プロセッサ１１８が取得および／または検索するためにメモリ１１２に記憶される。さらにこの実施形態では、別の音声ストリーム１０２が低電力集積回路１０４によって受信され、デジタル化された場合、メモリにおける前のデジタル化された音声ストリームは、より現在に近いデジタル化された音声ストリーム１１４と置き換えられる。したがって、プロセッサ１１８は、最も現在に近い音声ストリーム１０２を取得および／または検索する。この実施形態において、メモリは、最も現在に近い音声ストリーム１０２を提供するための先入先出バッファとして動作する。 The digitized audio stream 114 is stored in the memory 112. Embodiments may include low power integrated circuit 104 compressing audio stream 102 after digitizing module 106 to obtain a compressed digitized audio stream prior to placement in memory 112. Although FIG. 1 shows the digitized audio stream 114 stored in the memory 112, the digitized audio stream may also be stored in memory on the low power integrated circuit 104. In a further embodiment, digitized audio stream 114 comprises audio stream 102 of a predetermined length of time. In this embodiment, when the audio stream 102 is received for a predetermined time period, such as seconds or minutes, the audio stream 102 for the predetermined time period is digitized for acquisition and/or retrieval by the processor 118. Stored in the memory 112. Further, in this embodiment, if another audio stream 102 was received and digitized by the low power integrated circuit 104, the previous digitized audio stream in memory is closer to the present digitized audio stream 114. Is replaced by Therefore, the processor 118 retrieves and/or retrieves the most current audio stream 102. In this embodiment, the memory acts as a first-in first-out buffer to provide the most current audio stream 102.

信号１１６は、デジタル化された音声ストリーム１１４内にキーワードが認識されると、低電力集積回路１０４からプロセッサ１１８に送信される。信号１１６は、電力１２２を増大させ、メモリ１１２からのデジタル化された音声ストリーム１１４を分析するよう、プロセッサ１１８に命令する。信号１１６の実施形態は、電力１２２を増大させるためのプロセッサ１１８への通信、送信、電気信号、命令、デジタル信号、アナログ信号、または他のタイプの通信を含む。信号１１６のさらなる実施形態は、デジタル化された音声ストリーム１１４内にキーワードが認識されるとプロセッサ１１８に送信される割り込みを含む。 The signal 116 is transmitted from the low power integrated circuit 104 to the processor 118 upon the recognition of the keyword in the digitized audio stream 114. The signal 116 increases the power 122 and directs the processor 118 to analyze the digitized audio stream 114 from the memory 112. Embodiments of signal 116 include communications, transmissions, electrical signals, instructions, digital signals, analog signals, or other types of communications to processor 118 to increase power 122. A further embodiment of signal 116 includes an interrupt sent to processor 118 when a keyword is recognized in digitized audio stream 114.

プロセッサ１１８は、電力１２２を増大させ、デジタル化された音声ストリーム１１４を取得してモジュール１２０で分析するようにとの信号１１６を受信する。プロセッサ１１８の実施形態は、デジタル化された音声ストリーム１１４を分析１２０するのに適した中央処理ユニット（ＣＰＵ）、視覚処理ユニット（ＶＰＵ）、マイクロプロセッサ、グラフィックスプロセッシングユニット（ＧＰＵ）、または他のプログラム可能なデバイスを含み得る。 Processor 118 receives signal 116 to increase power 122 and obtain digitized audio stream 114 for analysis by module 120. Embodiments of the processor 118 may include a central processing unit (CPU), visual processing unit (VPU), microprocessor, graphics processing unit (GPU), or other suitable for analyzing 120 the digitized audio stream 114. It may include a programmable device.

プロセッサ１１８がメモリ１１２からデジタル化された音声ストリーム１１４を取得すると、プロセッサは、モジュール１２０でデジタル化された音声ストリーム１１４を分析する。分析モジュール１２０の実施形態は、プロセッサ１１８が、フェッチ、復号、および／または実行し得る、命令、処理、動作、論理、アルゴリズム、技法、論理関数、ファームウェア、および／またはソフトウェアを含む。モジュール１２０の追加の実施形態は、デジタル化された音声ストリーム１１４をテキストストリームに変換して、音声ストリーム１０２のコンテキストに基づいて適切な応答を決定することを含む。モジュール１２０のさらなる実施形態は、後の図面において見られるように、コンピューティングデバイス１００のユーザに差し出すための応答を決定することを含む。 Once the processor 118 obtains the digitized audio stream 114 from the memory 112, the processor analyzes the digitized audio stream 114 at module 120. Embodiments of analysis module 120 include instructions, processes, operations, logic, algorithms, techniques, logical functions, firmware, and/or software that processor 118 may fetch, decode, and/or execute. Additional embodiments of module 120 include converting the digitized audio stream 114 to a text stream and determining an appropriate response based on the context of audio stream 102. Further embodiments of module 120 include determining a response to present to a user of computing device 100, as seen in later figures.

電力１２２は、プロセッサ１１８に電位の形態で電気エネルギーを供給する。特に、電力１２２は、低電力集積回路１０４から信号１１６が受信されると、プロセッサ１１８への電気エネルギーを増大させる。プロセッサ１１８への電力１２２を増大させることは、デジタル化された音声ストリーム１１４を取得するよう、プロセッサ１１８をウェイクまたはトリガする。電力１２２の実施形態は、プロセッサ１１８に電力１２２を与えることができる、電源、電力管理デバイス、バッテリー、エネルギーストレージ、電気機械システム、ソーラーパワー、電源プラグ、または他のデバイスを含む。さらなる実施形態において、電力１２２は、コンピューティングデバイス１００に電気エネルギーを供給する。 Electric power 122 supplies electrical energy to processor 118 in the form of a potential. In particular, the power 122 increases electrical energy to the processor 118 when the signal 116 is received from the low power integrated circuit 104. Increasing power 122 to processor 118 wakes or triggers processor 118 to acquire digitized audio stream 114. Embodiments of power 122 include power supplies, power management devices, batteries, energy storage, electromechanical systems, solar power, power plugs, or other devices that can provide power 122 to processor 118. In a further embodiment, power 122 provides electrical energy to computing device 100.

ここで図２を参照すると、音声ストリーム２０２を分析し、キーワードが音声ストリーム２０２中に検出された場合に電力を増大させるようプロセッサに信号２１６を送信するための、例示的な低電力集積回路２０４のブロック図である。低電力集積回路２０４は、デジタル化回路素子２０６を使用してデジタル化された音声ストリーム２１４を生成するための回路素子２１０を含み、比較回路素子２０８によってキーワードを検出し、デジタル化された音声ストリーム２１４中にキーワードを認識すると、信号２１６を送信する。 Referring now to FIG. 2, an exemplary low power integrated circuit 204 for analyzing the audio stream 202 and sending a signal 216 to the processor to increase power if a keyword is detected in the audio stream 202. It is a block diagram of. Low power integrated circuit 204 includes circuitry 210 for generating digitized audio stream 214 using digitizing circuitry 206, detecting keywords by comparison circuitry 208 and digitizing audio stream. Upon recognizing the keyword during 214, it sends signal 216.

音声ストリーム２０２が、低電力集積回路２０４によって受信される。音声ストリーム２０２は、図１の音声ストリーム１０２と構造が同様であり得る。 Audio stream 202 is received by low power integrated circuit 204. The audio stream 202 may be similar in structure to the audio stream 102 of FIG.

低電力集積回路２０４は、音声ストリーム２０２をデジタル化し、デジタル化された音声ストリーム２１４をキーワードと比較するための回路素子２１０を含む。低電力集積回路２０４は、図１における上述した低電力集積回路１０４と機能および構造が同様であり得る。 The low power integrated circuit 204 includes circuitry 210 for digitizing the audio stream 202 and comparing the digitized audio stream 214 with keywords. The low power integrated circuit 204 may be similar in function and structure to the low power integrated circuit 104 described above in FIG.

回路素子２１０は、デジタル化回路素子２０６および比較回路素子２０８を含む。回路素子２１０の実施形態は、音声ストリーム２０２をデジタル化し、デジタル化された音声ストリーム２１４をキーワードと比較することができる、論理、アナログ回路素子、電子回路素子、デジタル回路素子、または他の回路素子を含む。さらなる実施形態において、回路素子は、回路素子２０６および２０８をフェッチ、復号、および／または実行するために、低電力集積回路２０４と独立しておよび／または共に利用され得る、アプリケーションおよび／またはファームウェアを含む。 The circuit element 210 includes a digitizing circuit element 206 and a comparing circuit element 208. Embodiments of the circuit element 210 can digitize the audio stream 202 and compare the digitized audio stream 214 with keywords, such as logic, analog circuit elements, electronic circuit elements, digital circuit elements, or other circuit elements. including. In a further embodiment, the circuit elements may include applications and/or firmware that may be utilized independently and/or together with the low power integrated circuit 204 to fetch, decode, and / or execute the circuit elements 206 and 208. Including.

音声ストリーム２０２が、回路素子２０６によって受信され、デジタル化されて、デジタル化された音声ストリーム２１４が生成される。デジタル化回路素子２０６は、音声ストリーム２０２のための変換のタイプである。さらに、デジタル化回路素子２０６は、図１に関連して説明されたデジタル化モジュール１０６と機能が同様であり得る。 The audio stream 202 is received by the circuit element 206 and digitized to produce a digitized audio stream 214. Digitization circuitry 206 is the type of transformation for audio stream 202. Further, the digitizing circuitry 206 may be similar in function to the digitizing module 106 described in connection with FIG.

低電力集積回路２０４は、音声ストリーム２０２を受信して回路素子２０６でデジタル化し、デジタル化された音声ストリーム２１４を生成する。デジタル化された音声ストリーム２１４は、図１に関連して説明されたデジタル化された音声ストリーム１１４と構造が同様であり得る。さらに、図２は、低電力集積回路２０４の外側にデジタル化された音声ストリーム２１４を示しているが、デジタル化された音声ストリーム２１４はまた、低電力集積回路２０４内に位置することもできる。低電力集積回路２０４内に位置するデジタル化された音声ストリーム２１４は、キーワードとの比較のために回路素子２０８で使用される。別の実施形態において、デジタル化された音声ストリーム２１４は、メモリにおいて記憶および／または維持される。 The low power integrated circuit 204 receives the audio stream 202 and digitizes it in circuit elements 206 to produce a digitized audio stream 214. The digitized audio stream 214 may be similar in structure to the digitized audio stream 114 described in connection with FIG. Further, although FIG. 2 shows the digitized audio stream 214 outside the low power integrated circuit 204, the digitized audio stream 214 may also be located within the low power integrated circuit 204. The digitized audio stream 214 located in the low power integrated circuit 204 is used by the circuit element 208 for comparison with keywords. In another embodiment, the digitized audio stream 214 is stored and/or maintained in memory.

低電力集積回路２０４の回路素子２１０に含まれる回路素子２０８は、デジタル化された音声ストリーム２１４をキーワードと比較する。さらに、２０８は、デジタル化された音声ストリーム２１４内にキーワードを認識して、プロセッサに電力を増大させるための信号２１６を送信するために使用される。比較回路素子２０８は、図１に関連して説明されたモジュール１０８と機能が同様であり得る。 The circuit element 208 included in the circuit element 210 of the low power integrated circuit 204 compares the digitized audio stream 214 with the keywords. In addition, 208 is used to recognize the keywords in the digitized audio stream 214 and send a signal 216 to the processor to increase power. The comparator circuit element 208 may be similar in function to the module 108 described in connection with FIG.

信号２１６は、比較回路素子２０８によりデジタル化された音声ストリーム２１４内にキーワードが認識されると、電力を増大させるようデバイスに命令する。信号２１６は、図１の信号１１６と構造および機能が同様であり得る。信号２１６の実施形態は、電力を増大させ、メモリからのデジタル化された音声ストリーム２１４を分析するよう、プロセッサに命令することを含む。この実施形態において、信号２１６は、デジタル化された音声ストリーム２１４を取得して分析し、回路素子２０８でのキーワード認識に基づいて応答を決定するよう、プロセッサに命令する。 Signal 216 directs the device to increase power when a keyword is recognized in digitized audio stream 214 by comparison circuitry 208. Signal 216 may be similar in structure and function to signal 116 of FIG. Embodiments of signal 216 include instructing a processor to increase power and analyze digitized audio stream 214 from memory. In this embodiment, signal 216 directs the processor to acquire and analyze digitized audio stream 214 and determine a response based on keyword recognition at circuit element 208.

図３は、デジタル化された音声ストリーム３１４を分析するための例示的なコンピューティングデバイス３００と、デジタル化された音声ストリーム３１４から発生させたテキストストリーム３２４を分析するためにコンピューティングデバイス３００と通信するサーバ３２６と、のブロック図である。コンピューティングデバイス３００は、低電力集積回路３０４、メモリ３１２、プロセッサ３１８、出力デバイス３２８、およびサーバ３２６を含む。特に、図３は、出力デバイス３２４でコンピューティングデバイスのユーザに応答を差し出すためにサーバ３２６またはプロセッサ３１８によって処理されるテキストストリーム３２４を示す。コンピューティングデバイス３００は、図１に関連して説明されたコンピューティングデバイス１００と構造および機能が同様であり得る。 FIG. 3 communicates with an exemplary computing device 300 for analyzing a digitized audio stream 314 and a computing device 300 for analyzing a text stream 324 generated from the digitized audio stream 314. 3 is a block diagram of a server 326 that executes Computing device 300 includes low power integrated circuit 304, memory 312, processor 318, output device 328, and server 326. In particular, FIG. 3 illustrates a text stream 324 that is processed by a server 326 or processor 318 to present a response at a output device 324 to a user of a computing device. Computing device 300 may be similar in structure and function to computing device 100 described in connection with FIG.

音声ストリーム３０２が、コンピューティングデバイス３００、特に、低電力集積回路３０４によって受信される。音声ストリーム３０２は、図１および図２におけるそれぞれの音声ストリーム１０２および２０２と、構造が同様であり得る。 The audio stream 302 is received by the computing device 300, particularly low power integrated circuit 304. The audio stream 302 may be similar in structure to the respective audio streams 102 and 202 in FIGS. 1 and 2.

低電力集積回路３０４は、デジタル化モジュール３０６および分析モジュール３０８を含む。一実施形態において、低電力集積回路３０４は、モジュール３０６および３０８を備えるための回路素子を含む。低電力集積回路３０４は、図１および図２に関連してそれぞれ説明された低電力集積回路１０４および２０４と構造および機能が同様であり得る。 The low power integrated circuit 304 includes a digitization module 306 and an analysis module 308. In one embodiment, low power integrated circuit 304 includes circuit elements to comprise modules 306 and 308. The low power integrated circuit 304 may be similar in structure and function to the low power integrated circuits 104 and 204 described in connection with FIGS. 1 and 2, respectively.

音声ストリーム３０２は、コンピューティングデバイス３００によって受信されると、デジタル化３０６されて、デジタル化された音声ストリーム３１４が生成される。デジタル化モジュール３０６は、図１および図２におけるそれぞれのデジタル化モジュール１０６およびデジタル化回路素子２０６と構造および機能が同様であり得る。さらなる実施形態において、音声ストリーム３０２がモジュール３０６でデジタル化されると、低電力集積回路３０４は、デジタル化された音声ストリーム３１４を、記憶および／または維持のためにメモリ３１２に送信する。 Upon receipt by the computing device 300, the audio stream 302 is digitized 306 to produce a digitized audio stream 314. Digitization module 306 may be similar in structure and function to digitization module 106 and digitization circuitry 206, respectively in FIGS. 1 and 2. In a further embodiment, once the audio stream 302 is digitized at the module 306, the low power integrated circuit 304 sends the digitized audio stream 314 to the memory 312 for storage and/or maintenance.

音声ストリーム３１４がデジタル化されると、低電力集積回路は、モジュール３０８でデジタル化された音声ストリーム３１４を分析する。一実施形態において、モジュール３０８は、キーワードをデジタル化された音声ストリーム３１４と比較する。この実施形態において、３０８は、図１における上述した比較モジュール１０８の機能を含む。 Once the audio stream 314 is digitized, the low power integrated circuit analyzes the digitized audio stream 314 at module 308. In one embodiment, the module 308 compares the keyword with the digitized audio stream 314 . In this embodiment, 308 includes the functionality of the comparison module 108 described above in FIG.

メモリ３１２は、低電力集積回路３０４からのデジタル化された音声ストリーム３１４を記憶する。一実施形態において、メモリ３１２は、所定の時間期間中に受信されたデジタル化された音声ストリーム３１４を維持する。たとえば、音声ストリーム３０２は、数秒の所定の時間にわたって監視されることができ、したがって、この数秒の音声ストリーム３０２は、モジュール３０６でデジタル化され、メモリ３１２に送られる。この例において、メモリ３１２は、信号３１６を受信すると分析のためにプロセッサ３１８によって検索および／または取得されるように、数秒のデジタル化された音声ストリーム３１４を記憶する。また、この例において、数秒の別の音声ストリーム３０２が受信され、デジタル化された場合、この別のデジタル化された音声ストリーム３１４は、前のデジタル化された音声ストリーム３１４と置き換わる。これは、最新の音声ストリーム３０２をプロセッサ３１８による取得および／または検索のために維持することをメモリ３１２に可能にさせる。メモリ３１２は、図１に関連して説明されたメモリ１１２と構造および機能が同様であり得る。 The memory 312 stores the digitized audio stream 314 from the low power integrated circuit 304. In one embodiment, the memory 312 maintains the digitized audio stream 314 received during the predetermined time period. For example, the audio stream 302 can be monitored for a predetermined time of a few seconds, and thus the few seconds of the audio stream 302 is digitized at module 306 and sent to memory 312. In this example, memory 312 stores a few seconds of digitized audio stream 314 so that upon receipt of signal 316, it may be retrieved and/or retrieved by processor 318 for analysis. Also, in this example, if a few seconds of another audio stream 302 is received and digitized, this other digitized audio stream 314 replaces the previous digitized audio stream 314. This allows the memory 312 to maintain the latest audio stream 302 for acquisition and/or retrieval by the processor 318. The memory 312 may be similar in structure and function to the memory 112 described in connection with FIG.

音声ストリーム３０２がデジタル化され３０６、デジタル化された音声ストリーム３１４が生成される。デジタル化された音声ストリーム３１４は、メモリ３１２に記憶および／または維持される。実施形態において、プロセッサ３１８は、信号３１６を受信すると、デジタル化された音声ストリーム３１４を取得してモジュール３２０で分析する。デジタル化された音声ストリーム３１４は、図１および図２に関連してそれぞれ説明されたデジタル化された音声ストリーム１１４および２１４と構造および機能が同様であり得る。 The audio stream 302 is digitized 306 to produce a digitized audio stream 314. The digitized audio stream 314 is stored and/or maintained in memory 312. In an embodiment, the processor 318, upon receiving the signal 316, obtains the digitized audio stream 314 and analyzes it at the module 320. Digitized audio stream 314 may be similar in structure and function to digitized audio streams 114 and 214 described in connection with FIGS. 1 and 2, respectively.

信号３１６は、低電力集積回路３０４からプロセッサ３１８への、電力３２２を増大させるための送信である。信号３１６の実施形態は、デジタル化された音声ストリーム３１４を取得してモジュール３２０で分析するようプロセッサ３１８に追加で命令する。信号３１６は、図１および図２に関連してそれぞれ説明された信号１１６および２１６と構造および機能が同様であり得る。 Signal 316 is a transmission from low power integrated circuit 304 to processor 318 to increase power 322. Embodiments of signal 316 additionally direct processor 318 to obtain digitized audio stream 314 for analysis at module 320. Signal 316 may be similar in structure and function to signals 116 and 216 described in connection with FIGS. 1 and 2, respectively.

電力３２２は、プロセッサ３１８および／またはコンピューティングデバイス３００に電気エネルギーを供給する。電力３２２は、図１に関連して説明された電力１２２と構造および機能が同様であり得る。 Electric power 322 provides electrical energy to processor 318 and/or computing device 300. The power 322 may be similar in structure and function to the power 122 described in connection with FIG.

プロセッサ３１８は、分析モジュール３２０およびテキストストリーム３２４を含む。特に、プロセッサ３１８は、電力３２２を増大させるための信号３１６を受信する。この信号３１６を受信すると、プロセッサ３１８は、デジタル化された音声ストリーム３１４を取得してモジュール３２０で分析する。さらなる実施形態において、プロセッサ３１８は、デジタル化された音声ストリーム３１４をテキストストリーム３２４に変換する。この実施形態において、テキストストリーム３２４内のテキストは、コンピューティングデバイス３００のための応答を命じる。テキストストリームは、アルファベット、数字のセット、または英数字のセットからの、シンボルまたは表現の有限のシーケンスのストリングである。たとえば、デジタル化された音声ストリーム３１４は、二進言語（binary language）におけるものであり得るので、プロセッサは、二進表現のバイトを単語に翻訳する。さらなる例において、デジタル化された音声ストリーム３１４は、単語および／または数を表す言語におけるものであり得るので、プロセッサ３１８は、この言語をプロセッサ３１８が理解するテキストに翻訳する。応答の実施形態は、ウェブ検索を実行すること、電話番号をダイヤルすること、アプリケーションを開くこと、テキストを記録すること、メディアをストリーミングすること、テキストメッセージを作成すること、道程を一覧表示すること、または道程を話すことを含む。さらなる実施形態において、プロセッサ３１８は、コンピューティングデバイス３００のユーザに差し出すための応答を決定する。プロセッサ３１８は、図１に関連して説明されたプロセッサ１１８と構造および機能が同様であり得る。 Processor 318 includes analysis module 320 and text stream 324. In particular, processor 318 receives signal 316 to increase power 322. Upon receipt of this signal 316, the processor 318 obtains the digitized audio stream 314 for analysis by the module 320. In a further embodiment, the processor 318 converts the digitized audio stream 314 into a text stream 324. In this embodiment, the text in text stream 324 commands a response for computing device 300. A text stream is a string of a finite sequence of symbols or expressions from an alphabet, a set of numbers, or a set of alphanumeric characters. For example, the digitized audio stream 314 may be in a binary language, so the processor translates the binary representation bytes into words. In a further example, digitized audio stream 314 may be in a language that represents words and/or numbers, so processor 318 translates the language into text that processor 318 understands. Response embodiments include performing a web search, dialing a phone number, opening an application, recording text, streaming media, composing text messages, listing journeys. , Or talking the journey. In a further embodiment, the processor 318 determines a response to present to the user of the computing device 300. Processor 318 may be similar in structure and function to processor 118 described in connection with FIG.

プロセッサ３１８は、モジュール３２０で記憶されたデジタル化された音声ストリーム３１４を分析する。分析モジュール３２０の実施形態は、メモリ３１４から取得されたデジタル化された音声ストリーム３１４をサーバ３２６に送信することを含む。モジュール３２０の他の実施形態は、メモリ３１２から取得されたデジタル化された音声ストリーム３１４をテキストストリーム３２４に変換することと、テキストストリーム３２４をサーバ３２６に送信することとを含む。モジュール３２０の他の実施形態は、音声ストリーム３０２のコンテキストを分析することによって適切な応答を決定するために、デジタル化された音声ストリーム３１４をテキストストリーム３２４に変換することを含む。たとえば、デジタル化された音声ストリーム３１４は、モジュール３２０でテキストストリーム３２４に変換されることができ、プロセッサ３１８は、音声ストリーム３０２のコンテキストに基づいて適切な応答を決定するためにテキストストリーム３２４内のテキストを分析するために自然言語処理を利用し得る。 Processor 318 analyzes the digitized audio stream 314 stored in module 320. Embodiments of analysis module 320 include transmitting digitized audio stream 314 obtained from memory 314 to server 326. Other embodiments of module 320 include converting digitized audio stream 314 obtained from memory 312 into text stream 324 and sending text stream 324 to server 326. Other embodiments of module 320 include converting digitized audio stream 314 to text stream 324 to determine the appropriate response by analyzing the context of audio stream 302. For example, the digitized audio stream 314 can be converted at module 320 into a text stream 324 and a processor 318 within the text stream 324 to determine an appropriate response based on the context of the audio stream 302. Natural language processing may be utilized to analyze the text.

テキストストリーム３２４は、コンピューティングデバイス３００のための適切な応答を決定するためのテキストを含む。一実施形態において、テキストストリーム３２４は、出力デバイス３２８でコンピューティングデバイス３００のユーザに差し出すための適切な応答を決定するためにプロセッサによって処理される。別の実施形態において、テキストストリーム３２４は、コンピューティングデバイス３００に送信される適切な応答を決定するためにサーバ３２６によって処理される。この実施形態において、応答は、サーバ３２６からコンピューティングデバイス３００に送られる。さらなる実施形態において、コンピューティングデバイス３００は、コンピューティングデバイス３００のユーザに応答を差し出す。たとえば、テキストストリーム３２４は、母親にテキストメッセージを送ることを話し合うテキストを含み得る。したがって、テキストストリーム３２４内のテキストは、コンピューティングデバイス３００のために、母親へのテキストメッセージを作成することによって応答するよう命じる。 Text stream 324 includes text to determine the appropriate response for computing device 300. In one embodiment, the text stream 324 is processed by the processor at the output device 328 to determine the appropriate response to present to the user of the computing device 300. In another embodiment, text stream 324 is processed by server 326 to determine the appropriate response to send to computing device 300. In this embodiment, the response is sent from server 326 to computing device 300. In a further embodiment, computing device 300 presents a response to the user of computing device 300. For example, text stream 324 may include text that discusses sending a text message to a mother. Thus, the text in the text stream 324 commands the computing device 300 to respond by composing a text message to the mother.

サーバ３２６は、ネットワークにわたってサービスを提供し、テキストストリーム３２４を処理してコンピューティングデバイス３００に応答を送信するのに適した、たとえば、ウェブサーバ、ネットワークサーバ、ローカルエリアネットワーク（ＬＡＮ）サーバ、ファイルサーバ、または任意の他のコンピューティングデバイスを含み得る。 Server 326 is suitable for providing services across a network and processing text stream 324 and sending responses to computing device 300, such as a web server, network server, local area network (LAN) server, file server. , Or any other computing device.

出力デバイス３２８は、コンピューティングデバイス３００のユーザにテキストストリーム３２４内のテキストから決定された応答を差し出す。出力デバイス３２８の実施形態は、コンピューティングデバイス３００のユーザに応答を差し出すための、表示デバイス、スクリーン、またはスピーカーを含む。母親へのテキストメッセージの例を踏まえると、コンピューティングデバイス３００のユーザは、母親へのテキストメッセージが作成されているのを示すディスプレイ、および／または、テキストメッセージをユーザに通信するためのスピーカーを有し得る。 The output device 328 presents the response of the computing device 300 user determined from the text in the text stream 324. Embodiments of output device 328 include a display device, screen, or speaker for presenting a response to a user of computing device 300. Given the example of a text message to a mother, a user of computing device 300 may have a display that indicates that the text message to the mother is being created and/or a speaker to communicate the text message to the user. You can

ここで図４を見てみると、音声ストリームを受信し、応答を決定するためにコンピューティングデバイスで実行される例示的な方法のフローチャートである。図４は、図１におけるようなコンピューティングデバイス１００で実行されるものとして説明されるが、それはまた、当業者に理解されるように、他の適切なコンポーネントで実行されることもできる。たとえば、図４は、メモリ１１２のような機械可読記憶媒体における実行可能な命令の形態で実現され得る。 Turning now to FIG. 4, a flowchart of an exemplary method performed at a computing device for receiving an audio stream and determining a response. 4 is described as being executed on computing device 100 as in FIG. 1, it can also be executed on other suitable components, as will be appreciated by those skilled in the art. For example, FIG. 4 may be implemented in the form of executable instructions on a machine-readable storage medium such as memory 112.

動作４０２で、低電力集積回路と共に動作するコンピューティングデバイスが、音声ストリームを受信する。一実施形態において、音声ストリームは、所定の長さの時間のものであり得る。たとえば、音声ストリームは、数秒または数ミリ秒であり得る。この実施形態において、コンピューティングデバイスは、絶えず音声を監視し得る。さらなる実施形態において、音声ストリームは、ユーザからの話声または他のコンピューティングデバイスからの音声の少なくとも１つを含む。 At operation 402, a computing device operating with a low power integrated circuit receives an audio stream. In one embodiment, the audio stream may be of a predetermined length of time. For example, the audio stream can be seconds or milliseconds. In this embodiment, the computing device may constantly monitor audio. In a further embodiment, the audio stream comprises at least one of speech from a user or audio from other computing devices.

動作４０４で、コンピューティングデバイスと共に動作する低電力集積回路は、動作４０２で受信された音声ストリームをデジタル化して、デジタル化された音声ストリームを生成する。動作４０４の実施形態は、低電力集積回路と共に動作する、アナログデジタルコンバータ（ＡＤＣ）、デジタル変換デバイス、命令、ファームウェア、および／またはソフトウェアの使用を含む。動作４０４の実施形態は、デジタル化された音声ストリームをメモリに送信することを含む。４０４のさらなる実施形態が動作４０２で受信された音声ストリームを圧縮することを含む一方で、４０４の別の実施形態は、デジタル化された音声ストリームを圧縮することを含む。 At operation 404, the low power integrated circuit operating with the computing device digitizes the audio stream received at operation 402 to produce a digitized audio stream. Embodiments of operation 404 include the use of analog-to-digital converters (ADCs), digital conversion devices, instructions, firmware, and/or software that operate with low power integrated circuits. Embodiments of act 404 include sending the digitized audio stream to memory. While a further embodiment of 404 includes compressing the audio stream received at operation 402, another embodiment of 404 includes compressing the digitized audio stream.

動作４０６で、動作４０４で生成されたデジタル化された音声ストリームが、メモリに記憶される。動作４０６の実施形態は、メモリがデジタル化された音声ストリームを記憶および／または維持することを含む。動作４０６の別の実施形態において、動作４０２で所定の長さの時間中に受信された音声ストリームが動作４０４でデジタル化され、たとえば、別の音声ストリームが、動作４０２で受信され、動作４０４でデジタル化された場合、この現在のデジタル化された音声ストリームが、前のデジタル化された音声ストリームと置き換わる。この実施形態において、メモリは、現在の時間より前の所定の時間期間中に受信された記憶されたデジタル化された音声ストリームを維持する。 At act 406, the digitized audio stream generated at act 404 is stored in memory. Embodiments of operation 406 include memory storing and/or maintaining a digitized audio stream. In another embodiment of act 406, the audio stream received during act 402 during the predetermined amount of time is digitized at act 404, eg, another audio stream is received at act 402 and at act 404. If digitized, this current digitized audio stream replaces the previous digitized audio stream. In this embodiment, the memory maintains the stored digitized audio stream received during the predetermined time period prior to the current time.

動作４０８で、低電力集積回路は、動作４０４で生成されたデジタル化された音声ストリームを分析する。動作４０８の実施形態が、デジタル化された音声ストリームを処理することを含む一方で、他の実施形態は、デジタル化された音声ストリームをキーワードと比較することを含む。動作４０８のこれらの実施形態において、低電力集積回路は、キーワードについてデジタル化された音声ストリームを処理する。デジタル化された音声ストリーム内にキーワードが認識されると、方法は、信号を送信するための動作４１０へと移行する。さらなる実施形態において、低電力集積回路がデジタル化された音声ストリーム内にキーワードを認識しない場合、方法は、動作４０２へと戻る。さらに、さらなる実施形態は、デジタル化された音声ストリームを、コンピューティングデバイスのユーザがコンピューティングデバイスによる応答を所望することを示すアナログまたはデジタルの表現と比較することを含む。さらなる実施形態ではまた、動作４０２、４０４、４０６、および４０８は、並行して行われる。たとえば、コンピューティングデバイスが４０８でデジタル化された音声ストリームを分析するときに、集積回路は、動作４０２で音声ストリームを受信し、動作４０４および４０６で音声ストリームをデジタル化し、記憶し続ける。 At operation 408, the low power integrated circuit analyzes the digitized audio stream generated at operation 404. While embodiments of act 408 include processing the digitized audio stream, other embodiments include comparing the digitized audio stream with keywords. In these embodiments of act 408, the low power integrated circuit processes the audio stream digitized for keywords. Once the keyword is recognized in the digitized audio stream, the method transitions to operation 410 for sending a signal. In a further embodiment, if the low power integrated circuit does not recognize the keyword in the digitized audio stream, the method returns to operation 402. Still a further embodiment includes comparing the digitized audio stream, a user analog or digital representation indicate that the desired response by the computing device of the computing device. In a further embodiment, operations 402, 404, 406, and 408 are also performed in parallel. For example, when the computing device analyzes the digitized audio stream at 408, the integrated circuit receives the audio stream at operation 402 and continues to digitize and store the audio stream at operations 404 and 406.

動作４１０で、低電力集積回路は、電力を増大させるようプロセッサに信号を送信する。特に、デジタル化された音声ストリーム内にキーワードが認識されると、低電力集積回路は、電力を増大させるようプロセッサに信号を送信する。動作４１０の実施形態において、プロセッサは、プロセッサおよび／またはコンピューティングデバイスに与えられる電力または電気エネルギーを増大させる。 At operation 410, the low power integrated circuit sends a signal to the processor to increase power. In particular, when a keyword is recognized in the digitized audio stream, the low power integrated circuit signals the processor to increase the power. In an embodiment of operation 410, the processor increases power or electrical energy provided to the processor and/or computing device.

動作４１２で、プロセッサは、メモリから動作４０６で記憶されたデジタル化された音声ストリームを取得する。動作４１２の一実施形態では、メモリがプロセッサに、デジタル化された音声ストリームを送信する一方で、動作４１２の別の実施形態では、プロセッサがメモリから、デジタル化された音声ストリームを検索する。 At operation 412, the processor obtains from memory the digitized audio stream stored at operation 406. In one embodiment of act 412, the memory sends the digitized audio stream to the processor, while in another embodiment of act 412, the processor retrieves the digitized audio stream from the memory.

動作４１４で、プロセッサは、動作４１２で取得されたデジタル化された音声ストリームをテキストストリームに変換する。デジタル化された音声ストリームをテキストストリームに変換した後、プロセッサは、テキストストリーム内のテキストを分析して、適切な応答を決定する。動作４１４の実施形態は、スピーチトゥテキスト（ＳＴＴ）、ボイストゥテキスト、デジタルトゥテキスト、または他のタイプの、テキスト変換を使用することを含む。動作４１４のさらなる実施形態は、テキストストリームへの変換後に自然言語処理を使用することを含む。この実施形態では、コンピューティングデバイスは、テキストストリーム内のテキストを処理して、動作４０２で受信された音声ストリームのコンテキストに基づいて適切な応答を決定する。たとえば、４０８でデジタル化された音声ストリーム内にキーワードを検出すると、プロセッサがデジタル化された音声ストリームを動作４１２で取得して、デジタル化された音声ストリームが動作４１４でテキストストリームに変換される。さらなる例において、音声ストリームは、２つの場所の間の道程についての会話を含み得、たとえば、このデジタル化された音声ストリームが動作４１２でテキストストリームに変換されると、プロセッサは、テキストストリーム内のテキストを分析することによって適切な応答を決定し得る。 At operation 414, the processor converts the digitized audio stream obtained at operation 412 into a text stream. After converting the digitized audio stream to a text stream, the processor analyzes the text in the text stream to determine an appropriate response. Embodiments of act 414 include using speech-to-text (STT), voice-to-text, digital-to-text, or other types of text conversion. A further embodiment of act 414 includes using natural language processing after conversion to a text stream. In this embodiment, the computing device processes the text in the text stream to determine an appropriate response based on the context of the audio stream received at operation 402. For example, upon detecting a keyword in the digitized audio stream at 408, the processor obtains the digitized audio stream at operation 412 and the digitized audio stream is converted to a text stream at operation 414. In a further example, the audio stream may include a conversation about a journey between two locations, eg, when the digitized audio stream is converted to a text stream in act 412, the processor may The appropriate response may be determined by analyzing the text.

動作４１６で、プロセッサは、動作４１４で生成されたテキストストリームに基づいて応答を決定する。応答の実施形態は、ウェブ検索を実行すること、電話番号をダイヤルすること、アプリケーションを開くこと、テキストを記録すること、メディアをストリーミングすること、テキストメッセージを作成すること、道程を一覧表示すること、または道程を話すことを含む。一実施形態において、テキストストリーム内のテキストは、プロセッサのための適切な応答を命じる。さらなる実施形態において、応答は、コンピューティングデバイスのユーザに差し出される。たとえば、テキストストリームは、どのようにして中国に到達するかを尋ねる話声を含み得、したがって、中国への道程が適切な応答であろう。加えて、この例では、中国への道程を、地図表示で一覧表示すること、および／または、話すことが含まれ得る。 At act 416, the processor determines a response based on the text stream generated at act 414. Response embodiments include performing a web search, dialing a phone number, opening an application, recording text, streaming media, composing text messages, listing journeys. , Or talking the journey. In one embodiment, the text in the text stream dictates the appropriate response for the processor. In a further embodiment, the response is presented to the user of the computing device. For example, the text stream may include a voice asking how to reach China, and thus the journey to China would be a suitable response. In addition, this example may include listing and/or speaking the journeys to China on a map display.

ここで図５を参照すると、デジタル化された音声ストリームを圧縮し、コンピューティングデバイスのユーザに応答を差し出すためにコンピューティングデバイスで実行される例示的な方法のフローチャートである。図５は、図３における上述したコンピューティングデバイス３００で実行されるものとして説明されるが、それはまた、当業者に理解されるように、他の適切なコンポーネントで実行されることもできる。たとえば、図５は、メモリ３１２のような機械可読記憶媒体における実行可能な命令の形態で実現され得る。 Referring now to FIG. 5, a flowchart of an exemplary method performed on a computing device to compress a digitized audio stream and present a response to a user of the computing device. Although FIG. 5 is described as being executed on the computing device 300 described above in FIG. 3, it can also be executed on other suitable components, as will be appreciated by those skilled in the art. For example, FIG. 5 may be implemented in the form of executable instructions on a machine-readable storage medium such as memory 312.

動作５０２で、コンピューティングデバイスは、デジタル化された音声ストリームを圧縮する。一実施形態において、動作５０２は、図４における動作４０６より前の動作４０４と共に実行される。たとえば、受信された音声ストリームがデジタル化されると、低電力集積回路がコンピューティングデバイスと共に動作して、ストリームのデータバイトサイズを減じるためにデジタル化された音声ストリームを圧縮し得る。この例において、デジタル化された音声ストリームの圧縮は、動作４０６でメモリに記憶される前に行われる。さらなる実施形態において、動作５０２は、図４における動作４１２でデジタル化された音声ストリームを受信する前に実行される。たとえば、プロセッサが、メモリからのデジタル化された音声ストリームを圧縮するための動作５０２を実行し得る一方で、別の例では、メモリが、プロセッサがデジタル化された音声ストリームを取得する前にデジタル化された音声ストリームを圧縮し得る。動作５０２のさらなる別の実施形態では、圧縮されたデジタル化された音声ストリームが、図４におけるステップ４０８でのように、キーワードを認識するために分析される。 At operation 502, the computing device compresses the digitized audio stream. In one embodiment, act 502 is performed in conjunction with act 404 prior to act 406 in FIG. For example, as the received audio stream is digitized, a low power integrated circuit may work with the computing device to compress the digitized audio stream to reduce the data byte size of the stream. In this example, compression of the digitized audio stream is performed before it is stored in memory at operation 406. In a further embodiment, operation 502 is performed prior to receiving the digitized audio stream in operation 412 in FIG. For example, the processor may perform operation 502 to compress the digitized audio stream from the memory, while in another example, the memory may perform the digital operation before the processor obtains the digitized audio stream. The encrypted audio stream may be compressed. In yet another embodiment of act 502, the compressed digitized audio stream is analyzed to recognize keywords, as in step 408 in FIG.

動作５０４で、コンピューティングデバイスは、コンピューティングデバイスのユーザに応答を差し出す。動作５０４の実施形態は、図４における動作４１６中または後に行われることを含む。たとえば、プロセッサが適切な応答を決定すると、この応答は、コンピューティングデバイスのユーザに差し出され得る。さらなる実施形態において、応答は、コンピューティングデバイスと共に動作するディスプレイ画面またはスピーカーといった出力デバイス上でユーザに差し出されることができる。たとえば、ユーザが小エビとクルマエビとの違いを話し合っている場合、プロセッサは、ウェブ検索アプリケーションを起動して、たとえば、小エビとクルマエビとの違いのウェブ検索を実行し得る。実行されたウェブ検索は、コンピューティングデバイスの表示デバイス上でユーザに差し出され得る。さらなる例において、コンピューティングデバイスは、小エビとクルマエビの違いを、スピーカーを通じてユーザに聞こえるように読み上げる。これらの実施形態において、コンピューティングデバイスは、ユーザがコンピューティングデバイスに命令するよりもむしろ音声ストリームを用いて応答を決定するように動作する。 At operation 504, the computing device presents a response to the user of the computing device. Embodiments of act 504 include what occurs during or after act 416 in FIG. For example, when the processor determines an appropriate response, this response may be presented to the user of the computing device. In a further embodiment, the response can be presented to the user on an output device such as a display screen or speaker that works with the computing device. For example, if the user is discussing the differences between shrimp and prawns, the processor may launch a web search application to perform, for example, a web search for shrimp versus prawns. The performed web search may be presented to the user on the display device of the computing device. In a further example, the computing device announces the difference between shrimp and prawns so that the user can hear them through a speaker. In these embodiments, the computing device operates such that the user uses the audio stream to determine the response, rather than instructing the computing device.

本明細書において詳細に説明された実施形態は、キーワードを検出するために音声ストリームをデジタル化し、デジタル化された音声ストリーム内のキーワードの認識に基づいて、電力を増大させ、さらにデジタル化された音声ストリームを分析して応答を決定するよう、プロセッサに信号を送信することに関する。このように、例示的な実施形態は、コンピューティングデバイスへの反復した音声命令を防止しながら、コンピューティングデバイスの電力消費を減じることによって、ユーザの時間を節約する。
以下に本願の出願当初の特許請求の範囲に記載された発明を付記する。
［Ｃ１］
低電力集積回路とプロセッサとを含むコンピューティングデバイスによって実行される方法であって、
音声ストリームを受信することと、
前記音声ストリームをデジタル化することと、
前記デジタル化された音声ストリームをメモリに記憶することと、
前記低電力集積回路を使用してキーワードの認識のために前記デジタル化された音声ストリームを分析することと、
前記デジタル化された音声ストリーム内に前記キーワードが認識されると、前記低電力集積回路から前記プロセッサに電力を増大させるための信号を送信することと、
前記メモリから前記プロセッサに前記記憶されたデジタル化された音声ストリームを送信することと、
前記プロセッサを使用して前記デジタル化された音声ストリームをテキストストリームに変換することと、
前記テキストストリームに基づいて前記プロセッサのための応答を決定することと
を備える方法。
［Ｃ２］
前記デジタル化された音声ストリームを圧縮して、圧縮されたデジタル化された音声ストリームにすること
をさらに備え、前記分析することは、前記キーワードの認識のために前記圧縮されたデジタル化された音声ストリームを分析することを備える、Ｃ１に記載の方法。
［Ｃ３］
前記コンピューティングデバイスのユーザに前記応答を差し出すこと
をさらに備える、Ｃ１に記載の方法。
［Ｃ４］
前記応答は、ウェブ検索を実行すること、電話番号をダイヤルすること、アプリケーションを開くこと、テキストを記録すること、メディアをストリーミングすること、テキストメッセージを作成すること、道程を一覧表示すること、または道程を話すこと、の少なくとも１つを含む、Ｃ３に記載の方法。
［Ｃ５］
前記メモリは、現在の時間より前の所定の時間期間中に受信された、前記記憶されたデジタル化された音声ストリームを維持する、Ｃ１に記載の方法。
［Ｃ６］
前記プロセッサのための前記応答を決定することは、
サーバから前記サーバによる前記テキストストリームの分析に基づいた前記応答を受信することと、
前記プロセッサによって前記テキストストリームの分析に基づいて前記応答を決定することと
の１つを備える、Ｃ１に記載の方法。
［Ｃ７］
前記音声ストリームは、ユーザからの話声、別のコンピューティングデバイスからの話声、および前記別のコンピューティングデバイスからの音声、の少なくとも１つを含む、Ｃ１に記載の方法。
［Ｃ８］
コンピューティングデバイスであって、
音声ストリームを受信すると、前記音声ストリームをデジタル化してメモリに記憶し、キーワードを認識するために前記デジタル化された音声ストリームを分析し、
前記デジタル化された音声ストリーム中に前記キーワードを認識すると、プロセッサに電力を増大させるための信号を送信する
ための低電力集積回路と、
前記低電力集積回路からの前記信号に基づいて電力を増大させ、
応答を決定するために前記デジタル化された音声ストリームを分析する
ためのプロセッサと
を備えるコンピューティングデバイス。
［Ｃ９］
前記デジタル化された音声ストリームを分析するために、前記プロセッサはさらに、
前記電力を増大させるための信号を受信することに基づいて、前記メモリから前記デジタル化された音声ストリームを検索し、
前記デジタル化された音声ストリームをテキストストリームに変換し、
前記テキストストリーム中のテキストによって命じられた前記応答を決定する
ためのものである、Ｃ８に記載のコンピューティングデバイス。
［Ｃ１０］
前記デジタル化された音声ストリームを分析するために、前記プロセッサはさらに、
前記電力を増大させるための信号を受信することに基づいて、前記メモリから前記デジタル化された音声ストリームを検索し、
前記応答を決定するために、サーバに、前記デジタル化された音声ストリームまたは前記デジタル化された音声ストリームから発生させたテキストストリームを送信し、
前記サーバから前記応答を受信する
ためのものである、Ｃ８に記載のコンピューティングデバイス。
［Ｃ１１］
前記低電力集積回路はさらに、
前記デジタル化された音声ストリームを圧縮して、圧縮されたデジタル化された音声ストリームを取得し、
前記キーワードを認識するために前記圧縮されたデジタル化された音声ストリームを分析する
ためのものである、Ｃ８に記載のコンピューティングデバイス。
［Ｃ１２］
前記コンピューティングデバイスのユーザに前記応答を差し出すための出力デバイス
をさらに備える、Ｃ８に記載のコンピューティングデバイス。
［Ｃ１３］
前記キーワードを認識するために前記デジタル化された音声ストリームを分析するために、前記低電力集積回路は、前記デジタル化された音声ストリームを前記キーワードと比較する、Ｃ８に記載のコンピューティングデバイス。
［Ｃ１４］
低電力集積回路であって、
音声ストリームを受信し、
前記音声ストリームをデジタル化し、
前記デジタル化された音声ストリームをメモリに記憶し、
前記デジタル化された音声ストリームをキーワードと比較し、
前記デジタル化された音声ストリーム中に前記キーワードが認識されると、電力を増大させ、前記メモリからの前記記憶されたデジタル化された音声ストリームを分析するよう、コンピューティングデバイスのプロセッサに命令するための信号を送信する
ための回路素子
を備える低電力集積回路。
［Ｃ１５］
前記メモリは、所定の時間期間にわたる前記記憶されたデジタル化された音声ストリームを維持する、Ｃ１４に記載の低電力集積回路。 The embodiments described in detail herein digitize an audio stream to detect keywords, increase power, and further digitize based on recognition of keywords in the digitized audio stream. Signaling a processor to analyze an audio stream to determine a response. Thus, the exemplary embodiments save user time by reducing the power consumption of the computing device while preventing repeated voice commands to the computing device.
The inventions described in the claims at the initial application of the present application will be additionally described below.
[C1]
A method performed by a computing device including a low power integrated circuit and a processor, comprising:
Receiving an audio stream,
Digitizing the audio stream;
Storing the digitized audio stream in a memory;
Analyzing the digitized audio stream for keyword recognition using the low power integrated circuit;
Transmitting a signal from the low power integrated circuit to the processor to increase power when the keyword is recognized in the digitized audio stream;
Transmitting the stored digitized audio stream from the memory to the processor;
Converting the digitized audio stream into a text stream using the processor;
Determining a response for the processor based on the text stream;
A method comprising.
[C2]
Compressing the digitized audio stream into a compressed digitized audio stream
The method of C1, further comprising: analyzing the compressed digitized audio stream for recognition of the keyword.
[C3]
Presenting the response to the user of the computing device
The method according to C1, further comprising:
[C4]
The response is performing a web search, dialing a phone number, opening an application, recording text, streaming media, composing a text message, listing a route, or The method according to C3, comprising at least one of talking a journey.
[C5]
The method of C1, wherein the memory maintains the stored digitized audio stream received during a predetermined time period prior to the current time.
[C6]
Determining the response for the processor comprises
Receiving from the server the response based on the analysis of the text stream by the server;
Determining the response based on an analysis of the text stream by the processor;
The method according to C1, comprising one of:
[C7]
The method of C1, wherein the audio stream comprises at least one of a voice from a user, a voice from another computing device, and a voice from the another computing device.
[C8]
A computing device,
Upon receiving the audio stream, the audio stream is digitized and stored in memory, and the digitized audio stream is analyzed for keyword recognition,
Recognizing the keyword in the digitized audio stream sends a signal to the processor to increase power.
Low power integrated circuit for
Increasing power based on the signal from the low power integrated circuit,
Analyze the digitized audio stream to determine a response
With a processor for
A computing device comprising.
[C9]
To analyze the digitized audio stream, the processor further comprises:
Retrieving the digitized audio stream from the memory based on receiving a signal to increase the power,
Converting the digitized audio stream to a text stream,
Determine the response dictated by the text in the text stream
A computing device according to C8, which is for:
[C10]
To analyze the digitized audio stream, the processor further comprises:
Retrieving the digitized audio stream from the memory based on receiving a signal to increase the power,
Sending to the server the digitized audio stream or a text stream generated from the digitized audio stream to determine the response;
Receive the response from the server
A computing device according to C8, which is for:
[C11]
The low power integrated circuit further comprises:
Compressing the digitized audio stream to obtain a compressed digitized audio stream,
Analyze the compressed digitized audio stream to recognize the keywords
A computing device according to C8, which is for:
[C12]
An output device for presenting the response to a user of the computing device
The computing device according to C8, further comprising:
[C13]
The computing device of C8, wherein the low power integrated circuit compares the digitized voice stream with the keyword to analyze the digitized voice stream to recognize the keyword.
[C14]
A low power integrated circuit,
Receive an audio stream,
Digitize the audio stream,
Storing the digitized audio stream in memory,
Comparing the digitized audio stream with keywords,
For instructing a processor of a computing device to increase power and analyze the stored digitized voice stream from the memory when the keyword is recognized in the digitized voice stream. Send the signal of
Circuit elements for
A low power integrated circuit comprising:
[C15]
The low power integrated circuit of C14, wherein the memory maintains the stored digitized audio stream for a predetermined time period.

Claims

A method for processing audio information , comprising:
Receiving an audio stream by a computing device, the audio stream including keywords and additional audio,
Monitoring the audio stream by a low power integrated circuit of the computing device ;
Digitizing the audio stream by the low power integrated circuit, the digitized audio stream including the keyword and the additional audio;
Storing the digitized audio stream in a memory within the computing device, wherein storing the digitized audio stream is the same as the previous digitized audio stored in the memory. Comprising replacing a stream with the digitized audio stream,
Analyzing the stored digitized audio stream for recognition of the keyword by the low power integrated circuit;
If you recognize the keyword in the stored digitized in the voice stream, wherein the low-power integrated circuits, processors of the computing device, to enter a power use state increased, and the memory from the memory Causing at least said additional audio of a digitized audio stream of audio,
Obtaining at least the additional audio of the stored digitized audio stream from the memory by the processor;
At least the addition of the stored digitized audio stream by the computing device to determine one or more actions based on at least the additional audio of the stored digitized audio stream. Sending audio to the server,
Rendering the response received by the computing device from the server, wherein the response is based on an analysis of at least the additional audio of the digitized audio stream, the one or more actions. Comprising one or more instructions for executing
A method comprising.

The method of claim 1, wherein the stored digitized audio stream is converted to a text stream .

The one or more instructions perform a web search, dial a phone number, open an application, record text, stream media, compose a text message, list a route displaying, or to speak journey, comprises at least one of the method of claim 1.

The method of claim 1, wherein the audio stream comprises at least one of a voice from a user, a voice from another computing device, or a voice from the other computing device.

The method of claim 1, further comprising compressing the digitized audio stream prior to storage in the memory.

In response to recognizing the keyword in the stored digitized audio stream,
Constantly receiving the audio stream;
Constantly monitoring the audio stream;
Constantly digitizing the audio stream;
The method of claim 1, further comprising:

The method of claim 1, further comprising outputting the rendered response by at least one of a display of the computing device or a speaker of the computing device.

The method of claim 1, wherein receiving the audio stream comprises receiving the audio stream from a user by a microphone of the computing device.

A computing device for processing audio information , comprising:
A low power integrated circuit,
A memory coupled to the low power integrated circuit,
A processor coupled to the low power integrated circuit and the memory, the low power integrated circuit comprising:
Monitoring and digitizing an audio stream received by the computing device, the digitized audio stream including keywords and additional audio;
Storing the digitized audio stream in the memory, wherein storing the digitized audio stream includes converting the previous digitized audio stream stored in the memory to the digital Comprising replacing the encrypted audio stream,
Analyzing the stored digitized audio stream to recognize the keyword;
Recognizing the keyword in the stored digitized audio stream causes the processor of the computing device to enter an increased power usage state and the stored digitized audio stream from the memory. Causing at least the additional audio of
Is configured to
The processor is
Obtaining at least the additional audio of the stored digitized audio stream from the memory;
Transmitting at least the additional audio of the stored digitized audio stream to a server to determine one or more actions based on at least the additional audio of the stored digitized audio stream. That
Rendering a response received by the computing device from a server, wherein the response is based on an analysis of at least the additional audio of the digitized audio stream,
A computing device configured to perform.

The computing device of claim 9, wherein the stored digitized audio stream is converted into a text stream.

The computing device of claim 9, wherein the response received from the server comprises one or more instructions for performing the one or more actions by the processor.

The one or more instructions perform a web search by the processor, dial a telephone number, open an application, record text, stream media, compose a text message, The computing device of claim 11, comprising at least one of listing a journey or speaking a journey.

The computing device of claim 9, wherein the audio stream comprises at least one of a voice from a user, a voice from another computing device, or a voice from the another computing device.

The computing device of claim 9, wherein the digitized audio stream is compressed prior to storage in the memory.

The low power integrated circuit is responsive to recognizing the keyword in the stored digitized audio stream,
Constantly receiving the audio stream;
Constantly monitoring the audio stream;
Constantly digitizing the audio stream;
The computing device of claim 9, further configured to:

A display configured to output the rendered response, or
A speaker configured to output the rendered response
The computing device of claim 9, further comprising at least one of:

The computing device of claim 9, wherein the received audio stream is received by a microphone of the computing device.

The processor is
Sending the keyword and the additional audio of the stored digitized audio stream to the server to determine the one or more actions.
The computing device of claim 9, further configured to:

A system for analyzing a digitized audio stream, comprising:
One or more processors,
Memory and
A low power integrated circuit coupled to the one or more processors and the memory;
And a low power integrated circuit comprising:
Monitoring and digitizing an audio stream received by the computing device, the digitized audio stream including keywords and additional audio;
Storing the digitized audio stream in the memory;
Analyzing the stored digitized audio stream to recognize the keyword;
Recognizing the keyword in the stored digitized audio stream causes the one or more processors to enter an increased power usage state.
Is configured to
Server and
And the server is
Receiving at least the additional audio of the digitized audio stream transmitted from the computing device;
Processing at least the additional audio of the received digitized audio stream to determine one or more actions;
Sending instructions to the computing device to perform the one or more actions, wherein the instructions are based on an analysis of at least the additional audio of the received digitized audio stream. ,
A system configured to perform.

20. The system of claim 19, wherein the one or more processors are further configured to perform the one or more actions based at least in part on the instructions received from the server.

The one or more actions are performing a web search, dialing a phone number, opening an application, recording text, streaming media, composing a text message, listing a journey 21. The system of claim 20, comprising at least one of displaying or speaking a journey.

20. The system of claim 19, wherein the one or more processors are further configured to send at least the additional audio of the digitized audio stream to the server.

23. The system of claim 22, wherein the one or more processors are further configured to compress the digitized audio stream.

20. The system of claim 19, wherein the received audio stream is received by a microphone of the computing device.

20. The system of claim 19, wherein the low power integrated circuit is further configured to replace a previous digitized audio stream stored in the memory with the digitized audio stream.

The computing device is
A display configured to output the rendered response, or
A speaker configured to output the rendered response
20. The system of claim 19, further comprising at least one of:

In order to process the received digitized audio stream, the server is
Converting the received digitized audio stream into a text stream;
Analyzing the text stream to determine the meaning of the text stream;
20. The system of claim 19, further configured to: