JP2024529843A

JP2024529843A - Base calling using multiple base call models

Info

Publication number: JP2024529843A
Application number: JP2023580568A
Authority: JP
Inventors: ギャヴィン・デレク・パーナビー; マーク・デイヴィッド・ハーム
Original assignee: イルミナインコーポレイテッド
Priority date: 2021-08-03
Filing date: 2022-08-02
Publication date: 2024-08-14
Also published as: EP4381514A1; WO2023014741A1; KR20240035413A

Abstract

少なくとも２つのベースコーラを使用するベースコールの方法が開示される。本方法は、一連の感知サイクル内の感知サイクルに対して生成されるセンサデータに対して少なくとも第１のベースコーラ及び第２のベースコーラを実行することと、第１のベースコーラによって、センサデータに対して第１のベースコーラを実行することに基づいて、センサデータに関連付けられた第１の分類情報を生成することと、第２のベースコーラによって、センサデータに対して第２のベースコーラを実行することに基づいて、センサデータに関連付けられた第２の分類情報を生成することと、を含む。一例では、第１の分類情報及び第２の分類情報に基づいて、最終分類情報が生成され、最終分類情報は、センサデータに対する１つ以上のベースコールを含む。A method of base calling using at least two base callers is disclosed. The method includes: running at least a first base caller and a second base caller on sensor data generated for a sensing cycle in a series of sensing cycles; generating first classification information associated with the sensor data based on running the first base caller on the sensor data with the first base caller; and generating second classification information associated with the sensor data based on running the second base caller on the sensor data with the second base caller. In one example, final classification information is generated based on the first classification information and the second classification information, the final classification information including one or more base calls for the sensor data.

Description

（優先権出願）
本出願は、２０２２年７月２８日に出願された「ＢａｓｅＣａｌｌｉｎｇＵｓｉｎｇＭｕｌｔｉｐｌｅＢａｓｅＣａｌｌｅｒＭｏｄｅｌｓ」と題する米国特許非仮出願第１７／８７６，５２８号（代理人整理番号ＩＬＬＭ１０２１－２／ＩＰ－１８５６－ＵＳ）の優先権を主張し、これは、２０２１年８月３日に出願された「ＢａｓｅＣａｌｌｉｎｇＵｓｉｎｇＭｕｌｔｉｐｌｅＢａｓｅＣａｌｌｅｒＭｏｄｅｌｓ」と題する米国特許仮出願第６３／２２８，９５４号（代理人整理番号ＩＬＬＭ１０２１－１／ＩＰ－１８５６－ＰＲＶ）の利益を主張する。優先権出願は、全ての目的のために参照により本明細書に組み込まれる。 (Priority application)
This application claims priority to U.S. Nonprovisional Patent Application No. 17/876,528, entitled "Base Calling Using Multiple Base Caller Models," filed July 28, 2022 (Attorney Docket No. ILLM1021-2/IP-1856-US), which claims the benefit of U.S. Provisional Patent Application No. 63/228,954, entitled "Base Calling Using Multiple Base Caller Models," filed August 3, 2021 (Attorney Docket No. ILLM1021-1/IP-1856-PRV). The priority application is incorporated herein by reference for all purposes.

（発明の分野）
開示される技術は、人工知能型コンピュータ及びデジタルデータ処理システム、並びに知能（すなわち、知識ベースのシステム、推論システム、及び知識取得システム）を模倣するための対応するデータ処理方法及び製品に関し、不確実性を伴う推論のためのシステム（例えば、ファジー論理システム）、適応システム、機械学習システム、及び人工ニューラルネットワークを含む。具体的には、開示される技術は、データを分析するための深層畳み込みニューラルネットワークなどの深層ニューラルネットワークを使用することに関する。 FIELD OF THEINVENTION
The disclosed technology relates to artificial intelligence based computers and digital data processing systems and corresponding data processing methods and products for mimicking intelligence (i.e., knowledge-based, inference, and knowledge acquisition systems), including systems for reasoning with uncertainty (e.g., fuzzy logic systems), adaptive systems, machine learning systems, and artificial neural networks. In particular, the disclosed technology relates to using deep neural networks, such as deep convolutional neural networks, to analyze data.

（組み込み）
以下は、本明細書に完全に記載されているかのように参照により組み込まれる。
２０２０年２月２０日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＢａｓｅＣａｌｌｉｎｇｏｆＩｎｄｅｘＳｅｑｕｅｎｃｅｓ」と題する米国特許仮出願第６２／９７９，３８４号（代理人整理番号ＩＬＬＭ１０１５－１／ＩＰ－１８５７－ＰＲＶ）、
２０２０年２月２０日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＭａｎｙ－ｔｏ－ＭａｎｙＢａｓｅＣａｌｌｉｎｇ」と題する米国特許仮出願第６２／９７９，４１４号（代理人整理番号ＩＬＬＭ１０１６－１／ＩＰ－１８５８－ＰＲＶ）、
２０２０年３月２０日に出願された「ＴｒａｉｎｉｎｇＤａｔａＧｅｎｅｒａｔｉｏｎｆｏｒＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＳｅｑｕｅｎｃｉｎｇ」と題する米国特許非仮出願第１６／８２５，９８７号（代理人整理番号ＩＬＬＭ１００８－１６／ＩＰ－１６９３－ＵＳ）、
２０２０年３月２０日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＧｅｎｅｒａｔｉｏｎｏｆＳｅｑｕｅｎｃｉｎｇＭｅｔａｄａｔａ」と題する米国特許非仮出願第１６／８２５，９９１号（代理人整理番号ＩＬＬＭ１００８－１７／ＩＰ－１７４１－ＵＳ）、
２０２０年３月２０日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＢａｓｅＣａｌｌｉｎｇ」と題する米国特許非仮出願第１６／８２６，１２６号（代理人整理番号ＩＬＬＭ１００８－１８／ＩＰ－１７４４－ＵＳ）、
２０２０年３月２０日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＱｕａｌｉｔｙＳｃｏｒｉｎｇ」と題する米国特許非仮出願第１６／８２６，１３４号（代理人整理番号ＩＬＬＭ１００８－１９／ＩＰ－１７４７－ＵＳ）、及び
２０２０年３月２１日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＳｅｑｕｅｎｃｉｎｇ」と題する米国特許非仮出願第１６／８２６，１６８号（代理人整理番号ＩＬＬＭ１００８－２０／ＩＰ－１７５２－ＰＲＶ－ＵＳ）。 (Built-in)
The following are incorporated by reference as if fully set forth herein:
U.S. Provisional Patent Application No. 62/979,384, entitled “Artificial Intelligence-Based Base Calling of Index Sequences,” filed February 20, 2020 (Attorney Docket No. ILLM1015-1/IP-1857-PRV);
U.S. Provisional Patent Application No. 62/979,414, entitled “Artificial Intelligence-Based Many-to-Many Base Calling,” filed February 20, 2020 (Attorney Docket No. ILLM1016-1/IP-1858-PRV);
U.S. Non-provisional Patent Application No. 16/825,987, entitled “Training Data Generation for Artificial Intelligence-Based Sequencing,” filed March 20, 2020 (Attorney Docket No. ILLM1008-16/IP-1693-US);
U.S. Non-provisional Patent Application No. 16/825,991, entitled “Artificial Intelligence-Based Generation of Sequencing Metadata,” filed March 20, 2020 (Attorney Docket No. ILLM1008-17/IP-1741-US);
U.S. Nonprovisional Patent Application No. 16/826,126, entitled “Artificial Intelligence-Based Base Calling,” filed March 20, 2020 (Attorney Docket No. ILLM1008-18/IP-1744-US);
U.S. Non-provisional Patent Application No. 16/826,134, entitled "Artificial Intelligence-Based Quality Scoring," filed March 20, 2020 (Attorney Docket No. ILLM1008-19/IP-1747-US), and U.S. Non-provisional Patent Application No. 16/826,168, entitled "Artificial Intelligence-Based Sequencing," filed March 21, 2020 (Attorney Docket No. ILLM1008-20/IP-1752-PRV-US).

本セクションで考察される主題は、単に本セクションにおける言及の結果として、先行技術であると想定されるべきではない。同様に、本セクションで言及した問題、又は背景として提供された主題と関連付けられた問題は、先行技術において以前に認識されていると想定されるべきではない。本セクションの主題は、単に異なるアプローチを表し、それ自体はまた、特許請求される技術の実装形態に対応し得る。 The subject matter discussed in this section should not be assumed to be prior art merely as a result of its reference in this section. Similarly, it should not be assumed that the problems referenced in this section, or associated with the subject matter provided as background, have been previously recognized in the prior art. The subject matter in this section merely represents different approaches, which as such may also correspond to implementations of the claimed technology.

計算能力の急速な改善により、近年、多くのコンピュータビジョンタスクにおいて、深層畳み込みニューラルネットワーク（Convolution Neural Network、ＣＮＮ）が、著しく改善された精度で、大成功を収めることが可能となった。推論段階の間、多くのアプリケーションは、厳密な電力消費要件を伴う、１つの画像の低遅延の処理を必要とし、これにより、グラフィックス処理ユニット（Graphics Processing Unit、ＧＰＵ）及び他の汎用プラットフォームの効率が低下し、これにより、特定のアクセレレーションハードウェア、例えば、フィールドプログラマブルゲートアレイ（Field Programmable Gate Array、ＦＰＧＡ）にとっては、深層学習アルゴリズムの推論に特に効果的となるようにデジタル回路をカスタマイズすることによって、好機をもたらすこととなる。しかしながら、携帯型及び埋め込み式システムにＣＮＮを配備することは、大きいデータ量、集中的な計算、様々なアルゴリズム構造、及び頻繁なメモリアクセスのために依然として困難である。 Rapid improvements in computing power have enabled deep convolutional neural networks (CNNs) to achieve great success in many computer vision tasks in recent years, with significantly improved accuracy. During the inference phase, many applications require low-latency processing of a single image with strict power consumption requirements, which reduces the efficiency of Graphics Processing Units (GPUs) and other general-purpose platforms, providing an opportunity for specific acceleration hardware, such as Field Programmable Gate Arrays (FPGAs), by customizing digital circuits to be particularly effective for inference of deep learning algorithms. However, deploying CNNs in portable and embedded systems remains challenging due to the large data volume, intensive computation, various algorithm structures, and frequent memory accesses.

畳み込みが、ＣＮＮにおけるほとんどの演算を提供するので、畳み込みアクセレレーションスキームが、ハードウェアＣＮＮアクセラレータの効率及び性能に大きく影響することになる。畳み込みは、カーネル及び特徴マップに沿ってスライドする４つのレベルのループを伴う、積和（multiply and accumulate、ＭＡＣ）演算を含む。第１のループレベルは、１つのカーネルウィンドウ内のピクセルのＭＡＣを計算する。第２のループレベルは、様々な異なる入力特徴マップにわたるＭＡＣの積の和を累積する。第１及び第２のループレベルを完了した後、バイアスを追加することにより、出力特徴マップにおける最終出力要素が得られる。第３のループレベルは、入力特徴マップ内で、カーネルウィンドウをスライドさせる。第４のループレベルは、様々な異なる出力特徴マップを発生させる。 Since convolution provides most of the operations in CNN, the convolution acceleration scheme will greatly affect the efficiency and performance of hardware CNN accelerators. Convolution involves multiply and accumulate (MAC) operations with four levels of loops that slide along the kernel and feature maps. The first loop level calculates the MAC of pixels in one kernel window. The second loop level accumulates the sum of MAC products over various different input feature maps. After completing the first and second loop levels, the final output element in the output feature map is obtained by adding a bias. The third loop level slides the kernel window within the input feature map. The fourth loop level generates various different output feature maps.

ＦＰＧＡは、特に、推論タスクを加速化するために、より多くの関心を集め、より広く普及してきた。それは、ＦＰＧＡが、（１）再構成可能性が高く、（２）ＣＮＮの急速な進化にキャッチアップするために必要な開発時間の速さという点で、特定用途向け集積回路（Application Specific Integrated Circuit、ＡＳＩＣ）と比較して優れており、（３）良好な性能を有し、（４）ＧＰＵと比較して、エネルギー効率が優れている、ということに起因する。ＦＰＧＡの高い性能及び高い効率性は、特定の計算のためにカスタマイズされた回路を合成して、カスタマイズされたメモリシステムで数十億回の演算を直接処理することによって実現することができる。例えば、モデムＦＰＧＡにおける数百から数千のデジタル信号処理（digital signal processing、ＤＳＰ）ブロックは、コア畳み込み演算、例えば、高度の並列処理を伴う積和演算をサポートする。外部オンチップメモリとオンチッププロセッシングエンジン（processing engine、ＰＥ）と間の専用データバッファは、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）チップ上に、数十メガバイトのオンチップブロックランダムアクセスメモリ（block random access memory、ＢＲＡＭ）を構成することにより、優先データフローを実現するように設計することができる。 FPGAs have been gaining more attention and becoming more widespread, especially for accelerating inference tasks. This is because FPGAs (1) are highly reconfigurable, (2) are superior to Application Specific Integrated Circuits (ASICs) in terms of the development time required to keep up with the rapid evolution of CNNs, (3) have good performance, and (4) are more energy efficient than GPUs. The high performance and efficiency of FPGAs can be achieved by synthesizing circuits customized for specific calculations and directly processing billions of operations with customized memory systems. For example, hundreds to thousands of digital signal processing (DSP) blocks in modem FPGAs support core convolution operations, such as multiply-and-accumulate operations with a high degree of parallelism. Dedicated data buffers between external on-chip memory and the on-chip processing engine (PE) can be designed to achieve prioritized data flow by configuring tens of megabytes of on-chip block random access memory (BRAM) on a field programmable gate array (FPGA) chip.

高性能を達成するためにリソース利用を最大化しながら、データ通信を最小限に抑えるために、ＣＮＮアクセレレーションの効率的なデータフロー及びハードウェアアーキテクチャが望まれている。アクセレレーションハードウェア上の様々なＣＮＮアルゴリズムの推論プロセスを加速化し、高い性能、高い効率、及び高い柔軟性を実現するための方法論及びフレームワークを設計するための好機が生じることとなる。 An efficient data flow and hardware architecture for CNN acceleration is desired to minimize data communication while maximizing resource utilization to achieve high performance. This creates an opportunity to design methodologies and frameworks to accelerate the inference process of various CNN algorithms on acceleration hardware and achieve high performance, high efficiency, and high flexibility.

図面では、同様の参照文字は、概して、異なる図全体を通して同様の部分を指す。また、図面は必ずしも縮尺通りではなく、その代わりに、開示された技術の原理を例示することを強調している。以下の説明では、開示される技術の様々な実施態様が、以下の図面を参照して説明される。
様々な実施形態で使用することができるバイオセンサの断面図を示す図である。そのタイル内にクラスタを含むフローセルの一実装形態を示す。８つのレーンを有する例示的なフローセルを示し、１つのタイル及びそのクラスタ及びそれらの周囲の背景のズームインも示す。ベースコールセンサ出力など、配列決定システムからのセンサデータの分析のためのシステムの簡略ブロック図である。ホストプロセッサによって実行されるランタイムプログラムの機能を含む、ベースコール動作の態様を示す簡略図である。図４の構成可能なプロセッサなど、構成可能なプロセッサの構成の簡略図である。バイオセンサによって出力された生画像に対するベースコール動作のために２つ以上のベースコーラを採用するシステムを示す図である。本明細書に記載のように構成された構成可能又は再構成可能なアレイを使用して実行することができるニューラルネットワークアーキテクチャの図である。図７のもののようなニューラルネットワークアーキテクチャによって使用されるセンサデータのタイルの組織の簡略図である。図７のもののようなニューラルネットワークアーキテクチャによって使用されるセンサデータのタイルのパッチの簡略図である。フィールドプログラマブルゲートアレイ（ＦＰＧＡ）などの構成可能又は再構成可能なアレイ上の図７のもののようなニューラルネットワークの構成の一部を示す。本明細書に記載のように構成された構成可能又は再構成可能なアレイを使用して実行することができる別の代替のニューラルネットワークアーキテクチャの図である。異なる配列決定サイクルでデータの処理を分離するために使用されるニューラルネットワークベースのベースコーラの専用アーキテクチャの一実装形態を示す。各々が畳み込みを含み得る、分離された層の一実装形態を示す。各々が畳み込みを含み得る、組み合わせ層の一実装形態を示す。各々が畳み込みを含み得る、組み合わせ層の別の実装形態を示す。塩基配列を含む未知の検体のベースコールを予測するための、複数のベースコーラを含むベースコールシステムを示す図である。センサデータの対応するセットのための図１４のベースコールシステムの種々の動作を描写する、対応するフローチャートである。センサデータの対応するセットのための図１４のベースコールシステムの種々の動作を描写する、対応するフローチャートである。センサデータの対応するセットのための図１４のベースコールシステムの種々の動作を描写する、対応するフローチャートである。センサデータの対応するセットのための図１４のベースコールシステムの種々の動作を描写する、対応するフローチャートである。センサデータの対応するセットのための図１４のベースコールシステムの種々の動作を描写する、対応するフローチャートである。センサデータの例示的なセットのためのコンテキスト情報を生成する、図１４のベースコールシステムのコンテキスト情報生成モジュールを示す図である。タイルの空間位置に基づいて分類されたタイルを含むフローセルを示す図である。クラスタの空間位置に基づいて分類されたクラスタを含むフローセルのタイルを示す図である。サイクル数の関数として信号強度が減少したフェーディングの一例を示す図であり、ベースコール動作の配列決定実行である。配列決定進行のサイクルとしての減少する信号対雑音比を概念的に示す図である。ベースコーラの異なる例示的構成のベースコールホモポリマー（例えば、ＧＧＧＧＧ）及び近ホモポリマー（例えば、ＧＧＴＧＧ）にわたるベースコール精度（１ベースコールエラー率）を示す図である。図１４のベースコールシステムの第１のベースコーラからの第１のベースコール分類情報及び第２のベースコーラからの第２のベースコール分類情報の関数に基づく、センサデータのセットのための最終ベースコールの生成を示す図である。時間的コンテキスト情報に基づいて、最終信頼スコアのために使用される例示的な重み付け方式を示すルックアップテーブル（ＬＵＴ）である。呼び出される塩基が特別な塩基配列を含む場合に、使用されるベースコーラを示すＬＵＴである。呼び出される塩基が特別な塩基配列を含む場合に、個々のベースコーラの信頼スコアに与えられる重み付けを示すＬＵＴである。フローセルのクラスタ内の１つ以上の気泡の検出を考慮した、図１４のベースコール結合モジュールの動作を示すＬＵＴである。フローセルのクラスタからの焦点外画像（複数の場合もある）の検出を考慮した、図１４のベースコール結合モジュールの動作を示すＬＵＴである。使用される試薬の群に基づいて、個々のベースコーラの信頼スコアに与えられる例示的な重み付けを示すＬＵＴである。タイルの空間分類を考慮して、図１４のベースコール結合モジュールの動作を示すＬＵＴである。クラスタの空間分類を考慮して、図１４のベースコール結合モジュールの動作を示すＬＵＴである。（ｉ）特別な塩基配列が検出され、（ｉｉ）第１のベースコーラからの第１の呼び出される塩基が、第２のベースコーラからの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュールの動作を示すＬＵＴである。（ｉ）気泡がクラスタ内で検出され、（ｉｉ）第１のベースコーラからの第１の呼び出される塩基が、第２のベースコーラからの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュールの動作を示すＬＵＴである。（ｉ）１つ以上の焦点外画像が少なくとも１つのクラスタから検出され、（ｉｉ）第１のベースコーラからの第１の呼び出される塩基が、第２のベースコーラからの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュールの動作を示すＬＵＴである。（ｉ）センサデータがエッジクラスタからのものであり、（ｉｉ）第１のベースコーラからの第１の呼び出される塩基が、第２のベースコーラからの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュールの動作を示すＬＵＴである。塩基配列を含む未知の検体のベースコールを予測するための複数のベースコーラを含むベースコールシステムを示す図であり、ニューラルネットワークベースの最終ベースコール決定モジュールが、複数のベースコーラのうちの１つ以上の出力に基づいて最終ベースコールを決定する。一実装形態によるベースコールシステムのブロック図である。図２２のシステムで使用することができるシステムコントローラのブロック図である。開示される技術を実装するために使用することができるコンピュータシステムの簡略ブロック図である。 In the drawings, like reference characters generally refer to like parts throughout the different views. Also, the drawings are not necessarily to scale, emphasis instead being placed upon illustrating the principles of the disclosed technology. In the following description, various embodiments of the disclosed technology are described with reference to the following drawings, in which:
FIG. 1 illustrates a cross-sectional view of a biosensor that can be used in various embodiments. 1 shows one implementation of a flow cell that includes clusters within its tiles. An exemplary flow cell with eight lanes is shown, along with a zoom-in of one tile and its clusters and their surrounding background. FIG. 1 is a simplified block diagram of a system for analysis of sensor data from a sequencing system, such as base call sensor output. FIG. 2 is a simplified diagram illustrating aspects of a base call operation, including functions of a runtime program executed by a host processor. 5 is a simplified diagram of a configuration of a configurable processor, such as the configurable processor of FIG. 4 . FIG. 1 illustrates a system that employs two or more base callers for base calling operations on raw images output by a biosensor. FIG. 1 is a diagram of a neural network architecture that can be implemented using a configurable or reconfigurable array configured as described herein. FIG. 8 is a simplified diagram of an organization of tiles of sensor data used by a neural network architecture such as that of FIG. FIG. 8 is a simplified diagram of a patch of tiles of sensor data used by a neural network architecture such as that of FIG. 8 illustrates part of the implementation of a neural network such as that of FIG. 7 on a configurable or reconfigurable array such as a field programmable gate array (FPGA). FIG. 13 is a diagram of another alternative neural network architecture that can be implemented using a configurable or reconfigurable array configured as described herein. 1 shows one implementation of a dedicated architecture of a neural network-based base caller used to separate the processing of data in different sequencing cycles. 1 illustrates one implementation of separated layers, each of which may contain convolutions. 1 illustrates one implementation of combinational layers, each of which may include convolutions. 13 illustrates another implementation of combination layers, each of which may include convolutions. FIG. 1 shows a base calling system including multiple base callers for predicting base calls of an unknown sample that contains a base sequence. 15 is a corresponding flowchart depicting various operations of the base calling system of FIG. 14 for a corresponding set of sensor data. 15 is a corresponding flowchart depicting various operations of the base calling system of FIG. 14 for a corresponding set of sensor data. 15 is a corresponding flowchart depicting various operations of the base calling system of FIG. 14 for a corresponding set of sensor data. 15 is a corresponding flowchart depicting various operations of the base calling system of FIG. 14 for a corresponding set of sensor data. 15 is a corresponding flowchart depicting various operations of the base calling system of FIG. 14 for a corresponding set of sensor data. FIG. 15 illustrates a context information generation module of the base calling system of FIG. 14 that generates context information for an exemplary set of sensor data. FIG. 1 illustrates a flow cell containing tiles sorted based on the spatial location of the tiles. FIG. 1 shows tiles of a flow cell containing clusters sorted based on the spatial location of the clusters. FIG. 1 shows an example of fading in signal intensity decrease as a function of cycle number, a sequencing run of a base calling operation. FIG. 1 conceptually illustrates the decreasing signal-to-noise ratio as cycles of sequencing progress. FIG. 1 shows base calling accuracy (one base calling error rate) across homopolymer (e.g., GGGGG) and near-homopolymer (e.g., GGTGG) base calls of different exemplary compositions of base calls. FIG. 15 illustrates the generation of a final base call for a set of sensor data based on a function of first base call classification information from a first base caller and second base call classification information from a second base caller of the base calling system of FIG. 1 is a look-up table (LUT) illustrating an exemplary weighting scheme used for the final confidence score based on temporal context information. This is a LUT that indicates the base call to be used when the base being called contains a special base sequence. This is a LUT that indicates the weighting given to the confidence scores of individual base calls when the base being called contains a particular base sequence. 15 is a LUT illustrating the operation of the base call binding module of FIG. 14 taking into account the detection of one or more air bubbles in a cluster of flow cells. 15 is a LUT illustrating the operation of the base calling combination module of FIG. 14 taking into account the detection of out-of-focus image(s) from a cluster of flow cells. 13 is a LUT showing exemplary weightings given to the confidence scores of individual base collaborators based on the group of reagents used. 15 is a LUT illustrating the operation of the basecalling combination module of FIG. 14 taking into account the spatial grouping of tiles. 15 is a LUT illustrating the operation of the base call combination module of FIG. 14, taking into account the spatial classification of clusters. 15 is a LUT illustrating the operation of the base call combination module of FIG. 14 when (i) a special base sequence is detected and (ii) a first called base from a first base caller does not match a second called base from a second base caller. 15 is a LUT showing the operation of the base calling combination module of FIG. 14 when (i) a bubble is detected in a cluster and (ii) a first called base from a first base caller does not match a second called base from a second base caller. 15 is a LUT illustrating the operation of the base calling combination module of FIG. 14 when (i) one or more out-of-focus images are detected from at least one cluster, and (ii) a first called base from a first base caller does not match a second called base from a second base caller. 15 is a LUT showing the operation of the base call combination module of FIG. 14 when (i) the sensor data is from an edge cluster, and (ii) a first called base from a first base caller does not match a second called base from a second base caller. FIG. 1 illustrates a base calling system including a plurality of base callers for predicting base calls for an unknown sample that includes a base sequence, in which a neural network-based final base call determination module determines a final base call based on the output of one or more of the plurality of base callers. FIG. 1 is a block diagram of a base calling system according to one implementation. FIG. 23 is a block diagram of a system controller that can be used in the system of FIG. 22. FIG. 1 is a simplified block diagram of a computer system that can be used to implement the disclosed techniques.

本明細書で使用される場合、「ポリヌクレオチド」又は「核酸」という用語は、デオキシリボ核酸（deoxyribonucleic acid、ＤＮＡ）を指し、しかしながら、適切な場合には、当業者は、本明細書のシステム及びデバイスをリボ核酸（ribonucleic acid、ＲＮＡ）とともに利用することもできることを認識するであろう。これらの用語は、同等物として、ヌクレオチド類似体から作製されるＤＮＡ又はＲＮＡのいずれかの類似体を含むと理解されるべきである。本明細書で使用されるこれらの用語はまた、例えば逆転写酵素の作用によって、相補的であるｃＤＮＡ、又はＲＮＡ鋳型から生成されるコピーＤＮＡも包含する。 As used herein, the term "polynucleotide" or "nucleic acid" refers to deoxyribonucleic acid (DNA); however, where appropriate, one of skill in the art will recognize that the systems and devices herein can also be utilized with ribonucleic acid (RNA). These terms should be understood to include, as equivalents, analogs of either DNA or RNA made from nucleotide analogs. As used herein, these terms also encompass complementary cDNA or copy DNA generated from an RNA template, for example, by the action of reverse transcriptase.

本明細書のシステム及びデバイスによって配列決定される一本鎖ポリヌクレオチド分子は、ＤＮＡ若しくはＲＮＡとして一本鎖形態に由来し得るか、又は二本鎖ＤＮＡ（ｄｓＤＮＡ）形態（例えば、ゲノムＤＮＡ断片、ＰＣＲ及び増幅産物など）に由来し得る。したがって、一本鎖ポリヌクレオチドは、ポリヌクレオチド二重鎖のセンス鎖又はアンチセンス鎖であり得る。標準的な技法を使用した本開示の方法における使用に好適な一本鎖ポリヌクレオチド分子の調製方法は、当該技術分野で既知である。一次ポリヌクレオチド分子の正確な配列は、一般に、本開示に重要ではなく、既知又は不明であり得る。一本鎖ポリヌクレオチド分子は、イントロン及びエクソン配列（コード配列）の両方、並びにプロモーター及びエンハンサー配列などの非コード調節配列を含む、ゲノムＤＮＡ分子（例えば、ヒトゲノムＤＮＡ）を表し得る。 The single-stranded polynucleotide molecules sequenced by the systems and devices herein may originate in single-stranded form as DNA or RNA, or may originate in double-stranded DNA (dsDNA) form (e.g., genomic DNA fragments, PCR and amplification products, etc.). Thus, the single-stranded polynucleotide may be the sense or antisense strand of a polynucleotide duplex. Methods for preparing single-stranded polynucleotide molecules suitable for use in the methods of the present disclosure using standard techniques are known in the art. The exact sequence of the primary polynucleotide molecule is generally not critical to the present disclosure and may be known or unknown. The single-stranded polynucleotide molecule may represent a genomic DNA molecule (e.g., human genomic DNA), including both intron and exon sequences (coding sequences), as well as non-coding regulatory sequences such as promoter and enhancer sequences.

或る特定の実施形態では、例えば、本開示の使用によって配列決定される核酸は、基質（例えば、フローセル内の基質又はフローセルなどの基質上の１つ以上のビーズなど）に固定化される。本明細書で使用される「固定化された」という用語は、明示的又は文脈によって別途示されない限り、直接的又は間接的な、共有付着又は非共有付着を包含することを意図する。或る特定の実施形態では、共有付着が好まれ得るが、一般的に、必要とされるのは、例えば、核酸配列決定を必要とする用途において、支持体を使用することが意図される条件下で、分子（例えば、核酸）が、支持体に固定化されたままである又は付着したままであるということである。 In certain embodiments, for example, the nucleic acid to be sequenced by use of the present disclosure is immobilized on a substrate (e.g., a substrate in a flow cell or one or more beads on a substrate such as a flow cell). The term "immobilized" as used herein is intended to encompass direct or indirect, covalent or non-covalent attachment, unless otherwise indicated explicitly or by context. In certain embodiments, covalent attachment may be preferred, but generally, what is required is that the molecule (e.g., nucleic acid) remain immobilized or attached to the support under conditions in which the support is intended to be used, e.g., in applications requiring nucleic acid sequencing.

「固体支持体」（又は或る特定の使用における「基質」）という用語は、本明細書で記載される場合、例えば、ガラス表面、プラスチック表面、ラテックス、デキストラン、ポリスチレン表面、ポリプロピレン表面、ポリアクリルアミドゲル、金表面、及びシリコンウェハなどの、核酸が付着し得る任意の不活性基質又はマトリックスを指す。多くの実施形態では、固体支持体は、ガラス表面（例えば、フローセルチャネルの平面）である。或る特定の実施形態では、固体支持体は、例えば、ポリヌクレオチドなどの分子への共有付着を可能にする反応性基を含む中間材料の層又はコーティングを適用することなどによって、「官能化」されている不活性基質又はマトリックスを含み得る。非限定的な例として、そのような支持体は、ガラスなどの不活性基質上に支持されたポリアクリルアミドヒドロゲルを含むことができる。そのような実施形態では、分子（ポリヌクレオチド）は、中間材料（例えば、ヒドロゲル）に直接共有付着してもよいが、中間材料は、それ自体が基質又はマトリックス（例えば、ガラス基質）に非共有付着してもよい。固体支持体への共有付着は、この種類の配置を包含するように適宜解釈されるべきである。 The term "solid support" (or "substrate" in certain uses), as used herein, refers to any inert substrate or matrix to which nucleic acids may be attached, such as, for example, glass surfaces, plastic surfaces, latex, dextran, polystyrene surfaces, polypropylene surfaces, polyacrylamide gels, gold surfaces, and silicon wafers. In many embodiments, the solid support is a glass surface (e.g., the flat surface of a flow cell channel). In certain embodiments, the solid support may include an inert substrate or matrix that has been "functionalized," such as by applying a layer or coating of an intermediate material that includes reactive groups that allow for covalent attachment to molecules such as polynucleotides. As a non-limiting example, such a substrate may include a polyacrylamide hydrogel supported on an inert substrate such as glass. In such embodiments, the molecule (polynucleotide) may be covalently attached directly to the intermediate material (e.g., hydrogel), but the intermediate material may itself be non-covalently attached to the substrate or matrix (e.g., glass substrate). Covalent attachment to a solid support should be interpreted accordingly to encompass this type of arrangement.

上記のように、本開示は、核酸を配列決定するための新規のシステム及びデバイスを含む。当業者に明らかであるように、特定の核酸配列への本明細書における言及は、文脈に依存して、このような核酸配列を含む核酸分子も指し得る。ターゲット断片の配列決定は、塩基の時系列順の読み取りが確立されることを意味する。読み取られる塩基は、連続している必要はないが、これが好ましく、配列決定の間に全断片上の全ての塩基が配列決定される必要もない。配列決定は、任意の好適な配列決定技法を使用して実行することができ、ヌクレオチド又はオリゴヌクレオチドは、遊離３’ヒドロキシル基に連続的に添加され、５’から３’方向でのポリヌクレオチド鎖の合成をもたらす。付加されたヌクレオチドの性質は、好ましくは、各ヌクレオチド付加後に決定される。全ての連続塩基が配列決定されるわけではないライゲーションによる配列決定を使用する配列決定技法、及び表面上のストランドに塩基が付加されるのではなくストランドから塩基が除去される超並列シグネチャ配列決定（ＭＰＳＳ）などの技法も、本開示のシステム及びデバイスとともに使用するのに適している。 As noted above, the present disclosure includes novel systems and devices for sequencing nucleic acids. As will be apparent to one of skill in the art, reference herein to a particular nucleic acid sequence may also refer to a nucleic acid molecule that includes such a nucleic acid sequence, depending on the context. Sequencing a target fragment means that a chronological reading of the bases is established. The bases read do not have to be consecutive, although this is preferred, and it is not necessary that all bases on the entire fragment be sequenced during sequencing. Sequencing can be performed using any suitable sequencing technique, in which nucleotides or oligonucleotides are added sequentially to the free 3' hydroxyl group, resulting in the synthesis of a polynucleotide strand in the 5' to 3' direction. The nature of the added nucleotide is preferably determined after each nucleotide addition. Sequencing techniques that use sequencing by ligation, in which not all consecutive bases are sequenced, and techniques such as massively parallel signature sequencing (MPSS), in which bases are removed from strands on a surface rather than added to them, are also suitable for use with the systems and devices of the present disclosure.

或る特定の実施形態では、本開示は、合成による配列決定（ＳＢＳ）を開示する。ＳＢＳでは、４つの蛍光標識された修飾ヌクレオチドを使用して、基板（例えば、フローセル）の表面上に存在する増幅ＤＮＡの高密度クラスタ（おそらく数百万個のクラスタ）を配列決定する。本明細書のシステム及びデバイスとともに利用され得る、ＳＢＳ手順及び方法に関する種々の付加的態様は、例えば、国際公開第０４０１８４９７号、国際公開第０４０１８４９３号、及び米国特許第７，０５７，０２６号（ヌクレオチド）、国際公開第０５０２４０１０号及び国際公開第０６１２０４３３号（ポリメラーゼ）、国際公開第０５０６５８１４号（表面付着技法）、並びに国際公開第９８４４１５１号、国際公開第０６０６４１９９号、及び国際公開第０７０１０２５１号に開示され、そのそれぞれの内容は、参照によりその全体が本明細書に組み込まれる。 In certain embodiments, the present disclosure discloses sequencing by synthesis (SBS), in which four fluorescently labeled modified nucleotides are used to sequence high-density clusters (potentially millions of clusters) of amplified DNA present on the surface of a substrate (e.g., a flow cell). Various additional aspects of SBS procedures and methods that may be utilized with the systems and devices herein are disclosed, for example, in WO 04018497, WO 04018493, and U.S. Pat. No. 7,057,026 (nucleotides), WO 05024010 and WO 06120433 (polymerases), WO 05065814 (surface attachment techniques), and WO 9844151, WO 06064199, and WO 07010251, the contents of each of which are incorporated herein by reference in their entirety.

本明細書のシステム／デバイスの特定の使用では、配列決定のための核酸試料を含むフローセルは、適切なフローセルホルダ内に配置される。配列決定のための試料は、単一分子、クラスタの形態の増幅された単一分子、又は核酸の分子を含むビーズの形態をとることができる。核酸は、未知のターゲット配列に隣接するオリゴヌクレオチドプライマーを含むように調製される。第１のＳＢＳ配列決定サイクルを開始するために、１つ以上の異なる標識ヌクレオチド、及びＤＮＡポリメラーゼなどが、流体フローサブシステム（その様々な実施形態が本明細書に記載されている）によってフローセル内に／フローセルを通って流される。単一のヌクレオチドが一度に追加され得るか、又は配列決定手順で使用されるヌクレオチドが可逆終端特性を有するように特別に設計され得、したがって、配列決定反応の各サイクルが、４つ全ての標識ヌクレオチド（Ａ、Ｃ、Ｔ、Ｇ）の存在下で同時に生じることを可能にする。４個のヌクレオチドが一緒に混合される場合、ポリメラーゼは、正しい塩基を選択して組み込むことができ、各配列は、単一の塩基によって伸長される。システムを使用するそのような方法では、４つ全ての選択肢の間の自然な競合は、１つのヌクレオチドのみが反応混合物中に存在する場合（したがって、配列の大部分が正しいヌクレオチドに曝露されない場合）よりも高い精度をもたらす。特定の塩基が次々に繰り返される配列（例えば、ホモポリマー）は、任意の他の配列と同様に、高い精度で扱われる。 In a particular use of the system/device herein, a flow cell containing a nucleic acid sample for sequencing is placed in a suitable flow cell holder. The sample for sequencing can take the form of a single molecule, an amplified single molecule in the form of a cluster, or a bead containing molecules of nucleic acid. The nucleic acid is prepared to contain oligonucleotide primers flanking an unknown target sequence. To initiate the first SBS sequencing cycle, one or more different labeled nucleotides, a DNA polymerase, or the like, are flowed into/through the flow cell by a fluid flow subsystem (various embodiments of which are described herein). A single nucleotide can be added at a time, or the nucleotides used in the sequencing procedure can be specifically designed to have reversible termination properties, thus allowing each cycle of the sequencing reaction to occur simultaneously in the presence of all four labeled nucleotides (A, C, T, G). When the four nucleotides are mixed together, the polymerase can select and incorporate the correct base, and each sequence is extended by a single base. In such methods of using the system, the natural competition between all four options results in greater accuracy than if only one nucleotide were present in the reaction mixture (and thus the majority of the sequence would not be exposed to the correct nucleotide). Sequences in which a particular base is repeated one after the other (e.g., homopolymers) are treated with the same high accuracy as any other sequence.

流体フローサブシステムはまた、ブロックされた３’末端（適切な場合）及びフルオロフォアを各組み込まれた塩基から除去するために、適切な試薬を流す。基質は、４つのブロックされたヌクレオチドの第２ラウンド、又は必要に応じて、異なる個々のヌクレオチドを用いた第２ラウンドのいずれかに曝露され得る。次いで、このようなサイクルが繰り返され、各クラスタの配列が複数の化学サイクルにわたって読み取られる。本開示のコンピュータ態様は、任意選択で、各単一分子、クラスタ又はビーズから収集された配列データを整列させて、より長いポリマーなどの配列を決定することができる。あるいは、画像処理及び整列は、別個のコンピュータ上で実行することができる。 The fluid flow subsystem also flows appropriate reagents to remove blocked 3' ends (if appropriate) and fluorophores from each incorporated base. The substrate can then be exposed to either a second round of the four blocked nucleotides, or a second round with a different individual nucleotide, if desired. Such cycles are then repeated and the sequence of each cluster is read over multiple chemical cycles. The computer aspects of the present disclosure can optionally align sequence data collected from each single molecule, cluster or bead to determine the sequence of longer polymers, etc. Alternatively, image processing and alignment can be performed on separate computers.

システムの加熱／冷却構成要素は、フローセルチャネル及び試薬貯蔵領域／容器（並びに任意選択でカメラ、光学系、及び／又は他の構成要素）内の反応条件を調節し、一方、流体流構成要素は、取り込まれていない試薬が洗い流される間に、基質表面が取り込みに適した試薬（例えば、取り込まれる適切な蛍光標識ヌクレオチド）に曝露されることを可能にする。フローセルが配置される任意の可動ステージは、フローセルが基質のレーザ（又は他の光）励起のために適切な配向にされることを可能にし、任意選択で、基質の異なる領域の読み取りを可能にするために対物レンズに対して移動されることを可能にする。加えて、システムの他の構成要素もまた、任意選択的に移動可能／調整可能である（例えば、カメラ、対物レンズ、ヒータ／クーラなど）。レーザ励起の間、基質上の核酸から放出された蛍光の画像／位置は、カメラ構成要素によって捕捉され、それによって、コンピュータ構成要素では、各単一分子、クラスタ又はビーズについての第１の塩基の同一性を記録する。 The heating/cooling components of the system regulate the reaction conditions within the flow cell channel and the reagent storage areas/containers (and optionally the camera, optics, and/or other components), while the fluid flow components allow the substrate surface to be exposed to the appropriate reagents for incorporation (e.g., appropriate fluorescently labeled nucleotides to be incorporated) while unincorporated reagents are washed away. An optional movable stage on which the flow cell is positioned allows the flow cell to be properly oriented for laser (or other light) excitation of the substrate, and optionally moved relative to the objective lens to allow reading of different regions of the substrate. In addition, other components of the system are also optionally movable/adjustable (e.g., camera, objective lens, heater/cooler, etc.). During laser excitation, images/locations of the fluorescence emitted from the nucleic acids on the substrate are captured by the camera component, thereby recording the identity of the first base for each single molecule, cluster, or bead in the computer component.

本明細書に記載される実施形態は、学術分析又は商業的分析のための様々な生物学的又は化学的プロセス及びシステムにおいて使用されてもよい。より具体的には、本明細書に記載される実施形態は、所望の反応を示すイベント、特性、品質、又は特性を検出することが望ましい様々なプロセス及びシステムにおいて使用されてもよい。例えば、本明細書に記載される実施形態としては、カートリッジ、バイオセンサ、及びそれらの構成要素、並びにカートリッジ及びバイオセンサとともに動作するバイオアッセイシステムが挙げられる。特定の実施形態では、カートリッジ及びバイオセンサは、実質的に単一の構造で一緒に結合されたフローセル及び１つ以上のセンサ、ピクセル、光検出器、又はフォトダイオードを含む。 The embodiments described herein may be used in a variety of biological or chemical processes and systems for academic or commercial analysis. More specifically, the embodiments described herein may be used in a variety of processes and systems in which it is desirable to detect an event, characteristic, quality, or property indicative of a desired response. For example, the embodiments described herein include cartridges, biosensors, and components thereof, as well as bioassay systems that operate with the cartridges and biosensors. In certain embodiments, the cartridges and biosensors include a flow cell and one or more sensors, pixels, photodetectors, or photodiodes coupled together in a substantially single structure.

特定の実施形態の以下の詳細な説明は、添付の図面と併せて読むと、より良く理解され得る。図が様々な実施形態の機能ブロックの図を示す限りにおいて、機能ブロックは、必ずしもハードウェア回路間の分割を示すものではない。したがって、例えば、機能ブロック（例えば、プロセッサ又はメモリ）のうちの１つ以上は、１つのハードウェア（例えば、汎用信号プロセッサ又はランダムアクセスメモリ、ハードディスクなど）で実装されてもよい。同様に、プログラムは、スタンドアロンプログラムであってもよく、オペレーティングシステム内のサブルーチンとして組み込まれてもよく、インストールされたソフトウェアパッケージ内の機能であってもよい、など。様々な実施形態は、図面に示された配置及び手段に限定されないことを理解されたい。 The following detailed description of certain embodiments may be better understood when read in conjunction with the accompanying drawings. To the extent that the figures illustrate diagrams of functional blocks of various embodiments, the functional blocks do not necessarily indicative of a division between hardware circuitry. Thus, for example, one or more of the functional blocks (e.g., a processor or memory) may be implemented in a single piece of hardware (e.g., a general-purpose signal processor or random access memory, hard disk, etc.). Similarly, a program may be a stand-alone program, may be incorporated as a subroutine within an operating system, may be a function within an installed software package, etc. It should be understood that the various embodiments are not limited to the arrangements and instrumentalities shown in the drawings.

本明細書で使用する際、単数形で記載され、かつ単語「ａ」又は「ａｎ」に続く要素又は工程は、かかる除外が明示的に記載されていない限り、複数のこれらの要素又は工程を除外しないものとして理解されるべきである。更に、「一実施形態」への言及は、列挙された特徴をまた組み込む追加の実施形態の存在を除外するものとして解釈されることを意図するものではない。更に、反対に明示的に述べられていない限り、特定の特性を有する要素又は複数の要素を「備える」又は「有する」又は「含む」実施形態は、それらがその特性を有するかどうかにかかわらず、追加の要素を含み得る。 As used herein, elements or steps described in the singular and followed by the word "a" or "an" should be understood as not excluding a plurality of those elements or steps, unless such exclusion is expressly stated. Moreover, references to "one embodiment" are not intended to be interpreted as excluding the existence of additional embodiments that also incorporate the recited features. Furthermore, unless expressly stated to the contrary, embodiments that "comprise" or "have" or "include" an element or elements having a particular characteristic may include additional elements, whether or not they have that characteristic.

本明細書で使用するとき、「所望の反応」は、対象となる検体の化学的、電気的、物理的、又は光学的特性（又は品質）のうちの少なくとも１つの変化を含む。特定の実施形態では、所望の反応は、正の結合事象である（例えば、蛍光標識された生体分子の対象となる検体への組み込み）。より一般的には、所望の反応は、化学変換、化学変化、又は化学的相互作用であってもよい。所望の反応はまた、電気特性の変化であってもよい。例えば、所望の反応は、溶液内のイオン濃度の変化であってもよい。例示的な反応としては、還元、酸化、付加、脱離、再配列、エステル化、アミド化、エーテル化、環化、又は置換などの化学反応、第１の化学物質が第２の化学物質に結合する結合相互作用、２つ以上の化学物質が互いに分離する解離反応、蛍光、発光、生物発光、化学発光、並びに核酸複製、核酸増幅、核酸ハイブリダイゼーション、核酸ライゲーション、リン酸化、酵素触媒、受容体結合、又はリガンド結合などの生体反応、が挙げられるが、これらに限定されない。所望の反応はまた、例えば、周囲の溶液又は環境のｐＨの変化として検出可能である、プロトンの添加又は除去であってもよい。追加の所望の反応は、膜（例えば、天然又は合成二層膜）を横切るイオンの流れを検出することができ、例えば、イオンが膜を通って流れるとき、電流が乱れ、この乱れが検出され得る。 As used herein, a "desired reaction" includes a change in at least one of the chemical, electrical, physical, or optical properties (or qualities) of the analyte of interest. In certain embodiments, the desired reaction is a positive binding event (e.g., the incorporation of a fluorescently labeled biomolecule into the analyte of interest). More generally, the desired reaction may be a chemical conversion, a chemical change, or a chemical interaction. The desired reaction may also be a change in an electrical property. For example, the desired reaction may be a change in the concentration of an ion in a solution. Exemplary reactions include, but are not limited to, chemical reactions such as reduction, oxidation, addition, elimination, rearrangement, esterification, amidation, etherification, cyclization, or substitution; binding interactions in which a first chemical binds to a second chemical; dissociation reactions in which two or more chemicals separate from one another; fluorescence, luminescence, bioluminescence, chemiluminescence, and biological reactions such as nucleic acid replication, nucleic acid amplification, nucleic acid hybridization, nucleic acid ligation, phosphorylation, enzyme catalysis, receptor binding, or ligand binding. The desired reaction may also be the addition or removal of a proton, which is detectable, for example, as a change in the pH of the surrounding solution or environment. An additional desired response can be detecting the flow of ions across a membrane (e.g., a natural or synthetic bilayer membrane); for example, as ions flow through the membrane, a current is disturbed and this disturbance can be detected.

特定の実施形態では、所望の反応は、検体への蛍光標識分子の組み込みを含む。検体は、オリゴヌクレオチドであってもよく、蛍光標識分子は、ヌクレオチドであってもよい。所望の反応は、励起光が標識ヌクレオチドを有するオリゴヌクレオチドに方向付けられ、かつ蛍光団が検出可能な蛍光信号を発するときに、検出され得る。代替の実施形態では、検出された蛍光は、化学発光又は生物発光の結果である。所望の反応はまた、例えば、ドナーフルオロフォアをアクセプタフルオロフォアに近接させることによって蛍光（又はＦｏｒｓｔｅｒ）共鳴エネルギー移動（ＦＲＥＴ）を増加させるか、ドナーフルオロフォアとアクセプタフルオロフォアとを離すことによってＦＲＥＴを減少させるか、消光剤をフルオロフォアから離すことによって蛍光を増加させるか、又は消光剤及びフルオロフォアを共局在することによって蛍光を減少させることができる。 In certain embodiments, the desired reaction includes incorporation of a fluorescently labeled molecule into the analyte. The analyte may be an oligonucleotide and the fluorescently labeled molecule may be a nucleotide. The desired reaction may be detected when excitation light is directed to the oligonucleotide with the labeled nucleotide and the fluorophore emits a detectable fluorescent signal. In alternative embodiments, the detected fluorescence is the result of chemiluminescence or bioluminescence. The desired reaction may also increase fluorescence (or Forster) resonance energy transfer (FRET) by, for example, bringing a donor fluorophore into close proximity with an acceptor fluorophore, decrease FRET by separating the donor and acceptor fluorophores, increase fluorescence by separating a quencher from a fluorophore, or decrease fluorescence by colocalizing a quencher and a fluorophore.

本明細書で使用するとき、「反応成分」又は「反応物質」は、所望の反応を得るために使用され得る任意の物質を含む。例えば、反応成分としては、試薬、酵素、試料、他の生体分子、及び緩衝液が挙げられる。反応成分は、典型的には、溶液中の反応部位に送達され、及び／又は反応部位で固定される。反応成分は、対象となる検体などの別の物質と直接又は間接的に相互作用し得る。 As used herein, "reaction component" or "reactant" includes any substance that can be used to obtain a desired reaction. For example, reaction components include reagents, enzymes, samples, other biomolecules, and buffers. Reaction components are typically delivered to the reaction site in solution and/or immobilized at the reaction site. Reaction components may interact directly or indirectly with another substance, such as an analyte of interest.

本明細書で使用するとき、用語「反応部位」は、所望の反応が生じ得る局所的領域である。反応部位は、物質がその上に固定され得る基質の支持表面を含んでもよい。例えば、反応部位は、その上に核酸のコロニーを有するフローセルのチャネル内に実質的に平面状の表面を含んでもよい。常にではないが、典型的には、コロニー中の核酸は、同じ配列を有し、例えば、一本鎖又は二本鎖テンプレートのクローンコピーである。しかしながら、いくつかの実施形態では、反応部位は、例えば、一本鎖又は二本鎖形態で、単一の核酸分子のみを含有してもよい。更に、複数の反応部位は、支持表面に沿って不均一に分布してもよく、又は所定の様式で（例えば、マイクロアレイなどのマトリックス内で並列に）配置されてもよい。反応部位はまた、所望の反応を区画化するように構成された空間領域又は容積を少なくとも部分的に画定する反応チャンバ（又はウェル）を含むことができる。 As used herein, the term "reaction site" is a localized area where a desired reaction can occur. A reaction site may include a support surface of a substrate on which a substance may be immobilized. For example, a reaction site may include a substantially planar surface within a channel of a flow cell having a colony of nucleic acid thereon. Typically, but not always, the nucleic acid in the colony has the same sequence, e.g., is a clonal copy of a single-stranded or double-stranded template. However, in some embodiments, a reaction site may contain only a single nucleic acid molecule, e.g., in single-stranded or double-stranded form. Furthermore, multiple reaction sites may be distributed non-uniformly along the support surface or may be arranged in a predetermined manner (e.g., parallel in a matrix such as a microarray). A reaction site may also include a reaction chamber (or well) that at least partially defines a spatial region or volume configured to compartmentalize a desired reaction.

本出願は、用語「反応チャンバ」及び「ウェル」を互換的に使用する。本明細書で使用するとき、用語「反応チャンバ」又は「ウェル」は、フローチャネルと流体連通している空間領域を含む。反応チャンバは、周囲環境又は他の空間領域から少なくとも部分的に分離されてもよい。例えば、複数の反応チャンバは、共有された壁によって互いに分離されてもよい。より具体的な例として、反応チャンバは、ウェルの内面によって画定された空洞を含み、空洞がフローチャネルと流体連通しているように開口部又はアパーチャを有してもよい。そのような反応チャンバを含むバイオセンサは、２０１１年１０月２０日に出願された国際出願第ＰＣＴ／ＵＳ２０１１／０５７１１１号により詳細に記載されており、その全体は参照により本明細書に組み込まれる。 This application uses the terms "reaction chamber" and "well" interchangeably. As used herein, the term "reaction chamber" or "well" includes a spatial region in fluid communication with a flow channel. A reaction chamber may be at least partially isolated from the surrounding environment or other spatial regions. For example, multiple reaction chambers may be separated from one another by a shared wall. As a more specific example, a reaction chamber may include a cavity defined by an inner surface of the well and have an opening or aperture such that the cavity is in fluid communication with the flow channel. A biosensor including such a reaction chamber is described in more detail in International Application No. PCT/US2011/057111, filed October 20, 2011, the entirety of which is incorporated herein by reference.

いくつかの実施形態では、反応チャンバは、固体がその中に完全に又は部分的に挿入され得るように、固体（半固体を含む）に対してサイズ及び形状を定められる。例えば、反応チャンバは、ただ１つの捕捉ビーズを収容するようにサイズ及び形状を定められ得る。捕捉ビーズは、クローン的に増幅されたＤＮＡ又はその上の他の物質を有してもよい。あるいは、反応チャンバは、おおよその数のビーズ又は固体基質を受容するようにサイズ及び形状を定められ得る。別の例として、反応チャンバはまた、反応チャンバに流入し得る拡散又はフィルタ流体を制御するように構成された多孔質ゲル又は物質で充填されてもよい。 In some embodiments, the reaction chamber is sized and shaped relative to a solid (including a semi-solid) such that the solid can be fully or partially inserted therein. For example, the reaction chamber can be sized and shaped to accommodate only one capture bead. The capture bead may have clonally amplified DNA or other material thereon. Alternatively, the reaction chamber can be sized and shaped to receive an approximate number of beads or solid substrates. As another example, the reaction chamber may also be filled with a porous gel or material configured to control diffusion or filter fluids that may flow into the reaction chamber.

いくつかの実施形態では、センサ（例えば、光検出器、フォトダイオード）は、バイオセンサの試料表面の対応するピクセル領域に関連付けられている。したがって、ピクセル領域は、１つのセンサ（又はピクセル）のバイオセンサの試料表面上の領域を表す幾何学的構築物である。ピクセル領域に関連付けられたセンサは、関連するピクセル領域の上にある反応部位又は反応チャンバで所望の反応が生じたとき、関連するピクセル領域から収集された発光を検出する。平坦な表面の実施形態では、ピクセル領域は重なり合うことができる。いくつかの場合には、複数のセンサは、単一の反応部位又は単一の反応チャンバに関連付けられ得る。他の場合には、単一のセンサは、反応部位のグループ又は反応チャンバのグループに関連付けられ得る。 In some embodiments, a sensor (e.g., photodetector, photodiode) is associated with a corresponding pixel area of the biosensor sample surface. Thus, a pixel area is a geometric construct that represents an area on the biosensor sample surface of one sensor (or pixel). The sensor associated with a pixel area detects luminescence collected from the associated pixel area when a desired reaction occurs at a reaction site or reaction chamber overlying the associated pixel area. In flat surface embodiments, the pixel areas can overlap. In some cases, multiple sensors can be associated with a single reaction site or a single reaction chamber. In other cases, a single sensor can be associated with a group of reaction sites or a group of reaction chambers.

本明細書で使用するとき、「バイオセンサ」は、複数の反応部位及び／又は反応チャンバ（若しくはウェル）を有する構造を含む。バイオセンサは、ソリッドステート撮像デバイス（例えば、ＣＣＤ又はＣＭＯＳイメージャ）及び、任意選択的に、それに取り付けられたフローセルを含んでもよい。フローセルは、反応部位及び／又は反応チャンバと流体連通している少なくとも１つのフローチャネルを含み得る。１つの特定の例として、バイオセンサは、バイオアッセイシステムに流体的かつ電気的に結合するように構成される。バイオアッセイシステムは、所定のプロトコル（例えば、合成による配列決定）に従って反応部位及び／又は反応チャンバに反応物質を送達し、複数の撮像イベントを実行してもよい。例えば、バイオアッセイシステムは、反応部位及び／又は反応チャンバに沿って流れるように溶液を方向付けることができる。溶液のうちの少なくとも１つは、同じ又は異なる蛍光標識を有する４タイプのヌクレオチドを含んでもよい。ヌクレオチドは、反応部位及び／又は反応チャンバに位置する対応するオリゴヌクレオチドに結合し得る。次いで、バイオアッセイシステムは、励起光源（例えば、発光ダイオード又はＬＥＤなどのソリッドステート光源）を使用して反応部位及び／又は反応チャンバを照明することができる。励起光は、波長の範囲を含む所定の波長又は複数の波長を有してもよい。励起された蛍光標識は、センサによって捕捉され得る発光信号を提供する。 As used herein, a "biosensor" includes a structure having multiple reaction sites and/or reaction chambers (or wells). The biosensor may include a solid-state imaging device (e.g., a CCD or CMOS imager) and, optionally, a flow cell attached thereto. The flow cell may include at least one flow channel in fluid communication with the reaction sites and/or reaction chambers. As one particular example, the biosensor is configured to fluidly and electrically couple to a bioassay system. The bioassay system may deliver reactants to the reaction sites and/or reaction chambers according to a predetermined protocol (e.g., sequencing by synthesis) and perform multiple imaging events. For example, the bioassay system may direct solutions to flow along the reaction sites and/or reaction chambers. At least one of the solutions may include four types of nucleotides with the same or different fluorescent labels. The nucleotides may bind to corresponding oligonucleotides located in the reaction sites and/or reaction chambers. The bioassay system may then illuminate the reaction sites and/or reaction chambers using an excitation light source (e.g., a solid-state light source such as a light emitting diode or LED). The excitation light may have a predetermined wavelength or multiple wavelengths, including a range of wavelengths. The excited fluorescent label provides an emission signal that can be captured by a sensor.

代替の実施形態では、バイオセンサは、他の識別可能な特性を検出するように構成された電極又は他のタイプのセンサを含み得る。例えば、センサは、イオン濃度の変化を検出するように構成され得る。別の例では、センサは、膜を横切るイオン電流の流れを検出するように構成され得る。 In alternative embodiments, the biosensor may include electrodes or other types of sensors configured to detect other identifiable characteristics. For example, the sensor may be configured to detect changes in ion concentration. In another example, the sensor may be configured to detect the flow of ionic current across a membrane.

本明細書で使用するとき、「クラスタ」は、類似又は同一の分子又はヌクレオチド配列又はＤＮＡ鎖のコロニーである。例えば、クラスタは、増幅オリゴヌクレオチド、又は同じ又は類似の配列を有するポリヌクレオチド又はポリペプチドの任意の他の群であり得る。他の実施形態では、クラスタは、試料表面上の物理的領域を占有する任意の要素又は要素群であり得る。実施形態では、クラスタは、ベースコールサイクル中に反応部位及び／又は反応チャンバに固定化される。 As used herein, a "cluster" is a colony of similar or identical molecules or nucleotide sequences or DNA strands. For example, a cluster can be an amplification oligonucleotide or any other group of polynucleotides or polypeptides having the same or similar sequences. In other embodiments, a cluster can be any element or group of elements that occupy a physical region on a sample surface. In embodiments, the cluster is immobilized in a reaction site and/or reaction chamber during base calling cycles.

本明細書で使用するとき、用語「固定化された」は、生体分子又は生物学的物質若しくは化学物質に関して使用されるとき、生体分子又は生物学的物質若しくは化学物質を分子レベルで表面に実質的に付着させることを含む。例えば、生体分子又は生物学的物質若しくは化学物質は、非共有相互作用（例えば、静電力、ファンデルワールス、及び疎水性界面の脱水）、並びに官能基又はリンカーが生体分子の表面への付着を促進する共有結合技法を含む吸着技法を用いて、基質物質の表面に固定化されてもよい。生体分子又は生物学的物質若しくは化学物質を基質物質の表面に固定化することは、基質表面の特性、生体分子又は生物学的物質若しくは化学物質を担持する液体媒体、並びに生体分子又は生物学的物質若しくは化学物質自体の特性に基づいてもよい。場合によっては、生体分子（又は生物学的物質又は化学物質）を基質表面に固定化するのを容易にするために、基質表面を官能化（例えば、化学的又は物理的に修飾）してもよい。基質表面は、表面に結合した官能基を有するように最初に修飾されてもよい。次いで、官能基は、生体分子又は生物学的物質若しくは化学物質に結合して、それらをその上に固定化し得る。物質は、例えば、参照により本明細書に組み込まれる米国特許出願公開第２０１１／００５９８６５（Ａ１）号に記載されているように、ゲルを介して表面に固定化され得る。 As used herein, the term "immobilized" when used in reference to a biomolecule or biological material or chemical includes substantially attaching the biomolecule or biological material or chemical to a surface at the molecular level. For example, the biomolecule or biological material or chemical may be immobilized to the surface of a substrate material using adsorption techniques including non-covalent interactions (e.g., electrostatic forces, van der Waals, and hydrophobic interfacial dehydration), as well as covalent bonding techniques in which a functional group or linker facilitates attachment of the biomolecule to the surface. Immobilizing the biomolecule or biological material or chemical to the surface of a substrate material may be based on the properties of the substrate surface, the liquid medium carrying the biomolecule or biological material or chemical, and the properties of the biomolecule or biological material or chemical itself. In some cases, the substrate surface may be functionalized (e.g., chemically or physically modified) to facilitate immobilization of the biomolecule (or biological material or chemical) to the substrate surface. The substrate surface may first be modified to have functional groups attached to the surface. The functional groups may then bind to the biomolecule or biological material or chemical to immobilize them thereon. The substance can be immobilized on the surface via a gel, for example, as described in U.S. Patent Application Publication No. 2011/0059865(A1), which is incorporated herein by reference.

いくつかの実施形態では、核酸は表面に付着され、ブリッジ増幅を使用して増幅することができる。有用なブリッジ増幅法は、例えば、米国特許第５，６４１，６５８号、国際公開第２００７／０１０２５１号、米国特許第６，０９０，５９２号、米国特許出願公開第２００２／００５５１００（Ａ１）号、米国特許第７，１１５，４００号、米国特許出願公開第２００４／００９６８５３（Ａ１）号、米国特許出願公開第２００４／０００２０９０（Ａ１）号、米国特許出願公開第２００７／０１２８６２４（Ａ１）号、及び米国特許出願公開第２００８／０００９４２０（Ａ１）号に記載されており、これらの各々は、その全体が本明細書に組み込まれる。表面上の核酸を増幅するための別の有用な方法は、例えば、以下で更に詳細に説明する方法を使用する、ローリングサークル増幅（Rolling Circle Amplification、ＲＣＡ）である。いくつかの実施形態では、核酸は、表面に付着され、１つ以上のプライマー対を使用して増幅され得る。例えば、プライマーのうちの１つは溶液中であってもよく、他のプライマーは、表面上に固定化され得る（例えば、５’－付着）。例として、核酸分子は、表面上のプライマーのうちの１つにハイブリダイズし、続いて固定化プライマーを伸長させて、核酸の第１のコピーを生成することができる。溶液中のプライマーは、次いで、核酸の第１のコピーをテンプレートとして使用して伸長させることができる核酸の第１のコピーにハイブリダイズする。任意選択的に、核酸の第１のコピーが生成された後、元の核酸分子は、表面上の第２の固定化プライマーにハイブリダイズすることができ、同時に、又は溶液中のプライマーが伸長された後に伸長され得る。任意の実施形態では、固定化プライマー及び溶液中のプライマーを使用する伸長の反復ラウンド（例えば、増幅）は、核酸の複数のコピーを提供する。 In some embodiments, the nucleic acid can be attached to a surface and amplified using bridge amplification. Useful bridge amplification methods are described, for example, in U.S. Pat. No. 5,641,658, WO 2007/010251, U.S. Pat. No. 6,090,592, U.S. Pat. Appl. Pub. No. 2002/0055100 (A1), U.S. Pat. No. 7,115,400, U.S. Pat. Appl. Pub. No. 2004/0096853 (A1), U.S. Pat. Appl. Pub. No. 2004/0002090 (A1), U.S. Pat. Appl. Pub. No. 2007/0128624 (A1), and U.S. Pat. Appl. Pub. No. 2008/0009420 (A1), each of which is incorporated herein in its entirety. Another useful method for amplifying nucleic acids on a surface is Rolling Circle Amplification (RCA), for example, using methods described in more detail below. In some embodiments, the nucleic acid can be attached to a surface and amplified using one or more primer pairs. For example, one of the primers can be in solution and the other primer can be immobilized (e.g., 5'-attached) on the surface. By way of example, a nucleic acid molecule can hybridize to one of the primers on the surface, followed by extension of the immobilized primer to generate a first copy of the nucleic acid. The primer in solution then hybridizes to the first copy of the nucleic acid, which can be extended using the first copy of the nucleic acid as a template. Optionally, after the first copy of the nucleic acid is generated, the original nucleic acid molecule can hybridize to a second immobilized primer on the surface and be extended simultaneously or after the primer in solution is extended. In any embodiment, repeated rounds of extension (e.g., amplification) using immobilized primers and primers in solution provide multiple copies of the nucleic acid.

特定の実施形態では、本明細書に記載されるシステム及び方法によって実行されるアッセイプロトコルは、天然ヌクレオチド、及び天然ヌクレオチドと相互作用するように構成された酵素の使用を含む。天然ヌクレオチドとしては、例えば、リボヌクレオチド（ＲＮＡ）又はデオキシリボヌクレオチド（ＤＮＡ）が挙げられる。天然ヌクレオチドは、一リン酸、二リン酸、又は三リン酸形態であってよく、アデニン（Ａ）、チミン（Ｔ）、ウラシル（Ｕ）、グアニン（Ｇ）、又はシトシン（Ｃ）から選択される塩基を有することができる。しかしながら、上記ヌクレオチドの非天然ヌクレオチド、修飾ヌクレオチド、又は類似体を使用することができることが理解されるであろう。有用な非天然ヌクレオチドのいくつかの例は、合成方法による可逆的ターミネーターベースの配列決定に関して以下に記載されている。 In certain embodiments, the assay protocols performed by the systems and methods described herein include the use of naturally occurring nucleotides and enzymes configured to interact with the naturally occurring nucleotides. Naturally occurring nucleotides include, for example, ribonucleotides (RNA) or deoxyribonucleotides (DNA). Naturally occurring nucleotides may be in monophosphate, diphosphate, or triphosphate form and may have a base selected from adenine (A), thymine (T), uracil (U), guanine (G), or cytosine (C). However, it will be understood that non-naturally occurring nucleotides, modified nucleotides, or analogs of the above nucleotides may be used. Some examples of useful non-naturally occurring nucleotides are described below with respect to reversible terminator-based sequencing by synthetic methods.

反応チャンバを含む実施形態では、物品又は固体物質（半固体物質を含む）が、反応チャンバ内に配置され得る。配置される場合、物品又は固体は、干渉嵌合、接着、又は閉じ込めを介して反応チャンバ内に物理的に保持又は固定化され得る。反応チャンバ内に配置され得る例示的な物品又は固体としては、ポリマービーズ、ペレット、アガロースゲル、粉末、量子ドット、又は反応チャンバ内で圧縮及び／又は保持され得る他の固体が挙げられる。特定の実施形態では、ＤＮＡボールなどの核酸超構造は、例えば、反応チャンバの内面に取り付けることによって、又は反応チャンバ内に液体中に滞留することによって、反応チャンバ内に又は反応チャンバに配置することができる。ＤＮＡボール又は他の核酸超構造を事前成形し、次いで、反応チャンバ内に又は反応チャンバに配置することができる。あるいは、ＤＮＡボールは、反応チャンバにおいてその場で合成することができる。ＤＮＡボールは、ローリングサークル増幅によって合成して、特定の核酸配列のコンカテマーを生成することができ、コンカテマーは、比較的コンパクトなボールを形成する条件で処理することができる。ＤＮＡボール及びそれらの合成のための方法は、例えば、米国特許出願公開第２００８／０２４２５６０（Ａ１）号又は同第２００８／０２３４１３６（Ａ１）号に記載されており、それらの各々は、その全体が本明細書に組み込まれる。反応チャンバ内に保持又は配置された物質は、固体、液体、又は気体状態であり得る。 In embodiments that include a reaction chamber, an article or solid material (including a semi-solid material) may be placed in the reaction chamber. When placed, the article or solid may be physically held or immobilized in the reaction chamber via interference fit, adhesion, or confinement. Exemplary articles or solids that may be placed in the reaction chamber include polymer beads, pellets, agarose gels, powders, quantum dots, or other solids that may be compressed and/or held in the reaction chamber. In certain embodiments, nucleic acid superstructures such as DNA balls may be placed in or on the reaction chamber, for example, by attaching them to the inner surface of the reaction chamber or by dwelling in a liquid in the reaction chamber. DNA balls or other nucleic acid superstructures may be preformed and then placed in or on the reaction chamber. Alternatively, DNA balls may be synthesized in situ in the reaction chamber. DNA balls may be synthesized by rolling circle amplification to generate concatemers of specific nucleic acid sequences, and the concatemers may be treated under conditions to form relatively compact balls. DNA balls and methods for their synthesis are described, for example, in U.S. Patent Application Publication No. 2008/0242560 (A1) or U.S. Patent Application Publication No. 2008/0234136 (A1), each of which is incorporated herein in its entirety. The material held or disposed within the reaction chamber may be in a solid, liquid, or gaseous state.

本明細書で使用するとき、「ベースコール」は、核酸配列中のヌクレオチド塩基を識別する。ベースコールは、特定のサイクルにおいてあらゆるクラスタのベースコール（Ａ、Ｃ、Ｇ、Ｔ）を判定するプロセスを指す。一例として、ベースコールは、米国特許出願公開第２０１３／００７９２３２号の組み込まれた資料に記載されている４チャネル、２チャネル又は１チャネル方法及びシステムを利用して実行することができる。特定の実施形態では、ベースコールサイクルは、「サンプリングイベント」と呼ばれる。１色素及び２チャネル配列決定プロトコルでは、サンプリングイベントは、各段階でピクセル信号が発生するように、時系列で２つの照明段階を含む。第１の照明段階は、ＡＴピクセル信号においてヌクレオチド塩基Ａ及びＴを示す所与のクラスタからの照明を誘導し、第２の照明段階は、ＣＴピクセル信号においてヌクレオチド塩基Ｃ及びＴを示す所与のクラスタからの照明を誘導する。 As used herein, a "base call" identifies a nucleotide base in a nucleic acid sequence. Base calling refers to the process of determining the base call (A, C, G, T) of every cluster in a particular cycle. By way of example, base calling can be performed utilizing the four-channel, two-channel, or one-channel methods and systems described in the incorporated materials of U.S. Patent Application Publication No. 2013/0079232. In certain embodiments, a base call cycle is referred to as a "sampling event." In a one-dye, two-channel sequencing protocol, a sampling event includes two illumination steps in chronological order such that a pixel signal occurs at each step. The first illumination step induces illumination from a given cluster that represents nucleotide bases A and T in an AT pixel signal, and the second illumination step induces illumination from a given cluster that represents nucleotide bases C and T in a CT pixel signal.

開示される技術、例えば、開示されるベースコーラは、中央処理ユニット（ＣＰＵ）、グラフィックス処理ユニット（ＧＰＵ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、粗粒度再構成可能アーキテクチャ（ＣＧＲＡ）、特定用途向け集積回路（ＡＳＩＣ）、特定用途向け命令セットプロセッサ（ＡＳＩＰ）、及びデジタル信号プロセッサ（ＤＳＰ）のようなプロセッサ上で実装され得る。 The disclosed technology, for example the disclosed base code, may be implemented on processors such as central processing units (CPUs), graphics processing units (GPUs), field programmable gate arrays (FPGAs), coarse-grained reconfigurable architectures (CGRAs), application specific integrated circuits (ASICs), application specific instruction set processors (ASIPs), and digital signal processors (DSPs).

バイオセンサ
図１は、様々な実施形態で使用することができるバイオセンサ１００の断面図を示す。バイオセンサ１００は、ベースコールサイクル中に２つ以上のクラスタ（例えば、１ピクセル領域当たり２つのクラスタ）をそれぞれ保持することができるピクセル領域１０６’、１０８’、１１０’、１１２’、及び１１４’を有する。示されるように、バイオセンサ１００は、サンプリングデバイス１０４上に取り付けられたフローセル１０２を含み得る。図示の実施形態では、フローセル１０２は、サンプリングデバイス１０４に直接固定される。しかしながら、代替の実施形態では、フローセル１０２は、サンプリングデバイス１０４に取り外し可能に結合され得る。サンプリングデバイス１０４は、官能化され得る（例えば、所望の反応を起こすのに好適な様式で化学的又は物理的に修飾され得る）試料表面１３４を有する。例えば、試料表面１３４は、官能化されてもよく、ベースコールサイクル中に２つ以上のクラスタをそれぞれ保持することができる（例えば、それに固定化された対応するクラスタ対１０６Ａ、１０６Ｂ、クラスタ対１０８Ａ、１０８Ｂ、クラスタ対１１０Ａ、１１０Ｂ、クラスタ対１１２Ａ、１１２Ｂ、及びクラスタ対１１４Ａ、１１４Ｂをそれぞれ有する）複数のピクセル領域１０６’、１０８’、１１０’、１１２’、及び１１４’を含み得る。各ピクセル領域は、対応するセンサ（又はピクセル若しくはフォトダイオード）１０６、１０８、１１０、１１２、及び１１４に関連付けられ、したがって、ピクセル領域によって受信された光は、対応するセンサによって捕捉される。ピクセル領域１０６’はまた、クラスタ対を保持する試料表面１３４上の対応する反応部位１０６’’に関連付けられ得、したがって、反応部位１０６’’から発光された光は、ピクセル領域１０６’によって受信され、対応するセンサ１０６によって捕捉される。この感知構造の結果として、ベースコールサイクル中に特定のセンサのピクセル領域に２つ以上のクラスタが存在する（例えば、対応するクラスタ対をそれぞれ有する）場合、そのベースコールサイクルにおけるピクセル信号は、２つ以上のクラスタの全てに基づく情報を搬送する。結果として、本明細書に記載の信号処理は、特定のベースコールサイクルの所与のサンプリングイベントにおいてピクセル信号より多くのクラスタが存在する、各クラスタを区別するために使用される。 Biosensor FIG. 1 shows a cross-sectional view of a biosensor 100 that can be used in various embodiments. The biosensor 100 has pixel areas 106′, 108′, 110′, 112′, and 114′, each of which can retain two or more clusters (e.g., two clusters per pixel area) during a base call cycle. As shown, the biosensor 100 can include a flow cell 102 mounted on a sampling device 104. In the illustrated embodiment, the flow cell 102 is fixed directly to the sampling device 104. However, in alternative embodiments, the flow cell 102 can be removably coupled to the sampling device 104. The sampling device 104 has a sample surface 134 that can be functionalized (e.g., chemically or physically modified in a manner suitable for causing a desired reaction). For example, the sample surface 134 may be functionalized and may include a number of pixel regions 106', 108', 110', 112', and 114' each capable of holding two or more clusters during the base calling cycle (e.g., having corresponding cluster pairs 106A, 106B, cluster pairs 108A, 108B, cluster pairs 110A, 110B, cluster pairs 112A, 112B, and cluster pairs 114A, 114B immobilized thereon). Each pixel region is associated with a corresponding sensor (or pixel or photodiode) 106, 108, 110, 112, and 114, such that light received by the pixel region is captured by the corresponding sensor. The pixel region 106' may also be associated with a corresponding reaction site 106'' on the sample surface 134 holding the cluster pair, such that light emitted from the reaction site 106'' is received by the pixel region 106' and captured by the corresponding sensor 106. As a result of this sensing structure, if two or more clusters are present (e.g., each with a corresponding cluster pair) in a pixel region of a particular sensor during a base call cycle, the pixel signal in that base call cycle carries information based on all of the two or more clusters. As a result, the signal processing described herein is used to distinguish between each cluster, where there are more clusters than there are pixel signals at a given sampling event of a particular base call cycle.

図示の実施形態では、フローセル１０２は、側壁１３８、１２５、及び側壁１３８、１２５によって支持されるフローカバー１３６を含む。側壁１３８、１２５は、試料表面１３４に結合され、フローカバー１３６と側壁１３８、１２５との間に延在する。いくつかの実施形態では、側壁１３８、１２５は、フローカバー１３６をサンプリングデバイス１０４に接合する硬化性接着剤層から形成される。 In the illustrated embodiment, the flow cell 102 includes sidewalls 138, 125 and a flow cover 136 supported by the sidewalls 138, 125. The sidewalls 138, 125 are coupled to the sample surface 134 and extend between the flow cover 136 and the sidewalls 138, 125. In some embodiments, the sidewalls 138, 125 are formed from a curable adhesive layer that bonds the flow cover 136 to the sampling device 104.

側壁１３８、１２５は、フローカバー１３６とサンプリングデバイス１０４との間にフローチャネル１４４が存在するようにサイズ及び形状を定められる。フローカバー１３６は、バイオセンサ１００の外部からフローチャネル１４４に伝搬する励起光１０１に対して透明な材料を含み得る。一例では、励起光１０１は、非直交角度でフローカバー１３６に近づく。 The side walls 138, 125 are sized and shaped such that a flow channel 144 exists between the flow cover 136 and the sampling device 104. The flow cover 136 may include a material that is transparent to the excitation light 101 propagating from outside the biosensor 100 to the flow channel 144. In one example, the excitation light 101 approaches the flow cover 136 at a non-orthogonal angle.

また図示のように、フローカバー１３６は、他のポート（図示せず）に流体的に係合するように構成された入口ポート及び出口ポート１４２、１４６を含み得る。例えば、これらの他のポートは、カートリッジ又はワークステーションからのものであり得る。フローチャネル１４４は、試料表面１３４に沿って流体を方向付けるようにサイズ及び形状を定められる。フローチャネル１４４の高さＨｉ及び他の寸法は、試料表面１３４に沿って流体の実質的に均一な流れを維持するように構成され得る。フローチャネル１４４の寸法はまた、気泡形成を制御するように構成され得る。 Also as shown, the flow cover 136 may include inlet and outlet ports 142, 146 configured to fluidly engage other ports (not shown). For example, these other ports may be from a cartridge or a workstation. The flow channel 144 is sized and shaped to direct fluid along the sample surface 134. The height Hi and other dimensions of the flow channel 144 may be configured to maintain a substantially uniform flow of fluid along the sample surface 134. The dimensions of the flow channel 144 may also be configured to control bubble formation.

例として、フローカバー１３６（又はフローセル１０２）は、ガラス又はプラスチックなどの透明材料を含み得る。フローカバー１３６は、平面状の外面と、フローチャネル１４４を画定する平面状の内面とを有する、実質的に長方形のブロックを構成し得る。ブロックは、側壁１３８、１２５上に取り付けられ得る。あるいは、フローセル１０２をエッチングして、フローカバー１３６及び側壁１３８、１２５を画定することができる。例えば、凹部が、透明材料にエッチングされ得る。エッチングされた材料がサンプリングデバイス１０４に取り付けられると、凹部はフローチャネル１４４になり得る。 By way of example, the flow cover 136 (or flow cell 102) may comprise a transparent material such as glass or plastic. The flow cover 136 may comprise a substantially rectangular block having a planar outer surface and a planar inner surface that defines the flow channel 144. The block may be attached onto the side walls 138, 125. Alternatively, the flow cell 102 may be etched to define the flow cover 136 and the side walls 138, 125. For example, a recess may be etched into the transparent material. When the etched material is attached to the sampling device 104, the recess may become the flow channel 144.

サンプリングデバイス１０４は、例えば、複数のスタック基質層１２０～１２６を備える集積回路と同様であり得る。基質層１２０～１２６は、ベース基質１２０、ソリッドステートイメージャ１２２（例えば、ＣＭＯＳ画像センサ）、フィルタ又は光管理層１２４、並びにパッシベーション層１２６を含み得る。上記は単なる例示であり、他の実施形態はより少ない又は追加の層を含み得ることに留意されたい。更に、基質層１２０～１２６の各々は、複数の副層を含み得る。サンプリングデバイス１０４は、ＣＭＯＳ画像センサ及びＣＣＤなどの集積回路を製造する際に使用されるものと同様のプロセスを使用して製造され得る。例えば、基質層１２０～１２６又はそれらの一部は、サンプリングデバイス１０４を形成するために成長、堆積、エッチングなどを行うことができる。 The sampling device 104 may be similar to an integrated circuit comprising, for example, multiple stacked substrate layers 120-126. The substrate layers 120-126 may include a base substrate 120, a solid-state imager 122 (e.g., a CMOS image sensor), a filter or light management layer 124, and a passivation layer 126. Note that the above is merely exemplary and other embodiments may include fewer or additional layers. Additionally, each of the substrate layers 120-126 may include multiple sublayers. The sampling device 104 may be fabricated using processes similar to those used in fabricating integrated circuits such as CMOS image sensors and CCDs. For example, the substrate layers 120-126 or portions thereof may be grown, deposited, etched, etc. to form the sampling device 104.

パッシベーション層１２６は、フローチャネル１４４の流体環境からフィルタ層１２４を遮蔽するように構成されている。場合によっては、パッシベーション層１２６はまた、生体分子又は他の対象となる検体がその上に固定化されることを可能にする固体表面（すなわち、試料表面１３４）を提供するように構成されている。例えば、反応部位の各々は、試料表面１３４に固定化された生体分子のクラスタを含み得る。したがって、パッシベーション層１２６は、反応部位がそれに固定化されることを可能にする材料から形成され得る。パッシベーション層１２６はまた、所望の蛍光に対して少なくとも透明である材料を含み得る。例として、パッシベーション層１２６は、窒化ケイ素（Ｓｉ_２Ｎ_４）及び／又はシリカ（ＳｉＯ_２）を含み得る。しかしながら、他の好適な材料を使用することができる。図示の実施形態では、パッシベーション層１２６は、実質的に平面状であり得る。しかしながら、代替の実施形態では、パッシベーション層１２６は、ピット、ウェル、溝などの凹部を含み得る。図示の実施形態では、パッシベーション層１２６は、約１５０～２００ｎｍ、より具体的には約１７０ｎｍの厚さを有する。 The passivation layer 126 is configured to shield the filter layer 124 from the fluid environment of the flow channel 144. In some cases, the passivation layer 126 is also configured to provide a solid surface (i.e., the sample surface 134) onto which biomolecules or other analytes of interest can be immobilized. For example, each of the reaction sites can include a cluster of biomolecules immobilized on the sample surface 134. Thus, the passivation layer 126 can be formed from a material that allows the reaction sites to be immobilized thereon. The passivation layer 126 can also include a material that is at least transparent to the desired fluorescence. By way of example, the passivation layer 126 can include silicon nitride (Si ₂ N ₄ ) and/or silica (SiO ₂ ). However, other suitable materials can be used. In the illustrated embodiment, the passivation layer 126 can be substantially planar. However, in alternative embodiments, the passivation layer 126 can include recesses, such as pits, wells, grooves, and the like. In the illustrated embodiment, the passivation layer 126 has a thickness of about 150-200 nm, and more specifically, about 170 nm.

フィルタ層１２４は、光の透過に影響を及ぼす様々な特徴を含み得る。いくつかの実施形態では、フィルタ層１２４は、複数の機能を実行することができる。例えば、フィルタ層１２４は、（ａ）励起光源からの光信号など、不要な光信号をフィルタリングするか、（ｂ）反応部位からの発光信号を、反応部位からの発光信号を検出するように構成された対応するセンサ１０６、１０８、１１０、１１２、及び１１４に向かって方向付けるか、又は（ｃ）隣接する反応部位からの不要な発光信号の検出を遮断若しくは防止するように構成され得る。したがって、フィルタ層１２４は光管理層とも呼ばれ得る。図示の実施形態では、フィルタ層１２４は、約１～５μｍ、より具体的には約２～４μｍの厚さを有する。代替の実施形態では、フィルタ層１２４は、マイクロレンズ又は他の光学構成要素のアレイを含み得る。マイクロレンズの各々は、関連する反応部位からの発光信号をセンサに方向付けるように構成され得る。 The filter layer 124 may include various features that affect the transmission of light. In some embodiments, the filter layer 124 may perform multiple functions. For example, the filter layer 124 may be configured to (a) filter unwanted light signals, such as light signals from an excitation light source, (b) direct luminescence signals from the reaction sites toward corresponding sensors 106, 108, 110, 112, and 114 configured to detect luminescence signals from the reaction sites, or (c) block or prevent detection of unwanted luminescence signals from adjacent reaction sites. Thus, the filter layer 124 may also be referred to as a light management layer. In the illustrated embodiment, the filter layer 124 has a thickness of about 1-5 μm, more specifically about 2-4 μm. In alternative embodiments, the filter layer 124 may include an array of microlenses or other optical components. Each of the microlenses may be configured to direct the luminescence signal from an associated reaction site to a sensor.

いくつかの実施形態では、ソリッドステートイメージャ１２２及びベース基質１２０は、以前に構成されたソリッドステート撮像デバイス（例えば、ＣＭＯＳチップ）として一緒に提供され得る。例えば、ベース基質１２０は、シリコンのウェハであってもよく、ソリッドステートイメージャ１２２は、その上に取り付けられてもよい。ソリッドステートイメージャ１２２は、半導体材料（例えば、シリコン）の層、並びにセンサ１０６、１０８、１１０、１１２、及び１１４を含む。図示の実施形態では、センサは、光を検出するように構成されたフォトダイオードである。他の実施形態では、センサは、光検出器を備える。ソリッドステートイメージャ１２２は、ＣＭＯＳベースの製造プロセスを介して単一のチップとして製造され得る。 In some embodiments, the solid-state imager 122 and the base substrate 120 may be provided together as a previously constructed solid-state imaging device (e.g., a CMOS chip). For example, the base substrate 120 may be a wafer of silicon, and the solid-state imager 122 may be mounted thereon. The solid-state imager 122 includes a layer of semiconductor material (e.g., silicon) and sensors 106, 108, 110, 112, and 114. In the illustrated embodiment, the sensors are photodiodes configured to detect light. In other embodiments, the sensors include photodetectors. The solid-state imager 122 may be fabricated as a single chip via a CMOS-based manufacturing process.

ソリッドステートイメージャ１２２は、フローチャネル１４４内からの又はフローチャネル１４４に沿った所望の反応を示すアクティビティを検出するように構成されたセンサ１０６、１０８、１１０、１１２、及び１１４の高密度アレイを含み得る。いくつかの実施形態では、各センサは、約１～２平方マイクロメートル（μｍ^２）であるピクセル領域（又は検出領域）を有する。アレイは、５００，０００個のセンサ、５００万個のセンサ、１０００万個のセンサ、又は更に１億２０００万個のセンサを含むことができる。センサ１０６、１０８、１１０、１１２、及び１１４は、所望の反応を示す所定の光の波長を検出するように構成することができる。 The solid-state imager 122 may include a high density array of sensors 106, 108, 110, 112, and 114 configured to detect activity indicative of a desired response from within or along the flow channel 144. In some embodiments, each sensor has a pixel area (or detection area) that is about 1-2 micrometers squared (μm ² ). The array may include 500,000 sensors, 5 million sensors, 10 million sensors, or even 120 million sensors. The sensors 106, 108, 110, 112, and 114 may be configured to detect predetermined wavelengths of light that are indicative of a desired response.

いくつかの実施形態では、サンプリングデバイス１０４は、参照によりその全体が本明細書に組み込まれる米国特許第７，５９５，８８２号に記載されているマイクロ回路配置などのマイクロ回路配置を含む。より具体的には、サンプリングデバイス１０４は、センサ１０６、１０８、１１０、１１２、及び１１４の平面アレイを有する集積回路を備え得る。サンプリングデバイス１０４内に形成された回路は、信号増幅、デジタル化、記憶、及び処理のうちの少なくとも１つのために構成され得る。回路は、検出された蛍光を収集及び分析し、検出データを信号プロセッサに通信するためのピクセル信号（又は検出信号）を発生させることができる。回路はまた、サンプリングデバイス１０４において追加のアナログ及び／又はデジタル信号処理を実行し得る。サンプリングデバイス１０４は、信号ルーティングを実行する（例えば、ピクセル信号を信号プロセッサに送信する）導電ビア１３０を含み得る。ピクセル信号はまた、サンプリングデバイス１０４の電気接点１３２を通って送信され得る。 In some embodiments, the sampling device 104 includes a microcircuit arrangement, such as the microcircuit arrangement described in U.S. Patent No. 7,595,882, which is incorporated herein by reference in its entirety. More specifically, the sampling device 104 may include an integrated circuit having a planar array of sensors 106, 108, 110, 112, and 114. The circuitry formed within the sampling device 104 may be configured for at least one of signal amplification, digitization, storage, and processing. The circuitry may collect and analyze the detected fluorescence and generate a pixel signal (or detection signal) for communicating the detection data to a signal processor. The circuitry may also perform additional analog and/or digital signal processing in the sampling device 104. The sampling device 104 may include conductive vias 130 that perform signal routing (e.g., transmitting the pixel signal to a signal processor). The pixel signal may also be transmitted through electrical contacts 132 of the sampling device 104.

サンプリングデバイス１０４は、本明細書に完全に記載されているかのように参照により組み込まれる、２０２０年５月１４日に出願された「ＳｙｓｔｅｍｓａｎｄＤｅｖｉｃｅｓｆｏｒＣｈａｒａｃｔｅｒｉｚａｔｉｏｎａｎｄＰｅｒｆｏｒｍａｎｃｅＡｎａｌｙｓｉｓｏｆＰｉｘｅｌ－ＢａｓｅｄＳｅｑｕｅｎｃｉｎｇ」と題する米国特許非仮出願第１６／８７４，５９９号（代理人整理番号ＩＬＬＭ１０１１－４／ＩＰ－１７５０－ＵＳ）に関して更に詳細に論じられている。サンプリングデバイス１０４は、上述されたような上記の構成又は使用に限定されない。代替の実施形態では、サンプリングデバイス１０４は、他の形態をとってもよい。例えば、サンプリングデバイス１０４は、フローセルに結合されているか、又は反応部位をその中に有するフローセルとインターフェース接続するように移動される、ＣＣＤカメラなどのＣＣＤデバイスを備え得る。 The sampling device 104 is discussed in further detail with respect to U.S. Non-Provisional Patent Application No. 16/874,599, entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 (Attorney Docket No. ILLM1011-4/IP-1750-US), which is incorporated by reference as if fully set forth herein. The sampling device 104 is not limited to the above configurations or uses as described above. In alternative embodiments, the sampling device 104 may take other forms. For example, the sampling device 104 may comprise a CCD device, such as a CCD camera, coupled to a flow cell or moved to interface with a flow cell having reaction sites therein.

図２は、そのタイル内にクラスタを含むフローセル２００の一実装形態を示す。フローセル２００は、図１のフローセル１０２に対応し、例えば、フローカバー１３６なしである。更に、フローセル２００の描写は、本質的に記号的であり、フローセル２００は、その中に様々な他の構成要素を示すことなく、その中に様々なレーン及びタイルを記号的に示している。図２は、フローセル２００の上面図を示している。 Figure 2 shows one implementation of a flow cell 200 that includes clusters within its tiles. Flow cell 200 corresponds to flow cell 102 of Figure 1, e.g., without flow cover 136. Additionally, the depiction of flow cell 200 is symbolic in nature, and flow cell 200 symbolically shows various lanes and tiles therein without showing various other components therein. Figure 2 shows a top view of flow cell 200.

一実施形態では、フローセル２００は、レーン２０２ａ、２０２ｂ、．．．、２０２Ｐ、すなわち、Ｐ個のレーンなど、複数のレーンに分けられるか又は分割される。図２の例では、フローセル２００は、８つのレーンを含むように、すなわち、この例ではＰ＝８であるように示されているが、フローセル内のレーンの数は、実装形態固有である。 In one embodiment, the flow cell 200 is divided or split into multiple lanes, such as lanes 202a, 202b, ... 202P, i.e., P lanes. In the example of FIG. 2, the flow cell 200 is shown as including eight lanes, i.e., P=8 in this example, although the number of lanes in a flow cell is implementation specific.

一実施形態では、個々のレーン２０２は、「タイル」２１２と呼ばれる非重複領域に更に分割される。例えば、図２は、例示的なレーンのセクション２０８の拡大図を示している。セクション２０８は、複数のタイル２１２を含むように示されている。 In one embodiment, each lane 202 is further divided into non-overlapping regions called "tiles" 212. For example, FIG. 2 shows an expanded view of a section 208 of an exemplary lane. Section 208 is shown to include multiple tiles 212.

一例では、各レーン２０２は、１つ以上のタイル列を含む。例えば、図２では、各レーン２０２は、拡大セクション２０８内に示されているように、２つの対応するタイル列２１２を含む。各レーン内の各タイル列内のタイルの数は、実装形態固有であり、一例では、各レーン内の各タイル列に５０個のタイル、６０個のタイル、１００個のタイル、又は別の適切な数のタイルが存在し得る。 In one example, each lane 202 includes one or more tile columns. For example, in FIG. 2, each lane 202 includes two corresponding tile columns 212, as shown in the enlarged section 208. The number of tiles in each tile column in each lane is implementation specific, and in one example, there may be 50 tiles, 60 tiles, 100 tiles, or another suitable number of tiles in each tile column in each lane.

各タイルは、対応する複数のクラスタを含む。配列決定手順中、タイル上のクラスタ及びそれらの周囲の背景が撮像される。例えば、図２は、例示的なタイル内の例示的なクラスタ２１６を示している。 Each tile contains a corresponding number of clusters. During the sequencing procedure, the clusters on the tile and their surrounding background are imaged. For example, FIG. 2 shows an example cluster 216 in an example tile.

図３は、８つのレーンを有する例示的なＩｌｌｕｍｉｎａＧＡ－ＩＩｘ（商標）フローセルを示し、１つのタイル及びそのクラスタ及びそれらの周囲の背景のズームインも示す。例えば、ＩｌｌｕｍｉｎａＧｅｎｏｍｅＡｎａｌｙｚｅｒＩＩのレーン当たり１００タイル、及びＩｌｌｕｍｉｎａＨｉＳｅｑ２０００内のレーン当たり６８個のタイルが存在する。タイル２１２は数十万～数百万個のクラスタを保持する。図３では、明るい斑点として示されているクラスタを有するタイルから発生した画像は、３０８に示されており（例えば、３０８は、タイルの拡大画像図であり）、例示的なクラスタ３０４は標識されている。クラスタ３０４は、テンプレート分子の約千個の同一のコピーを含むが、クラスタはサイズ及び形状が異なる。クラスタは、配列決定実行前に、入力ライブラリのブリッジ増幅によって、テンプレート分子から成長させる。増幅及びクラスタ成長の目的は、撮像デバイスが単一の蛍光団を確実に感知できないため、放出された信号の強度を増大させることである。しかしながら、クラスタ３０４内のＤＮＡフラグメントの物理的距離は小さいため、撮像デバイスは、フラグメントのクラスタを単一のスポット３０４として知覚する。 3 shows an exemplary Illumina GA-IIx™ flow cell with eight lanes, and also shows a zoom-in of one tile and its clusters and their surrounding background. For example, there are 100 tiles per lane in the Illumina Genome Analyzer II, and 68 tiles per lane in the Illumina HiSeq2000. The tile 212 holds hundreds of thousands to millions of clusters. In FIG. 3, an image generated from a tile with clusters shown as bright spots is shown at 308 (e.g., 308 is a magnified image view of the tile), and an exemplary cluster 304 is labeled. The cluster 304 contains about a thousand identical copies of the template molecule, but the clusters differ in size and shape. The clusters are grown from the template molecules by bridge amplification of the input library prior to a sequencing run. The purpose of the amplification and cluster growth is to increase the intensity of the emitted signal, since imaging devices cannot reliably sense single fluorophores. However, because the physical distance between the DNA fragments within the cluster 304 is small, the imaging device perceives the cluster of fragments as a single spot 304.

クラスタ及びタイルは、２０２０年３月２０日に出願された「ＴｒａｉｎｉｎｇＤａｔａＧｅｎｅｒａｔｉｏｎＦｏｒＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＳｅｑｕｅｎｃｉｎｇ」と題する米国特許非仮出願第１６／８２５，９８７号（代理人整理番号ＩＬＬＭ１００８－１６／ＩＰ－１６９３－ＵＳ）に関して更に詳細に論じられている。 Clusters and tiles are discussed in further detail in U.S. Nonprovisional Patent Application No. 16/825,987, entitled "Training Data Generation For Artificial Intelligence-Based Sequencing," filed March 20, 2020 (Attorney Docket No. ILLM1008-16/IP-1693-US).

図４は、ベースコールセンサ出力など、配列決定システムからのセンサデータの分析のためのシステムの簡略ブロック図である（例えば、図１を参照）。図４の例では、システムは、配列決定マシン４００及び構成可能なプロセッサ４５０を含む。構成可能なプロセッサ４５０は、中央処理装置（ＣＰＵ）４０２などのホストプロセッサによって実行されるランタイムプログラムと協調して、ニューラルネットワークベースのベースコーラ及び／又は非ニューラルネットワークベースのベースコーラ（本明細書で更に詳細に論じられる）を実行することができる。配列決定マシン４００は、（例えば、図１～図３に関して論じられた）ベースコールセンサ及びフローセル４０１を備える。フローセルは、図１～図３に関して論じられたように、遺伝物質のクラスタが、クラスタ内の反応を引き起こして遺伝物質中の塩基を識別するために使用される検体フローの配列に曝露される１つ以上のタイルを含むことができる。センサは、タイルデータを提供するために、フローセルの各タイルにおける配列の各サイクルの反応を感知する。この技術の実施例は、以下により詳細に記載される。遺伝的配列決定はデータ集約的操作であり、このデータ集約的動作は、ベースコールセンサデータを、ベースコール動作中に感知された遺伝物質の各クラスタのベースコールの配列に変換する。 FIG. 4 is a simplified block diagram of a system for analysis of sensor data from a sequencing system, such as base calling sensor output (see, e.g., FIG. 1). In the example of FIG. 4, the system includes a sequencing machine 400 and a configurable processor 450. The configurable processor 450 can execute neural network-based base calling and/or non-neural network-based base calling (discussed in more detail herein) in coordination with a runtime program executed by a host processor, such as a central processing unit (CPU) 402. The sequencing machine 400 includes a base calling sensor (e.g., as discussed with respect to FIGS. 1-3) and a flow cell 401. The flow cell can include one or more tiles in which clusters of genetic material are exposed to an array of analyte flows that are used to induce reactions in the clusters to identify bases in the genetic material, as discussed with respect to FIGS. 1-3. A sensor senses the reaction of each cycle of the array in each tile of the flow cell to provide tile data. An example of this technique is described in more detail below. Genetic sequencing is a data-intensive operation that converts base call sensor data into a sequence of base calls for each cluster of genetic material sensed during the base calling operation.

この実施例のシステムは、ベースコール動作を調整するランタイムプログラムを実行するＣＰＵ４０２と、タイルデータのアレイの配列、ベースコール動作によって生成されたベースコール読み取り、及びベースコール動作で使用される他の情報を記憶するメモリ４０３と、を含む。また、この図では、システムは、構成ファイル（又は複数のファイル）、例えば、ＦＰＧＡビットファイル、及び構成可能なプロセッサ４５０を構成及び再構成し、かつニューラルネットワークを実行するために使用されるニューラルネットワークのモデルパラメータを記憶するメモリ４０４を含む。配列決定マシン４００は、構成可能なプロセッサを構成するためのプログラムを含むことができ、いくつかの実施形態では、ニューラルネットワークを実行する再構成可能なプロセッサを含むことができる。 The system of this example includes a CPU 402 that executes a runtime program that coordinates the base calling operation, and a memory 403 that stores the sequence of the array of tile data, the base call reads generated by the base calling operation, and other information used in the base calling operation. In this figure, the system also includes a configuration file (or files), e.g., an FPGA bit file, and a memory 404 that stores model parameters of a neural network used to configure and reconfigure the configurable processor 450 and to run the neural network. The sequencing machine 400 can include a program for configuring the configurable processor, and in some embodiments can include a reconfigurable processor that runs the neural network.

配列決定マシン４００は、バス４０５によって、構成可能なプロセッサ４５０に結合される。バス４０５は、ＰＣＩ－ＳＩＧ規格（ＰＣＩＳｐｅｃｉａｌＩｎｔｅｒｅｓｔＧｒｏｕｐ）によって現在維持及び開発されているＰＣＩｅ規格（ＰｅｒｉｐｈｅｒａｌＣｏｍｐｏｎｅｎｔＩｎｔｅｒｃｏｎｎｅｃｔＥｘｐｒｅｓｓ）と互換性のある１つの例示的なバス技術などの高スループット技術を使用して実装することができる。また、この実施例では、メモリ４６０は、バス４６１によって、構成可能なプロセッサ４５０に結合される。メモリ４６０は、構成可能なプロセッサ４５０を有する回路基板上に配置されたオンボードメモリであってもよい。メモリ４６０は、ベースコール動作で使用される作業データの構成可能なプロセッサ４５０による高速アクセスに使用される。バス４６１はまた、ＰＣＩｅ規格と互換性のあるバス技術などの高スループット技術を使用して実装することもできる。 The sequencing machine 400 is coupled to the configurable processor 450 by a bus 405. The bus 405 can be implemented using a high-throughput technology, such as one exemplary bus technology compatible with the PCIe standard (Peripheral Component Interconnect Express) currently maintained and developed by the PCI-SIG standard (PCI Special Interest Group). Also, in this embodiment, the memory 460 is coupled to the configurable processor 450 by a bus 461. The memory 460 may be an on-board memory located on a circuit board having the configurable processor 450. The memory 460 is used for fast access by the configurable processor 450 of working data used in base calling operations. The bus 461 can also be implemented using a high-throughput technology, such as a bus technology compatible with the PCIe standard.

フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、粗粒化された再構成可能アレイ（Coarse Grained Reconfigurable Array、ＣＧＲＡ）、及び他の構成可能かつ再構成可能なデバイスを含む、構成可能なプロセッサは、コンピュータプログラムを実行する汎用プロセッサを使用して達成され得るよりも、より効率的に又はより高速に様々な機能を実装するように構成することができる。構成可能なプロセッサの構成は、時にはビットストリーム又はビットファイルと呼ばれる構成ファイルを生成するために機能的な説明を編集することと、構成ファイルをプロセッサ上の構成可能要素に配布することと、を含む。 Configurable processors, including field programmable gate arrays (FPGAs), coarse grained reconfigurable arrays (CGRAs), and other configurable and reconfigurable devices, can be configured to implement various functions more efficiently or faster than can be achieved using a general purpose processor running a computer program. Configuring a configurable processor involves compiling a functional description to generate a configuration file, sometimes called a bitstream or bitfile, and distributing the configuration file to the configurable elements on the processor.

構成ファイルは、データフローパターンを設定するように回路を構成することにより、分散メモリ及び他のオンチップメモリリソースの使用、ルックアップテーブルコンテンツ、構成可能な論理ブロックの動作、及び構成可能な論理ブロックの動作、及び構成可能なアレイの構成可能な相互接続及び他の要素のような構成可能な実行ユニットを含む。構成ファイルがフィールド内で変更され得る場合、ロードされた構成ファイルを変更することによって構成ファイルを変更することができる場合に構成可能なプロセッサは再構成可能である。例えば、構成ファイルは、揮発性ＳＲＡＭ要素内に、不揮発性読み書きメモリ素子内に記憶されてもよく、構成可能又は再構成可能なプロセッサ上の構成可能要素のアレイ間に分散されたものであってもよい。様々な市販の構成可能なプロセッサは、本明細書に記載されるようなベースコール動作において使用するのに好適である。例としては、ＸｉｌｉｎｘＡｌｖｅｏ（商標）Ｕ２００、ＸｉｌｉｎｘＡｌｖｅｏ（商標）Ｕ２５０、ＸｉｌｉｎｘＡｌｖｅｏ（商標）Ｕ２８０、Ｉｎｔｅｌ／ＡｌｔｅｒａＳｔｒａｔｉｘ（商標）ＧＸ２８００、Ｉｎｔｅｌ／ＡｌｔｅｒａＳｔｒａｔｉｘ（商標）ＧＸ２８００、及びＩｎｔｅｌＳｔｒａｔｉｘ（商標）ＧＸ１０Ｍなどの市販の製品が挙げられる。いくつかの実施例では、ホストＣＰＵは、構成可能なプロセッサと同じ集積回路上に実装することができる。 The configuration file configures the circuit to set data flow patterns, including the use of distributed memory and other on-chip memory resources, lookup table contents, the operation of configurable logic blocks, and configurable execution units such as configurable interconnects and other elements of the configurable array. A configurable processor is reconfigurable if the configuration file can be changed in the field by changing the loaded configuration file. For example, the configuration file may be stored in a volatile SRAM element, in a non-volatile read-write memory element, or distributed among an array of configurable elements on a configurable or reconfigurable processor. A variety of commercially available configurable processors are suitable for use in base call operations as described herein. Examples include commercially available products such as the Xilinx Alveo™ U200, Xilinx Alveo™ U250, Xilinx Alveo™ U280, Intel/Altera Stratix™ GX2800, Intel/Altera Stratix™ GX2800, and Intel Stratix™ GX10M. In some embodiments, the host CPU may be implemented on the same integrated circuit as the configurable processor.

本明細書に記載の実施形態は、構成可能なプロセッサ４５０を使用して、マルチサイクルニューラルネットワークを実装する。構成可能なプロセッサの構成ファイルは、高レベルの記述言語（high-level description language、ＨＤＬ）又はレジスタ転送レベル（register transfer level、ＲＴＬ）言語仕様を使用して実行される論理機能を指定することによって実装することができる。本明細書は、選択された構成可能なプロセッサが構成ファイルを発生させるように設計されたリソースを使用してコンパイルすることができる。構成可能なプロセッサではない場合がある特定用途向け集積回路の設計を発生させる目的で、同じ又は類似の仕様をコンパイルすることができる。 The embodiments described herein use a configurable processor 450 to implement a multi-cycle neural network. The configuration file for the configurable processor can be implemented by specifying the logic functions to be performed using a high-level description language (HDL) or register transfer level (RTL) language specification. This specification can be compiled using resources designed for a selected configurable processor to generate the configuration file. The same or similar specifications can be compiled to generate the design of an application specific integrated circuit, which may not be a configurable processor.

したがって、本明細書に記載される全ての実施形態における構成可能なプロセッサの代替例は、本明細書に記載されるニューラルネットワークベースのベースコール動作を実行するように構成された、特定用途向けＡＳＩＣ又は専用集積回路又は集積回路のセット、あるいはシステムオンチップＳＯＣデバイスを含む、構成されたプロセッサを含む。 Thus, alternative examples of the configurable processor in all of the embodiments described herein include a configured processor, including an application specific ASIC or dedicated integrated circuit or set of integrated circuits, or a system on chip SOC device, configured to perform the neural network based base calling operations described herein.

一般に、ニューラルネットワークの動作を実行するように構成された、本明細書に記載の構成可能なプロセッサ及び構成されたプロセッサは、本明細書ではニューラルネットワークプロセッサと呼ばれる。別の例では、非ニューラルネットワークベースのベースコーラの動作を実行するように構成された、本明細書に記載の構成可能なプロセッサ及び構成されたプロセッサは、本明細書では非ニューラルネットワークプロセッサと呼ばれる。一般に、構成可能なプロセッサ及び構成されたプロセッサは、本明細書で後述するように、ニューラルネットワークベースのベースコーラ及び非ニューラルネットワークベースのベースコーラの一方又は両方を実装するために使用することができる。 In general, the configurable and configured processors described herein that are configured to perform neural network operations are referred to herein as neural network processors. In another example, the configurable and configured processors described herein that are configured to perform non-neural network based base caller operations are referred to herein as non-neural network processors. In general, the configurable and configured processors may be used to implement one or both of a neural network based base caller and a non-neural network based base caller, as described later in this specification.

構成可能なプロセッサ４５０は、この実施例では、ＣＰＵ４０２によって実行されるプログラムを使用してロードされた構成ファイルによって、又は構成可能なプロセッサ４５４上の構成可能な要素のアレイを構成してベースコール機能を実行する他のソースによって構成されている。この実施例では、構成は、バス４０５及び４６１に結合され、ベースコール動作で使用される要素間でデータ及び制御パラメータを分配する機能を実行するデータフロー論理４５１を含む。 The configurable processor 450 is configured, in this example, by a configuration file loaded using a program executed by the CPU 402, or by other sources, that configures an array of configurable elements on the configurable processor 454 to perform the base calling function. In this example, the configuration includes data flow logic 451 coupled to buses 405 and 461, which performs the function of distributing data and control parameters between the elements used in the base calling operation.

また、構成可能なプロセッサ４５０は、マルチサイクルニューラルネットワークを実行するためにベースコール実行論理４５２を用いて構成されている。論理４５２は、複数のマルチサイクル実行クラスタ（例えば、４５３）を含み、これは、この実施例では、マルチサイクルクラスタ１からマルチサイクルクラスタＸを含む。マルチサイクルクラスタの数は、動作の所望のスループットを伴うトレードオフ、及び構成可能なプロセッサ上の利用可能なリソースに従って選択することができる。 The configurable processor 450 is also configured with base call execution logic 452 to execute the multi-cycle neural network. The logic 452 includes a number of multi-cycle execution clusters (e.g., 453), which in this example include multi-cycle cluster 1 through multi-cycle cluster X. The number of multi-cycle clusters can be selected according to tradeoffs with the desired throughput of operation and available resources on the configurable processor.

マルチサイクルクラスタは、構成可能なプロセッサ上の構成可能な相互接続及びメモリリソースを使用して実装されるデータフロー経路４５４によってデータフロー論理４５１に結合される。また、マルチサイクルクラスタは、例えば構成可能なプロセッサ上の構成可能な相互接続及びメモリリソースを使用して実装された制御経路４５５によってデータフロー論理４５１に結合されている。それは、利用可能なクラスタ、ニューラルネットワークの動作の実行のための入力ユニットを利用可能なクラスタに提供する準備ができていること、ニューラルネットワークの訓練されたパラメータを提供する準備ができていること、ベースコール分類データの出力パッチを提供する準備ができていること、及びニューラルネットワークの実行に使用される他の制御データを示す、制御信号を提供する。 The multi-cycle clusters are coupled to the data flow logic 451 by data flow paths 454 implemented using configurable interconnect and memory resources on a configurable processor. The multi-cycle clusters are also coupled to the data flow logic 451 by control paths 455 implemented using, for example, configurable interconnect and memory resources on a configurable processor. It provides control signals indicating available clusters, readiness to provide input units to available clusters for execution of neural network operations, readiness to provide trained parameters of the neural network, readiness to provide output patches of base call classification data, and other control data used in the execution of the neural network.

構成可能なプロセッサは、訓練されたパラメータを使用してマルチサイクルニューラルネットワークの動作を実行して、ベースフロー動作の感知サイクルに関する分類データを生成するように構成されている。ニューラルネットワークの動作を実行して、ベースコール動作の被験者感知サイクルの分類データを生成する。ニューラルネットワークの動作は、Ｎ個の感知サイクルのそれぞれの感知サイクルからのタイルデータのアレイの数Ｎを含む配列に対して動作し、Ｎ個の感知サイクルは、本明細書に記載される実施例では、時系列における動作ごとに１つの塩基位置に対する異なるベースコール動作のセンサデータを提供する。任意選択的に、Ｎ個の感知サイクルのうちのいくつかは、実行されている特定のニューラルネットワークモデルに従って必要に応じて、配列から外れることができる。数Ｎは、１を超える任意の数であり得る。本明細書に記載されるいくつかの実施例では、Ｎ個の感知サイクルの感知サイクルは、時系列で、被験者感知サイクルに先行する少なくとも１つの感知サイクル、及び被験者サイクルに後続する少なくとも１つの感知サイクルについての感知サイクルのセットを表す。本明細書では、数Ｎが５以上の整数である実施例が記載される。 The configurable processor is configured to perform operations of the multi-cycle neural network using the trained parameters to generate classification data for sensing cycles of the baseflow operation. The neural network operations are performed to generate classification data for subject sensing cycles of the basecalling operation. The neural network operations operate on an array including a number N of arrays of tile data from each of the N sensing cycles, which in the examples described herein provide sensor data for different basecalling operations for one base position per operation in the time series. Optionally, some of the N sensing cycles can fall out of the array as needed according to the particular neural network model being performed. The number N can be any number greater than 1. In some examples described herein, the sensing cycles of the N sensing cycles represent a set of sensing cycles for at least one sensing cycle preceding the subject sensing cycle and at least one sensing cycle following the subject cycle in the time series. Examples described herein are in which the number N is an integer greater than or equal to 5.

データフロー論理４５１は、Ｎ個のアレイの空間的に位置合わせされたパッチのタイルデータを含む所与の動作のための入力ユニットを使用して、ニューラルネットワークの動作のために、メモリ４６０から、構成可能なプロセッサに、タイルデータ、及びモデルの少なくともいくつかの訓練されたパラメータを移動させるように構成されている。入力ユニットは、１回のＤＭＡ動作におけるダイレクトメモリアクセス動作によって、又は、配備されたニューラルネットワークの実行と協調して、利用可能なタイムスロットの間に移動するより小さいユニット内で移動させることができる。 The data flow logic 451 is configured to use an input unit for a given operation that includes tile data for the N arrays of spatially aligned patches to move the tile data and at least some trained parameters of the model from the memory 460 to the configurable processor for operation of the neural network. The input unit can be moved by direct memory access operations in a single DMA operation, or in smaller units that move during available time slots in coordination with the execution of the deployed neural network.

本明細書に記載される感知サイクルのタイルデータは、１つ以上の特徴を有するセンサデータのアレイを含むことができる。例えば、センサデータは、ＤＮＡ、ＲＮＡ、又は他の遺伝物質の遺伝的配列における塩基位置で４塩基のうちの１つを識別するために分析される２つの画像を含むことができる。タイルデータはまた、画像及びセンサに関するメタデータを含むことができる。例えば、ベースコール動作の実施形態では、タイルデータは、タイル上の遺伝物質のクラスタの中心からのセンサデータのアレイ内の各ピクセルの距離を示す中心情報からの距離などの、クラスタとの画像の位置合わせに関する情報を含むことができる。 The tile data of the sensing cycles described herein can include an array of sensor data having one or more features. For example, the sensor data can include two images that are analyzed to identify one of four bases at a base position in a genetic sequence of DNA, RNA, or other genetic material. The tile data can also include metadata about the images and the sensor. For example, in an embodiment of a base calling operation, the tile data can include information about the alignment of the images with the clusters, such as distance from center information indicating the distance of each pixel in the array of sensor data from the center of a cluster of genetic material on the tile.

以下に記載されるようなマルチサイクルニューラルネットワークの実行中に、タイルデータはまた、中間データと呼ばれる、マルチサイクルニューラルネットワークの実行中に生成されたデータを含むことができ、これは、マルチサイクルニューラルネットワークの実行中に再計算されるのではなく再利用され得る。例えば、マルチサイクルニューラルネットワークの実行中に、データフロー論理は、タイルデータのアレイの所与のパッチのセンサデータの代わりに、中間データをメモリ４６０に書き込むことができる。このような実施形態は、以下により詳細に記載される。 During execution of the multi-cycle neural network as described below, the tile data may also include data generated during execution of the multi-cycle neural network, referred to as intermediate data, which may be reused rather than recomputed during execution of the multi-cycle neural network. For example, during execution of the multi-cycle neural network, the data flow logic may write the intermediate data to memory 460 in place of the sensor data for a given patch of the array of tile data. Such an embodiment is described in more detail below.

図示されているように、ベースコール動作の感知サイクルからタイルのセンサデータを含むタイルデータを記憶するランタイムプログラムによってアクセス可能なメモリ（例えば、４６０）を含む、ベースコールセンサ出力の分析のためのシステムが説明される。また、システムは、メモリへのアクセスを有する構成可能なプロセッサ４５０などのニューラルネットワークプロセッサを含む。ニューラルネットワークプロセッサは、訓練されたパラメータを使用してニューラルネットワークの動作を実行して、感知サイクルのための分類データを生成するように構成される。本明細書に記載されるように、ニューラルネットワークの動作は、被験者サイクルを含むＮ個の感知サイクルのそれぞれの感知サイクルからタイルデータのＮ個のアレイの配列で動作して、被験者サイクルの分類データを生成する。データフロー論理４５１は、Ｎ個の感知サイクルのそれぞれの感知サイクルからのＮ個のアレイの空間的に位置合わせされたパッチのデータを含む入力ユニットを使用して、ニューラルネットワークの実行のために、メモリからニューラルネットワークプロセッサにタイルデータ及び訓練されたパラメータを移動させるために提供される。 As shown, a system for analysis of base calling sensor output is described that includes a memory (e.g., 460) accessible by a runtime program that stores tile data including sensor data for tiles from a sensing cycle of a base calling operation. The system also includes a neural network processor, such as a configurable processor 450, having access to the memory. The neural network processor is configured to perform operations of the neural network using trained parameters to generate classification data for the sensing cycle. As described herein, the operations of the neural network operate on an arrangement of N arrays of tile data from each sensing cycle of the N sensing cycles that comprise a subject cycle to generate classification data for the subject cycle. Data flow logic 451 is provided to move the tile data and trained parameters from the memory to the neural network processor for execution of the neural network using an input unit that includes data for the spatially aligned patches of the N arrays from each sensing cycle of the N sensing cycles.

また、ニューラルネットワークプロセッサがメモリへのアクセスを有し、複数の実行クラスタを含み、複数の実行クラスタ内の実行論理クラスタがニューラルネットワークを実行するように構成されているシステムも説明される。データフロー論理は、メモリへのアクセス、及び複数の実行クラスタ内のクラスタを実行して、複数の実行クラスタ内の利用可能な実行クラスタにタイルデータの入力ユニットを提供し、入力ユニットは、それぞれの感知サイクルからタイルデータのアレイの空間的に位置合わせされたパッチの数Ｎを含む、入力ユニットと、被験者感知サイクルを含み、Ｎ個の空間的に位置合わせされたパッチをニューラルネットワークに適用して、被験者感知サイクルの空間的に位置合わせされたパッチの分類データの出力パッチを生成させるように、実行クラスタに、ニューラルネットワークにＮ個の空間的に位置合わせされたパッチを適用させることと、を含み、Ｎは１より大きい。 Also described is a system in which a neural network processor has access to a memory and includes a plurality of execution clusters, and execution logic clusters in the plurality of execution clusters are configured to execute the neural network. The data flow logic includes access to the memory and executes the clusters in the plurality of execution clusters to provide an input unit of tile data to an available execution cluster in the plurality of execution clusters, the input unit including an input unit including a number N of spatially aligned patches of the array of tile data from each sensing cycle, and a subject sensing cycle, and causing the execution cluster to apply the N spatially aligned patches to the neural network to generate an output patch of classification data for the spatially aligned patches of the subject sensing cycle, where N is greater than 1.

図５は、ホストプロセッサによって実行されるランタイムプログラムの機能を含む、ベースコール動作の態様を示す簡略図である。この図では、（図１及び図２に示されたものなどの）フローセルからの画像センサの出力は、ライン５００上で画像処理スレッド５０１に提供され、画像処理スレッド５０１は、個々のタイルのセンサデータのアレイの再サンプリング、位置合わせ及び配置などの画像に対するプロセスを実行することができ、フローセル内の各タイルのタイルクラスタマスクを計算するプロセスによって使用することができ、フローセルの対応するタイル上の遺伝物質のクラスタに対応するセンサデータのアレイ内のピクセルを識別するプロセスによって使用することができる。クラスタマスクを計算するために、１つの例示的なアルゴリズムは、ソフトマックス出力から導出されたメトリックを使用して初期配列決定サイクルで信頼できないクラスタを検出するプロセスに基づいており、次いで、それらのウェル／クラスタからのデータは廃棄され、それらのクラスタの出力データは生成されない。例えば、プロセスは、最初のＮ１個の（例えば、２５個の）ベースコール中に信頼性が高いクラスタを識別し、他のクラスタを拒否することができる。拒否されたクラスタは、基準によるとポリクローナル又は非常に弱い強度又は不明瞭であり得る。この手順は、ホストＣＰＵで実行することができる。代替の実装形態では、潜在的にこの情報を使用して、ＣＰＵに戻されるべき対象となる必要なクラスタを識別し、それにより、中間データに必要なストレージを制限し得る。 5 is a simplified diagram showing aspects of the base calling operation, including the functionality of the runtime program executed by the host processor. In this diagram, the output of an image sensor from a flow cell (such as that shown in FIGS. 1 and 2) is provided on line 500 to an image processing thread 501, which can perform processes on the image such as resampling, alignment and positioning of the array of sensor data for individual tiles, which can be used by a process to calculate a tile cluster mask for each tile in the flow cell, which can be used by a process to identify pixels in the array of sensor data that correspond to clusters of genetic material on the corresponding tile of the flow cell. To calculate the cluster mask, one exemplary algorithm is based on a process that detects unreliable clusters in early sequencing cycles using a metric derived from the softmax output, and then data from those wells/clusters is discarded and no output data is generated for those clusters. For example, the process can identify clusters that are highly reliable during the first N1 (e.g., 25) base calls and reject other clusters. Rejected clusters can be polyclonal or very weak intensity or unclear according to the criteria. This procedure can be executed by the host CPU. Alternative implementations could potentially use this information to identify necessary clusters that are to be returned to the CPU, thereby limiting the storage required for intermediate data.

画像処理スレッド５０１の出力は、ライン５０２上でＣＰＵ内のディスパッチ論理５１０に提供され、ディスパッチ論理５１０は、ベースコール動作の状態に従って、タイルデータのアレイを、高速バス５０３上でデータキャッシュ５０４に、又は高速バス５０５上で、図４の構成可能なプロセッサなどのハードウェア５２０にルーティングする。ハードウェア５２０は、本明細書において後で論じられるように、ニューラルネットワークベースのベースコーラを実行するためのマルチクラスタニューラルネットワークプロセッサであり得るか、又は非ニューラルベースのベースコーラを実行するためのハードウェアであり得る。 The output of image processing thread 501 is provided on line 502 to dispatch logic 510 in the CPU, which routes the array of tile data to data cache 504 on high speed bus 503 or to hardware 520, such as the configurable processor of FIG. 4, on high speed bus 505, depending on the state of the base call operation. Hardware 520 may be a multi-cluster neural network processor for performing a neural network-based base call, as discussed later in this specification, or may be hardware for performing a non-neural based base call.

ハードウェア５２０は、（例えば、ニューラルネットワークのベースコーラ及び／又は非ニューラルネットワークのベースコーラによって出力された）分類データをディスパッチ論理５１０に返し、ディスパッチ論理５１０は、情報をデータキャッシュ５０４に、又はライン５１１上でスレッド５０２に渡し、それは、分類データを使用してベースコール及び品質スコア計算を実行し、ベースコール読み取りのための標準フォーマットでデータを配置することができる。ベースコール及び品質スコア計算を実行するスレッド５０２の出力は、ライン５１２上でスレッド５０３に提供され、それは、ベースコール読み取りを集約し、データ圧縮などの他の動作を実行し、結果として得られたベースコール出力を顧客による利用のために指定された宛先に書き込む。 Hardware 520 returns classification data (e.g., output by the neural network base caller and/or the non-neural network base caller) to dispatch logic 510, which passes the information to data cache 504 or on line 511 to thread 502, which can use the classification data to perform base calling and quality score calculations and place the data in a standard format for base call reads. The output of thread 502, which performs base calling and quality score calculations, is provided on line 512 to thread 503, which aggregates the base call reads, performs other operations such as data compression, and writes the resulting base call output to a specified destination for consumption by the customer.

いくつかの実施形態では、ホストは、ニューラルネットワークをサポートするハードウェア５２０の出力の最終処理を実行する、スレッド（図示せず）を含むことができる。例えば、ハードウェア５２０は、マルチクラスタニューラルネットワークの最終層から分類データの出力を提供することができる。ホストプロセッサは、ベースコール及び品質スコアスレッド５０２によって使用されるデータを設定するために、分類データを超えて、ソフトマックス関数などの出力起動機能を実行することができる。また、ホストプロセッサは、ハードウェア５２０に入力する前に、タイルデータの再サンプリング、バッチ正規化又は他の調整などの入力動作（図示せず）を実行することができる。 In some embodiments, the host may include a thread (not shown) that performs final processing of the output of the hardware 520 supporting the neural network. For example, the hardware 520 may provide an output of classification data from a final layer of a multi-cluster neural network. The host processor may perform output activation functions, such as a softmax function, over the classification data to populate the data used by the base calling and quality score thread 502. The host processor may also perform input operations (not shown), such as resampling, batch normalization, or other adjustments to the tile data before inputting it to the hardware 520.

図６は、図４の構成可能なプロセッサなど、構成可能なプロセッサの構成の簡略図である。図６では、構成可能なプロセッサは、複数の高速ＰＣＩｅインターフェースを有するＦＰＧＡを備える。ＦＰＧＡは、図１を参照しながら記載されたデータフロー論理を含むラッパー６００を用いて構成されている。ラッパー６００は、ＣＰＵ通信リンク６０９を介してＣＰＵ内のランタイムプログラムとのインターフェース及び調整を管理し、ＤＲＡＭ通信リンク６１０を介してオンボードＤＲＡＭ６０２（例えば、メモリ４６０）との通信を管理する。ラッパー６００内のデータフロー論理は、数Ｎのサイクルのために、オンボードＤＲＡＭ６０２上のタイルデータのアレイをクラスタ６０１まで横断することによって取得されたパッチデータを提供し、クラスタ６０１からプロセスデータ６１５を取得して、オンボードＤＲＡＭ６０２に配信する。ラッパー６００はまた、タイルデータの入力アレイと、分類データの出力パッチの両方について、オンボードＤＲＡＭ６０２とホストメモリとの間のデータの転送を管理する。ラッパーは、ライン６１３上でパッチデータを、割り当てられたクラスタ６０１に転送する。ラッパーは、クラスタ６０１に、ライン６１２上で、オンボードＤＲＡＭ６０２から取得された重みやバイアスなどの訓練されたパラメータを提供する。ラッパーは、クラスタ６０１に、ライン６１１上で、ＣＰＵ通信リンク６０９を介してホスト上のランタイムプログラムから提供されるか又はそれに応答して発生した構成及び制御データを提供する。クラスタはまた、タイルデータのアレイの横断を管理して空間的に位置合わせされたパッチデータを提供し、かつクラスタ６０１のリソースを使用してパッチデータ上でベースコールのマルチサイクルニューラルネットワーク及び／又は非ニューラルネットワークベースのベースコールの動作を実行するために、ホストからの制御信号と協働して使用される状態信号を、ライン６１６上でラッパー６００に提供することができる。 FIG. 6 is a simplified diagram of a configuration of a configurable processor, such as the configurable processor of FIG. 4. In FIG. 6, the configurable processor comprises an FPGA with multiple high-speed PCIe interfaces. The FPGA is configured with a wrapper 600 including the data flow logic described with reference to FIG. 1. The wrapper 600 manages interfacing and coordination with the runtime program in the CPU via CPU communication link 609 and manages communication with the on-board DRAM 602 (e.g., memory 460) via DRAM communication link 610. The data flow logic in the wrapper 600 provides patch data obtained by traversing an array of tile data on the on-board DRAM 602 to the cluster 601 for a number N of cycles, and obtains process data 615 from the cluster 601 and delivers it to the on-board DRAM 602. The wrapper 600 also manages the transfer of data between the on-board DRAM 602 and the host memory for both the input array of tile data and the output patch of classification data. The wrapper transfers the patch data on line 613 to the assigned cluster 601. The wrapper provides the cluster 601 with trained parameters such as weights and biases obtained from on-board DRAM 602 on line 612. The wrapper provides the cluster 601 with configuration and control data provided from or generated in response to a runtime program on the host via CPU communication link 609 on line 611. The cluster can also provide status signals to the wrapper 600 on line 616 that are used in conjunction with control signals from the host to manage traversal of an array of tile data to provide spatially aligned patch data and to perform multi-cycle neural network and/or non-neural network based base calling operations of base calling on the patch data using the resources of the cluster 601.

上述のように、タイルデータの複数のパッチのうちの対応するパッチ上で実行するように構成されたラッパー６００によって管理される単一の構成可能なプロセッサ上に複数のクラスタが存在し得る。各クラスタは、本明細書に記載される複数の感知サイクルのタイルデータを使用して、被験者感知サイクルにおけるベースコールの分類データを提供するように構成することができる。 As described above, there may be multiple clusters on a single configurable processor managed by wrapper 600 configured to run on corresponding ones of the multiple patches of tile data. Each cluster may be configured to provide classification data for base calls in a subject sensing cycle using the tile data of multiple sensing cycles as described herein.

システムの例では、フィルタ重み及びバイアスのようなカーネルデータを含むモデルデータをホストＣＰＵから構成可能なプロセッサに送信することができ、その結果、モデルは、サイクル数の関数として更新され得る。ベースコール動作は、代表的な例では、数百の感知サイクルの順序で含むことができる。ベースコール動作は、いくつかの実施形態では、ペア端部読み取りを含むことができる。例えば、モデル訓練されたパラメータは、２０サイクルごと（又は他の数のサイクル）ごとに、又は特定のシステムに実装される更新パターンに従って更新されてもよい。タイル上の遺伝的クラスタ内の所与のストリングのための配列が、ストリングの第１の端部から下方に（又は上方に）延在する第１の部分と、ストリングの第２の端部から上方に（又は下方に）に延在する第２の部分とを含む、ペア端部読み取りを含むいくつかの実施形態では、訓練されたパラメータは、第１の部分から第２の部分への遷移で更新され得る。 In an example system, model data, including kernel data such as filter weights and biases, can be sent from the host CPU to the configurable processor, so that the model can be updated as a function of cycle number. Base calling operations can include, in a representative example, on the order of hundreds of sensing cycles. The base calling operations can include paired end reads in some embodiments. For example, the model trained parameters may be updated every 20 cycles (or other number of cycles) or according to an update pattern implemented for a particular system. In some embodiments where the sequence for a given string in a genetic cluster on a tile includes paired end reads that include a first portion extending down (or up) from a first end of the string and a second portion extending up (or down) from a second end of the string, the trained parameters can be updated at the transition from the first portion to the second portion.

いくつかの実施例では、タイルのための感知データの複数サイクルの画像データは、ＣＰＵからラッパー６００に送信され得る。ラッパー６００は、任意選択的に、感知データの一部の前処理及び変換を行い、その情報をオンボードＤＲＡＭ６０２に書き込むことができる。各感知サイクルの入力タイルデータは、タイル当たり感知サイクル当たり４０００ｘ３０００ピクセル以上を含むセンサデータのアレイを含むことができ、２つの特徴はタイルの２つの画像の色を表し、１ピクセル当たり１つ又は２つのバイトを含むセンサデータのアレイを含むことができる。数Ｎが、マルチサイクルニューラルネットワークの各動作において使用される３回の感知サイクルである一実施形態では、マルチサイクルニューラルネットワークの各動作のためのタイルデータのアレイは、タイル当たり数百メガバイトの数で消費することができる。システムのいくつかの実施形態では、タイルデータはまた、タイルごとに１回記憶されたＤＦＣデータのアレイ、又はセンサデータ及びタイルに関する他のタイプのメタデータも含む。 In some implementations, image data for multiple cycles of sensor data for a tile may be sent from the CPU to the wrapper 600. The wrapper 600 may optionally perform some pre-processing and conversion of the sensor data and write the information to the on-board DRAM 602. The input tile data for each sensing cycle may include an array of sensor data including 4000 x 3000 pixels or more per sensing cycle per tile, with two features representing the colors of the two images of the tile, and including one or two bytes per pixel. In one embodiment, where the number N is three sensing cycles used in each operation of the multi-cycle neural network, the array of tile data for each operation of the multi-cycle neural network may consume in the hundreds of megabytes per tile. In some embodiments of the system, the tile data also includes an array of DFC data stored once per tile, or other types of metadata about the sensor data and the tile.

動作中、マルチサイクルクラスタが利用可能である場合、ラッパーは、パッチをクラスタに割り当てる。ラッパーは、タイルの横断面にタイルデータの次のパッチをフェッチし、適切な制御及び構成情報とともに割り当てられたクラスタに送信する。クラスタは、構成可能なプロセッサ上の十分なメモリを用いて構成されて、パッチを含むデータのパッチを、定位置に処理されているいくつかのシステム内で複数サイクルから保持するのに十分なメモリを有するように構成することができ、様々な実施形態では、ピンポンバッファ技術又はラスタ走査技術を使用して処理される。 During operation, if a multi-cycle cluster is available, the wrapper assigns the patch to the cluster. The wrapper fetches the next patch of tile data for the cross section of the tile and sends it to the assigned cluster along with the appropriate control and configuration information. The cluster can be configured with enough memory on the configurable processor to have enough memory to hold the patch of data, including the patch, from multiple cycles in some systems that are processed in place, and in various embodiments are processed using ping-pong buffer techniques or raster scan techniques.

割り当てられたクラスタが、現在のパッチのニューラルネットワークのその動作を完了し、出力パッチを生成すると、それはラッパーに信号を送る。ラッパーは、割り当てられたクラスタから出力パッチを読み出すか、あるいは割り当てられたクラスタは、データをラッパーにプッシュする。次いで、ラッパーは、ＤＲＡＭ６０２内の処理されたタイルのための出力パッチを組み立てることになる。タイル全体の処理が完了し、データの出力パッチがＤＲＡＭに転送されると、ラッパーは、処理された出力アレイを、特定のフォーマットでホスト／ＣＰＵに返送する。いくつかの実施形態では、オンボードＤＲＡＭ６０２は、ラッパー６００内のメモリ管理論理によって管理される。ランタイムプログラムは、リアルタイム分析を提供するために連続フローで動作する全てのサイクルについての全てのタイルデータのアレイの分析を完了するために、配列決定動作を制御することができる。 When the assigned cluster completes its operation of the neural network for the current patch and generates an output patch, it signals the wrapper. The wrapper will either read the output patch from the assigned cluster or the assigned cluster will push the data to the wrapper. The wrapper will then assemble the output patch for the processed tile in DRAM 602. Once the processing of the entire tile is complete and the output patch of data is transferred to the DRAM, the wrapper will send the processed output array back to the host/CPU in a specific format. In some embodiments, the on-board DRAM 602 is managed by memory management logic in the wrapper 600. The runtime program can control the sequencing operations to complete the analysis of the array of all tile data for every cycle running in a continuous flow to provide real-time analysis.

複数のベースコーラ
図６Ａは、バイオセンサによって出力された生画像（すなわち、センサデータ）に対するベースコール動作のために２つ以上のベースコーラを採用するシステム６００を示す。例えば、システム６００は、図１に関して論じられる（また、本明細書で後に図１４に関しても論じられる）配列決定マシンなどの配列決定マシン１４０４を含む。配列決定マシン１４０４は、図１～図３に関して論じたフローセルなどのフローセル１４０５を含む。フローセル１４０５は、複数のタイル１４０６を含み、各タイル１４０６は、図２及び図３に関して論じたように、複数のクラスタ１４０７（例えば、単一タイルの例示的なクラスタが図６Ａに示されている）を含む。図４～図６に関して論じたように、タイル１４０６からの生画像を含むセンサデータ１４１２は、配列決定マシン１４０４によって出力される。 Multiple Base Callers Figure 6A illustrates a system 600 that employs more than one base caller for base calling operations on raw images (i.e., sensor data) output by a biosensor. For example, the system 600 includes a sequencing machine 1404, such as the sequencing machine discussed with respect to Figure 1 (and also discussed with respect to Figure 14 later herein). The sequencing machine 1404 includes a flow cell 1405, such as the flow cell discussed with respect to Figures 1-3. The flow cell 1405 includes multiple tiles 1406, each of which includes multiple clusters 1407 (e.g., an exemplary cluster for a single tile is shown in Figure 6A), as discussed with respect to Figures 2 and 3. Sensor data 1412, including raw images from the tiles 1406, is output by the sequencing machine 1404, as discussed with respect to Figures 4-6.

一実施形態では、システム６００は、第１のベースコーラ１４１４及び第２のベースコーラ１４１６などの２つ以上のベースコーラを含む。２つのベースコーラが図に示されているが、一例では、３つ以上のベースコーラがシステム６００に存在してもよい。 In one embodiment, the system 600 includes two or more base colers, such as a first base coler 1414 and a second base coler 1416. Although two base colers are shown in the figure, in one example, three or more base colers may be present in the system 600.

図６Ａの各ベースコーラは、対応するベースコール分類情報を出力する。例えば、第１のベースコーラ１４１４は第１のベースコール分類情報１４３４を出力し、第２のベースコーラ１４１６は第２のベースコール分類情報１４３６を出力する。ベースコール結合モジュール１４２８は、第１のベースコール分類情報１４３４及び／又は第２のベースコール分類情報１４３６の一方又は両方に基づいて、最終ベースコール１４４０を生成する。 Each base caller in FIG. 6A outputs corresponding base call classification information. For example, the first base caller 1414 outputs first base call classification information 1434, and the second base caller 1416 outputs second base call classification information 1436. The base call combination module 1428 generates a final base call 1440 based on one or both of the first base call classification information 1434 and/or the second base call classification information 1436.

一例では、第１のベースコーラ１４１４は、ニューラルネットワークベースのベースコーラである。例えば、第１のベースコーラ１４１４は、本明細書で後述するように、ベースコールのための１つ以上のニューラルネットワークモデルを採用する非線形システムである。第１のベースコーラ１４１４は、本明細書では、ＤｅｅｐＲＴＡ（深層リアルタイム分析）ベースコーラ又は深層ニューラルネットワークベースコーラとも呼ばれる。 In one example, the first base caller 1414 is a neural network-based base caller. For example, the first base caller 1414 is a nonlinear system that employs one or more neural network models for base calling, as described later in this specification. The first base caller 1414 is also referred to herein as a DeepRTA (deep real-time analysis) base caller or a deep neural network base caller.

一例では、第２のベースコーラ１４１６は、非ニューラルネットワークベースのベースコーラである。例えば、第２のベースコーラ１４１６は、少なくとも部分的に、ベースコールに使用される線形システムである。例えば、第２のベースコーラ１４１６は、本明細書で後に論じられるように、ベースコールのためのニューラルネットワークを採用しない（又は、第１のベースコーラ１４１４によって使用されるより大きいニューラルネットワークモデルと比較して、ベースコールのためのより小さいニューラルネットワークモデルを使用する）。第２のベースコーラ１４１６は、本明細書では、ＲＴＡ（リアルタイム分析）ベースコーラとも呼ばれる。 In one example, the second base caller 1416 is a non-neural network based base caller. For example, the second base caller 1416 is a linear system used, at least in part, for base calling. For example, the second base caller 1416 does not employ a neural network for base calling (or uses a smaller neural network model for base calling compared to the larger neural network model used by the first base caller 1414), as discussed later herein. The second base caller 1416 is also referred to herein as an RTA (real-time analysis) base caller.

ＤｅｅｐＲＴＡ（又はディープニューラルネットワーク）ベースコーラ及びＲＴＡベースコーラの例は、２０２０年３月２０日に出願された「ＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＢａｓｅＣａｌｌｉｎｇ」と題する米国特許非仮出願第１６／８２６，１２６号（代理人整理番号ＩＬＬＭ１００８－１８／ＩＰ－１７４４－ＵＳ）において論じられており、これは、本明細書に完全に記載されているかのように、あらゆる目的のために参照により組み込まれる。 Examples of DeepRTA (or deep neural network) based callers and RTA based callers are discussed in U.S. Nonprovisional Patent Application No. 16/826,126, entitled "Artificial Intelligence-Based Base Calling," filed March 20, 2020 (Attorney Docket No. ILLM1008-18/IP-1744-US), which is incorporated by reference for all purposes as if fully set forth herein.

図６Ａのシステム６００の動作の更なる詳細、並びに第１のベースコーラ１４１４及び第２のベースコーラ１４１６の更なる例は、例えば図１４に関して、本明細書において後で更に詳細に論じられる。 Further details of the operation of the system 600 of FIG. 6A, as well as further examples of the first base colander 1414 and the second base colander 1416, are discussed in more detail later in this specification, for example with respect to FIG. 14.

非ニューラルネットワークベースの少なくとも部分的に線形のベースコーラ（例えば、図６Ａ及び図１４の第２のベースコーラ１４１６）。
図６Ａに関して論じたように、第２のベースコーラ１４１６は、非ニューラルネットワークベースの少なくとも部分的に線形のベースコーラである。すなわち、第２のベースコーラ１４１６は、ベースコールのためのニューラルネットワークを採用しない（又は、第１のベースコーラ１４１４によって使用されるより大きいニューラルネットワークモデルと比較して、ベースコールのためのより小さいニューラルネットワークモデルを使用する）。第２のベースコーラ１４１６の一例は、ＲＴＡベースコーラである。 A non-neural network based at least partially linear basis caller (eg, second basis caller 1416 in FIG. 6A and FIG. 14).
6A, the second base caller 1416 is a non-neural network based at least partially linear base caller, i.e., the second base caller 1416 does not employ a neural network for base calling (or uses a smaller neural network model for base calling compared to the larger neural network model used by the first base caller 1414). One example of the second base caller 1416 is an RTA base caller.

ＲＴＡは、線形強度抽出器を使用して、ベースコールのための配列決定画像から特徴を抽出するベースコーラである。以下の議論は、ＲＴＡによる強度抽出及びベースコールの一実装形態を説明する。この実装形態では、ＲＴＡは、テンプレートサイクルと呼ばれるいくつかの初期配列決定サイクルからの配列決定画像を使用してタイル上のクラスタの位置を識別するテンプレート画像を生成するためにテンプレート生成ステップを実行する。テンプレート画像は、後続の位置合わせ及び強度抽出ステップのための基準として使用される。テンプレート画像は、テンプレートサイクルの各配列決定画像内の輝点を検出してマージすることによって生成され、これは次いで、配列決定画像を鮮鋭化し（例えば、ラプラシアン畳み込みを使用して）、空間的に分離された大津の手法によって「オン」閾値を決定すること、及びサブピクセル位置補間を伴う後続の５ピクセル局所最大検出を含む。別の例では、タイル上のクラスタの位置は、基準マーカを使用して識別される。生物学的標本が画像化される固体支持体は、固体支持体に付着されたプローブに対する標本又はその画像の配向の決定を容易にするために、そのような基準マーカを含むことができる。例示的な基準としては、限定されないが、ビーズ（蛍光部分又は標識プローブが結合することができる核酸などの部分を有するか又は有さない）、既知の又は決定可能な特徴に付着した蛍光分子、又は形態学的形状を蛍光部分と組み合わせる構造が挙げられる。例示的な基準は、参照により本明細書に組み込まれる米国特許出願公開第２００２／０１５０９０９号に記載されている。 The RTA is a base caller that uses a linear intensity extractor to extract features from sequencing images for base calling. The following discussion describes one implementation of intensity extraction and base calling with the RTA. In this implementation, the RTA performs a template generation step to generate a template image that identifies the location of clusters on a tile using sequencing images from several initial sequencing cycles, called template cycles. The template image is used as a reference for the subsequent alignment and intensity extraction steps. The template image is generated by detecting and merging bright points in each sequencing image of the template cycle, which then includes sharpening the sequencing image (e.g., using a Laplacian convolution), determining an "on" threshold by spatially separated Otsu's method, and subsequent 5-pixel local maximum detection with sub-pixel position interpolation. In another example, the location of the clusters on the tile is identified using fiducial markers. The solid support on which the biological specimen is imaged can include such fiducial markers to facilitate the determination of the orientation of the specimen or its image relative to the probe attached to the solid support. Exemplary criteria include, but are not limited to, beads (with or without moieties such as fluorescent moieties or nucleic acids to which labeled probes can bind), fluorescent molecules attached to known or determinable features, or structures that combine a morphological shape with a fluorescent moiety. Exemplary criteria are described in U.S. Patent Application Publication No. 2002/0150909, which is incorporated herein by reference.

次いで、ＲＴＡは、テンプレート画像に対して現在の配列決定画像を登録する。これは、画像相関を使用して現在の配列決定画像をサブ領域上のテンプレート画像に位置合わせすることによって、又は非線形変換（例えば、完全６パラメータ線形アフィン変換）を使用することによって達成される。 The RTA then registers the current sequencing image to the template image. This is accomplished by aligning the current sequencing image to the template image on the subregion using image correlation, or by using a nonlinear transformation (e.g., a full six-parameter linear affine transformation).

ＲＴＡは、配列決定画像のカラーチャネル間のクロストークを補正するためのカラーマトリックスを生成する。ＲＴＡは、経験的な位相補正を実施して、位相誤差によって引き起こされる配列決定画像中の雑音を補償する。 The RTA generates a color matrix to correct for crosstalk between color channels in the sequencing image. The RTA performs an empirical phase correction to compensate for noise in the sequencing image caused by phase errors.

異なる補正が配列決定画像に適用された後、ＲＴＡは、配列決定画像における各スポット位置についての信号強度を抽出する。例えば、所与のスポット位置について、信号強度は、スポット位置内のピクセルの強度の加重平均を決定することによって抽出され得る。例えば、中心ピクセル及び隣接ピクセルの加重平均は、バイリニア又はバイキュービック補間を用いて実行されてもよい。いくつかの実装形態では、画像内の各スポット位置は、数ピクセル（例えば、１～５ピクセル）を含んでもよい。 After the different corrections are applied to the sequencing image, the RTA extracts the signal intensity for each spot location in the sequencing image. For example, for a given spot location, the signal intensity may be extracted by determining a weighted average of the intensities of the pixels within the spot location. For example, the weighted average of the central pixel and neighboring pixels may be performed using bilinear or bicubic interpolation. In some implementations, each spot location in the image may include several pixels (e.g., 1-5 pixels).

次いで、ＲＴＡは、抽出された信号強度を空間的に正規化して、サンプリングされた撮像された全体にわたる照明の変動を説明する。例えば、強度値は、５番目及び９５番目のパーセンタイルがそれぞれ０及び１の値を有するように正規化され得る。画像の正規化された信号強度（例えば、各チャネルの正規化された強度）を使用して、画像内の複数のスポットの平均純正度を計算することができる。 The RTA then spatially normalizes the extracted signal intensities to account for variations in illumination across the sampled image. For example, intensity values may be normalized such that the 5th and 95th percentiles have values of 0 and 1, respectively. The normalized signal intensities of the image (e.g., the normalized intensities of each channel) can be used to calculate the average purity of multiple spots in the image.

いくつかの実装形態では、ＲＴＡは、抽出された信号強度の信号対雑音比を最大化するために等化器を使用する。等化器は、配列決定画像におけるクラスタ強度データの信号対雑音比を最大化するように訓練され得る（例えば、最小二乗推定、適応等化アルゴリズムを使用して）。いくつかの実装形態では、等化器は、「等化器フィルタ」又は「畳み込みカーネル」とも呼ばれる、サブピクセル解像度を有する複数のルックアップテーブル（ＬＵＴ）を有するＬＵＴバンクである。一実装形態では、等化器内のＬＵＴの数は、配列決定画像のピクセルを分割することができるサブピクセルの数に依存する。例えば、ピクセルがｎ×ｎサブピクセル（例えば、５×５サブピクセル）に分割可能である場合、等化器は、ｎ２個のＬＵＴ（例えば、２５個のＬＵＴ）を生成する。 In some implementations, the RTA uses an equalizer to maximize the signal-to-noise ratio of the extracted signal intensities. The equalizer can be trained (e.g., using least squares estimation, adaptive equalization algorithms) to maximize the signal-to-noise ratio of the cluster intensity data in the sequencing image. In some implementations, the equalizer is a LUT bank with multiple look-up tables (LUTs) with sub-pixel resolution, also called "equalizer filters" or "convolution kernels." In one implementation, the number of LUTs in the equalizer depends on the number of sub-pixels into which a pixel of the sequencing image can be divided. For example, if a pixel is divisible into nxn sub-pixels (e.g., 5x5 sub-pixels), the equalizer generates n2 LUTs (e.g., 25 LUTs).

等化器を訓練する一実装形態では、配列決定画像からのデータは、ウェルサブピクセル位置によってビン化される。例えば、５×５ＬＵＴの場合、ウェルの１／２５は、ビン（１，１）（例えば、センサピクセルの左上隅）内にある中心を有し、ウェルの１／２５は、ビン（１，２）内にあり、以下同様である。一実装形態では、各ビンの等化器係数は、それぞれのビンに対応するウェルからのデータのサブセットに対して最小二乗推定を使用して決定される。このようにして、結果として得られる推定された等化器係数は、ビンごとに異なる。 In one implementation of training the equalizer, data from the sequencing images are binned by well subpixel location. For example, for a 5x5 LUT, 1/25 of the wells have centers that fall within bin (1,1) (e.g., the upper left corner of the sensor pixel), 1/25 of the wells fall within bin (1,2), and so on. In one implementation, the equalizer coefficients for each bin are determined using least squares estimation on the subset of data from the wells that correspond to the respective bin. In this way, the resulting estimated equalizer coefficients will vary from bin to bin.

各ＬＵＴ／等化器フィルタ／畳み込みカーネルは、訓練から学習された複数の係数を有する。一実装形態では、ＬＵＴ内の係数の数は、クラスタをベースコールするために使用されるピクセルの数に対応する。例えば、クラスタをベースコールするために使用されるピクセル（画像又はピクセルパッチ）の局所グリッドがサイズｐ×ｐ（例えば、９×９ピクセルパッチ）の場合、各ＬＵＴはｐ２の係数（例えば、８１の係数）を有する。 Each LUT/equalizer filter/convolution kernel has multiple coefficients learned from training. In one implementation, the number of coefficients in the LUT corresponds to the number of pixels used to base call the clusters. For example, if the local grid of pixels (image or pixel patch) used to base call the clusters is of size p×p (e.g., a 9×9 pixel patch), then each LUT has p2 coefficients (e.g., 81 coefficients).

一実装形態では、訓練は、信号対雑音比を最大化するように、ベースコールされるターゲットクラスタからの強度放射及び１以上の隣接クラスタからの強度放射を表すピクセルの強度値を混合／結合するように構成されている等化器係数を生成する。信号対雑音比において最大化された信号は、ターゲットクラスタからの強度放射であり、信号対雑音比において最小化された雑音は、隣接するクラスタからの強度放射、すなわち、空間クロストークにいくつかのランダム雑音を（例えば、背景強度放射を説明するために）加えたものである。等化器係数は重みとして使用され、混合／結合は、等化器係数とピクセルの強度値との間で要素ごとの乗算を実行して、ピクセルの強度値の加重和、すなわち、畳み込み演算を計算することを含む。 In one implementation, the training generates equalizer coefficients that are configured to mix/combine the intensity values of pixels representing intensity radiation from the base-called target cluster and intensity radiation from one or more neighboring clusters to maximize the signal-to-noise ratio. The signal that is maximized in the signal-to-noise ratio is the intensity radiation from the target cluster, and the noise that is minimized in the signal-to-noise ratio is the intensity radiation from the neighboring clusters, i.e., spatial crosstalk, plus some random noise (e.g., to account for background intensity radiation). The equalizer coefficients are used as weights, and the mixing/combining involves performing element-wise multiplications between the equalizer coefficients and the intensity values of the pixels to calculate a weighted sum of the intensity values of the pixels, i.e., a convolution operation.

次いで、ＲＴＡは、最適化された強度データに数学的モデルを適合させることによってベースコールを実行する。使用され得る適切な数学的モデルは、例えば、ｋ平均クラスタリングアルゴリズム、ｋ平均様クラスタリングアルゴリズム、期待値最大化クラスタリングアルゴリズム、ヒストグラムベースの方法などを含む。４つのガウス分布は、１つの分布がデータセットにおいて表される４つのヌクレオチドの各々に適用されるように、２チャネル強度データのセットに適合され得る。１つの特定の実装形態では、期待値最大化（ＥＭ）アルゴリズムが適用され得る。ＥＭアルゴリズムの結果として、各Ｘ、Ｙ値（それぞれ２つのチャネル強度の各々を参照する）について、或るＸ、Ｙ強度値が、データが適合される４つのガウス分布のうちの１つに属する尤度を表す値を生成することができる。４つの塩基が４つの別個の分布を与える場合、各Ｘ、Ｙ強度値はまた、４つの塩基の各々に対して１つずつ、４つの関連する尤度値を有する。４つの尤度値の最大値は、ベースコールを示す。例えば、クラスタが両方のチャネルにおいて「オフ」である場合、ベースコールはＧである。クラスタが１つのチャネルにおいて「オフ」であり、別のチャネルにおいて「オン」である場合、ベースコールは（どのチャネルがオンであるかに応じて）Ｃ又はＴのいずれかであり、クラスタが両方のチャネルにおいて「オン」である場合、ベースコールはＡである。 The RTA then performs base calling by fitting a mathematical model to the optimized intensity data. Suitable mathematical models that may be used include, for example, k-means clustering algorithms, k-means-like clustering algorithms, expectation-maximization clustering algorithms, histogram-based methods, and the like. Four Gaussian distributions may be fitted to the set of two-channel intensity data such that one distribution is applied to each of the four nucleotides represented in the data set. In one particular implementation, an expectation-maximization (EM) algorithm may be applied. As a result of the EM algorithm, for each X,Y value (each referring to each of the two channel intensities), a value may be generated that represents the likelihood that a certain X,Y intensity value belongs to one of the four Gaussian distributions to which the data is fitted. If the four bases give four separate distributions, then each X,Y intensity value also has four associated likelihood values, one for each of the four bases. The maximum of the four likelihood values indicates the base call. For example, if a cluster is "off" in both channels, the base call is G. If a cluster is "off" in one channel and "on" in another, the base call is either C or T (depending on which channel is on); if a cluster is "on" in both channels, the base call is A.

ＲＴＡについての更なる詳細は、２０１８年３月１日に出願された「ＯｐｔｉｃａｌＤｉｓｔｏｒｔｉｏｎＣｏｒｒｅｃｔｉｏｎＦｏｒＩｍａｇｅｄＳａｍｐｌｅｓ」と題する米国特許非仮出願第１５／９０９，４３７号、２０１４年１０月３１日に出願された「ＩｍａｇｅＡｎａｌｙｓｉｓＵｓｅｆｕｌｆｏｒＰａｔｔｅｒｎｅｄＯｂｊｅｃｔｓ」と題する米国特許非仮出願第１４／５３０，２９９号、２０１４年１２月３日に出願された「ＭｅｔｈｏｄｓａｎｄＳｙｓｔｅｍｓｆｏｒＡｎａｌｙｚｉｎｇＩｍａｇｅＤａｔａ」と題する米国特許非仮出願第１５／１５３，９５３号、２０１１年１月１３日に出願された「ＤａｔａＰｒｏｃｅｓｓｉｎｇＳｙｓｔｅｍａｎｄＭｅｔｈｏｄｓ」と題する米国特許非仮出願第１３／００６，２０６号、及び２０２１年５月４日に出願された「Ｅｑｕａｌｉｚａｔｉｏｎ－ＢａｓｅｄＩｍａｇｅＰｒｏｃｅｓｓｉｎｇａｎｄＳｐａｔｉａｌＣｒｏｓｓｔａｌｋＡｔｔｅｎｕａｔｏｒ」と題する米国特許非仮出願第１７／３０８，０３５号（代理人整理番号ＩＬＬＭ１０３２－２／ＩＰ－１９９１－ＵＳ）に見出すことができ、本明細書に完全に記載されているかのように参照により組み込まれる。 Further details regarding RTA can be found in U.S. Non-provisional Patent Application No. 15/909,437, filed March 1, 2018, entitled "Optical Distortion Correction For Image Samples," U.S. Non-provisional Patent Application No. 14/530,299, filed October 31, 2014, entitled "Image Analysis Useful for Patterned Objects," U.S. Non-provisional Patent Application No. 15/153,953, filed December 3, 2014, entitled "Methods and Systems for Analyzing Image Data," U.S. Non-provisional Patent Application No. 15/153,953, filed January 13, 2011, entitled "Data Analysis Useful for Patterned Objects," U.S. Non-provisional Patent Application No. 15/153,953, filed December 3, 2014, entitled "Data Analysis Useful for Patterned Objects ... December 3, 2014, entitled "Data Analysis Useful for Patterned Objects," U. No. 13/006,206, entitled "Equalization-Based Image Processing and Spatial Crosstalk Attenuator," filed May 4, 2021 (Attorney Docket No. ILLM1032-2/IP-1991-US), and are incorporated by reference as if fully set forth herein.

ニューラルネットワークベースの少なくとも部分的に非線形のベースコーラ（例えば、図６Ａの第１のベースコーラ１４１４）
図７～図１３Ｂは、図６Ａの第１のベースコーラ１４１４の様々な例を論じる。例えば、図７は、本明細書に記載のシステムを使用して実行することができるマルチサイクルニューラルネットワークモデルの図である。マルチサイクルニューラルネットワークモデルは、図６Ａの第１のベースコーラ１４１４の一例であるが、別のニューラルネットワークベースのモデルが第１のベースコーラ１４１４に使用されてもよい。 A neural network-based at least partially nonlinear base caller (e.g., the first base caller 1414 of FIG. 6A ).
Figures 7-13B discuss various examples of the first base caller 1414 of Figure 6A. For example, Figure 7 is an illustration of a multi-cycle neural network model that can be implemented using the systems described herein. The multi-cycle neural network model is one example of the first base caller 1414 of Figure 6A, although other neural network-based models may be used for the first base caller 1414.

図７に示される例は、５サイクル入力、１サイクル出力ニューラルネットワークと呼ばれ得る。しかしながら、５サイクル入力１サイクル出力のニューラルネットワークは一例に過ぎず、ニューラルネットワークは異なる数の入力（６、７、９、又は別の適切な数など）を有することができることに留意されたい。例えば、本明細書で後述する図１０は、９サイクル入力を有する。図７を再度参照すると、マルチサイクルニューラルネットワークモデルへの入力は、所与のタイルの５つの感知サイクルのタイルデータアレイからの、５つの空間的に位置合わせされたパッチ（例えば、７００）を含む。空間的に位置合わせされたパッチは、セット内の他のパッチと同じ位置合わせされた行及び列の寸法（ｘ、ｙ）を有し、その結果、情報は、配列サイクルにおけるタイル上の遺伝物質の同じクラスタに関連する。この例では、被験者パッチは、サイクルＫのタイルデータのアレイからのパッチである。５つの空間的に位置合わせされたパッチのセットは、２サイクルだけ被験者パッチに先行するサイクルＫ－２からのパッチと、１サイクルだけ被験者パッチに先行するサイクルＫ－１からのパッチと、１サイクルだけ被験者サイクルからパッチに後続するサイクルＫ＋１からのパッチと、２つのサイクルだけ被験者サイクルからパッチに後続するサイクルＫ＋２からのパッチと、を含む。 The example shown in FIG. 7 may be referred to as a 5-cycle input, 1-cycle output neural network. However, it should be noted that the 5-cycle input, 1-cycle output neural network is only one example, and the neural network may have a different number of inputs (such as 6, 7, 9, or another suitable number). For example, FIG. 10, described later in this specification, has 9-cycle inputs. Referring again to FIG. 7, the input to the multi-cycle neural network model includes five spatially aligned patches (e.g., 700) from the tile data array of five sensing cycles of a given tile. The spatially aligned patches have the same aligned row and column dimensions (x, y) as other patches in the set, such that the information relates to the same cluster of genetic material on the tile in the sequence cycle. In this example, the subject patch is a patch from the tile data array of cycle K. The set of five spatially aligned patches includes a patch from cycle K-2 that precedes the subject patch by two cycles, a patch from cycle K-1 that precedes the subject patch by one cycle, a patch from cycle K+1 that follows the patch from the subject cycle by one cycle, and a patch from cycle K+2 that follows the patch from the subject cycle by two cycles.

モデルは、入力パッチの各々に対して、ニューラルネットワークの層の分離されたスタック７０１を含む。したがって、スタック７０１は、サイクルＫ＋２からのパッチのタイルデータを入力として受信し、それらが入力データ又は中間データを共有しないようにスタック７０２、７０３、７０４、及び７０５から分離される。いくつかの実施形態では、スタック７１０～７０５の全ては、同一のモデル、及び同一の訓練されたパラメータを有することができる。他の実施形態では、モデル及び訓練されたパラメータは、異なるスタックにおいて異なり得る。スタック７０２は、サイクルＫ＋１からのパッチのタイルデータを入力として受信する。スタック７０３は、サイクルＫからのパッチのタイルデータを入力として受信する。スタック７０４は、サイクルＫ－１からのパッチのタイルデータを入力として受信する。スタック７０５は、サイクルＫ－２からのパッチのタイルデータを入力として受信する。分離されたスタックの層は各々、層の入力データにわたって複数のフィルタを含むカーネルの畳み込み動作を実行する。上記の例のように、パッチ７００は、３つの特徴を含み得る。層７１０の出力は、１０～２０個の特徴など、より多くの特徴を含み得る。同様に、層７１１～７１６の各々の出力は、特定の実装形態に好適な任意の数の特徴を含むことができる。フィルタのパラメータは、重み及びバイアスなど、ニューラルネットワークの訓練されたパラメータである。スタック７０１～７０５の各々からの出力特徴セット（中間データ）は、複数のサイクルからの中間データが組み合わされる時間的組み合わせ層の逆階層７２０への入力として提供される。例示される例では、逆階層７２０は、分離されたスタックのうちの３つから中間データをそれぞれ受信する、３つの組み合わせ層７２１、７２２、７２３を含む第１の層と、３つの時間層７２１、７２２、７２３から中間データを受信する、１つの組み合わせ層７３０を含む最終層と、を含む。 The model includes a separate stack 701 of layers of a neural network for each of the input patches. Thus, stack 701 receives as input patch tile data from cycle K+2 and is separate from stacks 702, 703, 704, and 705 so that they do not share input data or intermediate data. In some embodiments, all of stacks 710-705 can have the same model and the same trained parameters. In other embodiments, the models and trained parameters can be different in different stacks. Stack 702 receives as input patch tile data from cycle K+1. Stack 703 receives as input patch tile data from cycle K. Stack 704 receives as input patch tile data from cycle K-1. Stack 705 receives as input patch tile data from cycle K-2. Each layer of the separate stacks performs a convolution operation of a kernel that includes multiple filters over the input data of the layer. As in the above example, patch 700 can include three features. The output of layer 710 may include many more features, such as 10-20 features. Similarly, the output of each of layers 711-716 may include any number of features suitable for a particular implementation. The parameters of the filters are the trained parameters of the neural network, such as weights and biases. The output feature sets (intermediate data) from each of stacks 701-705 are provided as inputs to an inverse layer 720 of temporal combination layers, where the intermediate data from multiple cycles are combined. In the illustrated example, the inverse layer 720 includes a first layer including three combination layers 721, 722, 723 that respectively receive intermediate data from three of the separated stacks, and a final layer including one combination layer 730 that receives intermediate data from the three temporal layers 721, 722, 723.

最終組み合わせ層７３０の出力は、サイクルＫからタイルの対応するパッチに位置するクラスタの分類データの出力パッチである。出力パッチは、サイクルＫのタイルの出力アレイ分類データに組み立てることができる。いくつかの実施形態では、出力パッチは、入力パッチとは異なるサイズ及び寸法を有し得る。いくつかの実施形態では、出力パッチは、クラスタデータを選択するためにホストによってフィルタリングされ得るピクセルごとのデータを含み得る。 The output of the final combination layer 730 is an output patch of classification data for the clusters located in the corresponding patch of the tile from cycle K. The output patches can be assembled into an output array of classification data for the tile of cycle K. In some embodiments, the output patch can have different sizes and dimensions than the input patch. In some embodiments, the output patch can include per-pixel data that can be filtered by the host to select the cluster data.

次いで、出力分類データ７３５を、特定の実装形態に応じて、ホストによって、又は構成可能なプロセッサ上で任意選択的に実行されるソフトマックス関数７４０（又は他の出力起動機能）に適用することができる。ソフトマックスとは異なる出力関数を使用することができる（例えば、最大出力に従ってベースコール出力パラメータを作製し、次いで、コンテキスト／ネットワーク出力を使用して学習された非線形マッピングを使用して、ベース品質を与える）。 The output classification data 735 can then be applied to a softmax function 740 (or other output-driven function) optionally executed by the host or on a configurable processor, depending on the particular implementation. Output functions different from softmax can be used (e.g., creating base call output parameters according to the maximum output, then using a nonlinear mapping learned using the context/network output to give the base quality).

最後に、ソフトマックス関数７４０の出力は、サイクルＫのベースコール確率（７５０）として提供され、その後の処理で使用されるホストメモリに記憶され得る。他のシステムは、出力確率計算のために別の関数、例えば、別の非線形モデルを使用することができる。 Finally, the output of the softmax function 740 may be provided as the base call probability for cycle K (750) and stored in host memory for use in subsequent processing. Other systems may use a different function, e.g., a different nonlinear model, for the output probability calculation.

ニューラルネットワークは、複数の実行クラスタを有する構成可能なプロセッサを使用して実装して、１つの感知サイクルの時間間隔の持続時間内に、又は時間間隔の持続時間の近くで１つのタイルサイクルの評価を完了し、リアルタイムで出力データを効果的に出力することができる。データフロー論理は、タイルデータ及び訓練されたパラメータの入力ユニットを実行クラスタに分配するように、かつメモリでのアグリゲーションのために出力パッチを分配するように構成することができる。 The neural network can be implemented using a configurable processor with multiple execution clusters to complete the evaluation of one tile cycle within or near the duration of the time interval of one sensing cycle, effectively outputting output data in real time. Dataflow logic can be configured to distribute input units of tile data and trained parameters to the execution clusters, and to distribute output patches for aggregation in memory.

図７のものと同様の５サイクル入力、１サイクル出力ニューラルネットワークのデータの入力ユニットは、２チャネルセンサデータを使用したベースコール動作について図８Ａ及び図８Ｂを参照しながら説明される。例えば、遺伝的配列における所与の塩基について、ベースコール動作は、検体の２つの流れ及び２つの反応を実行することができ、これは、画像などの信号の２つのチャネルを発生させ、これは、遺伝物質の各クラスタについて遺伝的配列の現在の位置に４つの塩基のうちのどの１つが位置するかを識別するように処理され得る。他のシステムでは、感知データの異なる数のチャネルが利用され得る。例えば、ベースコールは、１チャネル方法及びシステムを利用して実行することができる。米国特許出願公開第２０１３／００７９２３２号の組み込まれた資料は、１チャネル、２チャネル、又は４チャネルなど、様々な数のチャネルを使用してベースコールを論じている。 The input unit of data for a 5-cycle input, 1-cycle output neural network similar to that of FIG. 7 is described with reference to FIGS. 8A and 8B for a base calling operation using 2-channel sensor data. For example, for a given base in a genetic sequence, the base calling operation can perform two flows of analyte and two reactions, which generate two channels of signals, such as images, that can be processed to identify which one of the four bases is located at the current position of the genetic sequence for each cluster of genetic material. In other systems, a different number of channels of sensor data can be utilized. For example, base calling can be performed using a 1-channel method and system. The incorporated materials in U.S. Patent Application Publication No. 2013/0079232 discuss base calling using various numbers of channels, such as 1-channel, 2-channel, or 4-channel.

図８Ａは、５サイクル入力、１サイクル出力ニューラルネットワークを実行する目的で使用される、所与のタイル、タイルＭのための５つのサイクルのタイルデータのアレイを示す。この実施例における５サイクル入力タイルデータは、データフロー論理によってアクセスされ得るシステム内のオンボードＤＲＡＭ又は他のメモリに書き込まれることができ、サイクルＫ－２のために、チャネル１のアレイ８０１及びチャネル２のアレイ８１１を含み、サイクルＫ－１のために、チャネル１のアレイ８０２及びチャネル２のアレイ８１２を含み、サイクルＫのために、チャネル１のアレイ８０３及びチャネル２のアレイ８１３を含み、サイクルＫ＋１のために、チャネル１のアレイ８０４及びチャネル２のアレイ８１４を含み、サイクルＫ＋２のために、チャネル１のアレイ８０５及びチャネル２のアレイ８１５を含む。また、タイルのメタデータのアレイ８２０は、メモリに１回書き込むことができ、この場合、各サイクルとともにニューラルネットワークへの入力として使用するために含まれるＤＦＣファイルが含まれる。 Figure 8A shows an array of five cycles of tile data for a given tile, Tile M, used to implement a five cycle input, one cycle output neural network. The five cycle input tile data in this example can be written to on-board DRAM or other memory in the system that can be accessed by the dataflow logic, and includes channel 1 array 801 and channel 2 array 811 for cycle K-2, channel 1 array 802 and channel 2 array 812 for cycle K-1, channel 1 array 803 and channel 2 array 813 for cycle K, channel 1 array 804 and channel 2 array 814 for cycle K+1, and channel 1 array 805 and channel 2 array 815 for cycle K+2. Also, the tile metadata array 820 can be written once to memory, with the DFC file included for use as input to the neural network with each cycle.

図８Ａは２チャネルベースコール動作を論じているが、２つのチャネルを使用することは単なる例であり、ベースコールは、任意の他の適切な数のチャネルを使用して実行することができる。例えば、米国特許出願公開第２０１３／００７９２３２号の組み込まれた資料は、１チャネル、２チャネル、又は４チャネル、又は別の適切な数のチャネルなど、様々な数のチャネルを使用してベースコールを論じている。 Although FIG. 8A discusses a two-channel base calling operation, the use of two channels is merely an example, and base calling can be performed using any other suitable number of channels. For example, the incorporated materials in U.S. Patent Application Publication No. 2013/0079232 discuss base calling using various numbers of channels, such as one channel, two channels, or four channels, or another suitable number of channels.

データフロー論理は、入力パッチ上でニューラルネットワークの実行を実行するように構成された各実行クラスタについてタイルデータのアレイの空間的に位置合わせされたパッチを含むタイルデータの、図８Ｂを参照して理解され得る入力ユニットを構成する。割り当てられた実行クラスタの入力ユニットは、５つの入力サイクルのためのタイルデータのアレイ８０１～８０５、８１１、８１５、８２０の各々からの空間的に位置合わせされたパッチ（例えば、８５１、８５２、８６１、８６２、８７０）を読み取り、それらを、データ経路（概略的には８５０）を介して、割り当てられた実行クラスタが使用するために構成された構成可能なプロセッサ上のメモリに送達することによって、データフロー論理によって構成される。割り当てられた実行クラスタは、５サイクル入力／１サイクル出力ニューラルネットワークの実行を実行し、被験者サイクルＫのタイルの同じパッチについて分類データの被験者サイクルＫの出力パッチを送達する。 The dataflow logic configures an input unit, which may be understood with reference to FIG. 8B, of tile data including spatially aligned patches of arrays of tile data for each execution cluster configured to perform a neural network execution on the input patches. The input unit of the assigned execution cluster is configured by the dataflow logic to read spatially aligned patches (e.g., 851, 852, 861, 862, 870) from each of the arrays of tile data 801-805, 811, 815, 820 for five input cycles and deliver them via a datapath (schematically 850) to memory on a configurable processor configured for use by the assigned execution cluster. The assigned execution cluster performs a 5 cycle input/1 cycle output neural network execution and delivers a subject cycle K output patch of classification data for the same patch of tiles for subject cycle K.

図９は、図７のもの（例えば、７０１及び７２０）のようなシステムで使用可能なニューラルネットワークのスタックの簡略化された表現である。この例では、ニューラルネットワークのいくつかの機能（例えば、９００、９０２）は、ホスト上で実行され、ニューラルネットワークの他の部分（例えば、９０１）は、構成可能なプロセッサ上で実行される。 Figure 9 is a simplified representation of a neural network stack that can be used in a system like that of Figure 7 (e.g., 701 and 720). In this example, some functions of the neural network (e.g., 900, 902) run on the host and other parts of the neural network (e.g., 901) run on a configurable processor.

一例では、第１の機能は、ＣＰＵ上に形成されたバッチ正規化（層９１０）であり得る。しかしながら、別の例では、機能としてのバッチ正規化は、１つ以上の層に融合されてもよく、別個のバッチ正規化層は存在しなくてもよい。 In one example, the first function may be batch normalization (layer 910) formed on the CPU. However, in another example, batch normalization as a function may be blended into one or more layers, and there may not be a separate batch normalization layer.

いくつかの空間的な分離された畳み込み層は、構成可能なプロセッサについて上記で論じられたように、ニューラルネットワークの畳み込み層の第１のセットとして実行される。この例では、畳み込み層の第１のセットは、空間的に２Ｄ畳み込みを適用する。 Several spatially separated convolutional layers are implemented as the first set of convolutional layers of the neural network, as discussed above for the configurable processor. In this example, the first set of convolutional layers applies spatially 2D convolutions.

図９に示されるように、各スタック内の空間的に分離されたニューラルネットワーク層の数Ｌ／２に対して（Ｌは図７を参照しながら説明された）、第１の空間畳み込み９２１が実行され、続いて第２の空間畳み込み９２２が実行され、続いて第３の空間畳み込み９２３が実行され、以下同様である。９２３Ａに示されるように、空間層の数は、任意の実際的な数であり得、これは、コンテキストにおいて、異なる実施形態では、数個～２０超の範囲であり得る。 As shown in FIG. 9, for a number L/2 of spatially separated neural network layers in each stack (where L was described with reference to FIG. 7), a first spatial convolution 921 is performed, followed by a second spatial convolution 922, followed by a third spatial convolution 923, and so on. As shown in 923A, the number of spatial layers can be any practical number, which in context can range from a few to more than 20 in different embodiments.

ＳＰ＿ＣＯＮＶ＿０の場合、カーネル重みは、この層に３つの入力チャネルがあるため、例えば（１、６、６、３、Ｌ）構造で記憶される。 For SP_CONV_0, the kernel weights are stored in, for example, a (1, 6, 6, 3, L) structure since this layer has three input channels.

他のＳＰ＿ＣＯＮＶ層の場合、カーネル重みは、これらの層の各々についてＫ（＝Ｌ）個の入力及び出力があるため、この実施例では（１、６、６Ｌ）構造で記憶される。 For the other SP_CONV layers, the kernel weights are stored in a (1, 6, 6L) structure in this example since there are K(=L) inputs and outputs for each of these layers.

空間層のスタックの出力は、ＦＰＧＡ上で実行される畳み込み層９２４、９２５を含めて、時間層に提供される。層９２４及び９２５は、サイクルにわたって１Ｄ畳み込みを適用する畳み込み層であり得る。９２４Ａに示されるように、時間層の数は、任意の実際的な数であり得、これは、コンテキストにおいて、異なる実施形態では、数個～２０超の範囲であり得る。 The output of the stack of spatial layers is provided to a temporal layer, including convolution layers 924, 925 running on an FPGA. Layers 924 and 925 may be convolution layers that apply 1D convolution over cycles. As shown in 924A, the number of temporal layers may be any practical number, which in context may range from a few to more than 20 in different embodiments.

第１の時間層、ＴＥＭＰ＿ＣＯＮＶ＿０層９２４は、図７に示すように、サイクルチャネルの数を５から３に減少させる。第２の時間層、層９２５は、図７に示すようにサイクルチャネルの数を３から１に減少させ、特徴マップの数を、各ベースコールの信頼性を表すピクセルごとの４つの出力に減少させる。 The first temporal layer, TEMP_CONV_0 layer 924, reduces the number of cycle channels from 5 to 3, as shown in FIG. 7. The second temporal layer, layer 925, reduces the number of cycle channels from 3 to 1, as shown in FIG. 7, and reduces the number of feature maps to 4 outputs per pixel representing the confidence of each base call.

時間層の出力は、出力パッチに蓄積され、ホストＣＰＵに送達されて、例えば、ソフトマックス関数９３０、又は他の関数を適用して、ベースコール確率を正規化する。 The output of the temporal layer is stored in an output patch and delivered to the host CPU, where, for example, a softmax function 930, or other function, is applied to normalize the base call probabilities.

図１０は、ベースコール動作のために実行することができる１０入力、６出力ニューラルネットワークを示す代替の実装形態を示す。この例では、サイクル０～９の空間的に位置合わせされた入力パッチのタイルデータは、サイクル９のスタック１００１など、空間層の分離されたスタックに適用される。分離されたスタックの出力は、時間スタック１０２０の逆階層配置に適用され、出力１０３５（２）～１０３５（７）は、被験者サイクル２～７のベースコール分類データを提供する。 Figure 10 shows an alternative implementation illustrating a 10-input, 6-output neural network that can be implemented for base calling operations. In this example, the spatially aligned input patch tile data for cycles 0-9 are applied to a separated stack in the spatial layer, such as stack 1001 for cycle 9. The output of the separated stack is applied to the inverted hierarchical arrangement of the time stack 1020, and outputs 1035(2)-1035(7) provide the base call classification data for subject cycles 2-7.

図１１は、異なる配列決定サイクルでデータの処理を分離するために使用されるニューラルネットワークベースのベースコーラの専用アーキテクチャ（例えば、図７）の一実装形態を示す。上記の専用アーキテクチャを使用する動機をまず説明する。 Figure 11 shows one implementation of a dedicated architecture (e.g., Figure 7) of a neural network-based base caller used to separate the processing of data in different sequencing cycles. The motivation for using such a dedicated architecture is first explained.

ニューラルネットワークベースのベースコーラは、現在の配列決定サイクル、１つ以上の先行する配列決定サイクル、及び１つ以上の連続する配列決定サイクルでデータを処理する。追加の配列決定サイクルのデータは、配列固有のコンテキストを提供する。訓練中に、ニューラルネットワークベースのベースコーラは、ベースコール精度を改善するために配列固有のコンテキストを使用することを学習する。更に、事前及び事後配列決定サイクルのデータは、プレフェージング及びフェージング信号の二次の寄与を現在の配列決定サイクルに提供する。 The neural network-based base caller processes data from the current sequencing cycle, one or more preceding sequencing cycles, and one or more successive sequencing cycles. Data from additional sequencing cycles provides sequence-specific context. During training, the neural network-based base caller learns to use sequence-specific context to improve base calling accuracy. Additionally, data from pre- and post-sequencing cycles provide secondary contributions of pre-phasing and phasing signals to the current sequencing cycle.

空間畳み込み層は、畳み込みの「専用の非共有」配列を介して複数の配列決定サイクルの各々に対して独立してデータを処理することによって分離を操作する、いわゆる「分離された畳み込み」を使用する。分離された畳み込みは、任意の他の配列決定サイクルのデータ及び得られた特徴マップ上で畳み込むことなく、所与の配列決定サイクル、すなわち、サイクル内のみのデータ及び得られた特徴マップ上で畳み込む。 The spatial convolutional layer uses so-called "decoupled convolutions" that operate on separation by processing the data for each of multiple sequencing cycles independently through a "dedicated, non-shared" array of convolutions. Decoupled convolutions convolve on the data and resulting feature maps only within a given sequencing cycle, i.e., the cycle, without convolving on the data and resulting feature maps of any other sequencing cycles.

例えば、入力データが、（ｉ）ベースコールされる現在の（時間ｔ）配列決定サイクルに対する現在のデータと、（ｉｉ）以前の（時間ｔ－１）配列決定サイクルに対する以前のデータと、（ｉｉｉ）次の（時間ｔ＋１）配列決定サイクルに対する次のデータと、を含むと考える。次いで、専用アーキテクチャは、３つの別個のデータ処理パイプライン（又は畳み込みパイプライン）、すなわち、現在のデータ処理パイプライン、以前のデータ処理パイプライン、及び次のデータ処理パイプラインを開始する。現在のデータ処理パイプラインは、現在の（時間ｔ）配列決定サイクルに対する現在のデータを入力として受信し、複数の空間畳み込み層を介して独立してそれを処理して、最終空間畳み込み層の出力としていわゆる「現在の空間畳み込み表現」を生成する。以前のデータ処理パイプラインは、以前の（時間ｔ－１）配列決定サイクルに対する以前のデータを入力として受信し、複数の空間畳み込み層を介して独立してそれを処理して、最終空間畳み込み層の出力としていわゆる「以前の空間畳み込み表現」を生成する。次のデータ処理パイプラインは、次の（時間ｔ＋１）配列決定サイクルに対する次のデータを入力として受信し、複数の空間畳み込み層を介して独立してそれを処理して、最終空間畳み込み層の出力としていわゆる「次の空間畳み込み表現」を生成する。 For example, consider that the input data includes (i) current data for the current (time t) sequencing cycle to be base called, (ii) previous data for the previous (time t-1) sequencing cycle, and (iii) next data for the next (time t+1) sequencing cycle. The dedicated architecture then initiates three separate data processing pipelines (or convolution pipelines), namely, the current data processing pipeline, the previous data processing pipeline, and the next data processing pipeline. The current data processing pipeline receives the current data for the current (time t) sequencing cycle as input and processes it independently through multiple spatial convolution layers to generate a so-called "current spatial convolution representation" as the output of the final spatial convolution layer. The previous data processing pipeline receives the previous data for the previous (time t-1) sequencing cycle as input and processes it independently through multiple spatial convolution layers to generate a so-called "previous spatial convolution representation" as the output of the final spatial convolution layer. The next data processing pipeline receives the next data for the next (time t+1) sequencing cycle as input and processes it independently through multiple spatial convolution layers to produce the so-called "next spatially convolved representation" as the output of the final spatial convolution layer.

いくつかの実装形態では、現在のパイプライン、１つ以上の以前のパイプライン、及び１つ以上の次の処理パイプラインは、並列に実行される。 In some implementations, the current pipeline, one or more previous pipelines, and one or more next processing pipelines execute in parallel.

いくつかの実装形態では、空間畳み込み層は、専用アーキテクチャ内の空間畳み込みネットワーク（又はサブネットワーク）の一部である。 In some implementations, the spatial convolutional layer is part of a spatial convolutional network (or sub-network) within a dedicated architecture.

ニューラルネットワークベースのベースコーラは、配列決定サイクル間、すなわち、サイクル間で情報を混合する時間畳み込み層を更に含む。時間畳み込み層は、空間畳み込みネットワークからそれらの入力を受信し、それぞれのデータ処理パイプラインに対して最終空間畳み込み層によって生成される空間畳み込み表現で動作する。 The neural network-based base caller further includes temporal convolutional layers that blend information between sequencing cycles, i.e., between cycles. The temporal convolutional layers receive their input from the spatial convolutional network and operate on the spatial convolutional representations produced by the final spatial convolutional layer for each data processing pipeline.

時間畳み込み層は、スライディングウィンドウベースでの後続の入力で入力チャネル上でグループごとに畳み込む、いわゆる「組み合わせ畳み込み」を使用する。一実装形態では、後続の入力は、以前の空間畳み込み層又は以前の時間畳み込み層によって生成される後続の出力である。 Temporal convolutional layers use so-called "combinational convolution" that convolves group-wise on the input channels with the subsequent input on a sliding window basis. In one implementation, the subsequent input is the subsequent output generated by the previous spatial convolutional layer or the previous temporal convolutional layer.

いくつかの実装形態では、時間畳み込み層は、専用アーキテクチャ内の時間畳み込みネットワーク（又はサブネットワーク）の一部である。時間畳み込みネットワークは、空間畳み込みネットワークからその入力を受信する。一実装形態では、時間畳み込みネットワークの第１の時間畳み込み層は、配列決定サイクル間の空間畳み込み表現をグループごとに組み合わせる。別の実装形態では、時間畳み込みネットワークの後続の時間畳み込み層は、以前の時間畳み込み層の連続する出力を組み合わせる。一例では、圧縮ロジック（又は圧縮ネットワーク若しくは圧縮サブネットワーク若しくは圧縮層若しくはスクイーズ層）は、時間及び／又は空間畳み込みネットワークの出力を処理し、出力の圧縮表現を生成する。一実装形態では、圧縮ネットワークは、ネットワークによって生成された特徴マップの深さ次元数を低減する圧縮畳み込み層を含む。 In some implementations, the temporal convolutional layer is part of a temporal convolutional network (or sub-network) in a dedicated architecture. The temporal convolutional network receives its input from a spatial convolutional network. In one implementation, the first temporal convolutional layer of the temporal convolutional network combines the spatial convolutional representations between sequencing cycles by group. In another implementation, subsequent temporal convolutional layers of the temporal convolutional network combine successive outputs of previous temporal convolutional layers. In one example, a compression logic (or compression network or compression sub-network or compression layer or squeeze layer) processes the output of the temporal and/or spatial convolutional network and generates a compressed representation of the output. In one implementation, the compression network includes a compression convolutional layer that reduces the depth dimensionality of the feature maps generated by the network.

最終時間畳み込み層の出力（例えば、圧縮の有無にかかわらず）は、出力を生成する出力層に供給される。出力は、１つ以上の配列決定サイクルで１つ以上のクラスタをベースコールするために使用される。 The output of the final temporal convolutional layer (e.g., compressed or not) is fed into an output layer that produces outputs. The outputs are used to base call one or more clusters in one or more sequencing cycles.

前方伝搬の間、専用アーキテクチャは、２つの段階で複数の入力からの情報を処理する。第１の段階では、分離された畳み込みは、入力間の情報の混合を防止するために使用される。第２の段階では、組み合わせ畳み込みは、
入力間の情報を混合するために使用される。第２の段階からの結果は、複数の入力に対して単一の推論を行うために使用される。 During forward propagation, the dedicated architecture processes information from multiple inputs in two stages. In the first stage, separated convolutions are used to prevent mixing of information between the inputs. In the second stage, combined convolutions are used to
It is used to blend information between inputs. The results from the second stage are used to make a single inference on multiple inputs.

これは、バッチモード技術とは異なり、畳み込み層は、バッチ内の複数の入力を同時に処理し、バッチ内の各入力に対して対応する推測を行う。対照的に、専用アーキテクチャは、複数の入力を単一の推論にマッピングする。単一の推論は、４つの塩基（Ａ、Ｃ、Ｔ、及びＧ）の各々に対する分類スコアなどの２つ以上の予測を含み得る。 This differs from batch-mode techniques, where the convolutional layer processes multiple inputs in a batch simultaneously and makes a corresponding inference for each input in the batch. In contrast, dedicated architectures map multiple inputs to a single inference. A single inference may include two or more predictions, such as a classification score for each of the four bases (A, C, T, and G).

一実装形態では、入力は、各入力が異なる時間ステップで発生し、かつ複数の入力チャネルを有するように、時間的順序付けを有する。例えば、複数の入力は、以下の３つの入力、すなわち、時間ステップ（ｔ）で現在の配列決定サイクルによって発生する現在の入力と、時間ステップ（ｔ－１）で以前の配列決定サイクルによって発生する以前の入力と、時間ステップ（ｔ＋１）で次の配列決定サイクルによって発生する次の入力と、を含み得る。別の実装形態では、各入力は、１つ以上の以前の畳み込み層によって現在の、以前の、及び次の入力からそれぞれ導出され、ｋ個の特徴マップを含む。 In one implementation, the inputs have a temporal ordering such that each input occurs at a different time step and has multiple input channels. For example, the multiple inputs may include three inputs: a current input generated by a current sequencing cycle at time step (t), a previous input generated by a previous sequencing cycle at time step (t-1), and a next input generated by a next sequencing cycle at time step (t+1). In another implementation, each input is derived from the current, previous, and next inputs by one or more previous convolutional layers, respectively, and includes k feature maps.

一実装形態では、各入力は、以下の５つの入力チャネル、すなわち、赤色画像チャネル（赤色）と、赤色距離チャネル（黄色）と、緑色画像チャネル（緑色）と、緑色距離チャネル（紫色）と、スケーリングチャネル（青色）と、を含み得る。別の実装形態では、各入力は、赤色チャネル及び緑色チャネルの代わりに、又はそれに加えて、青色チャネル及び紫色チャネル（又は１つ以上の他の適切な色チャネル）内にあり得る。別の実装形態では、各入力は、赤色チャネル、緑色チャネル、紫色チャネル、及び／又は黄色チャネルの代わりに、又はそれに加えて、青色チャネル及び紫色チャネル内にあり得る。別の実装形態では、各入力は、以前の畳み込み層によって生成されるｋ個の特徴マップを含み得、各特徴マップは、入力チャネルとして処理される。更に別の例では、各入力は、単に１つのチャネル、２つのチャネル、又は別の異なる数のチャネルを有することができる。米国特許出願公開第２０１３／００７９２３２号の組み込まれた資料は、１チャネル、２チャネル、又は４チャネルなど、様々な数のチャネルを使用してベースコールを論じている。 In one implementation, each input may include five input channels: a red image channel (red), a red distance channel (yellow), a green image channel (green), a green distance channel (purple), and a scaling channel (blue). In another implementation, each input may be in the blue and purple channels (or one or more other suitable color channels) instead of or in addition to the red and green channels. In another implementation, each input may be in the blue and purple channels instead of or in addition to the red, green, purple, and/or yellow channels. In another implementation, each input may include k feature maps generated by a previous convolutional layer, with each feature map being treated as an input channel. In yet another example, each input may have just one channel, two channels, or another different number of channels. The incorporated materials in U.S. Patent Application Publication No. 2013/0079232 discuss base calling using various numbers of channels, such as one channel, two channels, or four channels.

図１２は、各々が畳み込みを含み得る、分離された層の一実装形態を示す。分離された畳み込みは、畳み込みフィルタを各入力に並行して適用することによって、複数の入力を一度に処理する。分離された畳み込みでは、畳み込みフィルタは、同じ入力内で入力チャネルを組み合わせ、異なる入力内で入力チャネルを組み合わせない。一実装形態では、同じ畳み込みフィルタは、各入力に並行して適用される。別の実装形態では、異なる畳み込みフィルタは、各入力に並行して適用される。いくつかの実装形態では、各空間畳み込み層は、ｋ個の畳み込みフィルタのバンクを含み、その各々は、各入力に並行して適用される。 Figure 12 shows one implementation of separated layers, each of which may include a convolution. Separate convolution processes multiple inputs at once by applying a convolution filter to each input in parallel. In separated convolution, the convolution filter combines input channels within the same input and does not combine input channels within different inputs. In one implementation, the same convolution filter is applied to each input in parallel. In another implementation, a different convolution filter is applied to each input in parallel. In some implementations, each spatial convolution layer includes a bank of k convolution filters, each of which is applied to each input in parallel.

図１３Ａは、各々が畳み込みを含み得る、組み合わせ層の一実装形態を示す。図１３Ｂは、各々が畳み込みを含み得る、組み合わせ層の別の実装形態を示す。組み合わせ畳み込みは、異なる入力の対応する入力チャネルをグループ化し、畳み込みフィルタを各グループに適用することによって、異なる入力間で情報を混合する。対応する入力チャネルのグループ化及び畳み込みフィルタの適用は、スライディングウィンドウベースで生じる。このコンテキストでは、ウィンドウは、例えば、２つの連続する配列決定サイクルに対する出力を表す、２つ以上の連続する入力チャネルに及ぶ。ウィンドウがスライドウィンドウであるため、最も多くの入力チャネルは、２つ以上のウィンドウで使用される。 Figure 13A shows one implementation of a combination layer, each of which may include a convolution. Figure 13B shows another implementation of a combination layer, each of which may include a convolution. A combination convolution mixes information between different inputs by grouping corresponding input channels of the different inputs and applying a convolution filter to each group. The grouping of corresponding input channels and the application of the convolution filter occurs on a sliding window basis. In this context, a window spans two or more consecutive input channels, e.g., representing the output for two consecutive sequencing cycles. Because the window is a sliding window, most input channels are used in more than one window.

いくつかの実装形態では、異なる入力は、先行する空間又は時間畳み込み層によって生成される出力配列から生じる。出力配列では、異なる入力は、連続する出力として配置され、したがって、連続する入力として次の時間畳み込み層によって観察される。次いで、次の時間畳み込み層では、組み合わせ畳み込みは、連続する入力内の対応する入力チャネルのグループに畳み込みフィルタを適用する。 In some implementations, the distinct inputs come from an output array generated by a preceding spatial or temporal convolutional layer. In the output array, the distinct inputs are arranged as successive outputs and are therefore observed by the next temporal convolutional layer as successive inputs. Then, in the next temporal convolutional layer, a combinatorial convolution applies a convolutional filter to groups of corresponding input channels in the successive inputs.

一実装形態では、連続する入力は、現在の入力が時間ステップ（ｔ）で現在の配列決定サイクルによって発生し、以前の入力が時間ステップ（ｔ－１）で以前の配列決定サイクルによって発生し、次の入力が時間ステップ（ｔ＋１）で次の配列決定サイクルによって発生するように、時間的順序付けを有する。別の実装形態では、各連続する入力は、１つ以上の以前の畳み込み層によって現在の、以前の、及び次の入力からそれぞれ導出され、ｋ個の特徴マップを含む。 In one implementation, the successive inputs have a temporal ordering such that the current input is generated by the current sequencing cycle at time step (t), the previous input is generated by the previous sequencing cycle at time step (t-1), and the next input is generated by the next sequencing cycle at time step (t+1). In another implementation, each successive input is derived from the current, previous, and next inputs by one or more previous convolutional layers, respectively, and includes k feature maps.

一実装形態では、各入力は、以下の５つの入力チャネル、すなわち、赤色画像チャネル（赤色）と、赤色距離チャネル（黄色）と、緑色画像チャネル（緑色）と、緑色距離チャネル（紫色）と、スケーリングチャネル（青色）と、を含み得る。別の実装形態では、追加の入力チャネルは紫色チャネルであり得る。別の実装形態では、各入力は、以前の畳み込み層によって生成されるｋ個の特徴マップを含み得、各特徴マップは、入力チャネルとして処理される。 In one implementation, each input may include five input channels: a red image channel (red), a red distance channel (yellow), a green image channel (green), a green distance channel (purple), and a scaling channel (blue). In another implementation, an additional input channel may be a purple channel. In another implementation, each input may include k feature maps generated by a previous convolutional layer, with each feature map treated as an input channel.

畳み込みフィルタの深さＢは、対応する入力チャネルがスライディングウィンドウベースで畳み込みフィルタによってグループごとに畳み込まれる、連続する入力の数に依存する。言い換えると、深さＢは、各スライディングウィンドウ及びグループサイズ内の連続する入力の数と等しい。 The depth B of the convolution filter depends on the number of consecutive inputs whose corresponding input channels are group-wise convolved by the convolution filter on a sliding window basis. In other words, the depth B is equal to the number of consecutive inputs in each sliding window and the group size.

図１３Ａでは、各スライディングウィンドウ内で２つの継続的な入力からの対応する入力チャネルが組み合わされており、したがって、Ｂ＝２である。図１３Ｂでは、３つの連続する入力からの対応する入力チャネルは、各スライディングウィンドウ内で組み合わされ、したがってＢ＝３である。 In FIG. 13A, corresponding input channels from two consecutive inputs are combined in each sliding window, so B=2. In FIG. 13B, corresponding input channels from three consecutive inputs are combined in each sliding window, so B=3.

一実装形態では、スライディングウィンドウは、同じ畳み込みフィルタを共有する。別の実装形態では、異なる畳み込みフィルタが、各スライディングウィンドウに対して使用される。いくつかの実装形態では、各時間畳み込み層は、ｋ個の畳み込みフィルタのバンクを含み、その各々は、スライディングウィンドウベースの連続する入力に適用される。 In one implementation, the sliding windows share the same convolutional filter. In another implementation, a different convolutional filter is used for each sliding window. In some implementations, each temporal convolutional layer includes a bank of k convolutional filters, each of which is applied to successive inputs on a sliding window basis.

図４～図１０の更なる詳細及びその変形形態は、本明細書に完全に記載されているかのように参照により組み込まれる、２０２１年２月１５日に出願された「ＨａｒｄｗａｒｅＥｘｅｃｕｔｉｏｎａｎｄＡｃｃｅｌｅｒａｔｉｏｎｏｆＡｒｔｉｆｉｃｉａｌＩｎｔｅｌｌｉｇｅｎｃｅ－ＢａｓｅｄＢａｓｅＣａｌｌｅｒ」と題する同時係属中の米国特許非仮出願第１７／１７６，１４７号（代理人整理番号ＩＬＬＭ１０２０－２／ＩＰ－１８６６－ＵＳ）に見出すことができる。 Further details of Figures 4-10 and variations thereof can be found in co-pending U.S. non-provisional patent application Ser. No. 17/176,147, entitled "Hardware Execution and Acceleration of Artificial Intelligence-Based Base Caller," filed Feb. 15, 2021 (Attorney Docket No. ILLM1020-2/IP-1866-US), which is incorporated by reference as if fully set forth herein.

複数のベースコーラを使用するベースコール
図１４は、塩基配列を含む未知の検体のベースコールを予測するための、複数のベースコーラを含むベースコールシステム１４００を示す。 Base Calling Using Multiple Base Callers FIG. 14 shows a base calling system 1400 including multiple base callers for predicting base calls of an unknown sample that includes a base sequence.

先に論じた図６Ａは、図１４のシステム１４００のいくつかの構成要素のみを示し、図１４は、図６Ａに示されなかった様々な他の構成要素を示すことに留意されたい。 Note that FIG. 6A, discussed above, shows only some components of system 1400 of FIG. 14, and FIG. 14 shows various other components that were not shown in FIG. 6A.

図６Ａに関連して論じたように、図１４のシステム１４００は、図１に関連して論じた配列決定マシンなどの配列決定マシン１４０４を含む。配列決定マシン１４０４は、図１～図３に関して論じたフローセルなどのフローセル１４０５を含む。フローセル１４０５は、複数のタイル１４０６を含み、各タイル１４０６は、例えば図２及び図３に関して論じたように、複数のクラスタ１４０７を含む（単一のタイルの例示的なクラスタが図６Ａに示されている）。図４～図６に関して論じたように、タイル１４０６からの生画像を含むセンサデータ１４１２は、配列決定マシン１４０４によって出力される。 As discussed in connection with FIG. 6A, the system 1400 of FIG. 14 includes a sequencing machine 1404, such as the sequencing machine discussed in connection with FIG. 1. The sequencing machine 1404 includes a flow cell 1405, such as the flow cell discussed in connection with FIGS. 1-3. The flow cell 1405 includes a number of tiles 1406, each of which includes a number of clusters 1407 (an exemplary cluster for a single tile is shown in FIG. 6A), e.g., as discussed in connection with FIGS. 2 and 3. As discussed in connection with FIGS. 4-6, sensor data 1412, including raw images from the tiles 1406, is output by the sequencing machine 1404.

一実施形態では、システム１４００は、第１のベースコーラ１４１４及び第２のベースコーラ１４１６などの２つ以上のベースコーラを含む。２つのベースコーラが図に示されているが、一例では、３つ、４つ、又はそれ以上の数のベースコーラなど、３つ以上のベースコーラがシステム１４００内に存在し得る。 In one embodiment, the system 1400 includes two or more base colers, such as a first base coler 1414 and a second base coler 1416. Although two base colers are shown in the figure, in one example, there may be more than two base colers in the system 1400, such as three, four, or more base colers.

一例では、ベースコーラ１４１４及び１４１６は、配列決定マシン１４０４に対してローカルである。したがって、ベースコーラ１４１４及び１４１６並びに配列決定マシン１４０４は、近位に位置しており（例えば、同じ筐体内、又は２つの近位に位置する筐体内に）、ベースコーラ１４１４及び１４１６は、配列決定マシン１４０４から直接センサデータ１４１２を受信する。 In one example, the base callers 1414 and 1416 are local to the sequencing machine 1404. Thus, the base callers 1414 and 1416 and the sequencing machine 1404 are located proximally (e.g., in the same housing or in two proximally located housings) and the base callers 1414 and 1416 receive the sensor data 1412 directly from the sequencing machine 1404.

別の例では、ベースコーラ１４１４及び１４１６は、いわゆるクラウドベースのベースコーラの例である配列決定マシン１４０４に対して遠隔に位置する。したがって、ベースコーラ１４１４及び１４１６は、インターネットなどのコンピュータネットワークを介して配列決定マシン１４０４からセンサデータ１４１２を受信する。 In another example, the base callers 1414 and 1416 are located remotely to the sequencing machine 1404, which is an example of a so-called cloud-based base caller. Thus, the base callers 1414 and 1416 receive the sensor data 1412 from the sequencing machine 1404 over a computer network such as the Internet.

図１４の各ベースコーラ１４１４及び１４１６は、対応するベースコール分類情報を出力する。例えば、第１のベースコーラ１４１４は第１のベースコール分類情報１４３４を出力し、第２のベースコーラ１４１６は第２のベースコール分類情報１４３６を出力する。ベースコール結合モジュール１４２８は、第１のベースコール分類情報１４３４及び第２のベースコール分類情報１４３６の一方又は両方に基づいて、最終ベースコール１４４０を生成する。 Each base caller 1414 and 1416 in FIG. 14 outputs corresponding base call classification information. For example, the first base caller 1414 outputs first base call classification information 1434, and the second base caller 1416 outputs second base call classification information 1436. The base call combination module 1428 generates a final base call 1440 based on one or both of the first base call classification information 1434 and the second base call classification information 1436.

一例では、第１のベースコーラ１４１４は、ニューラルネットワークベースのベースコーラである。例えば、第１のベースコーラ１４１４は、本明細書で前述したように（例えば、図６～図１３Ｂを参照）、ベースコールのための１つ以上のニューラルネットワークモデルを採用する非線形システムである。 In one example, the first base caller 1414 is a neural network-based base caller. For example, the first base caller 1414 is a nonlinear system that employs one or more neural network models for base calling, as described previously herein (e.g., see Figures 6-13B).

一例では、第２のベースコーラ１４１６は、非ニューラルネットワークベースのベースコーラである。例えば、第２のベースコーラ１４１６は、少なくとも部分的に、ベースコールに使用される線形システムである。例えば、第２のベースコーラ１４１６は、本明細書で前述したように（例えば、図６及び後続の議論を参照）、ベースコールのためのニューラルネットワークを採用しない（又は、第１のベースコーラ１４１４によって使用されるより大きいニューラルネットワークモデルと比較して、ベースコールのためのより小さいニューラルネットワークモデルを使用する）。 In one example, the second base caller 1416 is a non-neural network based base caller. For example, the second base caller 1416 is a linear system used, at least in part, for base calling. For example, the second base caller 1416 does not employ a neural network for base calling (or uses a smaller neural network model for base calling compared to the larger neural network model used by the first base caller 1414) as previously described herein (see, e.g., FIG. 6 and the subsequent discussion).

一実施形態において、システム１４００は、コンテキスト情報生成モジュール１４１８を含む。コンテキスト情報生成モジュール１４１８は、コンテキスト情報１４２０を生成する。一実施形態では、ベースコール結合モジュール１４２８は、コンテキスト情報１４２０に基づいて動作する。例えば、コンテキスト情報１４２０に基づいて、ベースコール結合モジュール１４２８は、ベースコール分類情報１４３４及びベースコール分類情報１４３６の一方又は両方を使用して、最終ベースコールを生成する。コンテキスト情報は、例えば図１６に関して本明細書において後で論じる。 In one embodiment, the system 1400 includes a context information generation module 1418. The context information generation module 1418 generates context information 1420. In one embodiment, the base call combination module 1428 operates based on the context information 1420. For example, based on the context information 1420, the base call combination module 1428 uses one or both of the base call classification information 1434 and the base call classification information 1436 to generate a final base call. Context information is discussed later in this specification, for example, with respect to FIG. 16.

一実施形態において、システム１４００はまた、スイッチングモジュール１４２２を含む。図１４では、スイッチングモジュール１４２２、コンテキスト情報生成モジュール１４１８、及びベースコール結合モジュール１４２８が、システム１４００の３つの別個の構成要素として示されていることに留意されたい。しかしながら、一例では、これらのモジュールのうちの１つ以上を組み合わせて、組み合わされたモジュールを形成することができる。 In one embodiment, the system 1400 also includes a switching module 1422. Note that in FIG. 14, the switching module 1422, the context information generation module 1418, and the base call combination module 1428 are shown as three separate components of the system 1400. However, in one example, one or more of these modules can be combined to form a combined module.

一実施形態では、システム１４００はまた、ベースコーラ１４１４及び１４１６を選択的にオン又はオフに切り替えるスイッチングモジュール１４２２を含む。例えば、コンテキスト情報１４２０に応じて、ベースコーラ１４１４及び１４１６のうちの一方のみがセンサデータ１４１２の特定のセットを分析することになっている場合、本明細書において後で更に詳細に論じるように、そのセンサデータセットについて、選択されたベースコーラのみが有効化され、他方のベースコーラは無効化される。 In one embodiment, the system 1400 also includes a switching module 1422 that selectively switches the base chores 1414 and 1416 on or off. For example, if only one of the base chores 1414 and 1416 is to analyze a particular set of sensor data 1412 in response to the context information 1420, then only the selected base chore is enabled for that sensor data set and the other base chore is disabled, as discussed in more detail later herein.

センサデータのセットに対してベースコーラを有効化又はオンに切り替えることは、ベースコーラがセンサデータの特定のセットに対して動作又は実行することを意味する。したがって、ベースコーラを有効化又はオンに切り替えることは、ベースコーラをオンにすることを必ずしも意味せず、ベースコーラがセンサデータの特定の対応するセットに対して実行されることを単に意味する。センサデータのセットに対してベースコーラを無効化又はオフに切り替えることは、ベースコーラがセンサデータの特定のセットに対して動作又は実行することを控えることを意味する。例えば、ベースコーラがセンサデータの第１のセットに対して無効化される一方で、ベースコーラはセンサデータの第２のセットに対して有効化され得ることに留意されたい。一例では、第１のベースコーラ１４１４は、イネーブル信号１４２４を使用して選択的に有効化又は無効化することができ、第２のベースコーラ１４１６は、イネーブル信号１４２６を使用して選択的に有効化又は無効化することができる。したがって、イネーブル信号１４２４及び１４２６は、それぞれ、対応するベースコーラ１４１４又は１４１６を選択的に有効化（又は無効化）するための信号である。 Enabling or switching on the base chora for a set of sensor data means that the base chora operates or executes on the particular set of sensor data. Thus, enabling or switching on the base chora does not necessarily mean turning on the base chora, but simply means that the base chora executes on the particular corresponding set of sensor data. Disabling or switching off the base chora for a set of sensor data means that the base chora refrains from operating or executing on the particular set of sensor data. Note that, for example, the base chora may be disabled for a first set of sensor data, while the base chora may be enabled for a second set of sensor data. In one example, the first base chora 1414 may be selectively enabled or disabled using the enable signal 1424, and the second base chora 1416 may be selectively enabled or disabled using the enable signal 1426. Thus, the enable signals 1424 and 1426 are signals for selectively enabling (or disabling) the corresponding base chora 1414 or 1416, respectively.

本明細書で論じられる「センサデータのセット」は、センサデータ１４１２のセクション、又はセンサデータ１４１２のデータセットを指す。例えば、センサデータのセットは、フローセル１４０５の１つ以上の特定のクラスタ１４０７又は１つ以上の特定のタイル１４０６からのセンサデータであってもよい。センサデータのセットは、１つ以上の特定のベース感知サイクルからのセンサデータであり得る。したがって、センサデータのセットは、フローセル１４０５の特定の空間的態様（例えば、フローセル１４０５の１つ以上の特定のクラスタ１４０７からの）及び／又はベースコールサイクルの特定の時間的態様（例えば、１つ以上の特定のベースコールサイクルからの）と関連付けられ得る。 A "set of sensor data" as discussed herein refers to a section of sensor data 1412 or a data set of sensor data 1412. For example, a set of sensor data may be sensor data from one or more particular clusters 1407 or one or more particular tiles 1406 of the flow cell 1405. A set of sensor data may be sensor data from one or more particular base sensing cycles. Thus, a set of sensor data may be associated with a particular spatial aspect of the flow cell 1405 (e.g., from one or more particular clusters 1407 of the flow cell 1405) and/or a particular temporal aspect of a base call cycle (e.g., from one or more particular base call cycles).

単なる例として、センサデータ１４１２の第１のセットに対して、ベースコール結合モジュール１４２８は、第１のベースコーラ１４１４からのベースコール分類情報１４３４のみに依存して、センサデータ１４１２の第１のセットに対する最終ベースコール１４４０を生成し得る。ベースコール結合モジュール１４２８は、例えば、センサデータの第１のセット１４１２に関連付けられたコンテキスト情報１４２０に基づいて、ベースコール分類情報１４３４のみに依存する（ベースコール分類情報１４３６には依存しない）ことを決定する。一例では、センサデータの第１のセットを処理するとき、スイッチングモジュール１４２２は、イネーブル信号１４２４を使用して第１のベースコーラ１４１４のみを有効化し（例えば、第１のベースコーラ１４１４がデータの第１のセットに対して実行する）、イネーブル信号１４２６を使用して第２のベースコーラ１４１６を無効化し（例えば、第２のベースコーラ１４１６がデータの第１のセットに対して実行しない）、第１のベースコーラ１４１４からの第１のベースコール分類情報１４３４を使用して、最終ベースコール１４４０を生成する。しかしながら、別の例では、第１のベースコーラ１４１４からの第１のベースコール分類情報１４３４が、データの第１のセットのための最終ベースコール１４４０を生成するために使用されるが、スイッチングモジュール１４２２は、第１のベースコーラ１４１４を有効化し、任意選択で、例えば本明細書で後述する理由で、第２のベースコーラ１４１６も有効化する。そのような例では、ベースコール分類情報１４３４及び１４３６の両方が、データの第１のセットについて利用可能であり、最終ベースコール１４４０は、第１のベースコール分類情報１４３４のみに基づく。 By way of example only, for a first set of sensor data 1412, base call combination module 1428 may rely only on base call classification information 1434 from the first base caller 1414 to generate a final base call 1440 for the first set of sensor data 1412. Base call combination module 1428 may determine to rely only on base call classification information 1434 (and not on base call classification information 1436) based on, for example, context information 1420 associated with the first set of sensor data 1412. In one example, when processing the first set of sensor data, the switching module 1422 enables only the first base caller 1414 using enable signal 1424 (e.g., the first base caller 1414 runs on the first set of data), disables the second base caller 1416 using enable signal 1426 (e.g., the second base caller 1416 does not run on the first set of data), and uses the first base call classification information 1434 from the first base caller 1414 to generate the final base calls 1440. However, in another example, the first base call classification information 1434 from the first base caller 1414 is used to generate the final base calls 1440 for the first set of data, but the switching module 1422 enables the first base caller 1414, and optionally also enables the second base caller 1416, for example, for reasons described later herein. In such an example, both base call classification information 1434 and 1436 are available for the first set of data, and the final base call 1440 is based only on the first base call classification information 1434.

単に別の例として、センサデータ１４１２の第２のセットについて、ベースコール結合モジュール１４２８は、第２のベースコーラ１４１６からのベースコール分類情報１４３６のみに依存して、センサデータ１４１２の第２のセットについての最終ベースコール１４４０を生成し得る。ベースコール結合モジュール１４２８は、例えば、センサデータ１４１２の第２のセットに関連付けられたコンテキスト情報１４２０に基づいて、ベースコール分類情報１４３６のみに依存する（ベースコール分類情報１４３４には依存しない）ことを決定する。一例では、センサデータの第２のセットを処理するとき、スイッチングモジュール１４２２は、イネーブル信号１４２６を使用して第２のベースコーラ１４１６のみを有効化し、イネーブル信号１４２４を使用して第１のベースコーラ１４１４を無効化し、例えば、第２のベースコーラ１４１６からの第２のベースコール分類情報１４３６が、最終ベースコール１４４０を生成するために使用される。しかしながら、別の例では、第２のベースコーラ１４１６からの第２のベースコール分類情報１４３６が、データの第２のセットのための最終ベースコール１４４０を生成するために使用されるが、スイッチングモジュール１４２２は、第２のベースコーラ１４１６を有効化し、任意選択で、例えば本明細書で後述する理由で、第１のベースコーラ１４１４も有効化する。そのような例では、ベースコール分類情報１４３４及び１４３６の両方が利用可能であり、最終ベースコール１４４０は、ベースコール分類情報１４３６のみに基づく。 As just another example, for the second set of sensor data 1412, the base call combining module 1428 may rely only on the base call classification information 1436 from the second base caller 1416 to generate the final base call 1440 for the second set of sensor data 1412. The base call combining module 1428 determines to rely only on the base call classification information 1436 (and not on the base call classification information 1434), for example, based on the context information 1420 associated with the second set of sensor data 1412. In one example, when processing the second set of sensor data, the switching module 1422 enables only the second base caller 1416 using the enable signal 1426 and disables the first base caller 1414 using the enable signal 1424, for example, such that the second base call classification information 1436 from the second base caller 1416 is used to generate the final base call 1440. However, in another example, the second base call classification information 1436 from the second base caller 1416 is used to generate the final base call 1440 for the second set of data, but the switching module 1422 enables the second base caller 1416 and, optionally, also enables the first base caller 1414, e.g., for reasons described later herein. In such an example, both base call classification information 1434 and 1436 are available, and the final base call 1440 is based only on the base call classification information 1436.

単に更に別の例として、センサデータ１４１２の第３のセットについて、ベースコール結合モジュール１４２８は、それぞれベースコーラ１４１４及び１４１６からのベースコール分類情報１４３４及び１４３６の両方に依存して、センサデータ１４１２の第３のセットについての最終ベースコール１４４０を生成し得る。ベースコール結合モジュール１４２８は、例えばセンサデータ１４１２の第３のセットに関連付けられたコンテキスト情報１４２０に基づいて、ベースコール分類情報１４３４及び１４３６の両方に依存することを決定する。したがって、センサデータの第３のセットを処理するとき、スイッチングモジュール１４２２は、それぞれ、イネーブル信号１４２４及び１４２６を使用して、ベースコーラ１４１４及び１４１６の両方を有効化する。 As merely another example, for the third set of sensor data 1412, base call combination module 1428 may rely on both base call classification information 1434 and 1436 from base callers 1414 and 1416, respectively, to generate a final base call 1440 for the third set of sensor data 1412. Base call combination module 1428 may determine to rely on both base call classification information 1434 and 1436, for example, based on context information 1420 associated with the third set of sensor data 1412. Thus, when processing the third set of sensor data, switching module 1422 enables both base callers 1414 and 1416 using enable signals 1424 and 1426, respectively.

したがって、センサデータの所与のセットについて、ベースコール結合モジュール１４２８は、センサデータの対応するセットに関連付けられたコンテキスト情報１４２０に基づいて、ベースコール分類情報１４３４及び１４３６のうちの特定の一方又は両方に依存することを決定する。同様に、スイッチングモジュール１４２２は、センサデータの対応するセットに関連付けられたコンテキスト情報１４２０に基づいて、ベースコーラ１４１４及び１４１６のうちの特定の一方又は両方を有効化することを決定する。 Thus, for a given set of sensor data, the base call combining module 1428 determines to rely on a particular one or both of the base call classification information 1434 and 1436 based on the context information 1420 associated with the corresponding set of sensor data. Similarly, the switching module 1422 determines to enable a particular one or both of the base callers 1414 and 1416 based on the context information 1420 associated with the corresponding set of sensor data.

第１のベースコーラ１４１４及び第２のベースコーラ１４１６の例示的な動作
図１５Ａ、図１５Ｂ、図１５Ｃ、図１５Ｄ、及び図１５Ｅは、センサデータの対応するセットのための図１４のベースコールシステム１４００の種々の動作を示す、対応するフローチャートを示す。例えば、図１５Ａ～図１５Ｅは、システム１４００が動作し得る、種々の順列及び組み合わせを示す。 Exemplary Operation of First Base Caller 1414 and Second Base Caller 1416 Figures 15A, 15B, 15C, 15D, and 15E show corresponding flow charts illustrating various operations of base calling system 1400 of Figure 14 for corresponding sets of sensor data. For example, Figures 15A-15E show various permutations and combinations in which system 1400 may operate.

第１のベースコール分類情報１４３４に基づいて最終ベースコール１４４０を有効化する第１のベースコーラ１４１４
図１５Ａは、システム１４００の動作を示し、第１のベースコーラ１４１４が有効化され、センサデータのセット１５０１ａに対するベースコール分類情報を生成し（例えば、第２のベースコーラ１４１６がセンサデータのセット１５０１ａに対して動作していない間に）、最終ベースコール１４４０は、センサデータのセット１５０１ａに対する第１のベースコール分類情報１４３４に基づく。 A first base caller 1414 that validates a final base call 1440 based on the first base call classification information 1434.
FIG. 15A illustrates operation of system 1400, where a first base caller 1414 is enabled and generates base call classification information for a set of sensor data 1501a (e.g., while a second base caller 1416 is not operating on the set of sensor data 1501a), and a final base call 1440 is based on the first base call classification information 1434 for the set of sensor data 1501a.

したがって、図１５Ａでは、システム１４００の動作が、フローセル１４０５によって生成されたセンサデータのセット１５０１ａについて示されている。１５０５ａにおいて、フローセル１４０５は、センサデータのセット１５０１ａを生成する。論じたように、センサデータのセット１５０１ａは、特定のタイルの特定のクラスタによって、又は特定のタイルによってなど、フローセルの特定の位置において、特定の塩基配列サイクルについて生成することができる（すなわち、セットは、フローセル１４０５の特定の空間位置（複数の場合もある）及び特定の時間的塩基配列サイクル（複数の場合もある）に関連付けられる）。また、１５０５ａにおいて、センサデータのセット１５０１ａに関連付けられたコンテキスト情報がアクセスされる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。論じたように、コンテキスト情報は、コンテキスト情報生成モジュール１４１８によって生成され得る。 15A, the operation of the system 1400 is shown for a set of sensor data 1501a generated by the flow cell 1405. At 1505a, the flow cell 1405 generates the set of sensor data 1501a. As discussed, the set of sensor data 1501a can be generated for a particular sequence cycle at a particular location of the flow cell, such as by a particular cluster of a particular tile or by a particular tile (i.e., the set is associated with a particular spatial location(s) and a particular temporal sequence cycle(s) of the flow cell 1405). Also, at 1505a, context information associated with the set of sensor data 1501a is accessed (e.g., by the switching module 1422 and/or the base call combination module 1428). As discussed, the context information can be generated by the context information generation module 1418.

図１５Ａの例では、スイッチングモジュール１４２２は、（第２のベースコーラ１４１６ではなく）第１のベースコーラ１４１４がセンサデータのセット１５０１ａを処理することを決定する。したがって、１５１０ａにおいて、スイッチングモジュール１４２２は、例えばイネーブル信号１４２４をオンにすることによって、第１のベースコーラ１４１４を有効化する。第２のベースコーラ１４１６は、無効化されたままであってもよくすなわち、第２のベースコーラ１４１６は、センサデータのセット１５０１ａに対して動作しない。 In the example of FIG. 15A, the switching module 1422 determines that the first base caller 1414 (rather than the second base caller 1416) will process the set of sensor data 1501a. Thus, at 1510a, the switching module 1422 enables the first base caller 1414, for example, by turning on the enable signal 1424. The second base caller 1416 may remain disabled, i.e., the second base caller 1416 does not operate on the set of sensor data 1501a.

１５１５ａにおいて、第１のベースコーラ１４１４は、センサデータのセット１５０１ａのための第１のベースコール分類情報１４３４を生成するが、第２のベースコーラ１４１６は、センサデータのセット１５０１ａのための第２のベースコール分類情報１４３６を生成することを控える。 At 1515a, the first base caller 1414 generates first base call classification information 1434 for the set of sensor data 1501a, but the second base caller 1416 refrains from generating second base call classification information 1436 for the set of sensor data 1501a.

１５２０ａにおいて、ベースコール結合モジュール１４２８は、センサデータのセット１５０１ａに関連付けられたコンテキスト情報１４２０に基づいて、第１のベースコール分類情報１４３４を使用して、センサデータのセット１５０１ａに対する最終ベースコールを生成する。 At 1520a, the base call combination module 1428 uses the first base call classification information 1434 to generate a final base call for the set of sensor data 1501a based on the context information 1420 associated with the set of sensor data 1501a.

第２のベースコール分類情報１４３６に基づいて最終ベースコール１４４０を有効化する第２のベースコーラ１４１６
図１５Ｂは、システム１４００の動作を示し、第２のベースコーラ１４１６が有効化され、センサデータのセット１５０１ｂに対するベースコール分類情報を生成し（例えば、第１のベースコーラ１４１４がセンサデータのセット１５０１ｂに対して動作していない間に）、最終ベースコール１４４０は、センサデータのセット１５０１ｂに対する第２のベースコール分類情報１４３６に基づく。 A second base caller 1416 that validates a final base call 1440 based on the second base call classification information 1436.
FIG. 15B illustrates operation of system 1400, where second base caller 1416 is enabled and generates base call classification information for set of sensor data 1501b (e.g., while first base caller 1414 is not operating on set of sensor data 1501b), and final base calls 1440 are based on second base call classification information 1436 for set of sensor data 1501b.

１５０５ｂにおいて、フローセル１４０５は、センサデータのセット１５０１ｂを生成する。論じたように、センサデータのセット１５０１ｂは、特定のタイルの特定のクラスタによって、又は特定のタイルによってなど、フローセルの特定の位置において、特定の塩基配列サイクルについて生成することができる（すなわち、セットは、フローセル１４０５の特定の空間位置（複数の場合もある）及び特定の時間的塩基配列サイクル（複数の場合もある）に関連付けられる）。また、１５０５ｂにおいて、センサデータのセット１５０１ｂに関連付けられたコンテキスト情報がアクセスされる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。論じたように、コンテキスト情報は、コンテキスト情報生成モジュール１４１８によって生成され得る。 At 1505b, the flow cell 1405 generates a set of sensor data 1501b. As discussed, the set of sensor data 1501b can be generated for a particular sequence cycle at a particular location of the flow cell, such as by a particular cluster of a particular tile or by a particular tile (i.e., the set is associated with a particular spatial location(s) and a particular temporal sequence cycle(s) of the flow cell 1405). Also, at 1505b, context information associated with the set of sensor data 1501b is accessed (e.g., by the switching module 1422 and/or the base call combination module 1428). As discussed, the context information can be generated by the context information generation module 1418.

図１５Ｂの例では、スイッチングモジュール１４２２は、（第１のベースコーラ１４１４ではなく）第２のベースコーラ１４１６がセンサデータのセット１５０１ｂを処理することを決定する。したがって、１５１０ｂにおいて、スイッチングモジュール１４２２は、例えばイネーブル信号１４２６を使用することによって、第２のベースコーラ１４１６を有効化する。第１のベースコーラ１４１４は、無効化されたままであってもよくすなわち、第１のベースコーラ１４１４は、センサデータのセット１５０１ｂに対して動作しない。 In the example of FIG. 15B, the switching module 1422 determines that the second base caller 1416 (rather than the first base caller 1414) will process the set of sensor data 1501b. Thus, at 1510b, the switching module 1422 enables the second base caller 1416, for example, by using the enable signal 1426. The first base caller 1414 may remain disabled, i.e., the first base caller 1414 does not operate on the set of sensor data 1501b.

１５１５ｂにおいて、第２のベースコーラ１４１６は、センサデータのセット１５０１ｂのための第２のベースコール分類情報１４３６を生成するが、第１のベースコーラ１４１４は、センサデータのセット１５０１ｂのための任意の第１のベースコール分類情報１４３４を生成することを控える。 In 1515b, the second base caller 1416 generates second base call classification information 1436 for the set of sensor data 1501b, but the first base caller 1414 refrains from generating any first base call classification information 1434 for the set of sensor data 1501b.

１５２０ｂにおいて、ベースコール結合モジュール１４２８は、センサデータのセット１５０１ｂに関連付けられたコンテキスト情報１４２０に基づいて、第２のベースコール分類情報１４３６を使用して、センサデータのセット１５０１ｂに対する最終ベースコールを生成する。 In 1520b, the base call combination module 1428 uses the second base call classification information 1436 to generate a final base call for the set of sensor data 1501b based on the context information 1420 associated with the set of sensor data 1501b.

（ｉ）第１のベースコール分類情報１４３４及び／又は（ｉｉ）第２のベースコール分類情報１４３６の一方又は両方に基づいて、最終ベースコール１４４０を有効化する第１のベースコーラ１４１４及び第２のベースコーラ１４１６
図１５Ｃは、システム１４００の動作を示し、第１のベースコーラ１４１４及び第２のベースコーラ１４１６の両方が有効化され（すなわち、両方のベースコーラがセンサデータの対応するセット１５０１ｃに対して動作し）、センサデータのセット１５０１ｃに対して対応するベースコール分類情報を生成し、最終ベースコール１４４０は、（ｉ）第１のベースコール分類情報１４３４及び／又は（ｉｉ）第２のベースコール分類情報１４３６の一方又は両方に基づく。 A first base caller 1414 and a second base caller 1416 that validate a final base call 1440 based on one or both of (i) the first base call classification information 1434 and/or (ii) the second base call classification information 1436.
FIG. 15C illustrates operation of system 1400 where both first base caller 1414 and second base caller 1416 are enabled (i.e., both base callers operate on a corresponding set of sensor data 1501c) to generate corresponding base call classification information for set of sensor data 1501c, with final base calls 1440 based on one or both of (i) first base call classification information 1434 and/or (ii) second base call classification information 1436.

１５０５ｃにおいて、フローセル１４０５は、センサデータのセット１５０１ｃを生成する。論じたように、センサデータのセット１５０１ｃは、特定のタイルの特定のクラスタによって、又は特定のタイルによってなど、フローセルの特定の位置（複数の場合もある）において、特定の塩基配列サイクル（複数の場合もある）について生成することができる（すなわち、セットは、図１４のフローセル１４０５の特定の空間位置（複数の場合もある）及び特定の時間的塩基配列サイクル（複数の場合もある）に関連付けられる）。また、１５０５ｃにおいて、センサデータのセット１５０１ｃに関連付けられたコンテキスト情報がアクセスされる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。本明細書で更に詳細に論じるように、コンテキスト情報は、コンテキスト情報生成モジュール１４１８によって生成され得る。 At 1505c, the flow cell 1405 generates a set of sensor data 1501c. As discussed, the set of sensor data 1501c can be generated for a particular sequence cycle(s) at a particular location(s) of the flow cell, such as by a particular cluster of a particular tile or by a particular tile (i.e., the set is associated with a particular spatial location(s) and a particular temporal sequence cycle(s) of the flow cell 1405 of FIG. 14). Also, at 1505c, context information associated with the set of sensor data 1501c is accessed (e.g., by the switching module 1422 and/or the base call combination module 1428). As discussed in more detail herein, the context information can be generated by the context information generation module 1418.

図１５Ｃの例では、スイッチングモジュール１４２２は、第１のベースコーラ１４１４及び第２のベースコーラ１４１６の両方がセンサデータのセット１５０１ｃを処理すべきであると決定する。したがって、１５１０ｃにおいて、スイッチングモジュール１４２２（図１４）は、例えば、イネーブル信号１４２４及び１４２６（図１４）を使用して、第１のベースコーラ１４１４及び第２のベースコーラ１４１６の両方を有効化する。例えば、第１のベースコーラ１４１４及び第２のベースコーラ１４１６の両方が、センサデータのセット１５０１ｃ全体を処理することになる。別の例では、第１のベースコーラ１４１４は、センサデータのセット１５０１ｃの第１のサブセットを処理し、第２のベースコーラ１４１６は、センサデータのセット１５０１ｃの第２のサブセットを処理する。 In the example of FIG. 15C, the switching module 1422 determines that both the first base caller 1414 and the second base caller 1416 should process the set of sensor data 1501c. Thus, at 1510c, the switching module 1422 (FIG. 14) enables both the first base caller 1414 and the second base caller 1416, e.g., using enable signals 1424 and 1426 (FIG. 14). For example, both the first base caller 1414 and the second base caller 1416 will process the entire set of sensor data 1501c. In another example, the first base caller 1414 processes a first subset of the set of sensor data 1501c, and the second base caller 1416 processes a second subset of the set of sensor data 1501c.

１５１５ｃにおいて、第１のベースコーラ１４１４は、センサデータのセット１５０１ｃのための第１のベースコール分類情報１４３４を生成し、第２のベースコーラ１４１６は、センサデータのセット１５０１ｃのための第２のベースコール分類情報１４３６を生成する。 At 1515c, the first base caller 1414 generates first base call classification information 1434 for the set of sensor data 1501c, and the second base caller 1416 generates second base call classification information 1436 for the set of sensor data 1501c.

１５２０ｃにおいて、ベースコール結合モジュール１４２８は、センサデータのセット１５０１ｃに関連付けられたコンテキスト情報１４２０に基づいて、第１のベースコール分類情報１４３４及び／又は第２のベースコール分類情報１４３６を使用して、センサデータのセット１５０１ｂのための最終ベースコールを生成する。 In 1520c, the base call combination module 1428 generates a final base call for the set of sensor data 1501b using the first base call classification information 1434 and/or the second base call classification information 1436 based on the context information 1420 associated with the set of sensor data 1501c.

第１のベースコール分類情報１４３４のみを使用して最終ベースコールを生成できない場合、第２のベースコール分類情報１４３６を有効化して使用する
図１５Ｄは、第１のベースコール分類情報１４３４のみを使用して最終ベースコールを生成することができない場合、第２のベースコール分類情報１４３６が最終ベースコール１４４０に使用される、システム１４００の動作を示す。 If the final base call cannot be generated using only the first base call classification information 1434, then the second base call classification information 1436 is enabled and used. FIG. 15D illustrates operation of the system 1400 in which if the final base call cannot be generated using only the first base call classification information 1434, then the second base call classification information 1436 is used for the final base call 1440.

１５０５ｄにおいて、フローセル１４０５は、センサデータのセット１５０１ｄを生成する。論じたように、センサデータのセット１５０１ｄは、特定のタイルの特定のクラスタによって、又は特定のタイルによってなど、フローセルの特定の位置において、特定の塩基配列サイクルについて生成することができる（すなわち、セットは、フローセル１４０５の特定の空間位置（複数の場合もある）及び特定の時間塩基配列サイクル（複数の場合もある）に関連付けられる）。また、１５０５ｄにおいて、センサデータのセット１５０１ｄに関連付けられたコンテキスト情報がアクセスされる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。論じたように、コンテキスト情報は、コンテキスト情報生成モジュール１４１８によって生成され得る。 At 1505d, the flow cell 1405 generates a set of sensor data 1501d. As discussed, the set of sensor data 1501d can be generated for a particular sequence cycle at a particular location of the flow cell, such as by a particular cluster of a particular tile or by a particular tile (i.e., the set is associated with a particular spatial location(s) of the flow cell 1405 and a particular time sequence cycle(s). Also, at 1505d, context information associated with the set of sensor data 1501d is accessed (e.g., by the switching module 1422 and/or the base call combination module 1428). As discussed, the context information can be generated by the context information generation module 1418.

図１５Ｄの例では、スイッチングモジュール１４２２は、第１のベースコーラ１４１４がセンサデータのセット１５０１ｄを処理することを決定する。任意選択で、スイッチングモジュール１４２２は、第２のベースコーラ１４１６もセンサデータのセット１５０１ｄを処理することができると決定することもできる。したがって、１５０１ｄにおいて、第１のベースコーラ１４１４が有効化され、任意選択で、第２のベースコーラ１４１６も有効化される。 In the example of FIG. 15D, the switching module 1422 determines that the first base caller 1414 processes the set of sensor data 1501d. Optionally, the switching module 1422 may also determine that the second base caller 1416 may also process the set of sensor data 1501d. Thus, at 1501d, the first base caller 1414 is enabled, and optionally, the second base caller 1416 is also enabled.

１５１５ｄにおいて、第１のベースコーラ１４１４は、センサデータのセット１５０１ｄのための第１のベースコール分類情報１４３４を生成する。第２のベースコーラ１４１６が有効化される１５１０ｄにおける任意選択の動作では、第２のベースコーラ１４１６は、任意選択で、センサデータのセット１５０１ｄのための第２のベースコール分類情報１４３６を生成する。 In 1515d, the first base caller 1414 generates first base call classification information 1434 for the set of sensor data 1501d. In an optional operation in 1510d in which the second base caller 1416 is enabled, the second base caller 1416 optionally generates second base call classification information 1436 for the set of sensor data 1501d.

１５２０ｄにおいて、最終ベースコールが第１のベースコール分類情報１４３４から（例えば、第２のベースコール分類情報１４３６を使用することなく）生成され得るかどうかについての決定が行われる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。例えば、例えば最終ベースコール１４４０が第１のベースコール分類情報１４３４のみに基づく場合、最終ベースコール１４４０におけるエラーの確率が比較的高い可能性があると決定され得る。そのような決定の多数の例は、本明細書において後で順に論じられる。単に一例として、第１のベースコール分類情報１４３４がホモポリマー（例えば、ＧＧＧＧＧ）又は近ホモポリマー（例えば、ＧＧＴＧＧ）配列を示す場合、第１のベースコール分類情報１４３４は、例えば、図１９Ｂ及び図１９Ｃに関して本明細書で後に論じるように、最終ベースコールを生成するために不十分又は不適切であり得る（例えば、第２のベースコール分類情報１４３６が最終ベースコールを生成するために依存されなければならない）。 At 1520d, a determination is made (e.g., by the switching module 1422 and/or the base call combining module 1428) as to whether a final base call can be generated from the first base call classification information 1434 (e.g., without using the second base call classification information 1436). For example, it may be determined that there may be a relatively high probability of an error in the final base call 1440 if, for example, the final base call 1440 is based solely on the first base call classification information 1434. Numerous examples of such determinations are discussed in turn later herein. By way of example only, if the first base call classification information 1434 indicates a homopolymeric (e.g., GGGGG) or near homopolymeric (e.g., GGTGG) sequence, the first base call classification information 1434 may be insufficient or inappropriate for generating a final base call (e.g., the second base call classification information 1436 must be relied upon to generate the final base call), as discussed later herein with respect to, for example, FIG. 19B and FIG. 19C.

１５２０ｄにおいて「はい」である場合（すなわち、第２のベースコール分類情報１４３６を使用せずに、第１のベースコール分類情報１４３４から最終ベースコールを生成することができる場合）、方法１５００ｄは１５２５ｄに進み、第１のベースコール分類情報１４３４を使用して、センサデータのセット１５０１ｄの最終ベースコールが生成される。 If the answer is "yes" in 1520d (i.e., if the final base calls can be generated from the first base call classification information 1434 without using the second base call classification information 1436), method 1500d proceeds to 1525d, where the first base call classification information 1434 is used to generate final base calls for the set of sensor data 1501d.

１５２０ｄにおいて「いいえ」である場合（すなわち、例えば第２のベースコール分類情報１４３６を使用せずに、第１のベースコール分類情報１４３４のみから最終ベースコールを生成することができない場合）、方法１５００ｄは１５３０ｄに進み、第２のベースコーラ１４１６が有効化され、次いで１５３５ｄにおいて、第２のベースコール分類情報１４３６が、第２のベースコーラ１４１６を使用して生成される。ブロック１５３０ｄ及び１５３５ｄにおける動作は任意選択であり、したがって、点線を使用して示されていることに留意されたい。例えば、第２のベースコーラ１４１６が、任意選択で、１５１０ｄにおいて有効化された場合、動作１５３０ｄは、スキップすることができる。同様に、第２のベースコール分類情報１４３６が、１１５ｄにおいて第２のベースコーラ１４１６を使用して任意選択で生成された場合、動作１５３５ｄをスキップすることができる。 If the answer is "no" in 1520d (i.e., if the final base call cannot be generated from only the first base call classification information 1434, e.g., without using the second base call classification information 1436), method 1500d proceeds to 1530d, where the second base caller 1416 is enabled, and then in 1535d, the second base call classification information 1436 is generated using the second base caller 1416. Note that the operations in blocks 1530d and 1535d are optional and are therefore depicted using dotted lines. For example, if the second base caller 1416 was optionally enabled in 1510d, operation 1530d can be skipped. Similarly, if the second base call classification information 1436 was optionally generated using the second base caller 1416 in 1515d, operation 1535d can be skipped.

第２のベースコーラ１４１６が１５１０ｄにおいて有効化されておらず、第２のベースコーラ１４１６が１５３０ｄにおいて有効化されているシナリオを仮定する。したがって、１５３０ｄにおいて、第２のベースコーラ１４１６は、センサデータのセット１５１０ｄの処理を開始する。所与のベースコールサイクルについて、第２のベースコーラ１４１６は、対応するセンサデータの処理を直ちに開始してベースコールを生成することができないことに留意されたい。これは、本明細書で後に論じられるフェージングに起因して（例えば、図１７Ｃ、図１７Ｄを参照）、第２のベースコーラ１４１６が、現在のサイクルの塩基を満足に呼び出すために、１つ以上の以前のベースコールサイクルのセンサデータを処理しなければならないためである。例えば、ベースコールサイクル１～１０００が実行され、センサデータのセット１５０１ｄがベースコールサイクル１００以降の画像を含むと仮定する。また、１５３０ｄにおいて、第２のベースコーラ１４１６が、ベースコールサイクル１００及び１つ以上の後続のベースコールサイクルについてセンサデータを処理するように有効化されると仮定する。論じたように、第２のベースコーラ１４１６は、サイクル１００及び後続のサイクルの塩基を満足に呼び出すために、１つ以上の前のサイクルからのセンサデータを処理しなければならない。数回前のサイクルからのセンサデータを処理することによって、第２のベースコーラ１４１６がサイクル１００におけるフェージングの影響を推定することが可能になり、これは、サイクル１００におけるベースコールの質を増加させる。単に一例として、第２のベースコーラ１４１６がサイクル１００の塩基を満足に呼び出すために、５、１０、２０、又は別の適切な数の前のサイクルが第２のベースコーラ１４１６によって処理される。 Assume a scenario in which the second base caller 1416 is not enabled in 1510d and the second base caller 1416 is enabled in 1530d. Thus, in 1530d, the second base caller 1416 begins processing the set of sensor data 1510d. Note that for a given base calling cycle, the second base caller 1416 cannot immediately begin processing the corresponding sensor data to generate base calls. This is because, due to phasing discussed later in this specification (see, e.g., Figures 17C, 17D), the second base caller 1416 must process the sensor data of one or more previous base calling cycles in order to satisfactorily call the bases of the current cycle. For example, assume that base calling cycles 1-1000 have been performed and the set of sensor data 1501d includes images from base calling cycle 100 onwards. Also assume that at 1530d, the second base caller 1416 is enabled to process sensor data for base calling cycle 100 and one or more subsequent base calling cycles. As discussed, the second base caller 1416 must process sensor data from one or more previous cycles in order to satisfactorily call the bases in cycle 100 and subsequent cycles. Processing sensor data from several previous cycles allows the second base caller 1416 to estimate the effects of phasing in cycle 100, which increases the quality of the base calls in cycle 100. By way of example only, 5, 10, 20, or another suitable number of previous cycles may be processed by the second base caller 1416 in order for the second base caller 1416 to satisfactorily call the bases in cycle 100.

第１の例では、第２のベースコーラ１４１６が、サイクル１００の塩基を満足に呼び出すために、Ｎ１個の前のサイクルからのプロセスセンサデータを有すると仮定する。第２の例では、第２のベースコーラ１４１６が、サイクル１０００の塩基を満足に呼び出すために、Ｎ２個の以前のサイクルからのプロセスセンサデータを有すると仮定する。ここで、図１７Ｃ、図１７Ｄに関して論じるように、フェージング及びプレフェージングの影響は、ベースコールサイクルが進行するにつれて、より明白になる。したがって、フェージング及びプレフェージングは、サイクル１００よりもサイクル１０００においてより顕著である。したがって、サイクル１０００の塩基を満足に呼び出すために、第２のベースコーラ１４１６は、サイクル１００の塩基を満足に呼び出すために処理されるべき前のサイクルの数よりも多い数の前のサイクルを処理しなければならない。したがって、Ｎ２はＮ１より高い。 In a first example, assume that the second base caller 1416 has process sensor data from N1 previous cycles to satisfactorily call the base in cycle 100. In a second example, assume that the second base caller 1416 has process sensor data from N2 previous cycles to satisfactorily call the base in cycle 1000. Now, as discussed with respect to Figures 17C, 17D, the effects of phasing and prephasing become more evident as the base calling cycles progress. Thus, phasing and prephasing are more pronounced in cycle 1000 than in cycle 100. Thus, to satisfactorily call the base in cycle 1000, the second base caller 1416 must process a greater number of previous cycles than the number of previous cycles that must be processed to satisfactorily call the base in cycle 100. Thus, N2 is higher than N1.

再び図１５Ｄを参照すると、１５３５ｄに続いて、１５４０ｄにおいて、（ｉ）第１のベースコール分類情報１４３４及び／又は（ｉｉ）第２のベースコール分類情報１４３６の一方又は両方を使用して、センサデータのセット１５０１ｄの最終ベースコールが生成される。 Referring again to FIG. 15D, following 1535d, at 1540d, final base calls are generated for the set of sensor data 1501d using one or both of (i) the first base call classification information 1434 and/or (ii) the second base call classification information 1436.

第２のベースコール分類情報１４３６のみを用いて最終ベースコールを生成できない場合、第１のベースコール分類情報１４３４を有効化して使用する
図１５Ｅは、第２のベースコール分類情報１４３６のみを使用して最終ベースコールを生成することができない場合、第１のベースコール分類情報１４３４が最終ベースコール１４４０に使用される、システム１４００の動作を示す。 If the final base call cannot be generated using only the second base call classification information 1436, then the first base call classification information 1434 is enabled and used. FIG. 15E illustrates operation of the system 1400 in which if the final base call cannot be generated using only the second base call classification information 1436, then the first base call classification information 1434 is used for the final base call 1440.

１５０５ｅにおいて、フローセル１４０５は、センサデータのセット１５０１ｅを生成する。論じたように、センサデータのセット１５０１ｅは、特定のタイルの特定のクラスタによって、又は特定のタイルによってなど、フローセルの特定の位置において、特定の塩基配列サイクルについて生成することができる（すなわち、セットは、フローセル１４０５の特定の空間位置及び特定の時間的塩基配列サイクルに関連付けられる）。また、１５０５ｅにおいて、センサデータのセット１５０１ｅに関連付けられたコンテキスト情報がアクセスされる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。論じたように、コンテキスト情報は、コンテキスト情報生成モジュール１４１８によって生成され得る。 At 1505e, the flow cell 1405 generates a set of sensor data 1501e. As discussed, the set of sensor data 1501e can be generated for a particular sequence cycle at a particular location of the flow cell, such as by a particular cluster of a particular tile, or by a particular tile (i.e., the set is associated with a particular spatial location of the flow cell 1405 and a particular temporal sequence cycle). Also, at 1505e, context information associated with the set of sensor data 1501e is accessed (e.g., by the switching module 1422 and/or the base call combination module 1428). As discussed, the context information can be generated by the context information generation module 1418.

図１５Ｅの例では、スイッチングモジュール１４２２は、第２のベースコーラ１４１６が、例えば、関連付けられたコンテキスト情報に基づいて、センサデータのセット１５０１ｅを処理することを決定する。任意選択で、スイッチングモジュール１４２２は、第１のベースコーラ１４１４もセンサデータのセット１５０１ｅを処理することができると決定することもできる。したがって、１５１０ｅにおいて、第２のベースコーラ１４１６が有効化され、任意選択で、第１のベースコーラ１４１４も有効化される。 In the example of FIG. 15E, the switching module 1422 determines that the second base caller 1416 processes the set of sensor data 1501e, e.g., based on associated context information. Optionally, the switching module 1422 may also determine that the first base caller 1414 can also process the set of sensor data 1501e. Thus, at 1510e, the second base caller 1416 is enabled, and optionally, the first base caller 1414 is also enabled.

１５１５ｅにおいて、第２のベースコーラ１４１６は、センサデータのセット１５０１ｅのための第２のベースコール分類情報１４３６を生成する。第１のベースコーラ１４１４も有効化されるオプションでは、第１のベースコーラ１４１４は、センサデータのセット１５０１ｅのための第１のベースコール分類情報１４３４を生成する。 In 1515e, the second base caller 1416 generates second base call classification information 1436 for the set of sensor data 1501e. In an option in which the first base caller 1414 is also enabled, the first base caller 1414 generates first base call classification information 1434 for the set of sensor data 1501e.

１５２０ｅにおいて、最終ベースコールが、第２のベースコール分類情報１４３６のみから（例えば、第１のベースコール分類情報１４３４を使用することなく）生成され得るかどうかについての決定が行われる（例えば、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８によって）。例えば、例えば最終ベースコール１４４０が第２のベースコール分類情報１４３６のみに基づく場合、（例えば、コンテキスト情報に基づいて）最終ベースコール１４４０におけるエラーの確率が比較的高い可能性があると決定され得る。そのような決定の多数の例は、本明細書において後で順に論じられる。単に一例として、コンテキスト情報がクラスタ内の気泡の検出を示す場合、例えば、図１９Ｄに関して本明細書で後に論じられるように、最終ベースコールは、（例えば、第１のベースコール分類情報１４３４を使用せずに）第２のベースコール分類情報１４３６から生成することができない。 At 1520e, a determination is made (e.g., by the switching module 1422 and/or the base call combining module 1428) as to whether a final base call can be generated from only the second base call classification information 1436 (e.g., without using the first base call classification information 1434). For example, it may be determined that there may be a relatively high probability of an error in the final base call 1440 (e.g., based on the context information) if the final base call 1440 is based only on the second base call classification information 1436. Numerous examples of such determinations are discussed in turn later in this specification. As merely one example, if the context information indicates the detection of a bubble in a cluster, as discussed later in this specification with respect to FIG. 19D, the final base call cannot be generated from the second base call classification information 1436 (e.g., without using the first base call classification information 1434).

１５２０ｅにおいて「はい」である場合（すなわち、第１のベースコール分類情報１４３４を使用せずに、第２のベースコール分類情報１４３６から最終ベースコールを生成することができる場合）、方法１５００ｃは１５２５ｅに進み、第２のベースコール分類情報１４３６を使用して、センサデータのセット１５０１ｅの最終ベースコールが生成される。 If the answer is "yes" in 1520e (i.e., if the final base calls can be generated from the second base call classification information 1436 without using the first base call classification information 1434), method 1500c proceeds to 1525e, where the second base call classification information 1436 is used to generate the final base calls for the set of sensor data 1501e.

１５２０ｅにおいて「いいえ」である場合（すなわち、例えば、第１のベースコール分類情報１４３４を使用せずに、第２のベースコール分類情報１４３６から最終ベースコールを生成することができない場合）、方法１５００ｅは１５３０ｅに進み、第１のベースコーラ１４１４が有効化され、次いで、１５３５ｅに進み、第１のベースコール分類情報１４３４が、第１のベースコーラ１４１４を使用して生成される。ブロック１５３０ｅ及び１５３５ｅにおける動作は任意選択であり、したがって点線を使用して示されていることに留意されたい。例えば、第１のベースコーラ１４１４が、任意選択で、１５１０ｅにおいて有効化された場合、動作１５３０ｅは、スキップすることができる。同様に、第１のベースコール分類情報１４３４が、１５１５ｅにおいて第１のベースコーラ１４１４を使用して任意選択で生成された場合、動作１５３５ｅをスキップすることができる。 If the answer is "no" in 1520e (i.e., for example, if the final base call cannot be generated from the second base call classification information 1436 without using the first base call classification information 1434), method 1500e proceeds to 1530e, where the first base caller 1414 is enabled, and then to 1535e, where the first base call classification information 1434 is generated using the first base caller 1414. Note that the operations at blocks 1530e and 1535e are optional and are therefore depicted using dotted lines. For example, if the first base caller 1414 was optionally enabled in 1510e, operation 1530e may be skipped. Similarly, if the first base call classification information 1434 was optionally generated using the first base caller 1414 in 1515e, operation 1535e may be skipped.

第１のベースコーラ１４１４が１５１０ｅにおいて有効化されず、第１のベースコーラ１４１４が１５３０ｅにおいて有効化されるシナリオを仮定する。したがって、１５３０ｅにおいて、第１のベースコーラ１４１６は、センサデータのセット１５１０ｅの処理を開始する。所与のベースコールサイクルについて、第１のベースコーラ１４１４は、対応するセンサデータの処理を直ちに開始してベースコールを生成することができないことに留意されたい。例えば、第１のベースコーラ１４１４が、ベースコールサイクルＮａからのデータの対応するセットに対して動作すると仮定する。サイクルＮａからベースコールを満足に生成するために、第１のベースコーラ１４１４はまた、サイクルＮａ、少なくとも数サイクル前からのセンサデータに対して動作しなければならない。例えば、なぜなら、図７及び図１０に関して論じられるように、現在のサイクルについてのベースコールはまた、１つ以上の過去のサイクル及び１つ以上の将来のサイクルからのデータに基づくからである。したがって、サイクルＮａから第１のベースコール分類情報１４３４を生成するために、第１のベースコーラ１４１４は、数個の前のサイクル（図７の例では２サイクル、図１０の例では５サイクルなど）からのセンサデータも処理しなければならない。 Assume a scenario in which the first base caller 1414 is not enabled in 1510e, and the first base caller 1414 is enabled in 1530e. Thus, in 1530e, the first base caller 1416 begins processing the set of sensor data 1510e. Note that for a given base calling cycle, the first base caller 1414 cannot immediately begin processing the corresponding sensor data to generate base calls. For example, assume that the first base caller 1414 operates on the corresponding set of data from base calling cycle Na. To satisfactorily generate base calls from cycle Na, the first base caller 1414 must also operate on sensor data from cycle Na, at least several cycles prior. For example, because, as discussed with respect to FIG. 7 and FIG. 10, the base calls for the current cycle are also based on data from one or more past cycles and one or more future cycles. Therefore, to generate the first base call classification information 1434 from cycle Na, the first base caller 1414 must also process sensor data from several previous cycles (e.g., cycle 2 in the example of FIG. 7, cycle 5 in the example of FIG. 10, etc.).

その後、１５４０ｅにおいて、（ｉ）第１のベースコール分類情報１４３４及び／又は（ｉｉ）第２のベースコール分類情報１４３６の一方又は両方を使用して、センサデータのセット１５０１ｅの最終ベースコールが生成される。 Then, in 1540e, a final base call is generated for the set of sensor data 1501e using one or both of (i) the first base call classification information 1434 and/or (ii) the second base call classification information 1436.

コンテキスト情報
図１６は、センサデータの例示的なセット１６０１のためのコンテキスト情報１４２０を生成する、図１４のベースコールシステム１４００のコンテキスト情報生成モジュール１４１８を示す。例えば、コンテキスト情報生成モジュール１４１８は、センサデータのセット１６０１についての情報を受信し、センサデータのセット１６０１のための様々なタイプのコンテキスト情報を生成し、これらは組み合わせて、センサデータのセット１６０１のためのコンテキスト情報と呼ばれる。例えば、コンテキスト情報生成モジュール１４１８は、センサデータのセット１６０１について、空間的コンテキスト情報１６０４、時間的コンテキスト情報１６０６、塩基配列コンテキスト情報１６０８、及び他のコンテキスト情報１６１０を生成する。 Context Information Figure 16 illustrates the context information generation module 1418 of the base calling system 1400 of Figure 14 generating context information 1420 for an exemplary set of sensor data 1601. For example, the context information generation module 1418 receives information about the set of sensor data 1601 and generates various types of context information for the set of sensor data 1601, which in combination are referred to as context information for the set of sensor data 1601. For example, the context information generation module 1418 generates spatial context information 1604, temporal context information 1606, sequence context information 1608, and other context information 1610 for the set of sensor data 1601.

空間的コンテキスト情報１６０４
名前が示唆するように、空間的コンテキスト情報１６０４は、センサデータのセット１６０１が生成されるタイル及びクラスタの空間位置に関連付けられたコンテキスト情報を指す。以下の図１７Ａ及び図１７Ｂは、空間的コンテキスト情報１６０４の例を論じる。 Spatial Context Information 1604
As the name suggests, spatial context information 1604 refers to context information associated with the spatial location of the tiles and clusters from which the sets of sensor data 1601 are generated. Figures 17A and 17B below discuss examples of spatial context information 1604.

図１７Ａは、図１４のシステム１４００のフローセル１４０５を示し、フローセル１４０５は、タイルの空間位置に基づいて分類されるタイル１４０６を含む。例えば、図２に関して論じたように、図１７Ａのフローセル１４０５は複数のレーン１７０２を含み、各レーン内に対応する複数のタイル１４０６を有する。図１７Ａは、フローセル１４０５の上面図を示す。 FIG. 17A illustrates a flow cell 1405 of the system 1400 of FIG. 14, which includes tiles 1406 that are grouped based on the spatial location of the tiles. For example, as discussed with respect to FIG. 2, the flow cell 1405 of FIG. 17A includes multiple lanes 1702 with multiple corresponding tiles 1406 within each lane. FIG. 17A illustrates a top view of the flow cell 1405.

個々のタイルは、タイルの位置に基づいて分類される。例えば、フローセル１４０５の任意のエッジに隣接するタイルは、エッジタイル１４０６ａ（灰色のボックスを使用して示される）としてラベル付けされ、残りのタイルは、非エッジタイル１４０６ｂ（点線のボックスを使用して示される）としてラベル付けされる。 Individual tiles are classified based on the location of the tile. For example, tiles adjacent to any edge of the flow cell 1405 are labeled as edge tiles 1406a (indicated using a gray box), and the remaining tiles are labeled as non-edge tiles 1406b (indicated using a dotted box).

例えば、フローセル１４０４の（例えば、Ｙ軸に沿った）垂直エッジ及び／又は（例えば、Ｘ軸に沿った）水平エッジ上にあるタイルは、図１４に示すように、エッジタイル１４０６ａとして分類される。したがって、エッジタイル１４０６ａは、フローセル１４０４の対応するエッジに隣接しており（例えば、すぐ隣接しており）、非エッジタイルは、フローセル１４０４のいずれのエッジにも隣接していない。 For example, tiles that are on a vertical edge (e.g., along the Y axis) and/or a horizontal edge (e.g., along the X axis) of the flow cell 1404 are classified as edge tiles 1406a, as shown in FIG. 14. Thus, the edge tiles 1406a are adjacent (e.g., immediately adjacent) to the corresponding edge of the flow cell 1404, and the non-edge tiles are not adjacent to any edge of the flow cell 1404.

ベースコールサイクルは、フローセル１４０４の個々のタイル内のクラスタについて実行される。一例では、タイルのベースコール動作に関連するパラメータは、タイルの相対位置に基づくことができる。例えば、図１に関して論じられた励起光１０１は、フローセルのタイルに向かって方向付けられ、異なるタイルは、例えば、個々のタイルの位置及び／又は励起光１０１を発光する１つ以上の光源の位置に基づいて、異なる量の励起光１０１を受信することができる。例えば、励起光１０１を発光する光源（複数の場合もある）がフローセルの垂直上方にある場合、非エッジタイル１４０６ｂは、エッジタイル１４０６ａとは異なる量の光を受信することができる。別の例では、フローセル１４０５の周りの周辺光又は外部光（例えば、バイオセンサの外側からの周囲光）は、フローセル１４０５の個々のタイルによって受信される励起光１０１の量及び／又は特性に影響を及ぼし得る。単に一例として、エッジタイル１４０６ａは、フローセル１４０５の外側からいくらかの量の周辺光とともに励起光１０１を受信することができるが、非エッジタイル１４０６ｂは、主に励起光１０１を受信し得る。更に別の例では、フローセル１４０５に含まれる個々のセンサ（又はピクセル若しくはフォトダイオード）（例えば、図１に示されるセンサ１０６、１０８、１１０、１１２、及び１１４）は、対応するセンサの位置に基づく対応するタイルの位置に基づいて光を感知することができる。例えば、エッジタイル１４０６ａに関連付けられた１つ以上のセンサによって実行される感知動作は、非エッジタイル１４０６ｂに関連付けられた１つ以上の他のセンサの感知動作に対する周辺光の効果よりも比較的、（励起光１０１とともに）周辺光によって影響を受ける可能性がある。更に別の例では、様々なタイルに流れる（例えば、試薬、酵素、試料、他の生体分子、及び緩衝液など、ベースコール中に所望の反応を得るために使用され得る任意の物質を含む）反応物質の流れは、タイル位置によっても影響を受ける可能性がある。例えば、反応物質の供給源の近くにあるタイルは、供給源からより遠くにあるタイルよりも多量の反応物質を受けることができる。 Base calling cycles are performed for clusters within each tile of the flow cell 1404. In one example, parameters related to the base calling operation of the tiles can be based on the relative position of the tiles. For example, the excitation light 101 discussed with respect to FIG. 1 is directed toward the tiles of the flow cell, and different tiles can receive different amounts of excitation light 101, for example, based on the position of the individual tiles and/or the position of one or more light sources emitting the excitation light 101. For example, if the light source(s) emitting the excitation light 101 are vertically above the flow cell, the non-edge tile 1406b can receive a different amount of light than the edge tile 1406a. In another example, ambient or external light around the flow cell 1405 (e.g., ambient light from outside the biosensor) can affect the amount and/or characteristics of the excitation light 101 received by the individual tiles of the flow cell 1405. As just one example, edge tiles 1406a may receive excitation light 101 along with some amount of ambient light from outside flow cell 1405, while non-edge tiles 1406b may receive primarily excitation light 101. In yet another example, individual sensors (or pixels or photodiodes) included in flow cell 1405 (e.g., sensors 106, 108, 110, 112, and 114 shown in FIG. 1) may sense light based on the position of the corresponding tile based on the position of the corresponding sensor. For example, the sensing operation performed by one or more sensors associated with edge tiles 1406a may be affected by ambient light (along with excitation light 101) relatively to the effect of ambient light on the sensing operation of one or more other sensors associated with non-edge tiles 1406b. In yet another example, the flow of reactants (including, for example, any substances that may be used to obtain a desired reaction during base calling, such as reagents, enzymes, samples, other biomolecules, and buffers) flowing to various tiles may also be affected by tile position. For example, tiles that are closer to a source of reactant can receive a larger amount of reactant than tiles that are farther from the source.

一例では、センサデータのセット１６０１に関連付けられた空間的コンテキスト情報１６０４（図１６参照）は、センサデータのセット１６０１がエッジタイル１４０６ａにおいて生成されたか非エッジタイル１４０６ｂにおいて生成されたかに関する情報を含む。上述したように、ベースコールに関連付けられたパラメータは、タイルの異なるカテゴリに対してわずかに異なり得る。したがって、一実施形態では、センサデータのセット１６０１がエッジタイルから生成されるか、又は非エッジタイルから生成されるかを示す空間的コンテキスト情報１６０４は、センサデータのセット１６０１を処理するためのベースコーラの選択に影響を及ぼし得る。単に例として、実装形態の詳細に基づいて、第１のベースコーラ１４１４は、エッジタイル又は非エッジタイルのうちの１つからのセンサデータを処理するのにより適している場合があり、第２のベースコーラ１４１６は、エッジタイル又は非エッジタイルのうちのもう１つからのセンサデータを処理するのにより適している場合がある。 In one example, the spatial context information 1604 (see FIG. 16 ) associated with the set of sensor data 1601 includes information regarding whether the set of sensor data 1601 was generated in an edge tile 1406a or a non-edge tile 1406b. As discussed above, parameters associated with a base call may be slightly different for different categories of tiles. Thus, in one embodiment, the spatial context information 1604 indicating whether the set of sensor data 1601 is generated from an edge tile or a non-edge tile may influence the selection of a base caller for processing the set of sensor data 1601. By way of example only, based on the implementation details, the first base caller 1414 may be better suited to process sensor data from one of the edge tiles or the non-edge tiles, and the second base caller 1416 may be better suited to process sensor data from another of the edge tiles or the non-edge tiles.

図１７Ｂは、図１４のシステム１４００のフローセル１４０５のタイル１４０６を示し、タイル１４０６は、クラスタの空間位置に基づいて分類されたクラスタ１４０７を含む。 FIG. 17B shows a tile 1406 of the flow cell 1405 of the system 1400 of FIG. 14, where the tile 1406 includes clusters 1407 that are sorted based on the spatial location of the clusters.

一例では、タイルから受信されたセンサデータ（例えば、画像データであり得る）に基づいて、タイル内の様々なクラスタの位置が推定され得る。例えば、個々のクラスタの位置は、クラスタの（ｘ，ｙ）座標を使用して識別することができる。したがって、各クラスタ１４０７は、タイルに対するクラスタの位置を識別する対応する（ｘ，ｙ）座標を有する。図１７Ｂにおいて、例示的なタイル１４０６のクラスタ１４０７は、エッジクラスタ１４０７ａ又は非エッジクラスタ１４０７ｂのいずれかとして分類される。例えば、タイルのエッジから閾値距離ＬＩ内にあるクラスタ１４０７は、エッジクラスタ１４０７ａとしてラベル付けされ、タイルのエッジから閾値距離ＬＩ外にあるクラスタ１４０７は、非エッジクラスタ１４０７ｂとしてラベル付けされる。したがって、エッジクラスタ１４０７ａはタイル１４０６の周辺近くに位置し、非エッジクラスタ１４０７ａはタイル１４０６の中央部分近くに位置する。論じたように、クラスタの（ｘ，ｙ）座標を使用して、タイルのエッジに対するクラスタの距離を決定することができ（例えば、コンテキスト情報生成モジュール１４１８によって）、それに基づいて、コンテキスト情報生成モジュール１４１８は、クラスタをエッジクラスタ１４０７ａ又は非エッジクラスタ１４０７ｂのいずれかに分類する。単純な例として、図１７Ｂに示されているのは、タイル１４０６の周辺内にあり、タイル１４０６の周辺から距離ＬＩにある仮想の点線の長方形である。点線の長方形内のクラスタは非エッジクラスタ１４０７ｂとして分類され、点線の長方形の周囲とタイル１４０６の周囲との間のクラスタはエッジクラスタ１４０７ａとして分類される。 In one example, the locations of various clusters within a tile may be estimated based on sensor data (which may be, for example, image data) received from the tile. For example, the location of an individual cluster may be identified using the (x, y) coordinates of the cluster. Thus, each cluster 1407 has a corresponding (x, y) coordinate that identifies the location of the cluster relative to the tile. In FIG. 17B, the clusters 1407 of the example tile 1406 are classified as either edge clusters 1407a or non-edge clusters 1407b. For example, the clusters 1407 that are within a threshold distance LI from the edge of the tile are labeled as edge clusters 1407a, and the clusters 1407 that are outside the threshold distance LI from the edge of the tile are labeled as non-edge clusters 1407b. Thus, the edge clusters 1407a are located near the periphery of the tile 1406, and the non-edge clusters 1407a are located near the central portion of the tile 1406. As discussed, the (x,y) coordinates of the cluster can be used to determine (e.g., by the context information generation module 1418) the distance of the cluster to the edge of the tile, based on which the context information generation module 1418 classifies the cluster as either an edge cluster 1407a or a non-edge cluster 1407b. As a simple example, shown in FIG. 17B is an imaginary dotted rectangle that is within the perimeter of the tile 1406 and at a distance LI from the perimeter of the tile 1406. Clusters within the dotted rectangle are classified as non-edge clusters 1407b, and clusters between the perimeter of the dotted rectangle and the perimeter of the tile 1406 are classified as edge clusters 1407a.

図１に関して論じられるように、フローセル１４０５は、種々のクラスタの画像を捕捉するためのレンズ（マイクロレンズ又は他の光学構成要素のアレイを含むフィルタ層１２４など）を含んでもよい。一例では、画像を捕捉するとき、例えば、画像センサ又はカメラがフローセルの周りを移動するときに、様々なクラスタの焦点合わせにわずかな差があり得る。例えば、エッジクラスタ１４０７ａは、クラスタの画像が捕捉されるときに、非エッジクラスタ１４０７ｂに対してわずかに焦点がずれている可能性がある。焦点がずれる事象はまた、レンズの移動によって引き起こされる加熱又は機械的振動に起因して生じ得る。したがって、実装形態に応じて、本明細書において後で更に詳細に論じられるように（図１９Ｇ、図２０Ｄを参照）、第１又は第２のベースコーラ１４１４又は１４１６のうちの１つは、エッジクラスタ１４０７ａからのセンサデータを処理するのにより適している可能性があり、第１又は第２のベースコーラ１４１４又は１４１６のうちのもう１つは、非エッジクラスタ１４０７ｂからのセンサデータを処理するのにより適している可能性がある。一例では、センサデータのセット１６０１に関連付けられた空間的コンテキスト情報１６０４（図１６参照）は、センサデータのセット１６０１が１つ以上のエッジクラスタ１４０７ａ又は１つ以上の非エッジクラスタ１４０７ｂから生成されたかどうかに関する情報を含み、これに基づいて、センサデータのセット１６０１は、第１又は第２のベースコーラ１４１４又は１４１６のうちの特定の一方又は両方によって処理され得る。 As discussed with respect to FIG. 1, the flow cell 1405 may include lenses (such as a filter layer 124 including an array of microlenses or other optical components) for capturing images of the various clusters. In one example, there may be slight differences in the focusing of the various clusters when capturing the images, e.g., as the image sensor or camera moves around the flow cell. For example, the edge cluster 1407a may be slightly out of focus relative to the non-edge cluster 1407b when the image of the cluster is captured. The out-of-focus event may also occur due to heating or mechanical vibration caused by the movement of the lens. Thus, depending on the implementation, as discussed in more detail later in this specification (see FIG. 19G, FIG. 20D), one of the first or second base collas 1414 or 1416 may be better suited to process sensor data from the edge cluster 1407a, and the other of the first or second base collas 1414 or 1416 may be better suited to process sensor data from the non-edge cluster 1407b. In one example, the spatial context information 1604 (see FIG. 16 ) associated with the set of sensor data 1601 includes information regarding whether the set of sensor data 1601 was generated from one or more edge clusters 1407a or one or more non-edge clusters 1407b, based on which the set of sensor data 1601 may be processed by a particular one or both of the first or second base clusters 1414 or 1416.

したがって、上記の議論を要約すると、センサデータのセット１６０１に関連付けられた空間的コンテキスト情報１６０４は、（ｉ）センサデータのセット１６０１がエッジタイル１４０６ａから生成されたのか非エッジタイル１４０６ｂから生成されたのかに関する情報、及び／又は（ｉｉ）センサデータのセット１６０１が１つ以上のエッジクラスタ１４０７ａから生成されたのか１つ以上の非エッジクラスタ１４０７ｂから生成されたのかに関する情報を含む。本開示の教示に基づいて、他の適切な空間的コンテキスト情報も想定され得る。 Thus, to summarize the above discussion, the spatial context information 1604 associated with the set of sensor data 1601 includes (i) information regarding whether the set of sensor data 1601 was generated from an edge tile 1406a or a non-edge tile 1406b, and/or (ii) information regarding whether the set of sensor data 1601 was generated from one or more edge clusters 1407a or one or more non-edge clusters 1407b. Other suitable spatial context information may also be envisioned based on the teachings of the present disclosure.

時間的コンテキスト情報１６０６
再び図１６を参照すると、コンテキスト情報生成モジュール１４１８は、時間的コンテキスト情報１６０６も生成する。例えば、本明細書で論じられるベースコールシステムは、塩基が呼び出される試料を受け取るように構成され得る。そのようなベースコールは、複数のベースコールサイクルにわたって実行され得る。一例では、センサデータのセット１６０１の時間的コンテキスト情報１６０６は、センサデータのセット１６０１が生成される１つ以上のベースコールサイクル数を示す。例えば、Ｎ個のベースコールサイクルが存在し、センサデータのセット１６０１が、合計Ｎ個のベースコールサイクルのうちのベースコールサイクルＮ１～Ｎ２に関連付けられていると仮定する。センサデータのセット１６０１の時間的コンテキスト情報１６０６は、そのような情報を含む。本明細書で以下に論じるように、センサデータのセット１６０１を処理するためにどのベースコーラが使用されるべきかの選択は、センサデータのセット１６０１が関連付けられるベースコールサイクル数に基づくこともできる。 Temporal Context Information 1606
Referring again to FIG. 16, the context information generation module 1418 also generates temporal context information 1606. For example, the base calling system discussed herein may be configured to receive a sample in which bases are called. Such base calling may be performed over multiple base calling cycles. In one example, the temporal context information 1606 of the set of sensor data 1601 indicates one or more base calling cycle numbers in which the set of sensor data 1601 is generated. For example, assume there are N base calling cycles and the set of sensor data 1601 is associated with base calling cycles N1-N2 out of the total N base calling cycles. The temporal context information 1606 of the set of sensor data 1601 includes such information. As discussed herein below, the selection of which base caller should be used to process the set of sensor data 1601 may also be based on the number of base calling cycles with which the set of sensor data 1601 is associated.

図１７Ｃは、ベースコール動作の配列決定実行におけるサイクル数の関数として信号強度が減少したフェーディングの一例を示す。フェーディングは、ベースコールサイクル数の関数としての蛍光信号強度の指数関数的減衰である。配列決定実行が進行するにつれて、検体ストランドは、過度に洗浄され、反応種を作成するレーザ発光に曝露され、過酷な環境条件に置かれる。これらの全ては、各検体においてフラグメントが徐々に失われる結果を招き、その蛍光信号強度を低下させる。フェーディングは、減光又は信号減衰とも呼ばれる。図１７Ｃは、フェーディング１７００Ｃの一例を示す。図１７Ｃにおいて、ＡＣマイクロサテライトを有する検体フラグメントの強度値は、指数関数的減衰を示す。 Figure 17C shows an example of fading, where signal intensity decreases as a function of cycle number in a sequencing run of base calling operations. Fading is the exponential decay of fluorescent signal intensity as a function of base calling cycle number. As the sequencing run progresses, the specimen strands are washed excessively, exposed to laser emissions that create reactive species, and subjected to harsh environmental conditions. All of this results in a gradual loss of fragments in each specimen, reducing its fluorescent signal intensity. Fading is also referred to as extinction or signal decay. Figure 17C shows an example of fading 1700C. In Figure 17C, the intensity values of specimen fragments with AC microsatellites show an exponential decay.

図１７Ｄは、配列決定進行のサイクルとしての減少する信号対雑音比を概念的に示す。例えば、配列決定が進行すると、信号強度が低下し、ノイズが増加し、その結果、信号対ノイズ比が実質的に減少するため、正確なベースコールがますます困難になる。物理的に、後の合成ステップは、前の合成ステップよりもセンサに対して異なる位置にタグを付着させることが観察された。センサが、合成されている配列の下にある場合、信号減衰は、前のステップよりも後の配列決定ステップにおいてセンサから更に離れたストランドにタグを付着させることから生じる。これは、配列決定サイクルの進行とともに信号減衰を引き起こす。いくつかの設計では、センサが、クラスタを保持する基質の上方にある場合、信号は、減衰する代わりに、配列決定が進行するにつれて増加し得る。 Figure 17D conceptually illustrates the decreasing signal-to-noise ratio as the cycles of sequencing progress. For example, as sequencing progresses, the signal intensity decreases and the noise increases, resulting in a substantial decrease in the signal-to-noise ratio, making accurate base calling increasingly difficult. Physically, it has been observed that later synthesis steps attach tags to different positions relative to the sensor than earlier synthesis steps. If the sensor is below the sequence being synthesized, signal decay results from attaching tags to strands further away from the sensor in later sequencing steps than in earlier steps. This causes signal decay as the sequencing cycle progresses. In some designs, if the sensor is above the substrate holding the clusters, the signal may increase as sequencing progresses instead of decreasing.

調査されたフローセル設計では、信号が減衰している間、ノイズが増大する。物理的に、フェージング及びプレフェージングは、配列決定が進行するにつれてノイズを増加させる。フェージングは、タグが配列に沿って進行することができない配列決定のステップを指す。プレフェージングは、配列決定サイクル中に、タグが、１つの位置ではなく２つの位置だけ前方にジャンプする配列決定ステップを指す。フェージング及びプレフェージングは両方とも、比較的頻繁ではなく、５００～１０００サイクル中に１回程度である。フェージングは、プレフェージングよりわずかに頻繁である。フェージング及びプレフェージングは、強度データを生成するクラスタ内の個々のストランドに影響を及ぼすので、クラスタからの強度ノイズ分布は、配列決定が進行するにつれて、二項展開、三項展開、四項展開などで累積する。 In the flow cell designs investigated, noise increases while the signal decays. Physically, phasing and prephasing increase noise as sequencing progresses. Phasing refers to a step in sequencing where the tag cannot advance along the sequence. Prephasing refers to a sequencing step where the tag jumps forward two positions instead of one during a sequencing cycle. Both phasing and prephasing are relatively infrequent, occurring on the order of once every 500-1000 cycles. Phasing is slightly more frequent than prephasing. Phasing and prephasing affect individual strands within a cluster that generate intensity data, so the intensity noise distribution from the cluster accumulates in binomial, trinomial, quaternary, etc., expansions as sequencing progresses.

フェーディング、信号減衰、及び信号対雑音比の減少、並びに図１７Ｃ及び図１７Ｄの更なる詳細は、本明細書に完全に記載されているかのように参照により組み込まれる、２０２０年５月１４日に出願された「ＳｙｓｔｅｍｓａｎｄＤｅｖｉｃｅｓｆｏｒＣｈａｒａｃｔｅｒｉｚａｔｉｏｎａｎｄＰｅｒｆｏｒｍａｎｃｅＡｎａｌｙｓｉｓｏｆＰｉｘｅｌ－ＢａｓｅｄＳｅｑｕｅｎｃｉｎｇ」と題する米国特許非仮出願第１６／８７４，５９９号（代理人整理番号ＩＬＬＭ１０１１－４／ＩＰ－１７５０－ＵＳ）に見出すことができる。 Further details of fading, signal attenuation, and signal-to-noise ratio reduction, as well as FIGS. 17C and 17D, can be found in U.S. Nonprovisional Patent Application No. 16/874,599, entitled "Systems and Devices for Characterization and Performance Analysis of Pixel-Based Sequencing," filed May 14, 2020 (Attorney Docket No. ILLM1011-4/IP-1750-US), which is incorporated by reference as if fully set forth herein.

したがって、ベースコール中、ボーリングコールの信頼性又は品質（例えば、呼び出される塩基が正しい確率）は、現在の塩基が呼び出されているベースコールサイクル数に基づくことができる。したがって、センサデータのセット１６０１を処理するための第１のベースコーラ１４１４及び／又は第２のベースコーラ１４１６の選択はまた、ベースコール動作が実行されている現在のサイクル数に基づくことができ、これは、本明細書において後で更に詳細に論じるように、センサデータのセット１６０１のための時間的コンテキスト情報１６０６に含まれ得る。 Thus, during base calling, the reliability or quality of the base call (e.g., the probability that the called base is correct) can be based on the base calling cycle number in which the current base is being called. Thus, the selection of the first base caller 1414 and/or the second base caller 1416 to process the set of sensor data 1601 can also be based on the current cycle number in which the base calling operation is being performed, which can be included in the temporal context information 1606 for the set of sensor data 1601, as discussed in more detail later in this specification.

塩基配列コンテキスト情報１６０８
図１８は、ベースコーラ（例えばＤｅｅｐＲＴＡ、ＤｅｅｐＲＴＡ－Ｋ０－０６、ＤｅｅｐＲＴＡ－３４９－Ｋ０－１０－１６０ｐ、ＤｅｅｐＲＴＡ－ＫＯ－１６、ＤｅｅｐＲＴＡ－Ｋ０－１６－Ｌａｎｃｚｏｓ、ＤｅｅｐＲＴＡ－ＫＯ－１８、及びＤｅｅｐＲＴＡ－Ｋ０－２０）の異なる例示的な構成のベースコールホモポリマー（例えばＧＧＧＧＧ）、及び近ホモポリマーを有する配列又は隣接ホモポリマーを有する配列（例えばＧＧＴＧＧ）に対するベースコール精度（１ベースコールエラー率）を示す。一例では、隣接ホモポリマー（例えば、ＧＧＴＧＧ）を有する配列は、目的の塩基（例えば、Ｔ）の両側に隣接するホモポリマー（例えば、ＧＧ）を含む。同様に、近ホモポリマーは、塩基のほとんど又は大部分が同じである配列を含む（例えば、５塩基のうち３塩基、又は５塩基のうち４塩基、又は７塩基のうち４塩基がＧである）。図１８に示される表は、サイクル２０、４０、６０、及び８０などの種々のベースコールサイクルに対するデータ（例えば、ベースコール確率、又は塩基を正しく呼び出す確率）を示す。例えば、サイクル８０でＤｅｅｐＲＴＡベースコーラを使用して配列ＧＧＧＧＧの中央塩基を正しく呼び出す確率は９６．９７％である。本開示において論じるホモポリマー、近ホモポリマー又は隣接ホモポリマーを有する配列のいくつかの例は、５つの塩基を有すると想定されることに留意されたい。しかしながら、そのような特別な配列には、３、５、６、７、９、又は別の適切な数など、任意の異なる数の塩基が存在し得る。 Base sequence context information 1608
18 shows the base calling accuracy (one base calling error rate) for different exemplary configurations of base calls (e.g., DeepRTA, DeepRTA-K0-06, DeepRTA-349-K0-10-160p, DeepRTA-KO-16, DeepRTA-K0-16-Lanczos, DeepRTA-KO-18, and DeepRTA-K0-20) for base calling homopolymers (e.g., GGGGG), and sequences with near homopolymers or flanking homopolymers (e.g., GGTGG). In one example, a sequence with flanking homopolymers (e.g., GGTGG) includes homopolymers (e.g., GG) flanking both sides of the base of interest (e.g., T). Similarly, near homopolymers include sequences in which most or most of the bases are the same (e.g., 3 out of 5 bases, or 4 out of 5 bases, or 4 out of 7 bases are G). The table shown in FIG. 18 shows data (e.g., base calling probability, or probability of calling a base correctly) for various base calling cycles, such as cycles 20, 40, 60, and 80. For example, the probability of calling the middle base of the sequence GGGGG correctly using the DeepRTA base caller at cycle 80 is 96.97%. It should be noted that some examples of sequences having homopolymers, near homopolymers, or adjacent homopolymers discussed in this disclosure are assumed to have five bases. However, there may be any different number of bases in such a particular sequence, such as 3, 5, 6, 7, 9, or another suitable number.

上述したように、いくつかの実装形態では、ベースコーラは、右及び左配列決定サイクルによって文脈化される現在の配列決定サイクルを含む、複数の配列決定サイクルのための配列決定画像のウィンドウを処理することによって、現在の配列決定サイクルのためのベースコールを行う。塩基「Ｇ」は、配列決定画像において暗又は最小信号状態（本明細書ではオフ状態、判読不能信号状態、又は不活性状態とも呼ばれる）によって示されるので、塩基「Ｇ」の反復パターンは、誤ったベースコールをもたらし得る。このような誤ったベースコールはまた、現在の配列決定サイクルが非Ｇ塩基（例えば塩基「Ｔ」）だが、Ｇｓが左右に隣接している場合にも生じる。非Ｇ塩基（すなわち、Ａ、Ｃ、又はＴ）は、配列決定画像において点灯又はオン（又は活性）状態によって示されることに留意されたい。 As mentioned above, in some implementations, the base caller makes base calls for the current sequencing cycle by processing a window of the sequencing image for multiple sequencing cycles, including the current sequencing cycle contextualized by the right and left sequencing cycles. A repeating pattern of base "G" may result in an erroneous base call, since base "G" is represented by a dark or minimum signal state (also referred to herein as an off state, an unreadable signal state, or an inactive state) in the sequencing image. Such an erroneous base call also occurs when the current sequencing cycle is a non-G base (e.g., base "T"), but is flanked to the left and right by Gs. Note that non-G bases (i.e., A, C, or T) are represented by a lit or on (or active) state in the sequencing image.

一例では、ベースコールにおけるエラーの確率が比較的高いいくつかの特定のベースコール配列パターンが存在する。ＧＧＧＧＧ及びＧＧＴＧＧの２つのそのような例が図１８に示されている。同様にベースコールにおけるエラーの確率が比較的高い、他の特定のベースコール配列パターン（例えば、ＧＧＴＣＧ）が存在し得る。一例では、そのような特定のベースコール配列パターンは、複数のＧ、例えば、配列の少なくとも最初及び最後のＧ、並びにおそらく５塩基配列中の２つの末端Ｇの間の第３のＧを有する。このような特定のベースコール配列の他の例としては、ＧＧＸＧＧ、ＧＸＧＧＧ、ＧＧＧＸＧ、ＧＸＸＧＧ、及びＧＧＸＸＧが挙げられ、ここで、Ｘは、Ａ、Ｃ、Ｔ、又はＧのいずれかであり得る。 In one example, there are some specific base call sequence patterns that have a relatively high probability of errors in base calling. Two such examples, GGGGG and GGTGG, are shown in FIG. 18. There may be other specific base call sequence patterns (e.g., GGTCG) that also have a relatively high probability of errors in base calling. In one example, such a specific base call sequence pattern has multiple Gs, e.g., at least the first and last Gs in the sequence, and perhaps a third G between the two terminal Gs in a 5-base sequence. Other examples of such specific base call sequences include GGXGG, GXGGG, GGGXG, GXXGG, and GGXXG, where X can be any of A, C, T, or G.

一例では、センサデータのセット１６０１についての塩基配列コンテキスト情報１６０８はまた、センサデータのセット１６０１が任意のそのような特別な塩基配列パターンに関連付けられているかどうかに関するインジケーションを提供する。例えば、センサデータのセット１６０１が、配列ＧＧＧＧＧ（又はＧＧＴＧＧ）の中間塩基を呼び出すためのものである場合、これは、本明細書で論じられるように（例えば、図１９Ｂ、図１９Ｃ、図２０Ａを参照されたい）、最終ベースコールを生成するための特別な動作を必要とし得る。 In one example, the base sequence context information 1608 for the set of sensor data 1601 also provides an indication as to whether the set of sensor data 1601 is associated with any such special base sequence patterns. For example, if the set of sensor data 1601 is for calling intermediate bases in the sequence GGGGG (or GGTGG), this may require special operations to generate the final base calls, as discussed herein (e.g., see Figures 19B, 19C, 20A).

他のコンテキスト情報１６１０
再び図１６を参照すると、コンテキスト情報生成モジュール１４１８は、他のコンテキスト情報１６１０を更に生成する。他のコンテキスト情報１６１０は、空間的、時間的、及び塩基配列コンテキスト情報によってカバーされない任意のタイプのコンテキスト情報をカバーすることができる。
他のコンテキスト情報１６１０の多数の例が可能であり、それらのうちのいくつかは、本明細書において以下で論じられる。 Other Context Information 1610
16, the context information generation module 1418 further generates other context information 1610. The other context information 1610 may cover any type of context information not covered by the spatial, temporal, and sequence context information.
Numerous examples of other context information 1610 are possible, some of which are discussed herein below.

時々、気泡は、ベースコール動作の１つ以上の配列の間に１つ以上のクラスタにわたって形成される。そのような気泡は、クラスタ中に存在する任意の液体中の気体（空気など）の小球（ベースコールに使用される試薬内の気泡など）であり得る。気泡の存在は、影響を受けたクラスタ（複数の場合もある）から捕捉された画像を分析することに基づいて検出され得る。例えば、クラスタ内の気泡の存在は、クラスタの捕捉された画像内の固有の強度信号シグネチャを検出することによって推定することができる。一例では、他のコンテキスト情報１６１０は、クラスタからのセンサデータのセット１６０１がそのような気泡に関連付けられているかどうかを示し得る。言い換えれば、クラスタからのセンサデータのセット１６０１内の画像がクラスタ内の気泡の存在を示す場合、他のコンテキスト情報１６１０は、クラスタ内のそのような気泡のインジケーションを提供する。気泡の検出は、２０２１年４月２日に出願された「Ｍａｃｈｉｎｅ－ＬｅａｒｎｉｎｇＭｏｄｅｌｆｏｒＤｅｔｅｃｔｉｎｇａＢｕｂｂｌｅＷｉｔｈｉｎａＮｕｃｌｅｏｔｉｄｅ－ＳａｍｐｌｅＳｌｉｄｅｆｏｒＳｅｑｕｅｎｃｉｎｇ」と題する同時係属中の米国特許出願第６３／１７０，０７２号において更に詳細に論じられており、これは参照により本明細書に組み込まれる。気泡が検出された場合の最終ベースコールの生成についての更なる詳細は、本明細書において後で順に論じられる。 Sometimes, bubbles form across one or more clusters during one or more sequences of base calling operations. Such bubbles may be globules of gas (such as air) in any liquid present in the cluster (such as bubbles in the reagents used for base calling). The presence of bubbles may be detected based on analyzing images captured from the affected cluster(s). For example, the presence of bubbles in a cluster may be estimated by detecting a unique intensity signal signature in the captured images of the cluster. In one example, the other context information 1610 may indicate whether the set of sensor data from the cluster 1601 is associated with such a bubble. In other words, if an image in the set of sensor data from the cluster 1601 indicates the presence of a bubble in the cluster, the other context information 1610 provides an indication of such a bubble in the cluster. The detection of bubbles is discussed in further detail in co-pending U.S. patent application Ser. No. 63/170,072, entitled "Machine-Learning Model for Detecting a Bubble Within a Nucleotide-Sample Slide for Sequencing," filed April 2, 2021, which is incorporated herein by reference. Further details regarding the generation of final base calls when bubbles are detected are discussed in turn later in this specification.

一例では、フローセル内で使用される試薬は、ベースコールがどのように実行されるかにおいて主要な役割を果たす。例えば、第１のベースコーラ１４１４は、第１のタイプの試薬が使用される場合に好適であり得るが、第２のベースコーラ１４１６は、第２のタイプの試薬が使用される場合に好適であり得る。一例では、他のコンテキスト情報１６１０は、フローセルで使用される試薬のインジケーションを提供する。試薬の選択に基づく最終ベースコールの生成についての更なる詳細は、本明細書において後で順に論じられる。 In one example, the reagents used in the flow cell play a major role in how base calling is performed. For example, a first base caller 1414 may be preferred when a first type of reagent is used, while a second base caller 1416 may be preferred when a second type of reagent is used. In one example, other context information 1610 provides an indication of the reagents used in the flow cell. Further details regarding the generation of final base calls based on the selection of reagents are discussed in turn later in this specification.

第１のベースコーラ１４１４及び第２のベースコーラ１４１６の選択的使用：２つのベースコーラからの分類情報の関数（例えば、平均、最大、最小、又は別の適切な関数）の最終ベースコール
図１９Ａは、図１４のシステム１４００の第１のベースコーラ１４１４からの第１のベースコール分類情報１４３４及び第２のベースコーラ１４１６からの第２のベースコール分類情報１４３６の関数に基づく、センサデータのセットのための最終ベースコールの生成を示す。 Selective Use of First Base Caller 1414 and Second Base Caller 1416: Final Base Call as a Function (e.g., Average, Maximum, Minimum, or Another Suitable Function) of Classification Information from the Two Base Callers FIG. 19A shows the generation of a final base call for a set of sensor data based on a function of first base call classification information 1434 from the first base caller 1414 and second base call classification information 1436 from the second base caller 1416 of the system 1400 of FIG. 14 .

一実施形態では、各ベースコーラ１４１４、１４１６は、呼び出される塩基がＡ、Ｃ、Ｇ、又はＴである対応する確率を出力する。例えば、第１のベースコーラ１４１４からの第１のベースコール分類情報１４３４を考える。第１のベースコール分類情報１４３４は、呼び出される所与の塩基について、確率又は信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の形態である。ここで、ｐ１（Ａ）は、呼び出される塩基がＡである確率を示す。ｐ１（Ｃ）は、呼び出される塩基がＣである確率を示す。ｐ１（Ｇ）は、呼び出される塩基がＧである確率を示し、ｐ１（Ｔ）は、呼び出される塩基がＴである確率を示す。単に一例として、ｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）がそれぞれ０．６、０．２、０．１５、０．０５の場合、第１のベースコーラ１４１４は、呼び出される塩基がＡである０．６という高い確率を示す。 In one embodiment, each base caller 1414, 1416 outputs a corresponding probability that the called base is A, C, G, or T. For example, consider the first base call classification information 1434 from the first base caller 1414. The first base call classification information 1434 is in the form of a probability or confidence score p1(A), p1(C), p1(G), p1(T) for a given base called, where p1(A) indicates the probability that the called base is A; p1(C) indicates the probability that the called base is C; p1(G) indicates the probability that the called base is G, and p1(T) indicates the probability that the called base is T. As just one example, if p1(A), p1(C), p1(G), and p1(T) are 0.6, 0.2, 0.15, and 0.05, respectively, the first base caller 1414 indicates a high probability of 0.6 that the called base is A.

一例では、ｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の合計は１である。したがって、第１のベースコーラは、一例では、例えば、ソフトマックス関数を使用して、各塩基の正規化された確率を出力する。別の例では、他の技法（例えば、ソフトマックス以外のもの）が使用され得る。例えば、ベースコーラは、ソフトマックスを使用しない出力層を有する。例えば、回帰ベースの演算が使用されてもよく、これは、例えば、クラウドセンタへのユークリッド距離又はマハラノビス距離を使用して、各塩基に対する確率尺度を導出してもよい。 In one example, the sum of p1(A), p1(C), p1(G), and p1(T) is 1. Thus, the first base caller outputs a normalized probability for each base, in one example, using, for example, a softmax function. In another example, other techniques (e.g., other than softmax) may be used. For example, the base caller has an output layer that does not use softmax. For example, a regression-based operation may be used, which may derive a probability measure for each base, for example, using Euclidean or Mahalanobis distance to the crowd center.

同様に、第２のベースコール分類情報１４３６は、呼び出される所与の塩基について、確率又は信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）の形態であり、ｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）の合計は１である。 Similarly, the second base call classification information 1436 is in the form of probabilities or confidence scores p2(A), p2(C), p2(G), p2(T) for a given base being called, where the sum of p2(A), p2(C), p2(G), p2(T) is 1.

一実施形態では、本明細書で上述した確率に加えて、ベースコーラは、対応する呼び出される塩基を出力することもできる。例えば、第１のベースコーラ１４１４は第１の呼び出される塩基を出力し、第２のベースコーラ１４１６は第２の呼び出される塩基を出力する。 In one embodiment, in addition to the probabilities described herein above, the base callers can also output the corresponding called base. For example, the first base caller 1414 outputs a first called base and the second base caller 1416 outputs a second called base.

ベースコールの簡単な規則は以下の通りである。例えば、呼び出される所与の塩基について、第１のベースコーラ１４１４によって出力される第１のベースコール分類情報１４３４が、ｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）であると仮定する（ここで、ｐ１（Ｃ）は、ｐ１（Ａ）、ｐ１（Ｇ）、及びｐ１（Ｔ）の各々よりも大きい）。次に、第１のベースコーラ１４１４は、塩基をＣと呼び出すことができる。別の例では、第１のベースコーラ１４１４は、対応する確率ｐ１（Ｃ）が閾値確率よりも高い場合にのみ、塩基をＣと呼び出すことができる。更に別の例では、ｐ１（Ｃ）＞ｐ１（Ａ）＞ｐ１（Ｔ）及びｐ１（Ｇ）と仮定する。すなわち、ｐ１（Ｃ）が最も高い確率を有し、その次に確率ｐ１（Ａ）が続く。次いで、第１のベースコーラ１４１４は、ｐ１（Ｃ）がｐ１（Ａ）よりも少なくとも閾値だけ高い場合（すなわち、２つの塩基についての確率間の差が少なくとも閾値量である場合）、塩基をＣと呼び出すことができる。本開示の教示に基づいて、ベースコールのための任意の他の適切な規則（複数の場合もある）を想定することができる。第２のベースコーラ１４１６も、それに応じて塩基を呼び出すことができる。 A simple rule for base calling is as follows. For example, assume that for a given base to be called, the first base call classification information 1434 output by the first base caller 1414 is p1(A), p1(C), p1(G), p1(T), where p1(C) is greater than each of p1(A), p1(G), and p1(T). Then, the first base caller 1414 can call the base as C. In another example, the first base caller 1414 can call the base as C only if the corresponding probability p1(C) is higher than a threshold probability. In yet another example, assume p1(C)>p1(A)>p1(T) and p1(G). That is, p1(C) has the highest probability, followed by probability p1(A). The first base caller 1414 can then call the base C if p1(C) is at least a threshold value higher than p1(A) (i.e., the difference between the probabilities for the two bases is at least a threshold amount). Any other suitable rule(s) for base calling can be envisioned based on the teachings of this disclosure. The second base caller 1416 can also call the base accordingly.

再び図１９Ａを参照すると、ベースコール結合モジュール１４２８は、第１のベースコール分類情報１４３４及び第２のベースコール分類情報１４３６、並びにコンテキスト情報１４２０を受信する。コンテキスト情報１４２０に基づいて、ベースコール結合モジュール１４２８が、例えば図１５Ｃの方法１５００ｃに関して論じたように、第１及び第２のベースコール分類情報を結合することを決定すると仮定する。したがって、この例では、ベースコーラ１４１４及び１４１６の両方がセンサデータのセットを処理しており、最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）及び最終の呼び出される塩基は、ベースコーラ１４１４及び１４１６の両方の出力に基づく。 Referring again to FIG. 19A, the base call combine module 1428 receives the first base call classification information 1434 and the second base call classification information 1436, and the context information 1420. Assume that based on the context information 1420, the base call combine module 1428 decides to combine the first and second base call classification information, for example, as discussed with respect to method 1500c of FIG. 15C. Thus, in this example, both base callers 1414 and 1416 are processing the set of sensor data, and the final confidence scores pf(A), pf(C), pf(G), pf(T) and the final called bases are based on the output of both base callers 1414 and 1416.

２つのベースコーラからの分類情報が同意又は一致する場合、２つのベースコーラからの信頼スコアの平均（例えば、算術平均）、最小、最大、又は幾何平均を使用する
更に図１９Ａを参照して、第１のベースコーラ１４１４からの第１のベースコール分類情報１４３４と第２のベースコーラ１４１６からの第２のベースコール分類情報１４３６とが一致するシナリオを仮定する。例えば、第１のベースコーラ１４１４は、ｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の信頼スコアを含む第１のベースコール分類情報１４３４を出力し、単に例として、塩基をＣと呼び出す。また、例えば、第２のベースコーラ１４１６は、ｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）の信頼スコアを含む第２のベースコール分類情報１４３６を出力し、また、単に例として、塩基をＣと呼び出す。したがって、両方のベースコーラからのベースコールは一致し、この例ではＣである。 If the classification information from the two base callers agree or match, use the average (e.g., arithmetic mean), minimum, maximum, or geometric mean of the confidence scores from the two base callers. With further reference to FIG. 19A, assume a scenario in which a first base call classification information 1434 from a first base caller 1414 and a second base call classification information 1436 from a second base caller 1416 match. For example, the first base caller 1414 outputs a first base call classification information 1434 that includes confidence scores for p1(A), p1(C), p1(G), p1(T), and calls the base as C, merely by way of example. Also, for example, the second base caller 1416 outputs a second base call classification information 1436 that includes confidence scores for p2(A), p2(C), p2(G), p2(T), and also calls the base as C, merely by way of example. Thus, the base calls from both base callers match, which is C in this example.

ベースコーラ１４１４及び１４１６の両方からのベースコールが一致するそのようなシナリオでは、最終ベースコール１４４０は、ベースコーラ１４１４及び１４１６によって行われたベースコールと一致する最終の呼び出される塩基を含む。 In such a scenario where the base calls from both base callers 1414 and 1416 match, the final base call 1440 includes the final called base that matches the base calls made by base callers 1414 and 1416.

一実施形態では、最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）は、第１のベースコーラ１４１４によって出力された信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）、及び第２のベースコーラ１４１６によって出力された信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）の適切な関数である。 In one embodiment, the final confidence scores pf(A), pf(C), pf(G), and pf(T) are appropriate functions of the confidence scores p1(A), p1(C), p1(G), and p1(T) output by the first base caller 1414, and the confidence scores p2(A), p2(C), p2(G), and p2(T) output by the second base caller 1416.

例えば、最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）の各々は、第１のベースコーラ１４１４によって出力された信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）のうちの対応する１つと、第２のベースコーラ１４１６によって出力された信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）のうちの対応する１つとの平均又は算術平均であり得る。したがって、両方のベースコーラ１４１４及び１４１６が検討中の塩基をＣと呼び出した場合、ベースコール結合モジュール１４２８は、最終の呼び出される塩基をＣと出力し、最終信頼スコアを以下のように出力する。 For example, each of the final confidence scores pf(A), pf(C), pf(G), pf(T) may be the average or arithmetic mean of a corresponding one of the confidence scores p1(A), p1(C), p1(G), p1(T) output by the first base caller 1414 and a corresponding one of the confidence scores p2(A), p2(C), p2(G), p2(T) output by the second base caller 1416. Thus, if both base callers 1414 and 1416 call the base under consideration as C, the base call combination module 1428 outputs the final called base as C and outputs a final confidence score as follows:

別の例では、平均又は算術平均の代わりに、別の数学的関数（幾何平均など）を使用することができる。例えば、幾何平均が使用される場合、式１は次のように書き直すことができる。 In another example, instead of the average or arithmetic mean, another mathematical function (such as the geometric mean) can be used. For example, if the geometric mean is used, Equation 1 can be rewritten as follows:

別の例では、ベースコールシステム１４００が保守的スコアを報告したい場合、最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）の各々は、第１のベースコーラ１４１４によって出力された信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）のうちの対応する１つと、第２のベースコーラ１４１６によって出力された信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）のうちの対応する１つとの最小値であり得る（例えば、２つのベースコーラのベースコールが一致すると仮定して）。したがって、両方のベースコーラ１４１４及び１４１６が検討中の塩基をＣと呼び出した場合、ベースコール結合モジュール１４２８は、最終の呼び出される塩基をＣと出力し、最終信頼スコアを以下のように出力する。 In another example, if the base calling system 1400 wishes to report conservative scores, each of the final confidence scores pf(A), pf(C), pf(G), pf(T) may be the minimum of a corresponding one of the confidence scores p1(A), p1(C), p1(G), p1(T) output by the first base caller 1414 and a corresponding one of the confidence scores p2(A), p2(C), p2(G), p2(T) output by the second base caller 1416 (e.g., assuming that the base calls of the two base callers match). Thus, if both base callers 1414 and 1416 call the base under consideration as C, the base call combination module 1428 outputs the final called base as C and outputs the final confidence score as follows:

更に別の例では、ベースコールシステム１４００が高い信頼スコアを報告したい場合、最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）の各々は、第１のベースコーラ１４１４によって出力された信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）のうちの対応する１つと、第２のベースコーラ１４１６によって出力された信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）のうちの対応する１つとの最大値であり得る（例えば、２つのベースコーラのベースコールが一致すると仮定して）。したがって、両方のベースコーラ１４１４及び１４１６が検討中の塩基をＣと呼び出した場合、ベースコール結合モジュール１４２８は、最終の呼び出される塩基をＣと出力し、最終信頼スコアを以下のように出力する。 In yet another example, if the base calling system 1400 wishes to report a high confidence score, each of the final confidence scores pf(A), pf(C), pf(G), pf(T) may be the maximum of a corresponding one of the confidence scores p1(A), p1(C), p1(G), p1(T) output by the first base caller 1414 and a corresponding one of the confidence scores p2(A), p2(C), p2(G), p2(T) output by the second base caller 1416 (e.g., assuming that the base calls of the two base callers match). Thus, if both base callers 1414 and 1416 call the base under consideration as C, the base call combination module 1428 outputs the final called base as C and outputs the final confidence score as follows:

更に別の例では、ベースコールシステム１４００が重み付けされた信頼スコアを報告したい場合、最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）の各々は、第１のベースコーラ１４１４によって出力された信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）のうちの対応する１つと、第２のベースコーラ１４１６によって出力された信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）のうちの対応する１つとの正規化された加重和であり得る（例えば、２つのベースコーラのベースコールが一致すると仮定して）。したがって、両方のベースコーラ１４１４及び１４１６が検討中の塩基をＣと呼び出した場合、ベースコール結合モジュール１４２８は、最終の呼び出される塩基をＣと出力し、最終信頼スコアを以下のように出力する。 In yet another example, if the base calling system 1400 wishes to report weighted confidence scores, each of the final confidence scores pf(A), pf(C), pf(G), pf(T) may be a normalized weighted sum of a corresponding one of the confidence scores p1(A), p1(C), p1(G), p1(T) output by the first base caller 1414 and a corresponding one of the confidence scores p2(A), p2(C), p2(G), p2(T) output by the second base caller 1416 (e.g., assuming that the base calls of the two base callers match). Thus, if both base callers 1414 and 1416 call the base under consideration as C, the base call combination module 1428 outputs the final called base as C and outputs the final confidence score as follows:

一例では、式４における重みＡ１及びＡ２は、Ａ１＋Ａ２＝１となるように、予め指定された固定の重みである。一例では、重みＡ１及びＡ２は、例えば訓練データに基づいて、訓練プロセス中に調整又は更新される。 In one example, weights A1 and A2 in Equation 4 are pre-specified fixed weights such that A1+A2=1. In one example, weights A1 and A2 are adjusted or updated during the training process, for example, based on training data.

センサデータに関連付けられた時間的コンテキスト（例えば、ベースコールサイクル数）に基づく、２つのベースコーラからの信頼スコアの正規化された比率
一例では、ベースコールシステム１４００は、第１のベースコーラ１４１４によって出力された信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の各々と、第２のベースコーラ１４１６によって出力された信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）のうちの対応するものとの加重平均となるように最終信頼スコアｐｆ（Ａ）、ｐｆ（Ｃ）、ｐｆ（Ｇ）、ｐｆ（Ｔ）の各々を生成し（例えば、２つのベースコーラのベースコールが一致すると仮定する）、重みはコンテキスト情報１４２０に基づく。すなわち、コンテキスト情報１４２０は、２つのベースコーラからの信頼スコアの個々のスコアに与えられる重みを指示する。 Normalized ratio of confidence scores from two base callers based on temporal context (e.g., number of base call cycles) associated with the sensor data In one example, the base calling system 1400 generates each of the final confidence scores pf(A), pf(C), pf(G), pf(T) to be a weighted average of each of the confidence scores p1(A), p1(C), p1(G), p1(T) output by the first base caller 1414 and a corresponding one of the confidence scores p2(A), p2(C), p2(G), p2(T) output by the second base caller 1416 (e.g., assuming the base calls of the two base callers match), with the weights based on the context information 1420. That is, the context information 1420 dictates the weights given to the individual confidence scores from the two base callers.

図１９Ａ１は、時間的コンテキスト情報１６０６（図１６参照）に基づいて、最終信頼スコアのために使用される例示的な重み付け方式を示すルックアップテーブル（ＬＵＴ）１９０１を示す。ＬＵＴ１９１０に含まれる実際の重み付けは単なる例であり、限定されない。 FIG. 19A1 illustrates a look-up table (LUT) 1901 showing an exemplary weighting scheme used for the final confidence score based on the temporal context information 1606 (see FIG. 16). The actual weightings included in the LUT 1910 are merely examples and are not limiting.

図１７Ｃ及び図１７Ｄに関して論じられるフェージング、プレフェージング、及びフェーディングに起因して、ベースコーラ１４１４及び１４１６の両方の性能は、初期ベースコールサイクルの間、同等であることが観察されている（例えば、初期ベースコールサイクルの間、比較的より良好な信号品質及びより少ない雑音を示す、図１７Ｄ参照）。後のベースコールサイクル中、第１のベースコーラ１４１４は、後のベースコールサイクル中の信号劣化を処理するためにより良く装備され得るので、第１のベースコーラ１４１４は、第２のベースコーラ１４１６よりも性能が優れている。しかしながら、第１のベースコーラ１４１４は、第２のベースコーラ１４１６の動作と比較して、動作するのに計算集約的であり得る。 Due to the phasing, prephasing, and fading discussed with respect to Figures 17C and 17D, the performance of both base callers 1414 and 1416 has been observed to be comparable during the initial base calling cycles (e.g., showing relatively better signal quality and less noise during the initial base calling cycles, see Figure 17D). During the later base calling cycles, the first base caller 1414 outperforms the second base caller 1416 because the first base caller 1414 may be better equipped to handle signal degradation during the later base calling cycles. However, the first base caller 1414 may be computationally intensive to operate compared to the operation of the second base caller 1416.

したがって、一例では、図１９Ａ１に示されるように、第２のベースコーラ１４１６からの信頼スコアは、ベースコールサイクルの初期閾値数の間、第１のベースコーラ１４１４からの信頼スコアよりも強調される。ベースコールサイクルが進行するにつれて、第１のベースコーラ１４１４からの信頼スコアがより強調される（例えば、第１のベースコーラ１４１４が後のサイクル中に第２のベースコーラ１４１４より性能が優れているため）。 Thus, in one example, as shown in FIG. 19A1, confidence scores from the second base caller 1416 are emphasized over confidence scores from the first base caller 1414 for an initial threshold number of base calling cycles. As the base calling cycles progress, confidence scores from the first base caller 1414 are emphasized more (e.g., because the first base caller 1414 outperforms the second base caller 1414 during later cycles).

具体的には、ＬＵＴ１９０１の第１の行を参照して、Ｎ個のベースコールサイクルがあると仮定する。ベースコールサイクル１～Ｎ１（すなわち、最初のＮ１回のベースコールサイクル）の場合、高い（例えば、９０～１００％）重み付けが、第２のベースコーラ１４１６からの信頼スコアに与えられ、低い（例えば、０～１０％）重み付けが、第１のベースコーラ１４１４からの信頼スコアに与えられる。したがって、ベースコールサイクル１～Ｎ１の間、第１のベースコーラ１４１４は、無効化され得るか、又は動作不能であり得る。第１のベースコーラ１４１４は、（例えば、第２のベースコーラ１４１６の動作と比較して）動作するのに計算集約的であるので、これは計算効率を高める。論じたように、最初のＮ１サイクルの間、両方のベースコーラは同等の性能を有し、したがって、ベースコールの質の劣化は観察されない。 Specifically, with reference to the first row of LUT 1901, assume there are N base calling cycles. For base calling cycles 1-N1 (i.e., the first N1 base calling cycles), a high (e.g., 90-100%) weighting is given to the confidence score from the second base caller 1416 and a low (e.g., 0-10%) weighting is given to the confidence score from the first base caller 1414. Thus, during base calling cycles 1-N1, the first base caller 1414 may be disabled or inoperable. This increases computational efficiency, since the first base caller 1414 is computationally intensive to operate (e.g., compared to the operation of the second base caller 1416). As discussed, during the first N1 cycles, both base callers have comparable performance, and therefore no degradation in the quality of the base calls is observed.

ここで、Ｎ１は、１とＮ２との間の適切な数のベースコールサイクルである（後述）。単に一例として、Ｎ１は、１００、１５０、２００、２５０又は他の適切な数のベースコールサイクルであり得る。Ｎ１は、両方のベースコーラが適度に同等な質のベースコールを提供する初期ベースコールサイクルの数であると決定され得る。 where N1 is a suitable number of base calling cycles between 1 and N2 (see below). By way of example only, N1 can be 100, 150, 200, 250, or any other suitable number of base calling cycles. N1 can be determined to be the number of initial base calling cycles at which both base callers provide base calls of reasonably equal quality.

したがって、例えば、サイクル１とＮ１との間のベースコールサイクルに対して、最終ベースコールｐｆは、（例えば、第２のベースコーラに与えられた１００％重み付けを仮定して）、以下によって与えられる。 Thus, for example, for base calling cycles between cycle 1 and N1, the final base call pf is given (e.g., assuming 100% weighting given to the second base caller) by:

論じたように、第１のベースコーラ１４１４は、少なくとも最初のＮ１サイクルの間、無効化することができる（すなわち、データの対応するセットに対して動作しない）。第１のベースコーラ１４１４は、サイクル（Ｎ１＋１）以降（ＬＵＴ１９０１の第２の行を参照）のデータの対応するセットに対して動作することに留意されたい。サイクル（Ｎ１＋１）及び後続のサイクルに対するベースコールを満足に生成するために、第１のベースコーラ１４１４はまた、サイクル（Ｎ１＋１）の少なくとも数サイクル前に動作しなければならない。例えば、なぜなら、図７及び図１０に関して論じたように、現在のサイクルに対するベースコールはまた、１つ以上の過去のサイクル及び１つ以上の将来のサイクルからのデータに基づくからである（更なる説明については図１５Ｅに関する議論も参照されたい）。したがって、第１のベースコーラ１４１４は、サイクル１とサイクル（Ｎ１－Ｔ）との間のデータの対応するセットに対して非動作であり、サイクル（Ｎ１－Ｔ＋１）から動作可能であり得る。ここでＴはサイクルの閾値数であり、サイクル（Ｎ１＋１）からベースコールを開始するために必要なデータである。 As discussed, the first base caller 1414 may be disabled (i.e., not operational on the corresponding set of data) for at least the first N1 cycles. Note that the first base caller 1414 operates on the corresponding set of data from cycle (N1+1) onwards (see the second row of LUT 1901). To satisfactorily generate base calls for cycle (N1+1) and subsequent cycles, the first base caller 1414 must also operate at least a few cycles prior to cycle (N1+1). For example, because, as discussed with respect to Figures 7 and 10, the base calls for the current cycle are also based on data from one or more past cycles and one or more future cycles (see also the discussion with respect to Figure 15E for further explanation). Thus, the first base caller 1414 may be non-operative on the corresponding set of data between cycle 1 and cycle (N1-T) and operational from cycle (N1-T+1). Here, T is the threshold number of cycles, the data required to start base calling from cycle (N1+1).

ここでＬＵＴ１９０１の第２の行を参照すると、一例では、ベースコールサイクル（Ｎ１＋１）～Ｎ２に対して、第１の重み付けが、第２のベースコーラ１４１６からの信頼スコアに与えられ、第２の重み付けが、第１のベースコーラ１４１４からの信頼スコアに与えられる。図１９Ａ１の例では、第１及び第２の重み付けは両方とも、単なる例として、約５０％などの中程度の重みである。したがって、これらのサイクル中に、ベースコーラ１４１４及び１４１６の両方が動作する。したがって、例えば、サイクルＮ１＋１とＮ２との間のベースコールサイクルについて、最終スコアｐｆは、以下の式によって与えられる。 Now referring to the second row of LUT 1901, in one example, for base calling cycles (N1+1) to N2, a first weighting is given to the confidence score from the second base caller 1416 and a second weighting is given to the confidence score from the first base caller 1414. In the example of FIG. 19A1, the first and second weightings are both medium weights, such as about 50%, by way of example only. Thus, during these cycles, both base callers 1414 and 1416 are operational. Thus, for example, for base calling cycles between cycles N1+1 and N2, the final score pf is given by the following formula:

ここで、ＬＵＴ１９０１の第３の行を参照すると、一例では、ベースコールサイクル（Ｎ２＋１）～Ｎに対して、低い（例えば、０％）重み付けが、第２のベースコーラ１４１６からの信頼スコアに与えられ、高い（例えば、１００％）重み付けが、第１のベースコーラ１４１４からの信頼スコアに与えられる。これは、本明細書で論じるように、後のベースコールサイクル中に、第１のベースコーラ１４１４が、後のベースコールサイクル中の信号劣化（図１７Ｃ、図１７Ｄ参照）を処理するためにより良く装備され得るため、第１のベースコーラ１４１４が、第２のベースコーラ１４１６より性能が優れているからである。 Now, referring to the third row of LUT 1901, in one example, for base calling cycles (N2+1) through N, a low (e.g., 0%) weighting is given to the confidence score from the second base caller 1416 and a high (e.g., 100%) weighting is given to the confidence score from the first base caller 1414. This is because, as discussed herein, during later base calling cycles, the first base caller 1414 outperforms the second base caller 1416 because the first base caller 1414 may be better equipped to handle signal degradation during later base calling cycles (see Figures 17C, 17D).

したがって、例えば、サイクル（Ｎ２＋１）とＮとの間のベースコールサイクルについて、最終スコアｐｆは、以下の式によって与えられる。 Thus, for example, for a base call cycle between cycle (N2+1) and N, the final score pf is given by the following formula:

一例では、ＬＵＴ１９０１（又は本明細書で論じられる任意の他のＬＵＴ）は、システム１４００のメモリ（メモリは図１４に示されていない）に保存され得る。スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８は、メモリからＬＵＴ１９０１にアクセスし、現在のベースコールサイクル数を示すコンテキスト情報１４２０（例えば、時間的コンテキスト情報）を受信する。時間的コンテキスト情報に基づいて、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８は、ＬＵＴ１９０１の適切な行を選択し、選択された行において指定された重みに従って動作する。 In one example, the LUT 1901 (or any other LUT discussed herein) may be stored in a memory of the system 1400 (memory not shown in FIG. 14). The switching module 1422 and/or the base call combining module 1428 access the LUT 1901 from the memory and receive context information 1420 (e.g., temporal context information) indicating the current base call cycle number. Based on the temporal context information, the switching module 1422 and/or the base call combining module 1428 selects an appropriate row of the LUT 1901 and operates according to the weights specified in the selected row.

特別な塩基配列を示す塩基配列コンテキスト情報に基づく信頼スコア補正（２つのベースコーラからのベースコールが一致する）
２つのベースコーラからのベースコールが一致し、塩基配列コンテキスト情報が、ベースコーラのいずれかから呼び出される塩基が、ホモポリマー（例えば、ＧＧＧＧＧ）、隣接ホモポリマーを有する配列（例えば、ＧＧＴＧＧ）、近ホモポリマー、又は別の特別な塩基配列などの特別な塩基配列を含むことを示すシナリオを仮定する。一例では、５つの連続した最終ベースコールが、ベースコール結合モジュール１４２８（又はベースコーラ１４１４、１４１６のいずれか）によって行われ、５つの連続した最終ベースコールは、特別な塩基配列を含むことになる。本明細書で前述したように（図１８を参照されたい）、そのような特別な塩基配列に対するエラーの確率は、より高い可能性がある。したがって、システム１４００は、そのような塩基配列の塩基に関連付けられた信頼スコアを場合によっては修正するための特別な措置を講じることができる。ここでも、本開示で論じられる特別な塩基配列（ホモポリマー、近ホモポリマー、又は隣接ホモポリマーを有する配列など）のいくつかの例は、５つの塩基を有することに留意されたい。しかしながら、そのような特別な塩基配列には、３、５、６、７、９、又は別の適切な数など、任意の異なる数の塩基が存在し得る。 Confidence score correction based on sequence context information that indicates a particular base call (a base call from two base calls matches)
Assume a scenario in which base calls from two base callers match and the base sequence context information indicates that a base called from either of the base callers contains a special base sequence, such as a homopolymer (e.g., GGGGG), a sequence with adjacent homopolymers (e.g., GGTGG), a near homopolymer, or another special base sequence. In one example, five consecutive final base calls are made by the base call combination module 1428 (or either of the base callers 1414, 1416), and the five consecutive final base calls contain the special base sequence. As previously described herein (see FIG. 18), the probability of error for such a special base sequence may be higher. Thus, the system 1400 may take special measures to potentially modify the confidence scores associated with the bases of such a base sequence. Again, it is noted that some examples of special base sequences (such as sequences with homopolymers, near homopolymers, or adjacent homopolymers) discussed in this disclosure have five bases. However, there may be any different number of bases in such a particular base sequence, such as 3, 5, 6, 7, 9, or another suitable number.

図１９Ｂは、呼び出される塩基が特別な塩基配列を含む場合に使用されるベースコーラを示すＬＵＴ１９０５を示す。ＬＵＴ１９０５において、文字「Ｘ」は、Ａ、Ｃ、Ｔ、又はＧなどの任意の塩基を示す。したがって、ＬＵＴ１９０５に含まれる塩基配列のいずれか（ＧＧＸＧＧ、ＧＸＧＧＧ、ＧＧＧＸＧ、ＧＸＸＧＧ、ＧＧＸＸＧなど）について、例えば、第２のベースコーラ１４１６からの信頼スコアは、最終信頼スコアを決定するために使用される。これは、発明者らの実験により、ＬＵＴ１９０５に示される特別な塩基配列のいずれかに遭遇した場合に、第２のベースコーラ１４１６が第１のベースコーラ１４１４よりも性能が優れていると判断されたからである。 Figure 19B shows a LUT 1905 indicating the base callers to be used when the base being called contains a special base sequence. In the LUT 1905, the letter "X" indicates any base, such as A, C, T, or G. Thus, for any of the base sequences contained in the LUT 1905 (GGXGG, GXGG, GGGXG, GXXGG, GGXXG, etc.), the confidence score from, for example, the second base caller 1416 is used to determine the final confidence score. This is because the inventors' experiments have determined that the second base caller 1416 outperforms the first base caller 1414 when any of the special base sequences shown in the LUT 1905 are encountered.

したがって、ベースコール結合モジュール１４２８によって行われた５つの連続する最終ベースコールがＬＵＴ１９０５の特別な塩基配列のいずれかである場合、ベースコール結合モジュール１４２８は、５つの塩基のそれぞれ（又は少なくともいくつか、例えば中央のもの）に関連付けられた信頼スコアを、５つの塩基について第２のベースコーラ１４１６によって出力された信頼スコアに対応するように修正する。 Thus, if five consecutive final base calls made by the base call combining module 1428 are any of the special base sequences in LUT 1905, the base call combining module 1428 modifies the confidence scores associated with each of the five bases (or at least some, e.g., the middle one) to correspond to the confidence scores output by the second base caller 1416 for the five bases.

一例では、ＬＵＴ１９０５（又は本明細書で論じられる任意の他のＬＵＴ）は、システム１４００のメモリ（メモリは図１４に示されていない）に保存され得る。スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８は、メモリからＬＵＴにアクセスし、コンテキスト情報１４２０を受信する。コンテキスト情報に基づいて、スイッチングモジュール１４２２及び／又はベースコール結合モジュール１４２８は、ＬＵＴの適切な行を選択し、選択された行において指定されたベースコール動作に従って動作する。別段の指定がない限り、これは、本開示で論じる全てのＬＵＴに適用される。 In one example, the LUT 1905 (or any other LUT discussed herein) may be stored in a memory of the system 1400 (memory not shown in FIG. 14). The switching module 1422 and/or the base call combination module 1428 access the LUT from the memory and receive the context information 1420. Based on the context information, the switching module 1422 and/or the base call combination module 1428 selects the appropriate row of the LUT and operates according to the base calling operation specified in the selected row. Unless otherwise specified, this applies to all LUTs discussed in this disclosure.

図１９Ｃは、呼び出される塩基が特別な塩基配列を含む場合に、個々のベースコーラの信頼スコアに与えられる重み付けを示すＬＵＴ１９１０を示す。ＬＵＴ１９１０に含まれる実際の重み付けは単なる例であり、本開示の範囲を限定しないことに留意されたい。例えば、ＬＵＴ１９１０の第１の行を参照すると、特別な配列ＧＧＸＧＧに遭遇する場合、６０％の重み付けが、第２のベースコーラ１４１６からの信頼スコアに与えられてもよく、４０％の重み付けが、第１のベースコーラ１４１４からの信頼スコアに与えられてもよい。例えば、「Ｘ」によって示される配列の中央の（すなわち、３^ｒｄ）塩基がＴであり、第１のベースコーラ１４１４がｐ１（Ｔ）の信頼スコアを示し、第２のベースコーラ１４１４がｐ２（Ｔ）の信頼スコアを示すと仮定する。そのような例では、最終の呼び出される塩基はＴであり、配列の中央の（すなわち、３^ｒｄ）塩基に対する最終信頼スコアは以下の通りである。 FIG. 19C illustrates a LUT 1910 showing the weighting given to the confidence scores of the individual base callers when the called bases include a special base sequence. Note that the actual weightings included in the LUT 1910 are merely examples and do not limit the scope of the present disclosure. For example, referring to the first row of the LUT 1910, when the special sequence GGXGG is encountered, a weighting of 60% may be given to the confidence score from the second base caller 1416 and a weighting of 40% may be given to the confidence score from the first base caller 1414. For example, assume that the middle (i.e., ^3rd ) base of the sequence, indicated by "X", is T, the first base caller 1414 indicates a confidence score of p1(T) and the second base caller 1414 indicates a confidence score of p2(T). In such an example, the final called base is T and the final confidence scores for the middle (i.e., ^3rd ) base of the sequence are as follows:

一例では、ＬＵＴ１９１０内の重みは、試験及び較正を通じて経験的に決定することができる。 In one example, the weights in LUT 1910 can be determined empirically through testing and calibration.

ＬＵＴ１９１０の他の行も、検出された塩基配列に基づく重みを有することができる。これらの重みは事前に指定することができ、これらの重み付けの最適値は、試験及び較正を介して決定することができる。 The other rows of the LUT 1910 can also have weights based on the base sequences detected. These weights can be specified in advance, and optimal values for these weights can be determined through testing and calibration.

ＬＵＴ１９１０において指定される例示的な重みの全てにおいて、第２のベースコーラ１４１６に対する重みは、第１のベースコーラ１４１４に対する重みよりも高いことに留意されたい。これは、本明細書で上述したように、いくつかの例では、第２のベースコーラ１４１６が、ＬＵＴに示された特別な塩基配列のいずれかに遭遇したときに第１のベースコーラ１４１４よりも性能が優れている場合があるためである。 Note that in all of the example weights specified in the LUT 1910, the weight for the second base caller 1416 is higher than the weight for the first base caller 1414. This is because, as described above in this specification, in some instances, the second base caller 1416 may outperform the first base caller 1414 when encountering any of the special base sequences shown in the LUT.

クラスタにおける気泡検出に基づく最終分類情報の生成
本明細書で前述したように、ベースコール動作の１つ以上の配列中に、１つ以上のクラスタにわたって気泡が形成されることがある。そのような気泡は、クラスタ中に存在する任意の液体中の気体（空気など）の小球（ベースコールに使用される試薬内の気泡など）であり得る。気泡の存在は、影響を受けたクラスタ（複数の場合もある）から捕捉された画像を分析することに基づいて検出され得る。例えば、クラスタ内の気泡の存在は、クラスタの捕捉された画像内の固有の強度信号シグネチャを検出することによって推定することができる。一例では、他のコンテキスト情報１６１０は、クラスタからのセンサデータのセット１６０１がそのような気泡に関連付けられているかどうかを示し得る。 Generating Final Classification Information Based on Bubble Detection in Clusters As previously described herein, during one or more sequences of base calling operations, bubbles may form across one or more clusters. Such bubbles may be globules of gas (such as air) in any liquid present in the cluster (such as bubbles in the reagents used for base calling). The presence of bubbles may be detected based on analyzing images captured from the affected cluster(s). For example, the presence of a bubble in a cluster may be estimated by detecting a unique intensity signal signature in the captured image of the cluster. In one example, other context information 1610 may indicate whether a set of sensor data 1601 from a cluster is associated with such a bubble.

図１９Ｄは、フローセルのクラスタ内の１つ以上の気泡の検出を考慮して、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ１９１５を示す。例えば、ＬＵＴ１９１５の第１の行を参照すると、気泡がフローセル内で検出されない場合、最終ベースコールは、例えば、通常、本開示において本明細書で論じられる任意の適切な動作方式に従って、ベースコール結合モジュール１４２８によって実行される。 Figure 19D illustrates a LUT 1915 illustrating the operation of the base call combination module 1428 of Figure 14 given the detection of one or more bubbles in a cluster of flow cells. For example, referring to the first row of LUT 1915, if no bubbles are detected in the flow cells, a final base call is performed by the base call combination module 1428, for example, generally according to any suitable mode of operation discussed herein in this disclosure.

ＬＵＴ１９１５の第２の行を参照して、１つ以上の気泡がフローセルのクラスタ内で検出されるシナリオを論じる。一般に、第１のベースコーラ１４１４は、そのような気泡を含むクラスタに対するベースコールを処理するようにより良く装備されている。したがって、一実施形態では、他のコンテキスト情報１６１０（図１６参照）がクラスタ内の気泡の存在を示すことに応答して、ベースコール結合モジュール１４２８は、第１のベースコーラ１４１４からの信頼スコアに対する比較的高い重み付け（例えば、９０～１００％の重み付け）、及び第２のベースコーラ１４１６からの信頼スコアに対する比較的低い重み付け（例えば、０～１０％の重み付け）をする。 Referring to the second row of LUT 1915, we discuss a scenario in which one or more bubbles are detected in a cluster of flow cells. In general, the first base caller 1414 is better equipped to process base calls for clusters that contain such bubbles. Thus, in one embodiment, in response to other context information 1610 (see FIG. 16) indicating the presence of a bubble in a cluster, the base call combination module 1428 assigns a relatively high weighting to the confidence scores from the first base caller 1414 (e.g., a weighting between 90-100%) and a relatively low weighting to the confidence scores from the second base caller 1416 (e.g., a weighting between 0-10%).

フローセルのタイルは複数のクラスタから構成されており、気泡は例えばタイルの単一のクラスタで検出されることに留意されたい。したがって、単一のクラスタからのセンサデータは、主に、ＬＵＴ１９１５の第２の行に従って第１のベースコーラ１４１４によって処理され、タイルの他のクラスタからのセンサデータは、ＬＵＴ１９１５の第１の行に従って第１のベースコーラ１４１４及び／又は第２のベースコーラ１４１６によって処理される。 Note that a tile of the flow cell is composed of multiple clusters, and an air bubble, for example, may be detected in a single cluster of the tile. Thus, sensor data from the single cluster is primarily processed by the first base coller 1414 according to the second row of the LUT 1915, and sensor data from other clusters of the tile is processed by the first base coller 1414 and/or the second base coller 1416 according to the first row of the LUT 1915.

ベースコールサイクル１～Ｎａについてクラスタ中に気泡が検出されず、サイクル（Ｎａ＋１）でクラスタは気泡を含むことが検出されたと仮定する。したがって、ベースコールサイクル（Ｎａ＋１）以降、第１のベースコーラ１４１４は、ＬＵＴ１９１５の第２の行に従って、クラスタからのセンサデータを処理することになる。しかしながら、サイクル（Ｎａ＋１）の前に（すなわち、サイクル１からＮａまで）、第１のベースコーラ１４１４は、そのクラスタからのセンサデータに対して動作しておらず、第２のベースコーラ１４１６は、そのクラスタからのセンサデータに対して動作していたと仮定する。しかし、第１のベースコーラ１４１４がクラスタ（Ｎａ＋１）から塩基の呼び出しを開始するためには、第１のベースコーラ１４１４は、過去の数サイクルを処理する必要がある。（なぜなら、例えば、図７及び図１０に関して論じたように、現在のサイクルに対するベースコールも１つ以上の過去のサイクル及び１つ以上の将来のサイクルからのデータに基づいているためであり、また、図１５Ｅに関する議論を参照）。したがって、サイクル（Ｎａ＋１）における気泡の存在を示すコンテキスト情報に応答して、第１のベースコーラは、サイクル（Ｎａ＋１）の前に生じる数サイクルの間のセンサデータを処理し（例えば、サイクルＮａ、サイクル（Ｎａ－１）、サイクル（Ｎａ－２）、．．．、（Ｎａ－Ｔ）の間のセンサデータを処理する）、そのような過去のサイクルの処理に基づいて、ここで、サイクルを（Ｎａ＋１）で処理し、ベースコールするために準備される。Ｔは、現在のベースコールサイクルについてセンサデータを正しく処理するために、第１のベースコーラが処理しなければならない過去のベースコールサイクルの閾値数である。 Assume that no bubbles were detected in the cluster for base calling cycles 1 through Na, and that in cycle (Na+1) the cluster was detected to contain a bubble. Thus, from base calling cycle (Na+1) onwards, the first base caller 1414 will process the sensor data from the cluster according to the second row of the LUT 1915. However, assume that prior to cycle (Na+1) (i.e., from cycles 1 through Na), the first base caller 1414 had not operated on sensor data from that cluster, and the second base caller 1416 had operated on sensor data from that cluster. However, in order for the first base caller 1414 to start calling bases from cluster (Na+1), the first base caller 1414 needs to process several past cycles (because, for example, as discussed with respect to Figures 7 and 10, the base calls for the current cycle are also based on data from one or more past cycles and one or more future cycles; see also the discussion with respect to Figure 15E). Thus, in response to context information indicating the presence of a bubble in cycle (Na+1), the first base caller processes sensor data for several cycles occurring prior to cycle (Na+1) (e.g., processes sensor data for cycles Na, (Na-1), (Na-2), ..., (Na-T)) and is prepared to process and base call cycle (Na+1) now based on the processing of such past cycles, where T is a threshold number of past base calling cycles that the first base caller must process in order to correctly process the sensor data for the current base calling cycle.

クラスタにおける焦点外の偶数検出に基づく最終分類情報の生成
本明細書で前述したように、フローセル１４０５は、様々なクラスタの画像を捕捉するためのレンズ（マイクロレンズ又は他の光学構成要素のアレイを含むフィルタ層１２４など）を含むことができる。一例では、例えば画像センサ又はカメラがフローセルの周りを移動するにつれて、画像を捕捉するときに、様々なクラスタに対する焦点合わせにわずかな差があり得る。例えば、エッジクラスタ１４０７ａは、クラスタの画像が捕捉されるときに、非エッジクラスタ１４０７ｂに対してわずかに焦点がずれている可能性がある。焦点がずれる事象はまた、レンズの移動によって引き起こされる加熱又は機械的振動に起因して生じ得る。 Generating Final Classification Information Based on Out-of-Focus Even Detection in Clusters As previously described herein, the flow cell 1405 may include lenses (such as a filter layer 124 including an array of microlenses or other optical components) for capturing images of the various clusters. In one example, there may be slight differences in focus for the various clusters when capturing images, for example as an image sensor or camera moves around the flow cell. For example, an edge cluster 1407a may be slightly out of focus relative to a non-edge cluster 1407b when an image of the cluster is captured. Out-of-focus events may also occur due to heating or mechanical vibrations caused by lens movement.

図１９Ｄ１は、フローセルのクラスタからの焦点外画像（複数の場合もある）の検出を考慮した、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ１９１７を示す。例えば、ＬＵＴ１９１７の第１の行を参照すると、焦点外画像がフローセルに対して検出されない場合、最終ベースコールは、通常、例えば、本開示において本明細書で論じられる任意の適切な動作方式に従って、ベースコール結合モジュール１４２８によって実行される。 FIG. 19D1 illustrates a LUT 1917 illustrating the operation of the base call combination module 1428 of FIG. 14 given the detection of out-of-focus image(s) from a cluster of flow cells. For example, referring to the first row of LUT 1917, if no out-of-focus images are detected for a flow cell, the final base call is performed by the base call combination module 1428 normally, e.g., according to any suitable operating scheme discussed herein in this disclosure.

ＬＵＴ１９１７の第２の行を参照して、焦点外画像（複数の場合もある）がフローセルの１つ以上のクラスタから検出されるシナリオを論じる。一般に、第１のベースコーラ１４１４は、そのような焦点外画像を生成するクラスタに対するベースコールを処理するようにより良く装備されている。したがって、一実施形態では、クラスタからの焦点外画像の存在を示す他のコンテキスト情報１６１０（図１６参照）に応答して、ベースコール結合モジュール１４２８は、第１のベースコーラ１４１４からの信頼スコアに対する比較的高い重み付け（例えば、９０～１００％の重み付け）、及び第２のベースコーラ１４１６からの信頼スコアに対する比較的低い重み付け（例えば、０～１０％の重み付け）をする。 Referring to the second row of LUT 1917, we discuss a scenario in which an out-of-focus image(s) is detected from one or more clusters of the flow cell. In general, the first base caller 1414 is better equipped to process base calls for clusters that generate such out-of-focus images. Thus, in one embodiment, in response to other context information 1610 (see FIG. 16) indicating the presence of an out-of-focus image from a cluster, the base call combination module 1428 assigns a relatively high weighting (e.g., a weighting between 90-100%) to the confidence scores from the first base caller 1414 and a relatively low weighting (e.g., a weighting between 0-10%) to the confidence scores from the second base caller 1416.

フローセルのタイルは、複数のクラスタを含み、焦点外画像は、例えば、タイルの単一のクラスタ又は数個のクラスタ（しかし、クラスタの全て又は大部分ではない）において検出され得ることに留意されたい。したがって、焦点外画像を有する１つ以上のクラスタからのセンサデータは、主に、ＬＵＴ１９１５の第２の行に従って第１のベースコーラ１４１４によって処理され、タイルの他のクラスタからのセンサデータは、ＬＵＴ１９１５の第１の行に従って第１のベースコーラ１４１４及び／又は第２のベースコーラ１４１６によって処理される。 Note that a tile of a flow cell may contain multiple clusters, and out-of-focus images may be detected, for example, in a single cluster or a few clusters of the tile (but not all or most of the clusters). Thus, sensor data from one or more clusters having out-of-focus images is primarily processed by the first base coller 1414 according to the second row of the LUT 1915, and sensor data from other clusters of the tile is processed by the first base coller 1414 and/or the second base coller 1416 according to the first row of the LUT 1915.

使用された試薬に基づく、２つのベースコーラからの信頼スコアの正規化された比率
試薬は、本明細書で前述したように、ベースコールにおいて主要な役割を果たす。単なる例として、第１の群の試薬が使用される場合、第１のベースコーラ１４１４は、第２のベースコーラ１４１６よりも適している可能性があり、第２の群の試薬が使用される場合、第１のベースコーラ１４１４は、第２のベースコーラ１４１６よりも適していない可能性がある。一実施形態では、コンテキスト情報１６０１は、使用される試薬のタイプを示し、コンテキスト情報生成モジュール１４１８は、最終信頼スコアを決定するために、２つのベースコーラからの信頼スコアに対する正規化された重み付けを指定することができる。 Normalized ratio of confidence scores from two base callers based on reagents used Reagents play a major role in base calling, as previously described herein. By way of example only, if a first group of reagents is used, the first base caller 1414 may be more suitable than the second base caller 1416, and if a second group of reagents is used, the first base caller 1414 may be less suitable than the second base caller 1416. In one embodiment, the context information 1601 indicates the type of reagents used, and the context information generation module 1418 can specify a normalized weighting for the confidence scores from the two base callers to determine the final confidence score.

図１９Ｅは、使用される試薬の群に基づいて、個々のベースコーラの信頼スコアに与えられる例示的な重み付けを示す、ＬＵＴ１９２０を示す。例えば、ＬＵＴ１９２０の第１の行を参照すると、例示的な試薬群Ａが使用されるとき、Ａ１％の重み付けが、第１のベースコーラ１４１４からの信頼スコアに与えられ、Ａ２％の重み付けが、第２のベースコーラ１４１６からの信頼スコアに与えられ、ここで、Ａ１＋Ａ２＝１００である。同様に、ＬＵＴ１９２０の第２の行を参照すると、例示的な試薬群Ｂが使用されるとき、Ｂ１％の重み付けが、第１のベースコーラ１４１４からの信頼スコアに与えられ、Ｂ２％の重み付けが、第２のベースコーラ１４１６からの信頼スコアに与えられ、ここで、Ｂ１＋Ｂ２＝１００である。 19E illustrates a LUT 1920 showing exemplary weightings given to the confidence scores of individual base collaborators based on the group of reagents used. For example, referring to the first row of the LUT 1920, when exemplary reagent group A is used, a weighting of A1% is given to the confidence score from the first base collaborator 1414 and a weighting of A2% is given to the confidence score from the second base collaborator 1416, where A1+A2=100. Similarly, referring to the second row of the LUT 1920, when exemplary reagent group B is used, a weighting of B1% is given to the confidence score from the first base collaborator 1414 and a weighting of B2% is given to the confidence score from the second base collaborator 1416, where B1+B2=100.

対数確率領域における、２つのベースコーラからの信頼スコアの正規化された比率
本明細書の上記の様々な例及び実施形態は、確率に関して信頼スコアを論じている。しかしながら、一実施形態では、信頼スコアは、対数目盛を使用して表すことができ、本明細書で論じる数学的演算（例えば、式１～８に関して）は、対数目盛を使用して表される信頼スコアを用いて実行することができる。例えば、Ｐｈｒｅｄ品質スコアは、自動ＤＮＡ配列決定によって生成された核酸塩基の同定の品質の尺度である。Ｐｈｒｅｄ品質スコアＱは、以下のように、ベースコール確率Ｐに対数的に関連する特性として定義される。 Normalized Ratio of Confidence Scores from Two Base Calls in Log Probability Domain Various examples and embodiments herein above discuss confidence scores in terms of probability. However, in one embodiment, the confidence scores can be expressed using a logarithmic scale, and the mathematical operations discussed herein (e.g., with respect to Equations 1-8) can be performed with confidence scores expressed using a logarithmic scale. For example, the Phred quality score is a measure of the quality of a nucleic acid base identification generated by automated DNA sequencing. The Phred quality score Q is defined as a property that is logarithmically related to the base call probability P as follows:

したがって、９０％のベースコール精度（例えば、０．９の値を有するｐ１（ｃ））は、１０の対応するＰｈｒｅｄスコアに変換され、９９％のベースコール精度（例えば、０．９９の値を有するｐ１（ｃ））は、２０の対応するＰｈｒｅｄスコアに変換され、以下同様である。ここで、Ｐはベースコール確率であり、Ｐ＝（１－Ｅ）のようにエラーの確率Ｅに関連する。したがって、Ｐｈｒｅｄ品質スコアＱは、Ｑ＝－１０×ｌｏｇ_１０（１－Ｅ）のようにエラーの確率Ｅに関連し、式中、Ｅは、特定のベースコールのエラーの確率である。品質スコア及びエラーの確率の更なる詳細は、例えば、参照により組み込まれる２０２１年７月２８日に出願された「ＱｕａｌｉｔｙＳｃｏｒｅＣａｌｉｂｒａｔｉｏｎｏｆＢａｓｅｃａｌｌｉｎｇＳｙｓｔｅｍｓ」と題する同時係属中の米国特許仮出願第６３／２２６，７０７号（代理人整理番号（ＩＬＬＭ１０４５－１／ＩＰ－２０９３－ＰＲＶ））において論じられている。 Thus, a base calling accuracy of 90% (e.g., p1(c) having a value of 0.9) is converted to a corresponding Phred score of 10, a base calling accuracy of 99% (e.g., p1(c) having a value of 0.99) is converted to a corresponding Phred score of 20, and so on, where P is the base calling probability and is related to the probability of error E as P=(1−E). Thus, the Phred quality score Q is related to the probability of error E as Q=−10×log ₁₀ (1−E), where E is the probability of error for a particular base call. Further details of quality scores and probability of error are discussed, for example, in co-pending U.S. Provisional Patent Application No. 63/226,707, entitled "Quality Score Calibration of Basecalling Systems," filed July 28, 2021 (Attorney Docket No. (ILLM1045-1/IP-2093-PRV)), which is incorporated by reference.

一実施形態では、本明細書で論じる数学的演算（例えば、式１～８に関して）は、信頼スコアの代わりにＰｈｒｅｄスコアを使用して実行することができる。したがって、数学的演算がＰｈｒｅｄ又は品質スコアを使用するいくつかの例では、使用されるベースコーラの選択は、Ｐｈｒｅｄ又は品質スコアに基づくことができる（例えば、式１～８に関して論じられるように）。 In one embodiment, the mathematical operations discussed herein (e.g., with respect to Equations 1-8) can be performed using Phred scores instead of confidence scores. Thus, in some examples where the mathematical operations use Phred or quality scores, the selection of the base caller to be used can be based on Phred or quality scores (e.g., as discussed with respect to Equations 1-8).

空間的コンテキスト、例えばセンサデータに関連付けられたエッジタイルに基づく、２つのベースコーラからの最終信頼度スコアの生成
図１７Ａに関して前述したように、いくつかのタイルは、タイルの空間位置に基づいてエッジタイルとして分類され得る。例えば、図１７Ａにおいて、フローセル１４０５の任意のエッジに隣接するタイルは、エッジタイル１４０６ａとしてラベル付けされ、残りのタイルは、非エッジタイル１４０６ｂとしてラベル付けされる。例えば、フローセル１４０４の（例えば、Ｙ軸に沿った）垂直エッジ及び／又は（例えば、Ｘ軸に沿った）水平エッジ上にあるタイルは、図１４に示すように、エッジタイル１４０６として分類される。したがって、エッジタイル１４０６は、フローセル１４０４の対応するエッジに直接隣接している。 Generating a Final Confidence Score from Two Base Correspondences Based on Spatial Context, e.g., Edge Tiles Associated with Sensor Data As described above with respect to FIG. 17A, some tiles may be classified as edge tiles based on the spatial location of the tiles. For example, in FIG. 17A, tiles adjacent to any edge of the flow cell 1405 are labeled as edge tiles 1406a, and the remaining tiles are labeled as non-edge tiles 1406b. For example, tiles that are on a vertical edge (e.g., along the Y axis) and/or a horizontal edge (e.g., along the X axis) of the flow cell 1404 are classified as edge tiles 1406, as shown in FIG. 14. Thus, the edge tiles 1406 are directly adjacent to the corresponding edge of the flow cell 1404.

また図１７Ａに関して論じたように、一例では、タイルのベースコール動作に関連するパラメータは、タイルの相対位置に基づくことができる。例えば、図１に関して論じられた励起光１０１は、フローセルのタイルに向かって方向付けられ、異なるタイルは、例えば、個々のタイルの位置及び／又は励起光１０１を発光する１つ以上の光源の位置に基づいて、異なる量の励起光１０１を受信することができる。例えば、励起光１０１を発光する光源（複数の場合もある）がフローセルの垂直上方にある場合、非エッジタイル１４０６ｂは、エッジタイル１４０６ａとは異なる量の光を受信することができる。別の例では、フローセル１４０５の周りの周辺光又は外部光（例えば、バイオセンサの外側からの周囲光）は、フローセル１４０５の個々のタイルによって受信される励起光１０１の量及び／又は特性に影響を及ぼし得る。単に一例として、エッジタイル１４０６ａは、フローセル１４０５の外側からいくらかの量の周辺光とともに励起光１０１を受信することができるが、非エッジタイル１４０６ｂは、主に励起光１０１を受信し得る。更に別の例では、フローセル１４０５に含まれる個々のセンサ（又はピクセル若しくはフォトダイオード）（例えば、図１に示されるセンサ１０６、１０８、１１０、１１２、及び１１４）は、対応するタイルの位置に基づく対応するセンサの位置に基づいて光を感知することができる。例えば、エッジタイル１４０６ａに関連付けられた１つ以上のセンサによって実行される感知動作は、非エッジタイル１４０６ｂに関連付けられた１つ以上の他のセンサの感知動作に対する周辺光の効果よりも比較的、（励起光１０１とともに）周辺光によって影響を受ける可能性がある。更に別の例では、様々なタイルに流れる（例えば、試薬、酵素、試料、他の生体分子、及び緩衝液など、ベースコール中に所望の反応を得るために使用され得る任意の物質を含む）反応物質の流れは、タイル位置によっても影響を受ける可能性がある。例えば、反応物質の供給源の近くにあるタイルは、供給源からより遠くにあるタイルよりも多量の反応物質を受けることができる。 17A, in one example, parameters related to the base calling operation of the tiles can be based on the relative position of the tiles. For example, the excitation light 101 discussed in FIG. 1 is directed toward the tiles of the flow cell, and different tiles can receive different amounts of excitation light 101, for example, based on the position of the individual tiles and/or the position of one or more light sources emitting the excitation light 101. For example, if the light source(s) emitting the excitation light 101 are vertically above the flow cell, the non-edge tile 1406b can receive a different amount of light than the edge tile 1406a. In another example, ambient or external light around the flow cell 1405 (e.g., ambient light from outside the biosensor) can affect the amount and/or characteristics of the excitation light 101 received by the individual tiles of the flow cell 1405. As just one example, edge tiles 1406a may receive excitation light 101 along with some amount of ambient light from outside flow cell 1405, while non-edge tiles 1406b may receive primarily excitation light 101. In yet another example, individual sensors (or pixels or photodiodes) included in flow cell 1405 (e.g., sensors 106, 108, 110, 112, and 114 shown in FIG. 1) may sense light based on the position of the corresponding sensor based on the position of the corresponding tile. For example, the sensing operation performed by one or more sensors associated with edge tile 1406a may be affected by ambient light (along with excitation light 101) relatively more than the effect of ambient light on the sensing operation of one or more other sensors associated with non-edge tiles 1406b. In yet another example, the flow of reactants (including, for example, any substances that may be used to obtain a desired reaction during base calling, such as reagents, enzymes, samples, other biomolecules, and buffers) flowing to various tiles may also be affected by tile position. For example, tiles that are closer to a source of reactant can receive a larger amount of reactant than tiles that are farther from the source.

一例では、センサデータのセット１６０１に関連付けられた空間的コンテキスト情報１６０４（図１６参照）は、センサデータのセット１６０１がエッジタイル１４０６ａにおいて生成されたか非エッジタイル１４０６ｂにおいて生成されたかに関する情報を含む。 In one example, the spatial context information 1604 (see FIG. 16) associated with the set of sensor data 1601 includes information regarding whether the set of sensor data 1601 was generated in an edge tile 1406a or a non-edge tile 1406b.

図１９Ｆは、タイルの空間分類を考慮して、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ１９２５を示す。例えば、ＬＵＴ１９２５の第１の行を参照すると、非エッジタイルについて、最終ベースコールは、通常、例えば、本開示において本明細書で論じられる任意の適切な動作方式に従って、ベースコール結合モジュール１４２８によって実行される。 Figure 19F illustrates a LUT 1925 illustrating the operation of the base call combination module 1428 of Figure 14 taking into account the spatial classification of tiles. For example, referring to the first row of the LUT 1925, for non-edge tiles, the final base call is typically performed by the base call combination module 1428, e.g., according to any suitable manner of operation discussed herein in this disclosure.

ＬＵＴ１９２５の第２行を参照して、エッジタイルに対する最終ベースコールのシナリオを論じる。一般に、本明細書で論じるように、第１のベースコーラ１４１４は、エッジタイルに対するベースコールを処理するようにより良く装備されている。したがって、一実施形態では、エッジタイルについて、ベースコール結合モジュール１４２８は、第１のベースコーラ１４１４からの信頼スコアにＥ１重み付けをし、第２のベースコーラ１４１６からの信頼スコアにＥ２重み付けをし、一例では、Ｅ１はＥ２よりも高く、Ｅ１とＥ２との合計は１００％である（すなわち、重み付けは正規化される）。 Referring to the second row of LUT 1925, we discuss final base calling scenarios for edge tiles. In general, as discussed herein, the first base caller 1414 is better equipped to handle base calling for edge tiles. Thus, in one embodiment, for edge tiles, the base call combination module 1428 assigns an E1 weight to the confidence score from the first base caller 1414 and an E2 weight to the confidence score from the second base caller 1416, where in one example, E1 is higher than E2 and the sum of E1 and E2 is 100% (i.e., the weightings are normalized).

空間的コンテキスト、例えばセンサデータに関連付けられたエッジクラスタに基づく、２つのベースコーラからの最終信頼スコアの生成
図１７Ｂに関して前述したように、例示的なタイル１４０６のクラスタ１４０７は、エッジクラスタ１４０７ａ又は非エッジクラスタ１４０７ｂのいずれかとして分類される。また、本明細書で前述したように、フローセル１４０５は、様々なクラスタの画像を捕捉するためのレンズ（マイクロレンズ又は他の光学構成要素のアレイを含むフィルタ層１２４など）を含んでもよく、クラスタの画像が捕捉されるとき、エッジクラスタ１４０７ａは、非エッジクラスタ１４０７ｂに対してわずかに焦点がずれていてもよい。したがって、実装形態に応じて、第１又は第２のベースコーラ１４１４又は１４１６のうちの１つは、エッジクラスタ１４０７ａからのセンサデータを処理するのにより適している場合があり、第１又は第２のベースコーラ１４１４又は１４１６のうちのもう１つは、非エッジクラスタ１４０７ｂからのセンサデータを処理するのにより適している場合がある。一例では、センサデータのセット１６０１に関連付けられた空間的コンテキスト情報１６０４（図１６参照）は、センサデータのセット１６０１が１つ以上のエッジクラスタ１４０７ａ又は１つ以上の非エッジクラスタ１４０７ｂから生成されたかどうかに関する情報を含み、これに基づいて、センサデータのセット１６０１は、第１又は第２のベースコーラ１４１４又は１４１６のうちの特定の一方又は両方によって処理され得る。 Generating a Final Confidence Score from the Two Base Correlates Based on Spatial Context, e.g., Edge Clusters Associated with Sensor Data As previously described with respect to FIG. 17B, the clusters 1407 of the exemplary tile 1406 are classified as either edge clusters 1407a or non-edge clusters 1407b. Also, as previously described herein, the flow cell 1405 may include lenses (such as a filter layer 124 including an array of microlenses or other optical components) for capturing images of the various clusters, and when the images of the clusters are captured, the edge clusters 1407a may be slightly out of focus with respect to the non-edge clusters 1407b. Thus, depending on the implementation, one of the first or second base correlators 1414 or 1416 may be better suited to process sensor data from edge clusters 1407a, and the other of the first or second base correlators 1414 or 1416 may be better suited to process sensor data from non-edge clusters 1407b. In one example, spatial context information 1604 (see FIG. 16 ) associated with the set of sensor data 1601 includes information regarding whether the set of sensor data 1601 was generated from one or more edge clusters 1407 a or one or more non-edge clusters 1407 b, based on which the set of sensor data 1601 may be processed by a particular one or both of the first or second base clusters 1414 or 1416.

図１９Ｇは、クラスタの空間分類を考慮して、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ１９３０を示す。例えば、ＬＵＴ１９３０の第１の行を参照すると、非エッジクラスタについて、最終ベースコールは、通常、例えば、本開示において本明細書で論じられる任意の適切な動作方式に従って、ベースコール結合モジュール１４２８によって実行される。 Figure 19G illustrates a LUT 1930 illustrating the operation of the base call combination module 1428 of Figure 14, taking into account the spatial classification of the clusters. For example, referring to the first row of the LUT 1930, for non-edge clusters, the final base calling is typically performed by the base call combination module 1428, e.g., according to any suitable manner of operation discussed herein in this disclosure.

ＬＵＴ１９３０の第２行を参照して、エッジクラスタに対する最終ベースコールのシナリオを論じる。一般に、本明細書で論じるように、第１のベースコーラ１４１４は、エッジタイルに対するベースコールを処理するようにより良く装備され得る。したがって、一実施形態では、エッジクラスタについて、ベースコール結合モジュール１４２８は、第１のベースコーラ１４１４からの信頼スコアにＣ１重み付けをし、第２のベースコーラ１４１６からの信頼スコアにＣ２重み付けをし、一例では、Ｃ１はＣ２よりも高く、Ｃ１とＣ２との合計は１００％である（すなわち、重み付けは正規化される）。一例では、重み付けＣ１は、１００％と高くすることができ、その場合、第１のベースコーラ１４１４からの分類情報は、エッジクラスタをベースコールするために排他的に使用される。 Referring to the second row of LUT 1930, we discuss final base calling scenarios for edge clusters. In general, as discussed herein, the first base caller 1414 may be better equipped to handle base calling for edge tiles. Thus, in one embodiment, for edge clusters, the base call combination module 1428 assigns a C1 weight to the confidence scores from the first base caller 1414 and a C2 weight to the confidence scores from the second base caller 1416, where in one example C1 is higher than C2 and the sum of C1 and C2 is 100% (i.e., the weighting is normalized). In one example, the weighting C1 can be as high as 100%, in which case the classification information from the first base caller 1414 is used exclusively to base call edge clusters.

２つのベースコーラからの分類情報が異なるか又は一致しない場合の最終信頼スコアの低下
図１９Ａに関して議論されるように、第１のベースコーラ１４１４は、第１の呼び出される塩基及びｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の第１の信頼スコアを出力し、第２のベースコーラ１４１６は、第２の呼び出される塩基及びｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）の第２の信頼スコアを出力する。一例では、所与の塩基について、第１のベースコーラ１４１４からの第１の呼び出される塩基は、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しない場合がある。 19A, the first base caller 1414 outputs a first called base and a first confidence score for p1(A), p1(C), p1(G), p1(T), and the second base caller 1416 outputs a second called base and a second confidence score for p2(A), p2(C), p2(G), p2(T). In one example, for a given base, the first called base from the first base caller 1414 may not match the second called base from the second base caller 1416.

例えば、第１のベースコーラ１４１４が、ｐ１（Ａ）の信頼スコアを有する塩基をＡと呼び出し、第２のベースコーラ１４１６が、ｐ２（Ｃ）の信頼スコアを有する塩基をＣと呼び出すと仮定する。そのようなシナリオでは、ベースコール結合モジュール１４２８によって出力される最終の呼び出される塩基は、以下の通りである。 For example, assume that the first base caller 1414 calls the base with a confidence score of p1(A) as A, and the second base caller 1416 calls the base with a confidence score of p2(C) as C. In such a scenario, the final called bases output by the base call combination module 1428 are:

２つのベースコーラからの２つのベースコールは異なるので、エラーの確率が高いことに留意されたい。したがって、最終信頼スコアを低減することができる。例えば、ｐ１（Ａ）がｐ２（Ｃ）よりも高く（すなわち、ｐ１（Ａ）＞ｐ２（Ｃ））、最終の呼び出される塩基がＡであると仮定する。その後、Ａに対応する最終信頼スコアｐｆ（Ａ）は、以下である。 Note that since the two base calls from the two base calls are different, there is a high probability of error. Therefore, the final confidence score can be reduced. For example, assume that p1(A) is higher than p2(C) (i.e., p1(A)>p2(C)) and the final called base is A. Then the final confidence score pf(A) corresponding to A is:

したがって、最終信頼スコアは、２つのベースコーラによって呼び出される塩基における相違のために、人為的に低下する。 The final confidence score is therefore artificially lowered due to the differences in the bases called by the two base calls.

別の例では、最終信頼スコアｐｆ（Ａ）は以下のように低減される。 In another example, the final confidence score pf(A) is reduced as follows:

したがって、最終信頼スコアは、２つのベースコーラによって呼び出される塩基における相違に起因して、１未満である適切な重みＷ１を使用して下げられる。 The final confidence score is therefore lowered using an appropriate weight W1 less than 1 due to differences in the bases called by the two base callers.

２つのベースコーラからの分類情報が、特定のコンテキスト情報（例えば、特別な塩基配列）とともに、異なるか、又は一致しない場合
本明細書で上述したように、第１のベースコーラ１４１４は、第１の呼び出される塩基及びｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の第１の信頼スコアを出力する。第２のベースコーラ１４１６は、第２の呼び出される塩基及びｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）の第２の信頼スコアを出力する。一例では、所与の塩基について、第１のベースコーラ１４１４からの第１の呼び出される塩基は、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しない場合がある。 Classification information from two base callers differs or does not match, along with certain context information (e.g., a particular base sequence) As described herein above, the first base caller 1414 outputs a first called base and a first confidence score for p1(A), p1(C), p1(G), p1(T). The second base caller 1416 outputs a second called base and a second confidence score for p2(A), p2(C), p2(G), p2(T). In one example, for a given base, the first called base from the first base caller 1414 may not match the second called base from the second base caller 1416.

一実施形態では、第１のベースコーラ１４１４からの第１の呼び出される塩基が第２のベースコーラ１４１６からの第２の呼び出される塩基と一致せず、そのような不一致が１つ以上の特定のコンテキスト情報も伴う場合、コンテキスト情報を最終の呼び出される塩基について考慮に入れることができる。 In one embodiment, if a first called base from the first base caller 1414 does not match a second called base from the second base caller 1416, and such a mismatch also involves one or more specific contextual information, the contextual information can be taken into account for the final called base.

図２０Ａは、（ｉ）特別な塩基配列が検出され、かつ、（ｉｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ２０００を示す。例えば、ＬＵＴ２０００の第１の行を参照して、第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致し、ホモポリマー（例えば、ＧＧＧＧＧ）、隣接ホモポリマー、又は近ホモポリマー（ＧＧＸＧＧなど）を有する配列などの特別な塩基配列が検出されるシナリオが論じられる。そのような特別な配列の更なる例は、図１９Ｂに関して論じられる。第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致するので、最終の呼び出される塩基は、第１及び第２のベースコーラから呼び出される塩基と一致し、信頼スコアは、図１９Ｂに従って、及び／又は本明細書で論じられる任意の適切な動作方式に従って計算され得る。また前述したように、本開示で論じられる特別な塩基配列（ホモポリマー、近ホモポリマー、又は隣接ホモポリマーを有する配列など）のいくつかの例は、５つの塩基を有することに留意されたい。しかしながら、そのような特別な塩基配列には、３、５、６、７、９、又は別の適切な数など、任意の異なる数の塩基が存在し得る。 20A shows a LUT 2000 illustrating the operation of the base call combination module 1428 of FIG. 14 when (i) a special base sequence is detected and (ii) the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416. For example, with reference to the first row of the LUT 2000, a scenario is discussed in which the first called base from the first base caller 1414 matches the second called base from the second base caller 1416, and a special base sequence is detected, such as a sequence having a homopolymer (e.g., GGGGG), adjacent homopolymers, or near homopolymers (such as GGXGG). Further examples of such special sequences are discussed with respect to FIG. 19B. Because the first called base from the first base caller 1414 matches the second called base from the second base caller 1416, the final called base matches the bases called from the first and second base callers, and the confidence score can be calculated according to FIG. 19B and/or according to any suitable manner of operation discussed herein. Also, as previously mentioned, it should be noted that some examples of special base sequences discussed in this disclosure (such as sequences having homopolymers, near homopolymers, or adjacent homopolymers) have five bases. However, there may be any different number of bases in such special base sequences, such as 3, 5, 6, 7, 9, or another suitable number.

ここでＬＵＴ２０００の第２の行を参照して、第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致せず、ホモポリマー（例えば、ＧＧＧＧＧ）、隣接ホモポリマー、又は近ホモポリマー（ＧＧＸＧＧなど）を有する配列などの特別な塩基配列が検出されるシナリオが論じられる。そのような特別な配列の更なる例は、図１９Ｂに関して論じられる。第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しないので、最終の呼び出される塩基は、第２のベースコーラ１４１６からの第２の呼び出される塩基に基づく（例えば、図１９Ｂ及び図１９Ｃに関して論じた理由により、第２のベースコーラ１４１６は、そのような特別な塩基配列に対してより信頼性が高い）。最終の呼び出される塩基についての信頼スコアは、例えば、２つのベースコーラからの対応する信頼スコアの最小値又は平均値（又は別の適切な関数）であり得る。 Now referring to the second row of LUT 2000, a scenario is discussed in which a first called base from the first base caller 1414 does not match a second called base from the second base caller 1416, and a special base sequence is detected, such as a sequence having a homopolymer (e.g., GGGGG), adjacent homopolymers, or near homopolymers (e.g., GGXGG). Further examples of such special sequences are discussed with respect to FIG. 19B. Because the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, the final called base is based on the second called base from the second base caller 1416 (e.g., the second base caller 1416 is more reliable for such special base sequences, for reasons discussed with respect to FIGS. 19B and 19C). The confidence score for the final called base can be, for example, the minimum or average (or another suitable function) of the corresponding confidence scores from the two base callers.

２つのベースコーラからの分類情報が、特定のコンテキスト情報（例えば、気泡検出）とともに、異なるか、又は一致しない場合
図２０Ｂは、（ｉ）気泡がクラスタ内で検出され、かつ（ｉｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ２００５を示す。 When the classification information from the two base callers differs or does not match, along with certain context information (e.g., bubble detection) FIG. 20B shows a LUT 2005 illustrating the operation of the combine base call module 1428 of FIG. 14 when (i) a bubble is detected in a cluster and (ii) the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416.

例えば、ＬＵＴ２００５の第１の行を参照して、第１のベースコーラ１４１４からの第１の呼び出される塩基が第２のベースコーラ１４１６からの第２の呼び出される塩基と一致し、いずれのクラスタにおいても気泡が検出されないシナリオが論じられる。したがって、最終ベースコールは、本明細書で論じる任意の適切な動作方法に従って実行される。 For example, referring to the first row of LUT 2005, a scenario is discussed in which a first called base from the first base caller 1414 matches a second called base from the second base caller 1416, and no bubbles are detected in any cluster. Thus, the final base call is performed according to any suitable method of operation discussed herein.

次に、ＬＵＴ２００５の第２の行を参照して、（ｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致せず、（ｉｉ）気泡がクラスタ内で検出されるシナリオが論じられる。第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しないので、最終の呼び出される塩基は、第１のベースコーラ１４１４からの第１の呼び出される塩基に基づく（例えば図１９Ｄに関して論じた理由により、第１のベースコーラ１４１４は気泡検出の場合により信頼性が高い）。最終の呼び出される塩基についての信頼スコアは、例えば、２つのベースコーラからの対応する信頼スコアの最小値又は平均値（又は別の適切な関数）であり得る。 Now, referring to the second row of LUT 2005, a scenario is discussed in which (i) the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, and (ii) a bubble is detected in the cluster. Since the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, the final called base is based on the first called base from the first base caller 1414 (e.g., the first base caller 1414 is more reliable in the case of bubble detection, for reasons discussed with respect to FIG. 19D). The confidence score for the final called base can be, for example, the minimum or average (or another suitable function) of the corresponding confidence scores from the two base callers.

２つのベースコーラからの分類情報が、特定のコンテキスト情報（例えば、焦点外画像）とともに、異なるか、又は一致しない場合
図２０Ｃは、（ｉ）１つ以上の焦点外画像が少なくとも１つのクラスタから検出され、（ｉｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ２０１０を示す。 When the classification information from the two base callers, along with certain context information (e.g., out-of-focus images), differs or does not match. FIG. 20C shows a LUT 2010 illustrating the operation of the combine base call module 1428 of FIG. 14 when (i) one or more out-of-focus images are detected from at least one cluster, and (ii) a first called base from the first base caller 1414 does not match a second called base from the second base caller 1416.

例えば、ＬＵＴ２０１０の第１の行を参照して、第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致し、焦点外画像がいずれのクラスタにおいても検出されないシナリオが論じられる。したがって、最終ベースコールは、本明細書で論じる任意の適切な動作方法に従って実行される。 For example, with reference to the first row of LUT 2010, a scenario is discussed in which a first called base from a first base caller 1414 matches a second called base from a second base caller 1416, and no out-of-focus images are detected in any cluster. Thus, the final base call is performed according to any suitable method of operation discussed herein.

次に、ＬＵＴ２０１０の第２の行を参照して、（ｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致せず、（ｉｉ）１つ以上の焦点外画像が少なくとも１つのクラスタから検出されるシナリオが論じられる。第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しないので、最終の呼び出される塩基は、第１のベースコーラ１４１４からの第１の呼び出される塩基に基づく（例えば図１９Ｄ１に関して論じた理由により、第１のベースコーラ１４１４は、焦点外画像検出の場合により信頼性が高い）。最終の呼び出される塩基についての信頼スコアは、例えば、２つのベースコーラからの対応する信頼スコアの最小値又は平均値（又は別の適切な関数）であり得る。 Now, referring to the second row of the LUT 2010, a scenario is discussed in which (i) the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, and (ii) one or more out-of-focus images are detected from at least one cluster. Since the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, the final called base is based on the first called base from the first base caller 1414 (e.g., the first base caller 1414 is more reliable for out-of-focus image detection, for reasons discussed with respect to FIG. 19D1). The confidence score for the final called base can be, for example, the minimum or average (or another suitable function) of the corresponding confidence scores from the two base callers.

２つのベースコーラからの分類情報が、特定のコンテキスト情報（エッジクラスタを示す空間的コンテキスト情報など）とともに、異なるか、又は一致しない場合
図２０Ｄは、（ｉ）センサデータがエッジクラスタからのものであり、かつ（ｉｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しない場合の、図１４のベースコール結合モジュール１４２８の動作を示すＬＵＴ２０１５を示す。 When the classification information from the two base callers, along with certain context information (such as spatial context information indicating an edge cluster), differs or does not match. FIG. 20D shows a LUT 2015 illustrating the operation of the combine base call module 1428 of FIG. 14 when (i) the sensor data is from an edge cluster, and (ii) a first called base from the first base caller 1414 does not match a second called base from the second base caller 1416.

例えば、ＬＵＴ２０１５の第１の行を参照して、第１のベースコーラ１４１４からの第１の呼び出される塩基が第２のベースコーラ１４１６からの第２の呼び出される塩基と一致し、センサデータのセットがエッジクラスタからのものであるシナリオが論じられる。したがって、最終ベースコールは、図１９Ｇに関して論じたものなど、本明細書で論じる任意の適切な動作方式に従って実行される。 For example, with reference to the first row of LUT 2015, a scenario is discussed in which a first called base from a first base caller 1414 matches a second called base from a second base caller 1416, and the set of sensor data is from an edge cluster. Thus, the final base call is performed according to any suitable operating scheme discussed herein, such as that discussed with respect to FIG. 19G.

次に、ＬＵＴ２０１５の第２の行を参照して、（ｉ）第１のベースコーラ１４１４からの第１の呼び出される塩基が、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致せず、（ｉｉ）センサデータがエッジクラスタからのものであるシナリオが論じられる。第１のベースコーラ１４１４からの第１の呼び出される塩基は、第２のベースコーラ１４１６からの第２の呼び出される塩基と一致しないので、最終の呼び出される塩基は、第１のベースコーラ１４１４からの第１の呼び出される塩基に基づく（例えば図１９Ｇに関して論じた理由により、第１のベースコーラ１４１４は、エッジクラスタに対してより信頼できる）。最終の呼び出される塩基についての信頼スコアは、例えば、２つのベースコーラからの対応する信頼スコアの最小値又は平均値（又は別の適切な関数）であり得る。 Now, referring to the second row of LUT 2015, a scenario is discussed in which (i) the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, and (ii) the sensor data is from an edge cluster. Since the first called base from the first base caller 1414 does not match the second called base from the second base caller 1416, the final called base is based on the first called base from the first base caller 1414 (e.g., the first base caller 1414 is more reliable for edge clusters, for reasons discussed with respect to FIG. 19G). The confidence score for the final called base can be, for example, the minimum or average (or another suitable function) of the corresponding confidence scores from the two base callers.

潜在的な信頼できない信頼スコアの検出、及びそのような検出に基づくベースコーラ間の選択的切り替え
図１９Ｂ及び図１９Ｃに関して本明細書で論じるように、第１のベースコーラ１４１４は、いくつかの特定の検出された塩基配列に対して、例えば、ホモポリマー（例えば、ＧＧＧＧＧ）、隣接ホモポリマー、又は近ホモポリマー（例えば、ＧＧＴＧＧ）を有する配列を呼び出すとき、満足に実行しない場合がある（例えば、第２のベースコーラに対して）。一実施形態では、いくつかのそのような配列について、第１のベースコーラ１４１４は、その呼び出される塩基について高い信頼スコアを生成することができるが、そのような高い信頼スコアは、呼び出される塩基についての真の信頼よりも高い可能性がある。したがって、例えば、そのような比較的高い信頼スコア（例えば、閾値より高い）が、ホモポリマー、又は隣接ホモポリマー若しくは近ホモポリマーを有する配列に対して第１のベースコーラ１４１４によって呼び出される場合、そのような高い信頼スコアは信頼できない可能性がある。いくつかのそのようなシナリオでは、第２のベースコーラ１４１６からの信頼スコアが使用され得る。 Detection of Potentially Unreliable Confidence Scores and Selective Switching Between Base Callers Based on Such Detection As discussed herein with respect to Figures 19B and 19C, the first base caller 1414 may not perform satisfactorily (e.g., to the second base caller) for some particular detected base sequences, for example, when calling sequences with homopolymers (e.g., GGGGG), adjacent homopolymers, or near homopolymers (e.g., GGTGG). In one embodiment, for some such sequences, the first base caller 1414 may generate a high confidence score for the called base, but such a high confidence score may be higher than the true confidence for the called base. Thus, for example, if such a relatively high confidence score (e.g., higher than a threshold) is called by the first base caller 1414 for a sequence with a homopolymer, or adjacent homopolymers or near homopolymers, such a high confidence score may be unreliable. In some such scenarios, the confidence score from the second base caller 1416 may be used.

一例では、ホモポリマー、隣接ホモポリマー、又は近ホモポリマーを有する配列について、配列の中央又は第３の呼び出される塩基に関連する信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）を変更することができる。例えば、隣接ホモポリマー（例えば、ＧＧＴＧＧ）を有する配列について仮定すると、第１のベースコーラ１４１４は、或る特定の信頼スコアを有する第３の塩基を呼び出し、ここで、Ｔである第３の塩基についての信頼スコアは、比較的高い（例えば、閾値より高い）。したがって、第２のベースコーラ１４１６からの信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）は、隣接ホモポリマー又は近ホモポリマーを有する５塩基配列の第３の塩基に使用することができ、最終信頼スコアを決定するために使用することができる。 In one example, for sequences with homopolymers, adjacent homopolymers, or near homopolymers, the confidence scores p1(A), p1(C), p1(G), p1(T) associated with the middle or third called base of the sequence can be modified. For example, assuming a sequence with adjacent homopolymers (e.g., GGTGG), the first base caller 1414 calls the third base with a certain confidence score, where the confidence score for the third base that is T is relatively high (e.g., higher than a threshold). Thus, the confidence scores p2(A), p2(C), p2(G), p2(T) from the second base caller 1416 can be used for the third base of a 5-base sequence with adjacent homopolymers or near homopolymers and can be used to determine the final confidence score.

２つのベースコーラからの分類情報が異なるか又は一致しない場合の、不確定なベースコールを含む最終ベースコール
一例では、２つのベースコーラからの分類情報が異なるか又は一致しない場合、最終ベースコール１４４０は、不確定なベースコール及び対応する信頼スコアを含み得る。例えば、様々な塩基についての最終信頼スコアは、最小、平均、最大、又は正規化された重み付けされた信頼スコアなどの、本明細書で論じられる手法のいずれかを使用して生成することができ、最終ベースコールは、不確定であると示すことができる。 Final base call including uncertain base call when classification information from two base callers differ or do not match In one example, when classification information from two base callers differ or do not match, final base call 1440 may include an uncertain base call and a corresponding confidence score. For example, final confidence scores for the various bases may be generated using any of the approaches discussed herein, such as minimum, average, maximum, or normalized weighted confidence scores, and the final base call may be indicated as uncertain.

例えば、第１のベースコール分類情報１４３４が、呼び出される所与の塩基について、信頼スコアｐ１（Ａ）、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）、及びＡの第１の呼び出される塩基（例えば、ｐ１（Ａ）は、ｐ１（Ｃ）、ｐ１（Ｇ）、ｐ１（Ｔ）の各々よりも高いので）を含むと仮定する。また、第２のベースコール分類情報１４３６が、呼び出される所与の塩基について、信頼スコアｐ２（Ａ）、ｐ２（Ｃ）、ｐ２（Ｇ）、ｐ２（Ｔ）及びＣの第２の呼び出される塩基（例えば、ｐ２（Ｃ）がｐ２（Ａ）、ｐ２（Ｇ）、ｐ２（Ｔ）の各々よりも高いので）を含むと仮定する。２つのベースコールが一致しないので、最終ベースコールは「Ｎ」であり、ここで、一例では、「Ｎ」は不確定なベースコールを示す。別の例では、本明細書で論じられる特定の使用事例について、「Ｎ」は、塩基Ａ又はＣのいずれかを示し得る（すなわち、２つのベースコーラによって出力された第１及び第２のベースコール）。最終ベースコールＮには、本明細書で先に論じた式１～８のいずれかを使用して計算することができる最終信頼スコアを付加することができる。 For example, assume that first base call classification information 1434 includes, for a given base being called, confidence scores p1(A), p1(C), p1(G), p1(T), and a first called base of A (e.g., because p1(A) is higher than each of p1(C), p1(G), and p1(T)). Also assume that second base call classification information 1436 includes, for a given base being called, confidence scores p2(A), p2(C), p2(G), p2(T), and a second called base of C (e.g., because p2(C) is higher than each of p2(A), p2(G), and p2(T)). Because the two base calls do not match, the final base call is "N," where in one example "N" indicates an uncertain base call. In another example, for the particular use case discussed herein, "N" can represent either base A or C (i.e., the first and second base calls output by the two base callers). The final base call N can be appended with a final confidence score that can be calculated using any of Equations 1-8 discussed previously herein.

ニューラルネットワークベースの最終ベースコール決定モジュール
図２１は、塩基配列を含む未知の検体のベースコールを予測するための、複数のベースコーラを含むベースコールシステム２１００を示し、ニューラルネットワークベースの最終ベースコール決定モジュール２１２８は、複数のベースコーラのうちの１つ以上の出力に基づいて最終ベースコール１４４０を決定する。一例では、最終ベースコール決定モジュール２１２８は、コンテキスト情報及び他の変数を考慮に入れて（例えば、図１９Ａに関して論じたように）、第１のベースコール分類情報１４３４と第２のベースコール分類情報１４３６とをどのように組み合わせて最終ベースコール１４４０を生成するかを決定する。システム２１００は、図１４のシステム１４００と少なくとも部分的に類似している。しかしながら、図１４のコンテキスト情報生成モジュール１４１８及びベースコール結合モジュール１４２８は、図２１のシステム２１００において、最終ベースコール決定モジュール２１２８によって置き換えられる。 Neural Network-Based Final Base Call Determination Module FIG. 21 illustrates a base calling system 2100 including multiple base callers for predicting base calls for an unknown sample including a base sequence, where a neural network-based final base call determination module 2128 determines a final base call 1440 based on the output of one or more of the multiple base callers. In one example, the final base call determination module 2128 determines how to combine the first base call classification information 1434 and the second base call classification information 1436 to generate the final base call 1440, taking into account context information and other variables (e.g., as discussed with respect to FIG. 19A). The system 2100 is at least partially similar to the system 1400 of FIG. 14. However, the context information generation module 1418 and the base call combination module 1428 of FIG. 14 are replaced by the final base call determination module 2128 in the system 2100 of FIG. 21.

一例では、最終ベースコール決定モジュール２１２８は、２つのベースコーラ１４１４及び１４１６からの出力を使用して訓練されたニューラルネットワークベースのモジュールである。次いで、訓練された最終ベースコール決定モジュール２１２８は、ベースコールのために使用される。最終ベースコール決定モジュール２１２８の訓練は、本明細書で論じられる１つ以上の最終ベースコール決定動作に基づくことができる。図２１のシステム２１００の動作は、図１４に関する議論、及び本明細書に提示される最終ベースコール決定に関する更なる議論に基づいて明らかになるであろう。他の例では、最終ベースコール決定モジュール２１２８は、ロジスティック回帰モデル、勾配ブーストツリーモデル、ランダムフォレストモデル、ナイーブベイズモデルなどの別の適切な機械学習モデルであり得る。一例では、最終ベースコール決定モジュール２１２８は、２つの分類スコアを組み合わせて最終ベースコール１４４０を生成することができる任意の適切な機械学習モデルとすることができる。 In one example, the final base call determination module 2128 is a neural network-based module trained using the output from the two base callers 1414 and 1416. The trained final base call determination module 2128 is then used for base calling. The training of the final base call determination module 2128 can be based on one or more final base call determination operations discussed herein. The operation of the system 2100 of FIG. 21 will become clear based on the discussion regarding FIG. 14 and the further discussion regarding final base call determination presented herein. In other examples, the final base call determination module 2128 can be another suitable machine learning model, such as a logistic regression model, a gradient boosted tree model, a random forest model, a naive Bayes model, etc. In one example, the final base call determination module 2128 can be any suitable machine learning model that can combine the two classification scores to generate the final base call 1440.

重み推定
本開示全体を通して様々な重みが本明細書で論じられており、重みは、最終分類情報を生成する間に第１の分類情報１４３４及び第２の分類情報１４３６を重み付けするために使用される。重みを生成するために様々な技法を採用することができる。 Weight Estimation Various weights are discussed herein throughout this disclosure and are used to weight the first classification information 1434 and the second classification information 1436 while generating the final classification information. Various techniques can be employed to generate the weights.

一例では、図２１の最終ベースコール決定モジュール２１２８の訓練されたニューラルネットワークモデルを使用して、重みを微調整することができる。別の例では、重みは、試行錯誤法又は別の適切な方法を使用して経験的に決定することもできる。更に別の例では、信頼スコアの予測共分散行列を経験的に推定し、重みを推定するために使用することができる。 In one example, the weights can be fine-tuned using a trained neural network model of the final base call determination module 2128 of FIG. 21. In another example, the weights can be empirically determined using trial and error or another suitable method. In yet another example, the predicted covariance matrix of the confidence scores can be empirically estimated and used to estimate the weights.

ベースコールシステムアーキテクチャ
図２２は、一実装形態による、ベースコールシステム２２００のブロック図である。ベースコールシステム２２００は、生物学的物質又は化学物質のうちの少なくとも１つに関連する任意の情報又はデータを得るように動作することができる。いくつかの実装形態では、ベースコールシステム２２００は、ベンチトップデバイス又はデスクトップコンピュータと同様であり得るワークステーションである。例えば、所望の反応を実施するためのシステム及び構成要素の大部分（又は全て）は、共通のハウジング２２１６内にあってもよい。 Base calling system architecture Figure 22 is a block diagram of a base calling system 2200 according to one implementation. The base calling system 2200 can operate to obtain any information or data related to at least one of biological or chemical substances. In some implementations, the base calling system 2200 is a workstation, which can be similar to a benchtop device or desktop computer. For example, most (or all) of the systems and components for carrying out the desired reactions can be in a common housing 2216.

特定の実装形態では、ベースコールシステム２２００は、ｄｅｎｏｖｏｓｅｑｕｅｎｃｉｎｇ、全ゲノム又は標的ゲノム領域の再配列、及びメタゲノミクスを含むがこれらに限定されない、様々な用途のために構成された核酸配列決定システム（又はシーケンサ）である。シーケンサはまた、ＤＮＡ又はＲＮＡ分析に使用されてもよい。いくつかの実装形態では、ベースコールシステム２２００はまた、バイオセンサ内に反応部位を発生させるように構成されてもよい。例えば、ベースコールシステム２２００は、試料を受容し、試料由来のクローン的に増幅された核酸の表面付着クラスタを発生させるように構成され得る。各クラスタは、バイオセンサ内の反応部位を構成するか、又はその一部であってもよい。 In certain implementations, the base calling system 2200 is a nucleic acid sequencing system (or sequencer) configured for a variety of applications, including, but not limited to, de novo sequencing, resequencing of whole genomes or target genomic regions, and metagenomics. Sequencers may also be used for DNA or RNA analysis. In some implementations, the base calling system 2200 may also be configured to generate reaction sites within a biosensor. For example, the base calling system 2200 may be configured to receive a sample and generate surface-attached clusters of clonally amplified nucleic acids from the sample. Each cluster may constitute or be part of a reaction site within a biosensor.

［０３５９］例示的なベースコールシステム２２００は、バイオセンサ２２０２と相互作用して、バイオセンサ２２０２内で所望の反応を行うように構成されたシステム受け部又はインターフェース２２１２を含んでもよい。図２２に関して以下の説明では、バイオセンサ２２０２はシステム受け部２２１２内に装填される。しかしながら、バイオセンサ２２０２を含むカートリッジは、システム受け部２２１２に挿入されてもよく、一部の状態では、カートリッジは一時的又は永久的に除去され得ることが理解される。上述のように、カートリッジは、とりわけ、流体制御及び流体貯蔵構成要素を含んでもよい。 [0359] The exemplary base calling system 2200 may include a system receptacle or interface 2212 configured to interact with the biosensor 2202 to effect a desired reaction within the biosensor 2202. In the following description with respect to FIG. 22, the biosensor 2202 is loaded into the system receptacle 2212. However, it is understood that a cartridge containing the biosensor 2202 may be inserted into the system receptacle 2212, and that in some conditions the cartridge may be temporarily or permanently removed. As discussed above, the cartridge may include, among other things, fluid control and fluid storage components.

特定の実装形態では、ベースコールシステム２２００は、バイオセンサ２２０２内で多数の平行反応を行うように構成されている。バイオセンサ２２０２は、所望の反応が生じ得る１つ以上の反応部位を含む。反応部位は、例えば、バイオセンサの固体表面に固定化されてもよく、又はバイオセンサの対応する反応チャンバ内に位置するビーズ（又は他の可動基質）に固定化されてもよい。反応部位は、例えば、クローン的に増幅された核酸のクラスタを含むことができる。バイオセンサ２２０２は、固体撮像デバイス（例えば、ＣＣＤ又はＣＭＯＳイメージャ）及びそれに取り付けられたフローセルを含んでもよい。フローセルは、ベースコールシステム２２００から溶液を受容し、溶液を反応部位に向かって方向付ける１つ以上のフローチャネルを含んでもよい。任意選択的に、バイオセンサ２２０２は、熱エネルギーをフローチャネルの内外に伝達するための熱要素と係合するように構成することができる。 In certain implementations, the base calling system 2200 is configured to perform multiple parallel reactions within the biosensor 2202. The biosensor 2202 includes one or more reaction sites where a desired reaction can occur. The reaction sites may be immobilized, for example, on a solid surface of the biosensor or on beads (or other mobile substrates) located within corresponding reaction chambers of the biosensor. The reaction sites may include, for example, clusters of clonally amplified nucleic acids. The biosensor 2202 may include a solid-state imaging device (e.g., a CCD or CMOS imager) and a flow cell attached thereto. The flow cell may include one or more flow channels that receive solutions from the base calling system 2200 and direct the solutions toward the reaction sites. Optionally, the biosensor 2202 may be configured to engage a thermal element for transferring thermal energy into and out of the flow channel.

ベースコールシステム２２００は、相互に相互作用して、生物学的又は化学的分析のための所定の方法又はアッセイプロトコルを実行する、様々な構成要素、アセンブリ、及びシステム（又はサブシステム）を含んでもよい。例えば、ベースコールシステム２２００は、ベースコールシステム２２００の様々な構成要素、アセンブリ、及びサブシステムと通信してもよいシステムコントローラ２２０４、及びまたバイオセンサ２２０２を含む。例えば、システム受け部２２１２に加えて、ベースコールシステム２２００はまた、ベースコールシステム２２００及びバイオセンサ２２０２の流体ネットワーク全体にわたる流体の流れを制御するための流体制御システム２２０６と、バイオアッセイシステムによって使用され得る全ての流体（例えば、ガス又は液体）を保持するように構成された流体貯蔵システム２２０８と、流体ネットワーク、流体貯蔵システム２２０８、及び／又はバイオセンサ２２０２内の流体の温度を調節し得る温度制御システム２２１０と、並びにバイオセンサ２２０２を照明するように構成された照明システム２２０９と、を含み得る。上述のように、バイオセンサ２２０２を有するカートリッジがシステム受け部２２１２内に装填される場合、カートリッジはまた、流体制御及び流体貯蔵構成要素を含んでもよい。 The base calling system 2200 may include various components, assemblies, and systems (or subsystems) that interact with each other to perform a given method or assay protocol for biological or chemical analysis. For example, the base calling system 2200 includes a system controller 2204, which may communicate with the various components, assemblies, and subsystems of the base calling system 2200, and also a biosensor 2202. For example, in addition to the system receptacle 2212, the base calling system 2200 may also include a fluid control system 2206 for controlling the flow of fluids throughout the fluidic network of the base calling system 2200 and the biosensor 2202, a fluid storage system 2208 configured to hold all fluids (e.g., gas or liquid) that may be used by the bioassay system, a temperature control system 2210 that may regulate the temperature of the fluids in the fluidic network, the fluid storage system 2208, and/or the biosensor 2202, and an illumination system 2209 configured to illuminate the biosensor 2202. As described above, when a cartridge having a biosensor 2202 is loaded into the system receptacle 2212, the cartridge may also include fluid control and fluid storage components.

また示されるように、ベースコールシステム２２００は、ユーザと相互作用するユーザインターフェース２２１４を含んでもよい。例えば、ユーザインターフェース２２１４は、ユーザから情報を表示又は要求するディスプレイ２２１３と、ユーザ入力を受け取るためのユーザ入力デバイス２２１５と、を含むことができる。いくつかの実装形態では、ディスプレイ２２１３及びユーザ入力デバイス２２１５は、同じデバイスである。例えば、ユーザインターフェース２２１４は、個々のタッチの存在を検出し、またディスプレイ上のタッチの位置を識別するように構成されたタッチ感知ディスプレイを含んでもよい。しかしながら、マウス、タッチパッド、キーボード、キーパッド、ハンドヘルドスキャナ、音声認識システム、動き認識システムなどの他のユーザ入力デバイス２２１５が使用されてもよい。以下でより詳細に説明するように、ベースコールシステム２２００は、所望の反応を実行するために、バイオセンサ２２０２（例えば、カートリッジの形態）を含む様々な構成要素と通信してもよい。ベースコールシステム２２００はまた、バイオセンサから得られたデータを分析して、ユーザに所望の情報を提供するように構成されてもよい。 As also shown, the base calling system 2200 may include a user interface 2214 for interacting with a user. For example, the user interface 2214 may include a display 2213 for displaying or requesting information from a user, and a user input device 2215 for receiving user input. In some implementations, the display 2213 and the user input device 2215 are the same device. For example, the user interface 2214 may include a touch-sensitive display configured to detect the presence of an individual touch and to identify the location of the touch on the display. However, other user input devices 2215, such as a mouse, touchpad, keyboard, keypad, handheld scanner, voice recognition system, motion recognition system, etc. may be used. As described in more detail below, the base calling system 2200 may communicate with various components, including a biosensor 2202 (e.g., in the form of a cartridge), to perform the desired reaction. The base calling system 2200 may also be configured to analyze data obtained from the biosensor to provide the desired information to the user.

システムコントローラ２２０４は、マイクロコントローラ、低減命令セットコンピュータ（Reduced Instruction Set Computer、ＲＩＳＣ）、特定用途向け集積回路（ＡＳＩＣ）、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、論理回路、及び本明細書に記載される機能を実行することができる任意の他の回路又はプロセッサを使用するシステムを含む、任意のプロセッサベース又はマイクロプロセッサベースのシステムを含み得る。上記の実施例は、例示的なものに過ぎず、したがって、システムコントローラという用語の定義及び／又は意味を制限することを意図するものではない。例示的な実装形態では、システムコントローラ２２０４は、検出データを取得し分析する少なくとも１つのために、１つ以上の記憶要素、メモリ、又はモジュール内に記憶された命令のセットを実行する。検出データは、ピクセル信号の複数の配列を含むことができ、それにより、数百万個のセンサ（又はピクセル）のそれぞれからのピクセル信号の配列を、多くのベースコールサイクルにわたって検出することができる。記憶要素は、ベースコールシステム２２００内の情報源又は物理メモリ要素の形態であってもよい。 The system controller 2204 may include any processor-based or microprocessor-based system, including systems using microcontrollers, reduced instruction set computers (RISC), application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs), logic circuits, and any other circuits or processors capable of performing the functions described herein. The above examples are merely exemplary and are therefore not intended to limit the definition and/or meaning of the term system controller. In an exemplary implementation, the system controller 2204 executes a set of instructions stored in one or more storage elements, memories, or modules for at least one of acquiring and analyzing detection data. The detection data may include multiple sequences of pixel signals, such that sequences of pixel signals from each of millions of sensors (or pixels) may be detected over many base call cycles. The storage elements may be in the form of information sources or physical memory elements within the base call system 2200.

命令セットは、本明細書に記載される様々な実装形態の方法及びプロセスなどの特定の動作を実行するようにベースコールシステム２２００又はバイオセンサ２２０２に指示する様々なコマンドを含んでもよい。命令のセットは、有形の非一時的コンピュータ可読媒体又は媒体の一部を形成し得るソフトウェアプログラムの形態であってもよい。本明細書で使用するとき、用語「ソフトウェア」及び「ファームウェア」は互換可能であり、ＲＡＭメモリ、ＲＯＭメモリ、ＥＰＲＯＭメモリ、ＥＥＰＲＯＭメモリ、及び不揮発性ＲＡＭ（non-volatile RAM、ＮＶＲＡＭ）メモリを含むコンピュータによって実行されるメモリに記憶された任意のコンピュータプログラムを含む。上記メモリタイプは、例示的なものに過ぎず、したがって、コンピュータプログラムの記憶に使用可能なメモリのタイプに限定されない。 The set of instructions may include various commands that instruct the base call system 2200 or the biosensor 2202 to perform certain operations, such as the methods and processes of the various implementations described herein. The set of instructions may be in the form of a software program that may form a part of a tangible non-transitory computer readable medium or medium. As used herein, the terms "software" and "firmware" are interchangeable and include any computer program stored in memory executed by a computer, including RAM memory, ROM memory, EPROM memory, EEPROM memory, and non-volatile RAM (NVRAM) memory. The above memory types are merely exemplary and thus are not limited to the types of memory that may be used to store a computer program.

ソフトウェアは、システムソフトウェア又はアプリケーションソフトウェアなどの様々な形態であってもよい。更に、ソフトウェアは、別個のプログラムの集合、又はより大きいプログラム内のプログラムモジュール若しくはプログラムモジュールの一部分の形態であってもよい。ソフトウェアはまた、オブジェクト指向プログラミングの形態のモジュール式プログラミングを含んでもよい。検出データを取得した後、検出データは、ユーザ入力に応じて処理されたベースコールシステム２２００によって自動的に処理されてもよく、又は別の処理マシン（例えば、通信リンクを介したリモート要求）によって行われる要求に応じて処理されてもよい。図示の実装形態では、システムコントローラ２２０４は、（図２３に示される）分析モジュール２３３８を含む。他の実装形態では、システムコントローラ２２０４は分析モジュール２３３８を含まず、代わりに分析モジュール２３３８へのアクセスを有する（例えば、分析モジュール２３３８は、クラウド上で別個にホスティングされ得る）。 The software may be in various forms, such as system software or application software. Furthermore, the software may be in the form of a collection of separate programs, or a program module or a portion of a program module within a larger program. The software may also include modular programming in the form of object-oriented programming. After acquiring the detection data, the detection data may be automatically processed by the base calling system 2200 in response to user input, or may be processed in response to a request made by another processing machine (e.g., a remote request via a communication link). In the illustrated implementation, the system controller 2204 includes an analysis module 2338 (shown in FIG. 23). In other implementations, the system controller 2204 does not include the analysis module 2338, but instead has access to the analysis module 2338 (e.g., the analysis module 2338 may be separately hosted on the cloud).

システムコントローラ２２０４は、通信リンクを介して、バイオセンサ２２０２及びベースコールシステム２２００の他の構成要素に接続されてもよい。システムコントローラ２２０４はまた、オフサイトシステム又はサーバに通信可能に接続されてもよい。通信リンクは、配線、コード、又は無線であってもよい。システムコントローラ２２０４は、ユーザインターフェース２２１４及びユーザ入力デバイス２２１５からユーザ入力又はコマンドを受信してもよい。 The system controller 2204 may be connected to the biosensor 2202 and other components of the base calling system 2200 via a communication link. The system controller 2204 may also be communicatively connected to an off-site system or server. The communication link may be a wire, a cord, or wireless. The system controller 2204 may receive user input or commands from the user interface 2214 and the user input device 2215.

流体制御システム２２０６は、流体ネットワークを含み、流体ネットワークを通る１つ以上の流体の流れを方向付け、調節するように構成されている。流体ネットワークは、バイオセンサ２２０２及び流体貯蔵システム２２０８と流体連通していてもよい。例えば、流体貯蔵システム２２０８から流体を選択し、制御された方法でバイオセンサ２２０２に方向付けてもよく、又は流体は、バイオセンサ２２０２から引き出され、例えば、流体貯蔵システム２２０８内の廃棄物リザーバに方向付けられてもよい。図示されていないが、流体制御システム２２０６は、流体ネットワーク内の流体の流量又は圧力を検出する流量センサを含んでもよい。センサは、システムコントローラ２２０４と通信してもよい。 The fluid control system 2206 includes a fluid network and is configured to direct and regulate the flow of one or more fluids through the fluid network. The fluid network may be in fluid communication with the biosensor 2202 and the fluid storage system 2208. For example, fluid may be selected from the fluid storage system 2208 and directed to the biosensor 2202 in a controlled manner, or fluid may be drawn from the biosensor 2202 and directed to a waste reservoir, for example, in the fluid storage system 2208. Although not shown, the fluid control system 2206 may include a flow sensor that detects the flow rate or pressure of the fluid in the fluid network. The sensor may be in communication with the system controller 2204.

温度制御システム２２１０は、流体ネットワーク、流体貯蔵システム２２０８及び／又はバイオセンサ２２０２の異なる領域における流体の温度を調節するように構成されている。例えば、温度制御システム２２１０は、バイオセンサ２２０２とインターフェースし、バイオセンサ２２０２内の反応部位に沿って流れる流体の温度を制御する熱循環器を含んでもよい。温度制御システム２２１０はまた、ベースコールシステム２２００又はバイオセンサ２２０２の中実要素又は構成要素の温度を調節してもよい。図示されていないが、温度制御システム２２１０は、流体又は他の構成要素の温度を検出するためのセンサを含んでもよい。センサは、システムコントローラ２２０４と通信してもよい。 The temperature control system 2210 is configured to regulate the temperature of fluids in different regions of the fluid network, the fluid reservoir system 2208, and/or the biosensor 2202. For example, the temperature control system 2210 may include a thermal cycler that interfaces with the biosensor 2202 and controls the temperature of the fluid flowing along a reaction site in the biosensor 2202. The temperature control system 2210 may also regulate the temperature of solid elements or components of the base calling system 2200 or the biosensor 2202. Although not shown, the temperature control system 2210 may include sensors for detecting the temperature of the fluids or other components. The sensors may be in communication with the system controller 2204.

流体貯蔵システム２２０８は、バイオセンサ２２０２と流体連通しており、所望の反応を行うために使用される様々な反応成分又は反応物質を貯蔵してもよい。流体貯蔵システム２２０８はまた、流体ネットワーク及びバイオセンサ２２０２を洗浄又はクリーニングし、反応物質を希釈するための流体を貯蔵してもよい。例えば、流体貯蔵システム２２０８は、試料、試薬、酵素、他の生体分子、緩衝液、水性、及び非極性溶液などを保存するための様々なリザーバを含んでもよい。更に、流体貯蔵システム２２０８はまた、バイオセンサ２２０２から廃棄物を受容するための廃棄物リザーバを含んでもよい。カートリッジを含む実装形態では、カートリッジは、流体貯蔵システム、流体制御システム、又は温度制御システムのうちの１つ以上を含み得る。したがって、これらのシステムに関する本明細書に記載される構成要素のうちの１つ以上は、カートリッジハウジング内に収容され得る。例えば、カートリッジは、試料、試薬、酵素、他の生体分子、緩衝液、水性、及び非極性溶液、廃棄物などを保存するための様々なリザーバを有し得る。したがって、流体貯蔵システム、流体制御システム、又は温度制御システムのうちの１つ以上は、カートリッジ又は他のバイオセンサを介してバイオアッセイシステムと取り外し可能に係合され得る。 The fluid storage system 2208 is in fluid communication with the biosensor 2202 and may store various reaction components or reactants used to carry out the desired reaction. The fluid storage system 2208 may also store fluids for washing or cleaning the fluid network and the biosensor 2202 and diluting the reactants. For example, the fluid storage system 2208 may include various reservoirs for storing samples, reagents, enzymes, other biomolecules, buffers, aqueous, and non-polar solutions, etc. Additionally, the fluid storage system 2208 may also include a waste reservoir for receiving waste from the biosensor 2202. In implementations that include a cartridge, the cartridge may include one or more of a fluid storage system, a fluid control system, or a temperature control system. Thus, one or more of the components described herein with respect to these systems may be housed within the cartridge housing. For example, the cartridge may have various reservoirs for storing samples, reagents, enzymes, other biomolecules, buffers, aqueous, and non-polar solutions, waste, etc. Thus, one or more of the fluid storage system, the fluid control system, or the temperature control system may be removably engaged with the bioassay system via a cartridge or other biosensor.

照明システム２２０９は、バイオセンサを照明するための光源（例えば、１つ以上のＬＥＤ）及び複数の光学構成要素を含んでもよい。光源の例としては、レーザ、アークランプ、ＬＥＤ、又はレーザダイオードが挙げられる。光学構成要素は、例えば、反射器、偏光板、ビームスプリッタ、コリメータ、レンズ、フィルタ、ウェッジ、プリズム、鏡、検出器などであってもよい。照明システムを使用する実装形態では、照明システム２２０９は、励起光を反応部位に方向付けるように構成されてもよい。一例として、蛍光団は、緑色の光の波長によって励起されてもよく、そのため、励起光の波長は約５３２ｎｍであり得る。一実装形態では、照明システム２２０９は、バイオセンサ２２０２の表面の表面法線に平行な照明を生成するように構成されている。別の実装形態では、照明システム２２０９は、バイオセンサ２２０２の表面の表面法線に対してオフアングルである照明を生成するように構成されている。更に別の実装形態では、照明システム２２０９は、いくつかの平行照明及び或る程度のオフアングル照明を含む複数の角度を有する照明を生成するように構成されている。 The illumination system 2209 may include a light source (e.g., one or more LEDs) and multiple optical components for illuminating the biosensor. Examples of light sources include lasers, arc lamps, LEDs, or laser diodes. The optical components may be, for example, reflectors, polarizers, beam splitters, collimators, lenses, filters, wedges, prisms, mirrors, detectors, and the like. In implementations using an illumination system, the illumination system 2209 may be configured to direct excitation light to the reaction site. As an example, a fluorophore may be excited by a wavelength of green light, so the wavelength of the excitation light may be about 532 nm. In one implementation, the illumination system 2209 is configured to generate illumination parallel to a surface normal of the surface of the biosensor 2202. In another implementation, the illumination system 2209 is configured to generate illumination that is off-angled to the surface normal of the surface of the biosensor 2202. In yet another implementation, the illumination system 2209 is configured to generate illumination having multiple angles, including some parallel illumination and some off-angle illumination.

［０３７１］システム受け部又はインターフェース２２１２は、機械的、電気的、及び流体的な方法のうちの少なくとも１つにおいてバイオセンサ２２０２と係合するように構成される。システム受け部２２１２は、バイオセンサ２２０２を所望の配向に保持して、バイオセンサ２２０２を通る流体の流れを容易にすることができる。システム受け部２２１２はまた、バイオセンサ２２０２と係合するように構成された電気接点を含んでもよく、それにより、ベースコールシステム２２００は、バイオセンサ２２０２と通信してもよく、及び／又はバイオセンサ２２０２に電力を供給することができる。更に、システム受け部２２１２は、バイオセンサ２２０２と係合するように構成された流体ポート（例えば、ノズル）を含んでもよい。いくつかの実装形態では、バイオセンサ２２０２は、機械的に、電気的に、また流体方式で、システム受け部２２１２に取り外し可能に連結される。 [0371] The system receptacle or interface 2212 is configured to engage the biosensor 2202 in at least one of mechanical, electrical, and fluidic ways. The system receptacle 2212 can hold the biosensor 2202 in a desired orientation to facilitate fluid flow through the biosensor 2202. The system receptacle 2212 can also include electrical contacts configured to engage the biosensor 2202, such that the base call system 2200 can communicate with and/or power the biosensor 2202. Additionally, the system receptacle 2212 can include a fluid port (e.g., a nozzle) configured to engage the biosensor 2202. In some implementations, the biosensor 2202 is removably coupled to the system receptacle 2212 in a mechanical, electrical, and fluidic manner.

加えて、ベースコールシステム２２００は、他のシステム若しくはネットワークと遠隔で、又は他のバイオアッセイシステム２２００と通信してもよい。バイオアッセイシステム（複数の場合もある）２２００によって得られた検出データは、リモートデータベースに記憶されてもよい。 In addition, the base calling system 2200 may communicate remotely with other systems or networks, or with other bioassay systems 2200. Detection data obtained by the bioassay system(s) 2200 may be stored in a remote database.

図２３は、図２２のシステムで使用することができるシステムコントローラ２２０４のブロック図である。一実装形態では、システムコントローラ２２０４は、互いに通信することができる１つ以上のプロセッサ又はモジュールを含む。プロセッサ又はモジュールのそれぞれは、特定のプロセスを実行するためのアルゴリズム（例えば、有形及び／又は非一時的コンピュータ可読記憶媒体上に記憶された命令）又はサブアルゴリズムを含んでもよい。システムコントローラ２２０４は、モジュールの集合として概念的に例示されるが、専用ハードウェアボード、ＤＳＰ、プロセッサなどの任意の組み合わせを利用して実装されてもよい。あるいは、システムコントローラ２２０４は、単一のプロセッサ又は複数のプロセッサを備えた既製のＰＣを使用して実装されてもよく、機能動作はプロセッサ間に分散される。更なるオプションとして、以下に記載されるモジュールは、特定のモジュール式機能が専用ハードウェアを利用して実行されるハイブリッド構成を利用して実装されてもよく、残りのモジュール式機能は、既製のＰＣなどを利用して実行される。モジュールはまた、処理ユニット内のソフトウェアモジュールとして実装されてもよい。 23 is a block diagram of a system controller 2204 that can be used in the system of FIG. 22. In one implementation, the system controller 2204 includes one or more processors or modules that can communicate with each other. Each of the processors or modules may include an algorithm (e.g., instructions stored on a tangible and/or non-transitory computer-readable storage medium) or sub-algorithm for performing a particular process. The system controller 2204 is conceptually illustrated as a collection of modules, but may be implemented using any combination of dedicated hardware boards, DSPs, processors, etc. Alternatively, the system controller 2204 may be implemented using an off-the-shelf PC with a single processor or multiple processors, with functional operations distributed among the processors. As a further option, the modules described below may be implemented using a hybrid configuration in which certain modular functions are performed using dedicated hardware, while the remaining modular functions are performed using off-the-shelf PCs, etc. The modules may also be implemented as software modules within a processing unit.

動作中、通信ポート２３２０は、バイオセンサ２２０２（図２２）及び／又はサブシステム２２０６、２２０８、２２１０（図２２）から情報（例えば、データ）に情報（例えば、コマンド）を送信してもよい。実装形態では、通信ポート２３２０は、ピクセル信号の複数の配列を出力することができる。通信ポート２３２０は、ユーザインターフェース２２１４からユーザ入力を受信し（図２２）、ユーザインターフェース２２１４にデータ又は情報を送信してもよい。バイオセンサ２２０２又はサブシステム２２０６、２２０８、２２１０からのデータは、バイオアッセイセッション中に、システムコントローラ２２０４によってリアルタイムで処理されてもよい。追加的に又は代替的に、データは、バイオアッセイセッション中にシステムメモリ内に一時的に記憶され、リアルタイム又はオフライン操作より遅く処理されてもよい。 In operation, the communication port 2320 may transmit information (e.g., commands) to the biosensor 2202 (FIG. 22) and/or the subsystems 2206, 2208, 2210 (FIG. 22). In an implementation, the communication port 2320 may output multiple arrays of pixel signals. The communication port 2320 may receive user input from the user interface 2214 (FIG. 22) and transmit data or information to the user interface 2214. Data from the biosensor 2202 or the subsystems 2206, 2208, 2210 may be processed in real-time by the system controller 2204 during a bioassay session. Additionally or alternatively, the data may be temporarily stored in system memory during a bioassay session and processed in slower than real-time or offline operation.

図２３に示すように、システムコントローラ２２０４は、主制御モジュール２３３０と通信する複数のモジュール２３３１～２３３９を含んでもよい。主制御モジュール２３３０は、ユーザインターフェース２２１４と通信してもよい（図２２）。モジュール２３３１～２３３９は、主制御モジュール２３３０と直接通信するものとして示されているが、モジュール２３３１～２３３９はまた、互いに、ユーザインターフェース２２１４と、及びバイオセンサ２２０２と直接通信してもよい。また、モジュール２３３１～２３３９は、他のモジュールを介して主制御モジュール２３３０と通信してもよい。 As shown in FIG. 23, the system controller 2204 may include multiple modules 2331-2339 that communicate with a main control module 2330. The main control module 2330 may communicate with the user interface 2214 (FIG. 22). Although the modules 2331-2339 are shown as communicating directly with the main control module 2330, the modules 2331-2339 may also communicate directly with each other, with the user interface 2214, and with the biosensor 2202. The modules 2331-2339 may also communicate with the main control module 2330 via other modules.

複数のモジュール２３３１～２３３９は、サブシステム２２０６、２２０８、２２１０及び２２０９とそれぞれ通信するシステムモジュール２３３１～２３３３、２３３９を含む。流体制御モジュール２３３１は、流体ネットワークを通る１つ以上の流体の流れを制御するために、流体制御システム２２０６と通信して、流体ネットワークの弁及び流量センサを制御してもよい。流体貯蔵モジュール２３３２は、流体が低い場合、又は廃棄物リザーバが満タン容量又はほぼ満タン容量にあるときにユーザに通知することができる。流体貯蔵モジュール２３３２はまた、流体が所望の温度で貯蔵され得るように、温度制御モジュール２３３３と通信してもよい。照明モジュール２３３９は、所望の反応（例えば、結合事象）が生じた後など、プロトコル中に指定された時間で反応部位を照明するために、照明システム２２０９と通信してもよい。いくつかの実装形態では、照明モジュール２３３９は、照明システム２２０９と通信して、指定された角度で反応部位を照明することができる。 The plurality of modules 2331-2339 include system modules 2331-2333, 2339 that communicate with the subsystems 2206, 2208, 2210, and 2209, respectively. The fluid control module 2331 may communicate with the fluid control system 2206 to control valves and flow sensors of the fluid network to control the flow of one or more fluids through the fluid network. The fluid storage module 2332 may notify a user when fluid is low or when a waste reservoir is at or near full capacity. The fluid storage module 2332 may also communicate with a temperature control module 2333 so that the fluid may be stored at a desired temperature. The illumination module 2339 may communicate with the illumination system 2209 to illuminate the reaction site at a specified time during a protocol, such as after a desired reaction (e.g., a binding event) has occurred. In some implementations, the illumination module 2339 may communicate with the illumination system 2209 to illuminate the reaction site at a specified angle.

複数のモジュール２３３１～２３３９はまた、バイオセンサ２２０２と通信するデバイスモジュール２３３４と、バイオセンサ２２０２に関連する識別情報を決定する識別モジュール２３３５とを含んでもよい。デバイスモジュール２３３４は、例えば、システム受け部２２１２と通信して、バイオセンサがベースコールシステム２２００との電気的及び流体的接続を確立したことを確認することができる。識別モジュール２３３５は、バイオセンサ２２０２を識別する信号を受信してもよい。識別モジュール２３３５は、バイオセンサ２２０２の識別情報を使用して、他の情報をユーザに提供してもよい。例えば、識別モジュール２３３５は、ロット番号、製造日、又はバイオセンサ２２０２で動作することが推奨されるプロトコルを決定し、その後表示してもよい。 The plurality of modules 2331-2339 may also include a device module 2334 that communicates with the biosensor 2202 and an identification module 2335 that determines identification information associated with the biosensor 2202. The device module 2334 may, for example, communicate with the system receptacle 2212 to verify that the biosensor has established electrical and fluidic connection with the base calling system 2200. The identification module 2335 may receive a signal that identifies the biosensor 2202. The identification module 2335 may use the identification information of the biosensor 2202 to provide other information to the user. For example, the identification module 2335 may determine and then display the lot number, date of manufacture, or a recommended protocol to operate with the biosensor 2202.

複数のモジュール２３３１～２３３９はまた、バイオセンサ２２０２から信号データ（例えば、画像データ）を受信及び分析する分析モジュール２３３８（信号処理モジュール又は信号プロセッサとも呼ばれる）も含む。分析モジュール２３３８は、検出データを記憶するためのメモリ（例えば、ＲＡＭ又はフラッシュ）を含む。検出データは、ピクセル信号の複数の配列を含むことができ、それにより、数百万個のセンサ（又はピクセル）のそれぞれからのピクセル信号の配列を、多くのベースコールサイクルにわたって検出することができる。信号データは、その後の分析のために記憶されてもよく、又はユーザインターフェース２２１４に送信されて、所望の情報をユーザに表示することができる。いくつかの実装形態では、信号データは、分析モジュール２３３８が信号データを受信する前に、ソリッドステートイメージャ（例えば、ＣＭＯＳ画像センサ）によって処理され得る。 The modules 2331-2339 also include an analysis module 2338 (also referred to as a signal processing module or signal processor) that receives and analyzes signal data (e.g., image data) from the biosensor 2202. The analysis module 2338 includes memory (e.g., RAM or flash) for storing the detection data. The detection data can include multiple arrays of pixel signals, such that an array of pixel signals from each of millions of sensors (or pixels) can be detected over many base call cycles. The signal data can be stored for subsequent analysis or sent to the user interface 2214 to display desired information to the user. In some implementations, the signal data can be processed by a solid-state imager (e.g., a CMOS image sensor) before the analysis module 2338 receives the signal data.

分析モジュール２３３８は、複数の配列決定サイクルのそれぞれにおいて、光検出器から画像データを取得するように構成されている。画像データは、光検出器によって検出された発光信号から導出され、ニューラルネットワーク（例えば、ニューラルネットワークベースのテンプレート発生器２３４８、ニューラルネットワークベースのベースコーラ２３５８（例えば、図７、図９、及び図１０を参照）、及び／又はニューラルネットワークベースの品質スコアラ２３６８）を通して複数の配列決定サイクルの各々について画像データを処理し、複数の配列決定サイクルの各々において検体のうちの少なくとも一部のためのベースコールを生成する。 The analysis module 2338 is configured to obtain image data from the photodetector in each of the plurality of sequencing cycles. The image data is derived from the luminescence signals detected by the photodetector and processes the image data for each of the plurality of sequencing cycles through a neural network (e.g., a neural network-based template generator 2348, a neural network-based base caller 2358 (see, e.g., FIGS. 7, 9, and 10), and/or a neural network-based quality scorer 2368) to generate base calls for at least some of the analytes in each of the plurality of sequencing cycles.

プロトコルモジュール２３３６及び２３３７は、主制御モジュール２３３０と通信して、所定のアッセイプロトコルを実施する際にサブシステム２２０６、２２０８及び２２１０の動作を制御する。プロトコルモジュール２３３６及び２３３７は、所定のプロトコルに従って特定の動作を実行するようにベースコールシステム２２００に指示するための命令セットを含み得る。図示のように、プロトコルモジュールは、配列決定ごとの合成プロセスを実行するための様々なコマンドを発行するように構成された、合成による配列決定（Sequencing-By-Synthesis、ＳＢＳ）モジュール２３３６であってもよい。ＳＢＳにおいて、核酸テンプレートに沿った核酸プライマーの伸長を監視して、テンプレート中のヌクレオチド配列を判定する。下にある化学プロセスは、重合（例えば、ポリメラーゼ酵素により触媒される）又はライゲーション（例えば、リガーゼ酵素により触媒される）であり得る。特定のポリマー系ＳＢＳの実装形態では、プライマーに付加されるヌクレオチドの順序及びタイプの検出を使用してテンプレートの配列を判定することができるように、蛍光標識ヌクレオチドをテンプレート依存様式でプライマー（それによってプライマーを伸長させる）に添加する。例えば、第１のＳＢＳサイクルを開始するために、１つ以上の標識されたヌクレオチド、ＤＮＡポリメラーゼなどを、核酸テンプレートのアレイを収容するフローセル内に／それを介して送達することができる。核酸テンプレートは、対応する反応部位に位置してもよい。プライマー伸長が、組み込まれる標識ヌクレオチドを、撮像イベントを通して検出することができる、これらの反応部位が検出され得る。撮像事象の間、照明システム２２０９は、反応部位に励起光を提供することができる。任意選択的に、ヌクレオチドは、ヌクレオチドがプライマーに付加されると、更なるプライマー伸長を終結する可逆的終結特性を更に含むことができる。例えば、可逆的ターミネーター部分を有するヌクレオチド類似体をプライマーに付加して、デブロッキング剤が送達されて当該部分が除去されるまで、その後の伸長が起こり得ないようにすることができる。したがって、可逆的終結を使用する実装形態では、フローセルにデブロッキング試薬を送達するためのコマンドが与えられ得る（検出が起こる前又は後）。１つ以上のコマンドは、様々な送達ステップ間の洗浄をもたらすために与えられ得る。次いで、サイクルをｎ回繰り返してプライマーをｎヌクレオチドだけ伸長させ、それによって長さｎの配列を検出することができる。例示的な配列決定技術は、例えば、Ｂｅｎｔｌｅｙら、Ｎａｔｕｒｅ４５６：５３－５９（２００８）、国際公開第０４／０１８４９７号、米国特許第７，０５７，０２６号、国際公開第９１／０６６７８号、国際公開第０７／１２３７４４号、米国特許第７，３２９，４９２号、米国特許第７，２１１，４１４号、米国特許第７，３１５，０１９号、及び米国特許第７，４０５，２８１号に記載されており、これらの各々は、参照により本明細書に組み込まれる。 Protocol modules 2336 and 2337 communicate with main control module 2330 to control the operation of subsystems 2206, 2208, and 2210 in carrying out a predetermined assay protocol. Protocol modules 2336 and 2337 may include instruction sets for instructing base calling system 2200 to perform specific operations according to a predetermined protocol. As shown, the protocol module may be a Sequencing-By-Synthesis (SBS) module 2336 configured to issue various commands to carry out a sequencing-by-synthesis process. In SBS, the extension of a nucleic acid primer along a nucleic acid template is monitored to determine the sequence of nucleotides in the template. The underlying chemical process may be polymerization (e.g., catalyzed by a polymerase enzyme) or ligation (e.g., catalyzed by a ligase enzyme). In certain polymer-based SBS implementations, fluorescently labeled nucleotides are added to the primer (thereby extending the primer) in a template-dependent manner such that detection of the order and type of nucleotides added to the primer can be used to determine the sequence of the template. For example, to initiate a first SBS cycle, one or more labeled nucleotides, DNA polymerase, etc. can be delivered into/through a flow cell housing an array of nucleic acid templates. The nucleic acid templates may be located at corresponding reaction sites. These reaction sites can be detected where primer extension allows the incorporated labeled nucleotides to be detected through an imaging event. During the imaging event, an illumination system 2209 can provide excitation light to the reaction sites. Optionally, the nucleotides can further include a reversible termination property that terminates further primer extension once the nucleotide is added to the primer. For example, a nucleotide analog with a reversible terminator portion can be added to the primer such that no further extension can occur until a deblocking agent is delivered to remove the portion. Thus, in implementations using reversible termination, a command can be given to deliver a deblocking reagent to the flow cell (before or after detection occurs). One or more commands can be given to effect washing between the various delivery steps. The cycle can then be repeated n times to extend the primer by n nucleotides, thereby detecting a sequence of length n. Exemplary sequencing techniques are described, for example, in Bentley et al., Nature 456:53-59 (2008), WO 04/018497, U.S. Pat. No. 7,057,026, WO 91/06678, WO 07/123744, U.S. Pat. No. 7,329,492, U.S. Pat. No. 7,211,414, U.S. Pat. No. 7,315,019, and U.S. Pat. No. 7,405,281, each of which is incorporated herein by reference.

ＳＢＳサイクルのヌクレオチド送達ステップでは、単一のタイプのヌクレオチドのいずれかを一度に送達することができ、又は複数の異なるヌクレオチドタイプ（例えば、Ａ、Ｃ、Ｔ、及びＧを一緒に）を送達することができる。一度に単一のタイプのヌクレオチドのみが存在するヌクレオチド送達構成では、異なるヌクレオチドは、個別化された送達に固有の時間的分離に基づいて区別することができるため、異なるヌクレオチドは別個の標識を有する必要はない。したがって、配列決定方法又は装置は、単一の色検出を使用することができる。例えば、励起源は、単一の波長又は単一の波長範囲の励起のみを提供する必要がある。或る時点で、送達がフローセル内に存在する複数の異なるヌクレオチドをもたらすヌクレオチド送達構成では、異なるヌクレオチドタイプを組み込む部位は、混合物中のそれぞれのヌクレオチドタイプに付着された異なる蛍光標識に基づいて区別することができる。例えば、４つの異なる蛍光団のうちの１つをそれぞれ有する４つの異なるヌクレオチドを使用することができる。一実装形態では、４つの異なる蛍光団は、スペクトルの４つの異なる領域における励起を使用して区別することができる。例えば、４つの異なる励起放射線源を使用することができる。あるいは、４つ未満の異なる励起源を使用することができるが、単一源からの励起放射線の光学的濾過を使用して、フローセルにおいて異なる励起放射線の範囲を生成することができる。 In the nucleotide delivery step of the SBS cycle, any one of a single type of nucleotide can be delivered at a time, or multiple different nucleotide types (e.g., A, C, T, and G together) can be delivered. In nucleotide delivery configurations where only a single type of nucleotide is present at a time, different nucleotides do not need to have separate labels, since they can be distinguished based on the temporal separation inherent to the individualized delivery. Thus, the sequencing method or device can use a single color detection. For example, the excitation source only needs to provide excitation at a single wavelength or a single wavelength range. In nucleotide delivery configurations where delivery results in multiple different nucleotides being present in the flow cell at a given time, the sites incorporating different nucleotide types can be distinguished based on the different fluorescent labels attached to each nucleotide type in the mixture. For example, four different nucleotides can be used, each with one of four different fluorophores. In one implementation, the four different fluorophores can be distinguished using excitation in four different regions of the spectrum. For example, four different excitation radiation sources can be used. Alternatively, less than four different excitation sources can be used, but optical filtering of the excitation radiation from a single source can be used to generate a range of different excitation radiation in the flow cell.

いくつかの実装形態では、４つ未満の異なる色を、４つの異なるヌクレオチドを有する混合物中で検出することができる。例えば、ヌクレオチドの対は、同じ波長で検出することができるが、他と比較して対のうちの１つのメンバーに対する強度の差に基づいて、又は、対の他のメンバーについて検出された信号と比較して明らかな信号を出現又は消失させる、対の１つのメンバーへの変化（例えば、化学修飾、光化学修飾、又は物理的改質を行うことを介して）に基づいて区別され得る。４個未満の色の検出を使用して４個の異なるヌクレオチドを区別するための例示的な装置及び方法が、例えば、米国特許出願第６１／５３８，２９４号及び同第６１／６１９，８７８号に記載されており、それらの全体が参照により本明細書に組み込まれる。２０１２年９月２１日に出願された米国特許出願第１３／６２４，２００号は、その全体が参照により組み込まれる。 In some implementations, fewer than four different colors can be detected in a mixture having four different nucleotides. For example, pairs of nucleotides can be detected at the same wavelength but can be distinguished based on a difference in intensity for one member of the pair compared to the other, or based on a change to one member of the pair (e.g., via making a chemical modification, photochemical modification, or physical modification) that causes a distinct signal to appear or disappear compared to the signal detected for the other member of the pair. Exemplary devices and methods for distinguishing four different nucleotides using detection of fewer than four colors are described, for example, in U.S. Patent Application Nos. 61/538,294 and 61/619,878, which are incorporated by reference in their entireties. U.S. Patent Application No. 13/624,200, filed September 21, 2012, is incorporated by reference in its entirety.

複数のプロトコルモジュールはまた、バイオセンサ２２０２内の製品を増幅するための流体制御システム２２０６及び温度制御システム２２１０にコマンドを発行するように構成された試料調製（又は発生）モジュール２３３７を含んでもよい。例えば、バイオセンサ２２０２は、ベースコールシステム２２００に係合されてもよい。増幅モジュール２３３７は、バイオセンサ２２０２内の反応チャンバに必要な増幅成分を送達するために、流体制御システム２２０６に命令を発行することができる。他の実装形態では、反応部位は、テンプレートＤＮＡ及び／又はプライマーなどの増幅のためのいくつかの成分を既に含有していてもよい。増幅成分を反応チャンバに送達した後、増幅モジュール２３３７は、既知の増幅プロトコルに従って異なる温度段階を通して温度制御システム２２１０にサイクルするように指示し得る。いくつかの実装形態では、増幅及び／又はヌクレオチドの取り込みは、等温的に実行される。 The multiple protocol modules may also include a sample preparation (or generation) module 2337 configured to issue commands to the fluid control system 2206 and the temperature control system 2210 to amplify the product in the biosensor 2202. For example, the biosensor 2202 may be engaged to the base calling system 2200. The amplification module 2337 can issue instructions to the fluid control system 2206 to deliver the necessary amplification components to a reaction chamber in the biosensor 2202. In other implementations, the reaction site may already contain some components for amplification, such as template DNA and/or primers. After delivering the amplification components to the reaction chamber, the amplification module 2337 can instruct the temperature control system 2210 to cycle through different temperature steps according to a known amplification protocol. In some implementations, the amplification and/or incorporation of nucleotides is performed isothermally.

ＳＢＳモジュール２３３６は、クローン性アンプリコンのクラスタがフローセルのチャネル内の局所領域上に形成されるブリッジＰＣＲを実行するコマンドを発行することができる。ブリッジＰＣＲを介してアンプリコンを発生させた後、アンプリコンを「線形化」して、一本鎖テンプレートＤＮＡを作製してもよく、ｓｓｔＤＮＡ及び配列決定プライマーは、関心領域に隣接する普遍配列にハイブリダイズされてもよい。例えば、合成方法による可逆的ターミネーター系配列決定を、上記のように又は以下のように使用することができる。 The SBS module 2336 can issue commands to perform bridge PCR in which clusters of clonal amplicons are formed over localized regions within the flow cell channel. After generating amplicons via bridge PCR, the amplicons may be "linearized" to create single-stranded template DNA, and sstDNA and sequencing primers may be hybridized to universal sequences flanking the region of interest. For example, reversible terminator-based sequencing by synthesis methods can be used as described above or as follows.

各ベースコール又は配列決定サイクルは、例えば、修飾ＤＮＡポリメラーゼ及び４タイプのヌクレオチドの混合物を使用することによって達成することができる単一の塩基によってｓｓｔＤＮＡを延長することができる。異なるタイプのヌクレオチドは、固有の蛍光標識を有することができ、各ヌクレオチドは、各サイクルにおいて単一塩基の組み込みのみが生じることを可能にする可逆的ターミネーターを更に有し得る。単一の塩基をｓｓｔＤＮＡに添加した後、励起光が反応部位に入射し、蛍光発光を検出することができる。検出後、蛍光標識及びターミネーターは、ｓｓｔＤＮＡから化学的に切断され得る。別の同様のベースコール又は配列決定サイクルは、以下の通りであってもよい。そのような配列決定プロトコルでは、ＳＢＳモジュール２３３６は、バイオセンサ２２０２を通る試薬及び酵素溶液の流れを方向付けるように流体制御システム２２０６に指示することができる。本明細書に記載される装置及び方法とともに利用することができる例示的な可逆性ターミネーターベースのＳＢＳ方法は、米国特許出願公開第２００７／０１６６７０５（Ａ１）号、米国特許出願公開第２００６／０１８８９０１（Ａ１）号、米国特許第７，０５７，０２６号、米国特許出願公開第２００６／０２４０４３９（Ａ１）号、米国特許出願公開第２００６／０２８１４７１４７０９（Ａ１）号、国際公開第０５／０６５８１４号、国際公開第０６／０６４１９９号に記載されており、これらの各々は、その全体が参照により本明細書に組み込まれる。可逆性ターミネーターベースのＳＢＳの例示的な試薬が、米国特許第７，５４１，４４４号、米国特許第７，０５７，０２６号、米国特許第７，４２７，６７３号、米国特許第７，５６６，５３７号、及び米国特許第７，５９２，４３５号に記載されており、これらの各々は、その全体が参照により本明細書に組み込まれる。 Each base calling or sequencing cycle can extend the sstDNA by a single base, which can be accomplished, for example, by using a modified DNA polymerase and a mixture of four types of nucleotides. The different types of nucleotides can have unique fluorescent labels, and each nucleotide can further have a reversible terminator that allows only a single base incorporation to occur in each cycle. After the single base is added to the sstDNA, excitation light can be incident on the reaction site and the fluorescent emission can be detected. After detection, the fluorescent label and the terminator can be chemically cleaved from the sstDNA. Another similar base calling or sequencing cycle can be as follows. In such a sequencing protocol, the SBS module 2336 can instruct the fluid control system 2206 to direct the flow of reagents and enzyme solutions through the biosensor 2202. Exemplary reversible terminator-based SBS methods that can be utilized with the devices and methods described herein are described in U.S. Patent Application Publication No. 2007/0166705 (A1), U.S. Patent Application Publication No. 2006/0188901 (A1), U.S. Pat. No. 7,057,026, U.S. Patent Application Publication No. 2006/0240439 (A1), U.S. Patent Application Publication No. 2006/02814714709 (A1), WO 05/065814, WO 06/064199, each of which is incorporated herein by reference in its entirety. Exemplary reversible terminator-based SBS reagents are described in U.S. Pat. No. 7,541,444, U.S. Pat. No. 7,057,026, U.S. Pat. No. 7,427,673, U.S. Pat. No. 7,566,537, and U.S. Pat. No. 7,592,435, each of which is incorporated herein by reference in its entirety.

いくつかの実装形態では、増幅及びＳＢＳモジュールは、単一のアッセイプロトコルで動作してもよく、例えば、テンプレート核酸は増幅され、続いて同じカートリッジ内で配列される。 In some implementations, the amplification and SBS modules may operate in a single assay protocol, e.g., template nucleic acids are amplified and subsequently sequenced within the same cartridge.

ベースコールシステム２２００はまた、ユーザがアッセイプロトコルを再構成することを可能にし得る。例えば、ベースコールシステム２２００は、決定されたプロトコルを修正するために、ユーザインターフェース２２１４を通じてユーザにオプションを提供することができる。例えば、バイオセンサ２２０２が増幅のために使用されると決定された場合、ベースコールシステム２２００は、アニーリングサイクルの温度を要求し得る。 The base calling system 2200 may also allow the user to reconfigure the assay protocol. For example, the base calling system 2200 may provide the user with an option through the user interface 2214 to modify the determined protocol. For example, if it is determined that the biosensor 2202 is to be used for amplification, the base calling system 2200 may request the temperature of the annealing cycle.

更に、ベースコールシステム２２００は、選択されたアッセイプロトコルに対して一般的に許容されないユーザ入力をユーザが提供した場合に、ユーザに警告を発行し得る。 Furthermore, the base calling system 2200 may issue a warning to the user if the user provides user input that is not generally acceptable for the selected assay protocol.

実装形態では、バイオセンサ２２０２は、センサ（又はピクセル）のミリオンを含み、それらのそれぞれは、連続するベースコールサイクルにわたって複数のピクセル信号の配列を発生させる。分析モジュール２３３８は、センサのアレイ上のセンサの行方向及び／又は列方向の位置に従って、ピクセル信号の複数の配列を検出し、それらを対応するセンサ（又はピクセル）に属させる。 In an implementation, biosensor 2202 includes a million sensors (or pixels), each of which generates a sequence of multiple pixel signals over successive base call cycles. Analysis module 2338 detects the multiple sequences of pixel signals and attributes them to corresponding sensors (or pixels) according to the row-wise and/or column-wise positions of the sensors on the array of sensors.

センサのアレイ内の各センサは、フローセルのタイルのセンサデータを生成することができ、ここで、遺伝物質のクラスタがベースコール動作中に配置されるフローセル上の領域内のタイル。センサデータは、ピクセルのアレイ内の画像データを含むことができる。所与のサイクルについて、センサデータは、２つ以上の画像を含むことができ、タイルデータとして１ピクセル当たり複数の特徴を生成する。 Each sensor in the array of sensors can generate sensor data for a tile of a flow cell, where the tile is in an area on the flow cell where a cluster of genetic material is placed during a base calling operation. The sensor data can include image data in an array of pixels. For a given cycle, the sensor data can include two or more images, generating multiple features per pixel as tile data.

図２４は、開示される技術を実装するために使用することができるコンピュータ２４００システムの簡略ブロック図である。コンピュータシステム２４００は、バスサブシステム２４５５を介して多数の周辺デバイスと通信する少なくとも１つの中央処理ユニット（ＣＰＵ）２４７２を含む。これらの周辺デバイスは、例えば、メモリデバイス及びファイル記憶サブシステム２４３６を含む記憶サブシステム２４１０、ユーザインターフェース入力デバイス２４３８、ユーザインターフェース出力デバイス２４７６、並びにネットワークインターフェースサブシステム２４７４を含むことができる。入力デバイス及び出力デバイスは、コンピュータシステム２４００とのユーザ相互作用を可能にする。ネットワークインターフェースサブシステム２４７４は、他のコンピュータシステム内の対応するインターフェースデバイスへのインターフェースを含む外部ネットワークへのインターフェースを提供する。 24 is a simplified block diagram of a computer 2400 system that can be used to implement the disclosed techniques. The computer system 2400 includes at least one central processing unit (CPU) 2472 that communicates with a number of peripheral devices via a bus subsystem 2455. These peripheral devices can include, for example, a storage subsystem 2410 including memory devices and a file storage subsystem 2436, user interface input devices 2438, user interface output devices 2476, and a network interface subsystem 2474. The input and output devices allow user interaction with the computer system 2400. The network interface subsystem 2474 provides an interface to external networks, including interfaces to corresponding interface devices in other computer systems.

ユーザインターフェース入力デバイス２４３８は、キーボード、マウス、トラックボール、タッチパッド、又はグラフィックスタブレットなどのポインティングデバイス、スキャナ、ディスプレイに組み込まれたタッチスクリーン、音声認識システム及びマイクロフォンなどのオーディオ入力デバイス、並びに他のタイプの入力デバイスを含むことができる。一般に、用語「入力デバイス」の使用は、コンピュータシステム２４００に情報を入力するための全ての可能なタイプのデバイス及び方式を含むことを意図している。 User interface input devices 2438 can include pointing devices such as keyboards, mice, trackballs, touchpads, or graphics tablets, scanners, touch screens integrated into displays, audio input devices such as voice recognition systems and microphones, and other types of input devices. In general, use of the term "input device" is intended to include all possible types of devices and methods for inputting information into computer system 2400.

ユーザインターフェース出力デバイス２４７６は、ディスプレイサブシステム、プリンタ、ファックス装置、又はオーディオ出力デバイスなどの非視覚ディスプレイを含むことができる。ディスプレイサブシステムは、ＬＥＤディスプレイ、陰極線管（Cathode Ray Tube、ＣＲＴ）、液晶ディスプレイ（Liquid Crystal Display、ＬＣＤ）などのフラットパネルデバイス、投影デバイス、又は可視画像を作成するための何らかの他の機構を含むことができる。ディスプレイサブシステムはまた、オーディオ出力デバイスなどの非視覚ディスプレイを提供することができる。一般に、用語「出力デバイス」の使用は、コンピュータシステム２４００からユーザ又は別のマシン若しくはコンピュータシステムに情報を出力するための、全ての可能なタイプのデバイス及び方式を含むことを意図している。 The user interface output devices 2476 may include a display subsystem, a printer, a fax machine, or a non-visual display such as an audio output device. The display subsystem may include a flat panel device such as an LED display, a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), a projection device, or some other mechanism for creating a visible image. The display subsystem may also provide a non-visual display such as an audio output device. In general, use of the term "output device" is intended to include all possible types of devices and manners for outputting information from the computer system 2400 to a user or to another machine or computer system.

記憶サブシステム２４１０は、本明細書に記載されるモジュール及び方法のうちのいくつか又は全ての機能を提供するプログラミング及びデータ構築物を記憶する。これらのソフトウェアモジュールは、概して、深層学習プロセッサ２４７８によって実行される。 The storage subsystem 2410 stores programming and data constructs that provide the functionality of some or all of the modules and methods described herein. These software modules are generally executed by the deep learning processor 2478.

一実装形態では、ニューラルネットワークは、深層学習プロセッサ２４７８を使用して実装され、構成可能で再構成可能なプロセッサ、フィールドプログラマブルゲートアレイ（ＦＰＧＡ）、特定用途向け集積回路（ＡＳＩＣ）、及び／又は粗粒化された再構成可能アーキテクチャ（ＣＧＲＡ）及びグラフィック処理ユニット（ＧＰＵ）他の構成されたデバイスであり得る。深層学習プロセッサ２４７８は、ＧｏｏｇｌｅＣｌｏｕｄＰｌａｔｆｏｒｍ（商標）、Ｘｉｌｉｎｘ（商標）及びＣｉｒｒａｓｃａｌｅ（商標）などの深層学習クラウドプラットフォームによってホスティングすることができる。深層学習プロセッサ１４９７８の例には、ＧｏｏｇｌｅのＴｅｎｓｏｒＰｒｏｃｅｓｓｉｎｇＵｎｉｔ（ＴＰＵ）（商標）、ＧＸ４ＲａｃｋｍｏｕｎｔＳｅｒｉｅｓ（商標）、ＧＸ１４９ＲａｃｋｍｏｕｎｔＳｅｒｉｅｓ（商標）のようなラックマウントソリューション、ＮＶＩＤＩＡＤＧＸ－１（商標）、ＭｉｃｒｏｓｏｆｔのＳｔｒａｔｉｘＶＦＰＧＡ（商標）、ＧｒａｐｈｃｏｒｅのＩｎｔｅｌｌｉｇｅｎｔＰｒｏｃｅｓｓｏｒＵｎｉｔ（ＩＰＵ）（商標）、Ｓｎａｐｄｒａｇｏｎｐｒｏｃｅｓｓｏｒｓ（商標）を有するＱｕａｌｃｏｍｍのＺｅｒｏｔｈＰｌａｔｆｏｒｍ（商標）、ＮＶＩＤＩＡのＶｏｌｔａ（商標）、ＮＶＩＤＩＡのＤＲＩＶＥＰＸ（商標）、ＮＶＩＤＩＡのＪＥＴＳＯＮＴＸ１／ＴＸ２ＭＯＤＵＬＥ（商標）、ＩｎｔｅｌのＮｉｒｖａｎａ（商標）、ＭｏｖｉｄｉｕｓＶＰＵ（商標）、富士通のＤＰＩ（商標）、ＡＲＭのＤｙｎａｍｉｃＩＱ（商標）、ＩＢＭのＴｒｕｅＮｏｒｔｈ（商標）などが含まれる。 In one implementation, the neural network is implemented using a deep learning processor 2478, which may be a configurable and reconfigurable processor, a field programmable gate array (FPGA), an application specific integrated circuit (ASIC), and/or a coarse-grained reconfigurable architecture (CGRA) and a graphics processing unit (GPU) or other configured device. The deep learning processor 2478 may be hosted by a deep learning cloud platform such as Google Cloud Platform™, Xilinx™, and Cirrascale™. Examples of deep learning processors 14978 include Google's Tensor Processing Unit (TPU)™, rackmount solutions such as the GX4 Rackmount Series™ and GX149 Rackmount Series™, NVIDIA DGX-1™, Microsoft's Stratix V FPGA™, Graphcore's Intelligent Processor Unit (IPU)™, Qualcomm's Zeroth Platform™ with Snapdragon processors™, NVIDIA's Volta™, NVIDIA's DRIVE™, NVIDIA's NVIDIA GPU ... These include PX(trademark), NVIDIA's JETSON TX1/TX2 MODULE(trademark), Intel's Nirvana(trademark), Movidius VPU(trademark), Fujitsu's DPI(trademark), ARM's DynamicIQ(trademark), IBM's TrueNorth(trademark), and more.

記憶サブシステム２４１０で使用されるメモリサブシステム２４２２は、プログラム実行中に命令及びデータを記憶するためのメインランダムアクセスメモリ（random access memory、ＲＡＭ）２４３４と、固定命令が記憶された読み取り専用メモリ（read only memory、ＲＯＭ）２４３２とを含む多数のメモリを含むことができる。ファイル記憶サブシステム２４３６は、プログラム及びデータファイルのための永続的な記憶装置を提供することができ、ハードディスクドライブ、関連する取り外し可能な媒体を伴うフロッピーディスク、ＣＤ－ＲＯＭドライブ、光学ドライブ、又は取り外し可能な媒体カートリッジを含むことができる。或る特定の実装形態の機能を実装するモジュールは、記憶サブシステム２４１０内のファイル記憶サブシステム２４３６によって、又はプロセッサによってアクセス可能な他のマシン内に記憶することができる。 The memory subsystem 2422 used in the storage subsystem 2410 may include multiple memories, including a main random access memory (RAM) 2434 for storing instructions and data during program execution, and a read only memory (ROM) 2432 in which fixed instructions are stored. The file storage subsystem 2436 may provide persistent storage for program and data files, and may include a hard disk drive, a floppy disk with associated removable media, a CD-ROM drive, an optical drive, or a removable media cartridge. Modules implementing the functionality of a particular implementation may be stored by the file storage subsystem 2436 in the storage subsystem 2410, or in another machine accessible by the processor.

バスサブシステム２４５５は、コンピュータシステム２４００の様々な構成要素及びサブシステムを、意図されるように互いに通信させるための機構を提供する。バスサブシステム２４５５は、単一のバスとして概略的に示されているが、バスサブシステムの代替の実装形態は、複数のバスを使用することができる。 Bus subsystem 2455 provides a mechanism for allowing the various components and subsystems of computer system 2400 to communicate with each other as intended. Although bus subsystem 2455 is shown diagrammatically as a single bus, alternative implementations of the bus subsystem may use multiple buses.

コンピュータシステム２４００自体は、パーソナルコンピュータ、ポータブルコンピュータ、ワークステーション、コンピュータ端末、ネットワークコンピュータ、テレビ、メインフレーム、サーバファーム、緩くネットワーク化されたコンピュータの緩く分散したセット、又は任意の他のデータ処理システム若しくはユーザデバイスを含む様々なタイプのものであり得る。コンピュータ及びネットワークは絶え間なく変化する性質のものであるため、図２３に示されるコンピュータシステム２４００の説明は、本発明の好ましい実装形態を例示する目的のための特定の例としてのみ意図される。コンピュータシステム２４００の多くの他の構成は、図２３に示されるコンピュータシステムより多くの又は少ない構成要素を有することができる。 The computer system 2400 itself can be of various types, including a personal computer, a portable computer, a workstation, a computer terminal, a network computer, a television, a mainframe, a server farm, a loosely distributed set of loosely networked computers, or any other data processing system or user device. Due to the ever-changing nature of computers and networks, the description of computer system 2400 shown in FIG. 23 is intended only as a specific example for purposes of illustrating a preferred implementation of the invention. Many other configurations of computer system 2400 can have more or fewer components than the computer system shown in FIG. 23.

条項
項セット１（２つのベースコーラの分類情報からの最終分類の生成）
１．少なくとも２つのベースコーラを使用するベースコールのためのコンピュータ実装方法であって、
一連の感知サイクルにおける感知サイクルについて生成されたセンサデータに対して少なくとも第１のベースコーラ及び第２のベースコーラを実行することと、
第１のベースコーラによって、センサデータに対して第１のベースコーラを実行することに基づいて、センサデータに関連付けられた第１の分類情報を生成することと、
第２のベースコーラによって、センサデータに対して第２のベースコーラを実行することに基づいて、センサデータに関連付けられた第２の分類情報を生成することと、
第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することであって、最終分類情報が、センサデータに対する１つ以上のベースコールを含むことと、を含む、コンピュータ実装方法。
２．第１のベースコーラ及び第２のベースコーラのうちの少なくとも１つが非線形関数を実装し、第１のベースコーラ及び第２のベースコーラのうちの少なくとももう１つが少なくとも部分的に線形である、項１に記載の方法。
３．第１のベースコーラ及び第２のベースコーラのうちの少なくとも１つがニューラルネットワークモデルを実装し、第１のベースコーラ及び第２のベースコーラのうちの少なくとももう１つがニューラルネットワークモデルを含まない、項１に記載の方法。
４．
第１のベースコーラによって生成された第１の分類情報が、各ベースコールサイクルについて、（ｉ）第１の複数のスコアであって、第１の複数のスコアの各スコアが、呼び出される塩基がＡ、Ｃ、Ｔ、又はＧのうちの１つである確率を示す、第１の複数のスコアと、（ｉｉ）第１の呼び出される塩基と、を含み、
第２のベースコーラによって生成された第２の分類情報が、各ベースコールサイクルについて、（ｉ）第２の複数のスコアであって、第２の複数のスコアの各スコアが、呼び出される塩基がＡ、Ｃ、Ｔ、又はＧのうちの１つである確率を示す、第２の複数のスコアと、（ｉｉ）第２の呼び出される塩基と、を含む、項１に記載の方法。
５．
最終分類情報が、各ベースコールサイクルについて、（ｉ）第３の複数のスコアであって、第３の複数のスコアの各スコアが、呼び出される塩基がＡ、Ｃ、Ｔ、又はＧのうちの１つである確率を示す、第３の複数のスコアと、（ｉｉ）最終の呼び出される塩基と、を含む、項４に記載の方法。
６．第１のベースコーラ及び第２のベースコーラのうちの少なくとも１つが、ソフトマックス関数を使用して、対応する複数のスコアを生成する、項４に記載の方法。
７．最終分類情報を生成することが、センサデータに関連付けられたコンテキスト情報に基づいて、第１の分類情報と第２の分類情報とを選択的に組み合わせることによって、最終分類情報を生成することを含む、項１に記載の方法。
８．センサデータに関連付けられたコンテキスト情報が、時間的コンテキスト情報、空間的コンテキスト情報、塩基配列コンテキスト情報、及び他のコンテキスト情報を含む、項７に記載の方法。
９．センサデータに関連付けられたコンテキスト情報が、センサデータに関連付けられた１つ以上のベースコールサイクル数を示す時間的コンテキスト情報を含む、項７に記載の方法。
１０．センサデータに関連付けられたコンテキスト情報が、センサデータを生成するフローセル内の１つ以上のタイルの位置を示す空間的コンテキスト情報を含む、項７に記載の方法。
１１．センサデータに関連付けられたコンテキスト情報が、センサデータを生成するフローセルのタイル内の１つ以上のクラスタの位置を示す空間的コンテキスト情報を含む、項７に記載の方法。
１１Ａ．空間的コンテキスト情報が、センサデータを生成するフローセルのタイル内の１つ以上のクラスタがエッジクラスタであるか非エッジクラスタであるかを示す、項１１に記載の方法。
１１Ｂ．クラスタが、クラスタがタイルのエッジから閾値距離内に位置すると推定される場合、エッジクラスタとして分類される、項１１Ａに記載の方法。
１１Ｃ．クラスタが、クラスタがタイルの任意のエッジから閾値距離を超えて位置すると推定される場合、非エッジクラスタとして分類される、項１１Ａに記載の方法。
１２．センサデータに関連付けられたコンテキスト情報が、センサデータについて呼び出される塩基配列を示す塩基配列コンテキスト情報を含む、項７に記載の方法。
１３．
呼び出される特定の塩基について、第１の分類情報が、呼び出される塩基がそれぞれＡ、Ｃ、Ｔ、及びＧである確率を示す第１のスコア、第２のスコア、第３のスコア、及び第４のスコアを含み、
特定の呼び出される塩基について、第２の分類情報が、呼び出される塩基がそれぞれＡ、Ｃ、Ｔ及びＧである確率を示す第５のスコア、第６のスコア、第７のスコア及び第８のスコアを含み、
最終分類情報を生成することが、
呼び出される特定の塩基について、第１のスコア、第２のスコア、第３のスコア、第４のスコア、第５のスコア、第６のスコア、第７のスコア、及び第８のスコアに基づいて最終分類情報を生成することを含む、項１に記載の方法。
１４．
最終スコアが、第１のスコア及び第５のスコアの関数である第１の最終スコアを含み、第１の最終スコアが、呼び出される塩基がＡである確率を示し、
最終スコアが、第２のスコア及び第６のスコアの関数である第２の最終スコアを含み、第２の最終スコアが、呼び出される塩基がＣである確率を示し、
最終スコアが、第３のスコア及び第７のスコアの関数である第３の最終スコアを含み、第３の最終スコアが、呼び出される塩基がＴである確率を示し、
最終スコアが、第４のスコア及び第８のスコアの関数である第４の最終スコアを含み、第４の最終スコアが、呼び出される塩基がＧである確率を示す、項１３に記載の方法。
１５．
第１の最終スコアが、第１のスコア及び第５のスコアの平均、正規化された加重平均、最小値、又は最大値であり、
第２の最終スコアが、第２のスコア及び第６のスコアの平均、正規化された加重平均、最小値、又は最大値であり、
第３の最終スコアが、第３のスコア及び第７のスコアの平均、正規化された加重平均、最小値、又は最大値であり、
第４の最終スコアが、第４のスコア及び第８のスコアの平均、正規化された加重平均、最小値又は最大値である、項１４に記載の方法。
１６．
呼び出される特定の塩基について、第１の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つであり、第１のスコア、第２のスコア、第３のスコア、及び第４のスコアの中で最も高い対応するスコアを有する第１の呼び出される塩基を含み、
呼び出される特定の塩基について、第２の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つであり、第５のスコア、第６のスコア、第７のスコア、及び第８のスコアの中で最も高い対応するスコアを有する第２の呼び出される塩基を含む、項１４に記載の方法。
１７．
呼び出される特定の塩基について、第１の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つである第１の呼び出される塩基を含み、
呼び出される特定の塩基について、第２の分類情報が、第１の呼び出される塩基と同じ第２の呼び出される塩基を含み、
最終分類情報を生成することが、
呼び出される特定の塩基について、最終分類情報が第１の呼び出される塩基及び第２の呼び出される塩基と一致する最終の呼び出される塩基を含むように、最終分類情報を生成することを含む、項１に記載の方法。
１８．
呼び出される特定の塩基について、第１の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つである第１の呼び出される塩基を含み、
呼び出される特定の塩基について、第２の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちのもう１つである第２の呼び出される塩基を含み、それによって、第２の呼び出される塩基が第１の呼び出される塩基と一致せず、
最終分類情報を生成することが、
呼び出される特定の塩基について、最終分類情報が、（ｉ）第１の呼び出される塩基、（ｉｉ）第２の呼び出される塩基、又は（ｉｉｉ）のうちの１つが不確定としてマークされている最終の呼び出される塩基を含むように、最終分類情報を生成することを含む、項１に記載の方法。
１９．
第１の分類情報、第２の分類情報、又は最終分類情報のうちの少なくとも１つが、呼び出される塩基配列が特定の塩基配列パターンを有することを示し、呼び出される塩基配列が特定の塩基配列パターンを有することのインジケーションに応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することであり、第１の重みと第２の重みとは異なる、項１に記載の方法。
２０．
特定の塩基配列パターンが、ホモポリマーパターン又は近ホモポリマーパターンを含む、項１９に記載の方法。
２０ａ．
特定の塩基配列パターンが、ホモポリマーパターン又は隣接ホモポリマーを有するパターンを含む、項１９に記載の方法。
２１．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基がＧである、項１９に記載の方法。
２１ａ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、少なくとも最初及び最後の塩基がＧである、項１９に記載の方法。
２２．
特定の塩基配列パターンが、複数の塩基を含み、特定の塩基配列パターンの複数の塩基の大部分がＧである、項１９に記載の方法。
２２ａ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基がＧである、項１９に記載の方法。
２２Ａ．
特定の塩基配列パターンが、ＧＧＸＧＧ、ＧＸＧＧＧ、ＧＧＧＸＧ、ＧＸＸＧＧ、ＧＧＸＸＧのいずれかを含み、Ｘが、Ａ、Ｃ、Ｔ、又はＧのいずれかである、項１９に記載の方法。
２２Ｂ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項１９に記載の方法。
２２Ｂ１．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、少なくとも最初の塩基及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項１９に記載の方法。
２２Ｃ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初の塩基及び最後の塩基のそれぞれのベースコールが、暗サイクルに関連付けられている、項１９に記載の方法。
２２Ｄ．
特定の塩基配列パターンが複数の塩基を含み、特定の塩基配列パターンの塩基の少なくとも大部分のそれぞれが不活性ベースコールに関連付けられている、項１９に記載の方法。
２２Ｅ．
特定の塩基配列パターンが、複数の塩基を含み、特定の塩基配列パターンの塩基の少なくとも大部分のそれぞれが、暗サイクルに関連付けられている、項１９に記載の方法。
２３．
第１の重みが第２の重みよりも低く、それによって、最終分類情報を生成しながら、第１の分類情報が第２の分類情報よりも低く重み付けされる、項１９に記載の方法。
２４．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まない、項２３に記載の方法。
２５．
第１の重みが９０％よりも高く、第２の重みが１０％よりも低い、項１９に記載の方法。
２６．
センサデータが、（ｉ）第１の１つ以上の感知サイクルについての第１のセンサデータと、（ｉｉ）第１の１つ以上の感知サイクルに続いて生じる第２の１つ以上の感知サイクルについての第２のセンサデータと、を含み、
最終分類情報が、
（ｉ）（ａ）第１の１つ以上の感知サイクルに関連付けられた第１の分類情報に第１の重みを置き、（ｂ）第１の１つ以上の感知サイクルに関連付けられた第２の分類情報に第２の重みを置くことによって生成される、第１の１つ以上の感知サイクルについての第１の最終分類情報と、
（ｉ）（ａ）第２の１つ以上の感知サイクルに関連付けられた第１の分類情報に第３の重みを置き、（ｂ）第２の１つ以上の感知サイクルに関連付けられた第２の分類情報に第４の重みを置くことによって生成される、第２の１つ以上の感知サイクルについての第２の最終分類情報と、を含み、
第１、第２、第３、及び第４の重みが異なる、項１に記載の方法。
２７．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも低く、それにより、第１の１つ以上の感知サイクルについて、第２のベースコーラからの第２の分類情報が、第１のベースコーラからの第１の分類情報よりも強調され、
第３の重みが第４の重みよりも高く、それにより、第２の１つ以上の感知サイクルについて、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項２６に記載の方法。
２８．
センサデータが、（ｉ）フローセルのタイルの第１の１つ以上のクラスタからの第１のセンサデータと、
（ｉｉ）フローセルのタイルの第２の１つ以上のクラスタからの第２のセンサデータと、を含み、最終分類情報が、
（ｉ）第１の１つ以上のクラスタからの第１のセンサデータについての第１の最終分類情報であって、（ａ）第１の１つ以上のクラスタからの第１の分類情報に第１の重みを置き、（ｂ）第１の１つ以上のクラスタからの第２の分類情報に第２の重みを置くことによって生成される、第１の最終分類情報と、
（ｉ）第２の１つ以上のクラスタからの第２のセンサデータについての第２の最終分類情報であって、（ａ）第２の１つ以上のクラスタからの第１の分類情報に第３の重みを置き、（ｂ）第２の１つ以上のクラスタからの第２の分類情報に第４の重みを置くことによって生成される、第２の最終分類情報と、を含み、
第１、第２、第３、及び第４の重みが異なる、項１に記載の方法。
２９．
第１の１つ以上のクラスタが、フローセルのタイルの１つ以上のエッジから閾値距離内に配置されたエッジクラスタであり、
第２の１つ以上のクラスタが、フローセルのタイルの１つ以上のエッジから閾値距離を超えて配置された非エッジクラスタである、項２８に記載の方法。
３０．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも高く、それにより、第１の１つ以上のエッジクラスタについて、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項２９に記載の方法。
３１．
第３の重みが第４の重み以下であり、それにより、第２の１つ以上の非エッジクラスタについて、第１のベースコーラからの第１の分類情報は、第２のベースコーラからの第２の分類情報以下で強調される、項３０に記載の方法。
３２．
センサデータから、フローセルのタイルの少なくとも１つのクラスタ内の１つ以上の気泡の存在を検出することを更に含み、
最終分類情報を生成することが、
１つ以上の気泡の検出に応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することであって、第１の重みと第２の重みとは異なることを含む、項１に記載の方法。
３３．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも高く、それによって、１つ以上の気泡の検出に応答して、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項３２に記載の方法。
３４．センサデータが、少なくとも１つの画像を含み、方法が、
少なくとも１つの画像が焦点外画像であることを検出することを更に含み、
最終分類情報を生成することが、
焦点外画像の検出に応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することであって、第１の重みと第２の重みとは異なることを含む、項１に記載の方法。
３５．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも高く、それによって、焦点外画像の検出に応答して、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項３２に記載の方法。
３６．
センサデータが、複数の配列決定サイクルに関連付けられ、
第１の分類情報が、複数の配列決定サイクルに対応する第１の呼び出される塩基配列を含み、第２の分類情報が、複数の配列決定サイクルに対応する第２の呼び出される塩基配列を含み、
第１の呼び出される塩基配列と第２の呼び出される塩基配列とは一致せず、第１又は第２の呼び出される塩基配列の少なくとも一方が、特定の塩基配列パターンを有し、
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
最終分類情報を生成することが、
（ｉ）特定の塩基配列パターンを有する第１又は第２の呼び出される塩基配列のうちの少なくとも１つ、及び（ｉｉ）ニューラルネットワークモデルを含まない第２のベースコーラに応答して、最終分類情報の最終の呼び出される塩基配列が、第２の呼び出される塩基配列と一致し、第１の呼び出される塩基配列と一致しないように、最終分類情報を生成することを含む、項１に記載の方法。
３７．
特定の塩基配列パターンが、ホモポリマーパターン又は近ホモポリマーパターンを含む、項３６に記載の方法。
３８．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基がＧである、項３６に記載の方法。
３９．
特定の塩基配列パターンが、複数の塩基を含み、特定の塩基配列パターンの塩基の少なくとも大部分がＧである、項３６に記載の方法。
３９ａ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基がＧである、項３６に記載の方法。
３９Ａ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項３６に記載の方法。
３９Ｂ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初の塩基及び最後の塩基のそれぞれのベースコールが、暗サイクルに関連付けられている、項３６に記載の方法。
３９Ｃ．
特定の塩基配列パターンが複数の塩基を含み、特定の塩基配列パターンの塩基の少なくとも大部分のそれぞれが不活性ベースコールに関連付けられている、項３６に記載の方法。
３９Ｄ．
特定の塩基配列パターンが、複数の塩基を含み、特定の塩基配列パターンの塩基の少なくとも大部分のそれぞれが、暗サイクルに関連付けられている、項３６に記載の方法。
４０．最終分類情報を生成することが、機械学習モデルによって、第１のベースコーラから、センサデータに関連付けられた第１の分類情報を受信することと、
機械学習モデルによって、第２のベースコーラから、センサデータに関連付けられた第２の分類情報を受信することと、
機械学習モデルによって、第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することと、を含む、項１に記載の方法。
４０ａ．機械学習モデルが、ロジスティック回帰モデル、勾配ブーストツリーモデル、ランダムフォレストモデル、ナイーブベイズモデル、又はニューラルネットワークモデルのいずれかである、項４０に記載の方法。
４０ｂ．最終分類情報を生成することが、ニューラルネットワークモデルによって、第１のベースコーラから、センサデータに関連付けられた第１の分類情報を受信することと、
ニューラルネットワークモデルによって、第２のベースコーラから、センサデータに関連付けられた第２の分類情報を受信することと、
ニューラルネットワークモデルによって、第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することと、を含む、項１に記載の方法。
４１．コンピュータ実装方法であって、
一連の感知サイクル内の感知サイクルのためのセンサデータを生成することと、
センサデータの少なくとも対応する部分に対して少なくとも第１のベースコーラ及び第２のベースコーラを実行し、センサデータに関連付けられたコンテキスト情報に基づいて第１及び第２のベースコーラの実行を選択的に切り替えることであって、第１のベースコーラと第２のベースコーラとは異なることと、
第１のベースコーラ及び第２のベースコーラによって、それぞれ、第１の分類情報及び第２の分類情報を生成することと、
第１の分類情報及び第２の分類情報の一方又は両方に基づいて、ベースコールを生成することと、を含む、コンピュータ実装方法。
４２．ベースコーラを漸進的に訓練するためのコンピュータプログラム命令が記憶された非一時的コンピュータ可読記憶媒体であって、命令が、プロセッサ上で実行されると、
一連の感知サイクルにおける感知サイクルについて生成されたセンサデータに対して少なくとも第１のベースコーラ及び第２のベースコーラを実行することと、
第１のベースコーラによって、センサデータに対して第１のベースコーラを実行することに基づいて、センサデータに関連付けられた第１の分類情報を生成することと、
第２のベースコーラによって、センサデータに対して第２のベースコーラを実行することに基づいて、センサデータに関連付けられた第２の分類情報を生成することと、
第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することであって、最終分類情報が、センサデータに対する１つ以上のベースコールを含むことと、を含む、方法を実装する、非一時的コンピュータ可読記憶媒体。
４３．第１のベースコーラ及び第２のベースコーラのうちの少なくとも１つが非線形関数を実装し、第１のベースコーラ及び第２のベースコーラのうちの少なくとももう１つが少なくとも部分的に線形である、項４２に記載の非一時的コンピュータ可読記憶媒体。
４４．第１のベースコーラ及び第２のベースコーラのうちの少なくとも１つがニューラルネットワークモデルを実装し、第１のベースコーラ及び第２のベースコーラのうちの少なくとももう１つがニューラルネットワークモデルを含まない、項４２に記載の非一時的コンピュータ可読記憶媒体。
４５．
第１のベースコーラによって生成された第１の分類情報が、各ベースコールサイクルについて、（ｉ）第１の複数のスコアであって、第１の複数のスコアの各スコアが、呼び出される塩基がＡ、Ｃ、Ｔ、又はＧのうちの１つである確率を示す、第１の複数のスコアと、（ｉｉ）第１の呼び出される塩基と、を含み、
第２のベースコーラによって生成された第２の分類情報が、各ベースコールサイクルについて、（ｉ）第２の複数のスコアであって、第２の複数のスコアの各スコアが、呼び出される塩基がＡ、Ｃ、Ｔ、又はＧのうちの１つである確率を示す、第２の複数のスコアと、（ｉｉ）第２の呼び出される塩基と、を含む、項４２に記載の非一時的コンピュータ可読記憶媒体。
４６．
最終分類情報が、各ベースコールサイクルについて、（ｉ）第３の複数のスコアであって、第３の複数のスコアの各スコアが、呼び出される塩基がＡ、Ｃ、Ｔ、又はＧのうちの１つである確率を示す、第３の複数のスコアと、（ｉｉ）最終の呼び出される塩基と、を含む、項４５に記載の非一時的コンピュータ可読記憶媒体。
４７．第１のベースコーラ及び第２のベースコーラのうちの少なくとも１つが、ソフトマックス関数を使用して、対応する複数のスコアを生成する、項４５に記載の非一時的コンピュータ可読記憶媒体。
４８．最終分類情報を生成することが、
センサデータに関連付けられたコンテキスト情報に基づいて、第１の分類情報と第２の分類情報とを選択的に組み合わせることによって、最終分類情報を生成することを含む、項４２に記載の非一時的コンピュータ可読記憶媒体。
４９．センサデータに関連付けられたコンテキスト情報が、時間的コンテキスト情報、空間的コンテキスト情報、塩基配列コンテキスト情報、及び他のコンテキスト情報を含む、項４８に記載の非一時的コンピュータ可読記憶媒体。
５０．センサデータに関連付けられたコンテキスト情報が、センサデータに関連付けられた１つ以上のベースコールサイクル数を示す時間的コンテキスト情報を含む、項４８に記載の非一時的コンピュータ可読記憶媒体。
５１．センサデータに関連付けられたコンテキスト情報が、センサデータを生成するフローセル内の１つ以上のタイルの位置を示す空間的コンテキスト情報を含む、項４８に記載の非一時的コンピュータ可読記憶媒体。
５２．センサデータに関連付けられたコンテキスト情報が、センサデータを生成するフローセルのタイル内の１つ以上のクラスタの位置を示す空間的コンテキスト情報を含む、項４８に記載の非一時的コンピュータ可読記憶媒体。
５２Ａ．空間的コンテキスト情報が、センサデータを生成するフローセルのタイル内の１つ以上のクラスタがエッジクラスタであるか非エッジクラスタであるかを示す、項５２に記載の非一時的コンピュータ可読記憶媒体。
５２Ｂ．クラスタが、クラスタがタイルのエッジから閾値距離内に位置すると推定される場合、エッジクラスタとして分類される、項５２Ａに記載の非一時的コンピュータ可読記憶媒体。
５２Ｃ．クラスタが、クラスタがタイルの任意のエッジから閾値距離を超えて位置すると推定される場合、非エッジクラスタとして分類される、項５２Ａに記載の非一時的コンピュータ可読記憶媒体。
５３．センサデータに関連付けられたコンテキスト情報が、センサデータについて呼び出される塩基配列を示す塩基配列コンテキスト情報を含む、項４８に記載の非一時的コンピュータ可読記憶媒体。
５４．
呼び出される特定の塩基について、第１の分類情報が、呼び出される塩基がそれぞれＡ、Ｃ、Ｔ、及びＧである確率を示す第１のスコア、第２のスコア、第３のスコア、及び第４のスコアを含み、
特定の呼び出される塩基について、第２の分類情報が、呼び出される塩基がそれぞれＡ、Ｃ、Ｔ及びＧである確率を示す第５のスコア、第６のスコア、第７のスコア及び第８のスコアを含み、
最終分類情報を生成することが、
呼び出される特定の塩基について、第１のスコア、第２のスコア、第３のスコア、第４のスコア、第５のスコア、第６のスコア、第７のスコア、及び第８のスコアに基づいて最終分類情報を生成することを含む、項４２に記載の非一時的コンピュータ可読記憶媒体。
５５．
最終スコアが、第１のスコア及び第５のスコアの関数である第１の最終スコアを含み、第１の最終スコアが、呼び出される塩基がＡである確率を示し、
最終スコアが、第２のスコア及び第６のスコアの関数である第２の最終スコアを含み、第２の最終スコアが、呼び出される塩基がＣである確率を示し、
最終スコアが、第３のスコア及び第７のスコアの関数である第３の最終スコアを含み、第３の最終スコアが、呼び出される塩基がＴである確率を示し、
最終スコアが、第４のスコア及び第８のスコアの関数である第４の最終スコアを含み、第４の最終スコアが、呼び出される塩基がＧである確率を示す、項５４に記載の非一時的コンピュータ可読記憶媒体。
５６．
第１の最終スコアが、第１のスコア及び第５のスコアの平均、正規化された加重平均、最小値、又は最大値であり、
第２の最終スコアが、第２のスコア及び第６のスコアの平均、正規化された加重平均、最小値、又は最大値であり、
第３の最終スコアが、第３のスコア及び第７のスコアの平均、正規化された加重平均、最小値、又は最大値であり、
第４の最終スコアが、第４のスコア及び第８のスコアの平均、正規化された加重平均、最小値又は最大値である、項５５に記載の非一時的コンピュータ可読記憶媒体。
５７．
呼び出される特定の塩基について、第１の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つであり、第１のスコア、第２のスコア、第３のスコア、及び第４のスコアの中で最も高い対応するスコアを有する第１の呼び出される塩基を含み、
呼び出される特定の塩基について、第２の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つであり、第５のスコア、第６のスコア、第７のスコア、及び第８のスコアの中で最も高い対応するスコアを有する第２の呼び出される塩基を含む、項５５に記載の非一時的コンピュータ可読記憶媒体。
５８．
呼び出される特定の塩基について、第１の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つである第１の呼び出される塩基を含み、
呼び出される特定の塩基について、第２の分類情報が、第１の呼び出される塩基と同じ第２の呼び出される塩基を含み、
最終分類情報を生成することが、
呼び出される特定の塩基について、最終分類情報が第１の呼び出される塩基及び第２の呼び出される塩基と一致する最終の呼び出される塩基を含むように、最終分類情報を生成することを含む、項４２に記載の非一時的コンピュータ可読記憶媒体。
５９．
呼び出される特定の塩基について、第１の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちの１つである第１の呼び出される塩基を含み、
呼び出される特定の塩基について、第２の分類情報が、Ａ、Ｃ、Ｔ、及びＧのうちのもう１つである第２の呼び出される塩基を含み、それによって、第２の呼び出される塩基が第１の呼び出される塩基と一致せず、
最終分類情報を生成することが、
呼び出される特定の塩基について、最終分類情報が、（ｉ）第１の呼び出される塩基、（ｉｉ）第２の呼び出される塩基、又は（ｉｉｉ）のうちの１つが不確定としてマークされている最終の呼び出される塩基を含むように、最終分類情報を生成することを含む、項４２に記載の非一時的コンピュータ可読記憶媒体。
６０．
第１の分類情報、第２の分類情報、又は最終分類情報のうちの少なくとも１つが、呼び出される塩基配列が特定の塩基配列パターンを有することを示し、呼び出される塩基配列が特定の塩基配列パターンを有することのインジケーションに応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することであり、第１の重みと第２の重みとは異なる、項４２に記載の非一時的コンピュータ可読記憶媒体。
６１．
特定の塩基配列パターンが、ホモポリマーパターン又は近ホモポリマーパターンを含む、項６０に記載の非一時的コンピュータ可読記憶媒体。
６２．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基がＧである、項６０に記載の非一時的コンピュータ可読記憶媒体。
６３．
特定の塩基配列パターンが、複数の塩基を含み、特定の塩基配列パターンの塩基の大部分がＧである、項６０に記載の非一時的コンピュータ可読記憶媒体。
６３Ａ．
特定の塩基配列パターンが、ＧＧＸＧＧ、ＧＸＧＧＧ、ＧＧＧＸＧ、ＧＸＸＧＧ、ＧＧＸＸＧのいずれかを含み、Ｘが、Ａ、Ｃ、Ｔ、又はＧのいずれかである、項６０に記載の非一時的コンピュータ可読記憶媒体。
６３Ｂ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項６０に記載の非一時的コンピュータ可読記憶媒体。
６３Ｃ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初の塩基及び最後の塩基のそれぞれのベースコールが、暗サイクルに関連付けられている、項６０に記載の非一時的コンピュータ可読記憶媒体。
６３Ｄ．
特定の塩基配列パターンが複数の塩基を含み、特定の塩基配列パターンの塩基の大部分が不活性ベースコールに関連付けられている、項６０に記載の非一時的コンピュータ可読記憶媒体。
６３Ｅ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基の各々が、暗サイクルに関連付けられる、項６０に記載の非一時的コンピュータ可読記憶媒体。
６４．
第１の重みが第２の重みよりも低く、それによって、最終分類情報を生成しながら、第１の分類情報が第２の分類情報よりも低く重み付けされる、項６０に記載の非一時的コンピュータ可読記憶媒体。
６５．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まない、項６４に記載の非一時的コンピュータ可読記憶媒体。
６６．
第１の重みが９０％よりも高く、第２の重みが１０％よりも低い、項６０に記載の非一時的コンピュータ可読記憶媒体。
６７．
センサデータが、（ｉ）第１の１つ以上の感知サイクルについての第１のセンサデータと、（ｉｉ）第１の１つ以上の感知サイクルに続いて生じる第２の１つ以上の感知サイクルについての第２のセンサデータと、を含み、
最終分類情報が、
（ｉ）（ａ）第１の１つ以上の感知サイクルに関連付けられた第１の分類情報に第１の重みを置き、（ｂ）第１の１つ以上の感知サイクルに関連付けられた第２の分類情報に第２の重みを置くことによって生成される、第１の１つ以上の感知サイクルについての第１の最終分類情報と、
（ｉ）（ａ）第２の１つ以上の感知サイクルに関連付けられた第１の分類情報に第３の重みを置き、（ｂ）第２の１つ以上の感知サイクルに関連付けられた第２の分類情報に第４の重みを置くことによって生成される、第２の１つ以上の感知サイクルについての第２の最終分類情報と、を含み、
第１、第２、第３、及び第４の重みが異なる、項４２に記載の非一時的コンピュータ可読記憶媒体。
６８．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも低く、それにより、第１の１つ以上の感知サイクルについて、第２のベースコーラからの第２の分類情報が、第１のベースコーラからの第１の分類情報よりも強調され、
第３の重みが第４の重みよりも高く、それにより、第２の１つ以上の感知サイクルについて、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項６７に記載の非一時的コンピュータ可読記憶媒体。
６９．
センサデータが、（ｉ）フローセルのタイルの第１の１つ以上のクラスタからの第１のセンサデータと、
（ｉｉ）フローセルのタイルの第２の１つ以上のクラスタからの第２のセンサデータと、を含み、最終分類情報が、
（ｉ）第１の１つ以上のクラスタからの第１のセンサデータについての第１の最終分類情報であって、（ａ）第１の１つ以上のクラスタからの第１の分類情報に第１の重みを置き、（ｂ）第１の１つ以上のクラスタからの第２の分類情報に第２の重みを置くことによって生成される、第１の最終分類情報と、
（ｉ）第２の１つ以上のクラスタからの第２のセンサデータについての第２の最終分類情報であって、（ａ）第２の１つ以上のクラスタからの第１の分類情報に第３の重みを置き、（ｂ）第２の１つ以上のクラスタからの第２の分類情報に第４の重みを置くことによって生成される、第２の最終分類情報と、を含み、
第１、第２、第３、及び第４の重みが異なる、項４２に記載の非一時的コンピュータ可読記憶媒体。 Clause Item Set 1 (Generation of a final classification from classification information of two base classes)
1. A computer-implemented method for base calling using at least two base callers, comprising:
performing at least a first base coder and a second base coder on the sensor data generated for a sensing cycle in the series of sensing cycles;
generating first classification information associated with the sensor data based on performing a first base call on the sensor data with the first base call;
generating second classification information associated with the sensor data based on performing a second base call on the sensor data with the second base call;
11. A computer-implemented method comprising: generating final classification information based on the first classification information and the second classification information, the final classification information comprising one or more base calls for the sensor data.
2. The method of claim 1, wherein at least one of the first base chorus and the second base chorus implements a non-linear function, and at least one other of the first base chorus and the second base chorus is at least partially linear.
3. The method of claim 1, wherein at least one of the first base caller and the second base caller implements a neural network model, and at least another of the first base caller and the second base caller does not include a neural network model.
4.
the first classification information produced by the first base caller includes, for each base calling cycle, (i) a first plurality of scores, each score of the first plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a first called base;
2. The method of claim 1, wherein the second classification information produced by the second base caller includes, for each base calling cycle, (i) a second plurality of scores, each score of the second plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a second called base.
5.
5. The method of claim 4, wherein the final classification information comprises, for each base calling cycle, (i) a third plurality of scores, each score of the third plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a final called base.
6. The method of claim 4, wherein at least one of the first base caller and the second base caller uses a softmax function to generate the corresponding multiple scores.
7. The method of claim 1, wherein generating the final classification information includes generating the final classification information by selectively combining the first classification information and the second classification information based on context information associated with the sensor data.
8. The method of claim 7, wherein the context information associated with the sensor data includes temporal context information, spatial context information, sequence context information, and other context information.
9. The method of claim 7, wherein the context information associated with the sensor data includes temporal context information indicative of one or more base call cycle numbers associated with the sensor data.
10. The method of claim 7, wherein the context information associated with the sensor data includes spatial context information indicative of a location of one or more tiles within a flow cell generating the sensor data.
11. The method of claim 7, wherein the context information associated with the sensor data includes spatial context information indicating a location of one or more clusters within a tile of a flow cell generating the sensor data.
11A. The method of claim 11, wherein the spatial context information indicates whether one or more clusters within a tile of a flow cell generating the sensor data are edge clusters or non-edge clusters.
11B. The method of claim 11A, wherein a cluster is classified as an edge cluster if the cluster is estimated to be located within a threshold distance from an edge of the tile.
11C. The method of claim 11A, wherein a cluster is classified as a non-edge cluster if the cluster is estimated to be located more than a threshold distance from any edge of the tile.
12. The method of claim 7, wherein the context information associated with the sensor data includes sequence context information indicating sequences that are called for the sensor data.
13.
For a particular base being called, the first classification information includes a first score, a second score, a third score, and a fourth score indicating probabilities that the called base is A, C, T, and G, respectively;
for a particular called base, the second classification information includes a fifth score, a sixth score, a seventh score, and an eighth score indicating the probability that the called base is A, C, T, and G, respectively;
Producing final classification information
2. The method of claim 1, comprising generating final classification information for the particular base being called based on the first score, the second score, the third score, the fourth score, the fifth score, the sixth score, the seventh score, and the eighth score.
14.
the final score comprises a first final score that is a function of the first score and the fifth score, the first final score indicating the probability that the called base is A;
the final score includes a second final score that is a function of the second score and the sixth score, the second final score indicating the probability that the called base is C;
the final score includes a third final score that is a function of the third score and the seventh score, the third final score indicating the probability that the called base is T;
14. The method of claim 13, wherein the final score comprises a fourth final score that is a function of the fourth score and the eighth score, the fourth final score indicating the probability that the called base is G.
15.
the first final score is the average, normalized weighted average, minimum, or maximum of the first score and the fifth score;
the second final score is the average, normalized weighted average, minimum, or maximum of the second score and the sixth score;
the third final score is the average, normalized weighted average, minimum, or maximum of the third score and the seventh score;
15. The method of claim 14, wherein the fourth final score is the average, normalized weighted average, minimum or maximum of the fourth score and the eighth score.
16.
for a particular base being called, the first classification information includes the first called base being one of A, C, T, and G and having a corresponding score that is the highest among the first score, the second score, the third score, and the fourth score;
15. The method of claim 14, wherein for a particular base being called, the second classification information includes the second called base having a corresponding score that is one of A, C, T, and G and is highest among the fifth score, the sixth score, the seventh score, and the eighth score.
17.
for a particular base being called, the first classification information includes the first called base being one of A, C, T, and G;
for a particular called base, the second classification information includes a second called base that is the same as the first called base,
Producing final classification information
2. The method of claim 1, comprising generating final classification information such that for a particular called base, the final classification information includes a final called base that matches the first called base and the second called base.
18.
for a particular base being called, the first classification information includes the first called base being one of A, C, T, and G;
for the particular base called, the second classification information includes a second called base that is another of A, C, T, and G, whereby the second called base does not match the first called base;
Producing final classification information
2. The method of claim 1, comprising generating final classification information such that for a particular base called, the final classification information includes one of (i) the first called base, (ii) the second called base, or (iii) the final called base, marked as uncertain.
19.
2. The method of claim 1, wherein at least one of the first classification information, the second classification information, or the final classification information indicates that the called base sequence has a particular base sequence pattern, and in response to an indication that the called base sequence has a particular base sequence pattern, generating the final classification information by placing a first weight on the first classification information and a second weight on the second classification information, the first weight and the second weight being different.
20.
20. The method according to claim 19, wherein the specific base sequence pattern comprises a homopolymer pattern or a near homopolymer pattern.
20a.
20. The method according to claim 19, wherein the specific base sequence pattern comprises a homopolymer pattern or a pattern having adjacent homopolymers.
21.
20. The method according to item 19, wherein the specific base sequence pattern includes a plurality of bases, and at least the first and last bases are G.
21a.
20. The method according to claim 19, wherein the specific base sequence pattern comprises at least five bases, and at least the first and last bases are G.
22.
20. The method according to claim 19, wherein the specific base sequence pattern comprises a plurality of bases, and a majority of the plurality of bases in the specific base sequence pattern are G.
22a.
20. The method according to claim 19, wherein the specific base sequence pattern comprises at least five bases, and at least three bases of the specific base sequence pattern are G.
22A.
20. The method according to claim 19, wherein the specific base sequence pattern comprises any one of GGXGG, GXGGG, GGGXG, GXXGG, and GGXXG, and X is any one of A, C, T, or G.
22B.
20. The method of claim 19, wherein the particular base sequence pattern comprises a plurality of bases, and at least the first and last bases are each associated with an inactive base call.
22B1.
20. The method of claim 19, wherein the specific base sequence pattern comprises at least five bases, and at least the first and last bases are each associated with an inactive base call.
22C.
20. The method of claim 19, wherein the specific base sequence pattern includes a plurality of bases, and the base call of at least the first base and the last base is associated with the dark cycle.
22D.
20. The method of claim 19, wherein the specific base sequence pattern comprises a plurality of bases, and at least a majority of the bases in the specific base sequence pattern are each associated with an inactive base call.
22E.
20. The method of claim 19, wherein the specific base sequence pattern comprises a plurality of bases, and at least a majority of the bases of the specific base sequence pattern are each associated with a dark cycle.
23.
20. The method of claim 19, wherein the first weight is lower than the second weight, whereby the first classification information is weighted lower than the second classification information while generating the final classification information.
24.
24. The method of claim 23, wherein the first base call implements a neural network model and the second base call does not include a neural network model.
25.
20. The method of claim 19, wherein the first weight is greater than 90% and the second weight is less than 10%.
26.
the sensor data includes (i) first sensor data for a first one or more sensing cycles; and (ii) second sensor data for a second one or more sensing cycles occurring subsequent to the first one or more sensing cycles;
The final classification information is
(i) first final classification information for the first one or more sensing cycles generated by (a) placing a first weight on first classification information associated with the first one or more sensing cycles; and (b) placing a second weight on second classification information associated with the first one or more sensing cycles;
(i) second final classification information for the second one or more sensing cycles generated by (a) placing a third weight on the first classification information associated with the second one or more sensing cycles; and (b) placing a fourth weight on the second classification information associated with the second one or more sensing cycles;
Item 2. The method of item 1, wherein the first, second, third, and fourth weights are different.
27.
a first base code implements a neural network model, and a second base code does not include a neural network model;
the first weight is lower than the second weight, whereby, for the first one or more sensing cycles, the second classification information from the second base caller is emphasized over the first classification information from the first base caller;
27. The method of claim 26, wherein the third weight is higher than the fourth weight, thereby emphasizing the first classification information from the first base caller over the second classification information from the second base caller for the second one or more sensing cycles.
28.
The sensor data includes: (i) first sensor data from a first one or more clusters of tiles of the flow cell;
(ii) second sensor data from a second one or more clusters of tiles of the flow cell, and the final classification information comprises:
(i) first final classification information for the first sensor data from a first one or more clusters, the first final classification information being generated by: (a) placing a first weight on the first classification information from the first one or more clusters; and (b) placing a second weight on the second classification information from the first one or more clusters; and
(i) second final classification information for the second sensor data from the second one or more clusters, the second final classification information being generated by: (a) placing a third weight on the first classification information from the second one or more clusters; and (b) placing a fourth weight on the second classification information from the second one or more clusters;
Item 2. The method of item 1, wherein the first, second, third, and fourth weights are different.
29.
the first one or more clusters are edge clusters located within a threshold distance from one or more edges of a tile of the flow cell;
29. The method of claim 28, wherein the second one or more clusters are non-edge clusters located more than a threshold distance from one or more edges of the tile of the flow cell.
30.
a first base code implements a neural network model, and a second base code does not include a neural network model;
30. The method of claim 29, wherein the first weight is higher than the second weight, thereby emphasizing, for the first one or more edge clusters, first classification information from the first base colleague over second classification information from the second base colleague.
31.
31. The method of claim 30, wherein the third weight is less than or equal to the fourth weight, whereby for the second one or more non-edge clusters, the first classification information from the first base caller is emphasized less than or equal to the second classification information from the second base caller.
32.
detecting the presence of one or more air bubbles in at least one cluster of tiles of the flow cell from the sensor data;
Producing final classification information
Item 2. The method of item 1, comprising generating final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to detecting one or more bubbles, the first weight and the second weight being different.
33.
a first base code implements a neural network model, and a second base code does not include a neural network model;
33. The method of claim 32, wherein the first weight is higher than the second weight, whereby in response to detection of one or more bubbles, the first classification information from the first base cola is emphasized over the second classification information from the second base cola.
34. The sensor data includes at least one image, and the method comprises:
detecting that at least one image is an out-of-focus image;
Producing final classification information
Item 1. The method of item 1, comprising generating final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to detecting an out-of-focus image, the first weight and the second weight being different.
35.
a first base code implements a neural network model, and a second base code does not include a neural network model;
33. The method of claim 32, wherein the first weight is higher than the second weight, whereby in response to detecting an out-of-focus image, first classification information from the first base chore is emphasized over second classification information from the second base chore.
36.
The sensor data is associated with a plurality of sequencing cycles;
The first classification information includes a first called base sequence corresponding to a plurality of sequencing cycles, and the second classification information includes a second called base sequence corresponding to a plurality of sequencing cycles;
The first called base sequence and the second called base sequence do not match, and at least one of the first and second called base sequences has a specific base sequence pattern;
a first base code implements a neural network model, and a second base code does not include a neural network model;
Producing final classification information
2. The method of claim 1, comprising generating final classification information in response to (i) at least one of the first or second called base sequences having a particular base sequence pattern, and (ii) a second base caller that does not include a neural network model, such that a final called base sequence of the final classification information matches the second called base sequence and does not match the first called base sequence.
37.
Item 37. The method according to Item 36, wherein the specific base sequence pattern comprises a homopolymer pattern or a near homopolymer pattern.
38.
Item 37. The method according to Item 36, wherein the specific base sequence pattern includes a plurality of bases, and at least the first and last bases are G.
39.
Item 37. The method according to Item 36, wherein the specific base sequence pattern comprises a plurality of bases, and at least a majority of the bases in the specific base sequence pattern are G.
39a.
Item 37. The method according to Item 36, wherein the specific base sequence pattern comprises at least five bases, and at least three bases of the specific base sequence pattern are G.
39A.
37. The method of claim 36, wherein the particular base sequence pattern comprises a plurality of bases, and at least the first and last bases are each associated with an inactive base call.
39B.
Item 37. The method of item 36, wherein the specific base sequence pattern includes a plurality of bases, and the base call of at least the first base and the last base is associated with the dark cycle.
39C.
37. The method of claim 36, wherein the specific base sequence pattern comprises a plurality of bases, and at least a majority of the bases in the specific base sequence pattern are each associated with an inactive base call.
39D.
37. The method of claim 36, wherein the specific base sequence pattern comprises a plurality of bases, and at least a majority of the bases in the specific base sequence pattern are each associated with a dark cycle.
40. Generating final classification information includes receiving, by a machine learning model, first classification information associated with the sensor data from a first base caller;
receiving, by the machine learning model, second classification information associated with the sensor data from a second base caller;
generating final classification information based on the first classification information and the second classification information by a machine learning model.
40a. The method of claim 40, wherein the machine learning model is one of a logistic regression model, a gradient boosted trees model, a random forest model, a naive Bayes model, or a neural network model.
40b. Generating final classification information includes receiving, from a first base caller, first classification information associated with the sensor data by a neural network model;
receiving, by the neural network model, second classification information associated with the sensor data from a second base caller;
generating final classification information based on the first classification information and the second classification information by a neural network model.
41. A computer-implemented method comprising:
generating sensor data for a sensing cycle in a series of sensing cycles;
executing at least a first base coder and a second base coder on at least a corresponding portion of the sensor data, and selectively switching between execution of the first and second base coders based on context information associated with the sensor data, the first base coder and the second base coder being different;
generating first and second classification information using the first and second base collaborators, respectively;
generating base calls based on one or both of the first classification information and the second classification information.
42. A non-transitory computer readable storage medium having stored thereon computer program instructions for progressively training a base caller, the instructions, when executed on a processor, performing:
performing at least a first base coder and a second base coder on the sensor data generated for a sensing cycle in the series of sensing cycles;
generating first classification information associated with the sensor data based on performing a first base call on the sensor data with the first base call;
generating second classification information associated with the sensor data based on performing a second base call on the sensor data with the second base call;
1. A non-transitory computer-readable storage medium implementing a method, comprising: generating final classification information based on the first classification information and the second classification information, the final classification information comprising one or more base calls for the sensor data.
43. The non-transitory computer-readable storage medium of clause 42, wherein at least one of the first base chorda and the second base chorda implements a non-linear function, and at least one other of the first base chorda and the second base chorda is at least partially linear.
44. The non-transitory computer-readable storage medium of claim 42, wherein at least one of the first base choir and the second base choir implements a neural network model, and at least another of the first base choir and the second base choir does not include a neural network model.
45.
the first classification information produced by the first base caller includes, for each base calling cycle, (i) a first plurality of scores, each score of the first plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a first called base;
43. The non-transitory computer-readable storage medium of claim 42, wherein the second classification information produced by the second base caller includes, for each base calling cycle, (i) a second plurality of scores, each score of the second plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a second called base.
46.
46. The non-transitory computer-readable storage medium of claim 45, wherein the final classification information includes, for each base calling cycle, (i) a third plurality of scores, each score of the third plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a final called base.
47. The non-transitory computer-readable storage medium of clause 45, wherein at least one of the first base caller and the second base caller uses a softmax function to generate the corresponding plurality of scores.
48. Producing final classification information comprises:
43. The non-transitory computer-readable storage medium of claim 42, further comprising generating final classification information by selectively combining the first classification information and the second classification information based on contextual information associated with the sensor data.
49. The non-transitory computer-readable storage medium of claim 48, wherein the contextual information associated with the sensor data includes temporal contextual information, spatial contextual information, sequence contextual information, and other contextual information.
50. The non-transitory computer-readable storage medium of paragraph 48, wherein the contextual information associated with the sensor data includes temporal contextual information indicative of one or more base call cycle numbers associated with the sensor data.
51. The non-transitory computer-readable storage medium of claim 48, wherein the contextual information associated with the sensor data includes spatial contextual information indicative of a location of one or more tiles within a flow cell generating the sensor data.
52. The non-transitory computer-readable storage medium of clause 48, wherein the contextual information associated with the sensor data includes spatial contextual information indicating a location of one or more clusters within a tile of a flow cell generating the sensor data.
52A. The non-transitory computer-readable storage medium of claim 52, wherein the spatial context information indicates whether one or more clusters within a tile of a flow cell generating the sensor data are edge clusters or non-edge clusters.
52B. The non-transitory computer-readable storage medium of claim 52A, wherein a cluster is classified as an edge cluster if the cluster is estimated to be located within a threshold distance from an edge of the tile.
52C. The non-transitory computer-readable storage medium of claim 52A, wherein a cluster is classified as a non-edge cluster if the cluster is estimated to be located more than a threshold distance from any edge of the tile.
53. The non-transitory computer-readable storage medium of claim 48, wherein the context information associated with the sensor data includes sequence context information indicating sequences that are called for the sensor data.
54.
For a particular base being called, the first classification information includes a first score, a second score, a third score, and a fourth score indicating the probability that the called base is A, C, T, and G, respectively;
For a particular called base, the second classification information includes a fifth score, a sixth score, a seventh score, and an eighth score indicating the probability that the called base is A, C, T, and G, respectively;
Producing final classification information
43. The non-transitory computer readable storage medium of claim 42, comprising generating final classification information based on the first score, the second score, the third score, the fourth score, the fifth score, the sixth score, the seventh score, and the eighth score for the particular base being called.
55.
the final score comprises a first final score that is a function of the first score and the fifth score, the first final score indicating the probability that the called base is A;
the final score includes a second final score that is a function of the second score and the sixth score, the second final score indicating the probability that the called base is C;
the final score includes a third final score that is a function of the third score and the seventh score, the third final score indicating the probability that the called base is T;
55. The non-transitory computer-readable storage medium of claim 54, wherein the final score includes a fourth final score that is a function of the fourth score and the eighth score, the fourth final score indicating the probability that the called base is G.
56.
the first final score is the average, normalized weighted average, minimum, or maximum of the first score and the fifth score;
the second final score is the average, normalized weighted average, minimum, or maximum of the second score and the sixth score;
the third final score is the average, normalized weighted average, minimum, or maximum of the third score and the seventh score;
56. The non-transitory computer-readable storage medium of clause 55, wherein the fourth final score is an average, normalized weighted average, minimum or maximum of the fourth score and the eighth score.
57.
for a particular base being called, the first classification information includes the first called base being one of A, C, T, and G and having a corresponding score that is the highest among the first score, the second score, the third score, and the fourth score;
56. The non-transitory computer readable storage medium of claim 55, wherein for a particular base called, the second classification information includes the second called base having a corresponding score that is one of A, C, T, and G and is highest among the fifth score, the sixth score, the seventh score, and the eighth score.
58.
for a particular base being called, the first classification information includes the first called base being one of A, C, T, and G;
for a particular called base, the second classification information includes a second called base that is the same as the first called base,
Producing final classification information
43. The non-transitory computer readable storage medium of claim 42, comprising generating final classification information such that for a particular called base, the final classification information includes a final called base that matches the first called base and the second called base.
59.
for a particular base being called, the first classification information includes the first called base being one of A, C, T, and G;
for the particular base called, the second classification information includes a second called base that is another of A, C, T, and G, whereby the second called base does not match the first called base;
Producing final classification information
43. The non-transitory computer readable storage medium of claim 42, comprising generating final classification information such that for a particular base called, the final classification information includes one of (i) the first called base, (ii) the second called base, or (iii) a final called base that is marked as uncertain.
60.
43. The non-transitory computer-readable storage medium of claim 42, wherein at least one of the first classification information, the second classification information, or the final classification information indicates that the called base sequence has a particular base sequence pattern, and in response to an indication that the called base sequence has a particular base sequence pattern, generating the final classification information by placing a first weight on the first classification information and a second weight on the second classification information, the first weight and the second weight being different.
61.
Item 61. The non-transitory computer-readable storage medium of Item 60, wherein the specific base sequence pattern comprises a homopolymer pattern or a near homopolymer pattern.
62.
Item 61. The non-transitory computer-readable storage medium of Item 60, wherein the specific base sequence pattern includes a plurality of bases, and at least the first and last bases are G.
63.
Item 61. The non-transitory computer-readable storage medium of Item 60, wherein the specific base sequence pattern includes a plurality of bases, and a majority of the bases in the specific base sequence pattern are G.
63A.
Item 61. The non-transitory computer-readable storage medium of Item 60, wherein the specific base sequence pattern includes any of GGXGG, GXGGG, GGGXG, GXXGG, and GGXXG, where X is any of A, C, T, or G.
63B.
61. The non-transitory computer-readable storage medium of claim 60, wherein the particular base sequence pattern comprises a plurality of bases, and at least the first and last bases are each associated with an inactive base call.
63C.
Item 61. The non-transitory computer-readable storage medium of item 60, wherein the particular base sequence pattern includes a plurality of bases, and the base call of at least the first base and the last base is associated with a dark cycle.
63D.
61. The non-transitory computer-readable storage medium of claim 60, wherein the specific base sequence pattern comprises a plurality of bases, and a majority of the bases in the specific base sequence pattern are associated with inactive base calls.
63E.
61. The non-transitory computer-readable storage medium of claim 60, wherein the specific base sequence pattern includes at least five bases, and each of at least three bases of the specific base sequence pattern is associated with a dark cycle.
64.
61. The non-transitory computer-readable storage medium of claim 60, wherein the first weight is lower than the second weight, thereby weighting the first classification information lower than the second classification information while generating the final classification information.
65.
65. The non-transitory computer-readable storage medium of claim 64, wherein the first base code implements a neural network model and the second base code does not include a neural network model.
66.
61. The non-transitory computer-readable storage medium of clause 60, wherein the first weight is greater than 90% and the second weight is less than 10%.
67.
the sensor data includes (i) first sensor data for a first one or more sensing cycles; and (ii) second sensor data for a second one or more sensing cycles occurring subsequent to the first one or more sensing cycles;
The final classification information is
(i) first final classification information for the first one or more sensing cycles generated by (a) placing a first weight on first classification information associated with the first one or more sensing cycles; and (b) placing a second weight on second classification information associated with the first one or more sensing cycles;
(i) second final classification information for the second one or more sensing cycles generated by (a) placing a third weight on the first classification information associated with the second one or more sensing cycles; and (b) placing a fourth weight on the second classification information associated with the second one or more sensing cycles;
43. The non-transitory computer-readable storage medium of clause 42, wherein the first, second, third, and fourth weights are different.
68.
a first base code implements a neural network model, and a second base code does not include a neural network model;
the first weight is lower than the second weight, whereby, for the first one or more sensing cycles, the second classification information from the second base caller is emphasized over the first classification information from the first base caller;
68. The non-transitory computer-readable storage medium of claim 67, wherein the third weight is higher than the fourth weight, thereby emphasizing the first classification information from the first base coder over the second classification information from the second base coder for the second one or more sensing cycles.
69.
The sensor data includes: (i) first sensor data from a first one or more clusters of tiles of the flow cell;
(ii) second sensor data from a second one or more clusters of tiles of the flow cell, and the final classification information comprises:
(i) first final classification information for the first sensor data from a first one or more clusters, the first final classification information being generated by: (a) placing a first weight on the first classification information from the first one or more clusters; and (b) placing a second weight on the second classification information from the first one or more clusters; and
(i) second final classification information for the second sensor data from the second one or more clusters, the second final classification information being generated by: (a) placing a third weight on the first classification information from the second one or more clusters; and (b) placing a fourth weight on the second classification information from the second one or more clusters;
43. The non-transitory computer-readable storage medium of clause 42, wherein the first, second, third, and fourth weights are different.

７０．
第１の１つ以上のクラスタが、フローセルのタイルの１つ以上のエッジから閾値距離内に配置されたエッジクラスタであり、
第２の１つ以上のクラスタが、フローセルのタイルの１つ以上のエッジから閾値距離を超えて配置された非エッジクラスタである、項６９に記載の非一時的コンピュータ可読記憶媒体。 70.
the first one or more clusters are edge clusters located within a threshold distance from one or more edges of a tile of the flow cell;
70. The non-transitory computer-readable storage medium of claim 69, wherein the second one or more clusters are non-edge clusters located more than a threshold distance from one or more edges of the tile of the flow cell.

７１．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも高く、それにより、第１の１つ以上のエッジクラスタについて、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項７０に記載の非一時的コンピュータ可読記憶媒体。 71.
a first base code implements a neural network model, and a second base code does not include a neural network model;
71. The non-transitory computer-readable storage medium of claim 70, wherein the first weight is higher than the second weight, thereby emphasizing, for the first one or more edge clusters, first classification information from the first base colleague over second classification information from the second base colleague.

７２．
第３の重みが第４の重み以下であり、それにより、第２の１つ以上の非エッジクラスタについて、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報以下で強調される、項７１に記載の非一時的コンピュータ可読記憶媒体。 72.
72. The non-transitory computer-readable storage medium of claim 71, wherein the third weight is less than or equal to the fourth weight, whereby, for the second one or more non-edge clusters, the first classification information from the first base colleague is emphasized less than or equal to the second classification information from the second base colleague.

７３．センサデータから、フローセルのタイルの少なくとも１つのクラスタ内の１つ以上の気泡の存在を検出することを更に含み、
最終分類情報を生成することが、
１つ以上の気泡の検出に応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することであって、第１の重みと第２の重みとは異なることを含む、項４２に記載の非一時的コンピュータ可読記憶媒体。 73. The method further includes detecting the presence of one or more air bubbles in at least one cluster of tiles of the flow cell from the sensor data;
Producing final classification information
43. The non-transitory computer-readable storage medium of claim 42, comprising generating final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to detecting one or more bubbles, the first weight and the second weight being different.

７４．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも高く、それによって、１つ以上の気泡の検出に応答して、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項７３に記載の非一時的コンピュータ可読記憶媒体。 74.
a first base code implements a neural network model, and a second base code does not include a neural network model;
74. The non-transitory computer-readable storage medium of claim 73, wherein the first weight is higher than the second weight, whereby in response to detection of one or more air bubbles, the first classification information from the first base cola is emphasized over the second classification information from the second base cola.

７５．センサデータが少なくとも１つの画像を含み、方法が、
少なくとも１つの画像が焦点外画像であることを検出することを更に含み、
最終分類情報を生成することが、
焦点外画像の検出に応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することであって、第１の重みと第２の重みとは異なることを含む、項７３に記載の非一時的コンピュータ可読記憶媒体。 75. The sensor data includes at least one image, and the method comprises:
detecting that at least one image is an out-of-focus image;
Producing final classification information
74. The non-transitory computer-readable storage medium of claim 73, comprising generating final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to detecting an out-of-focus image, the first weight and the second weight being different.

７６．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも高く、それによって、焦点外画像の検出に応答して、第１のベースコーラからの第１の分類情報が、第２のベースコーラからの第２の分類情報よりも強調される、項７３に記載の非一時的コンピュータ可読記憶媒体。 76.
a first base code implements a neural network model, and a second base code does not include a neural network model;
74. The non-transitory computer-readable storage medium of claim 73, wherein the first weight is higher than the second weight, whereby in response to detecting an out-of-focus image, the first classification information from the first base colleague is emphasized over the second classification information from the second base colleague.

７７．
センサデータが、複数の配列決定サイクルに関連付けられ、
第１の分類情報が、複数の配列決定サイクルに対応する第１の呼び出される塩基配列を含み、第２の分類情報が、複数の配列決定サイクルに対応する第２の呼び出される塩基配列を含み、
第１の呼び出される塩基配列と第２の呼び出される塩基配列とは一致せず、第１又は第２の呼び出される塩基配列の少なくとも一方が、特定の塩基配列パターンを有し、
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
最終分類情報を生成することが、
（ｉ）特定の塩基配列パターンを有する第１又は第２の呼び出される塩基配列のうちの少なくとも１つ、及び（ｉｉ）ニューラルネットワークモデルを含まない第２のベースコーラに応答して、最終分類情報の最終の呼び出される塩基配列が、第２の呼び出される塩基配列と一致し、第１の呼び出される塩基配列と一致しないように、最終分類情報を生成することを含む、項４２に記載の非一時的コンピュータ可読記憶媒体。
７８．
特定の塩基配列パターンが、ホモポリマーパターン又は近ホモポリマーパターンを含む、項７７に記載の非一時的コンピュータ可読記憶媒体。
７９．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基がＧである、項７７に記載の非一時的コンピュータ可読記憶媒体。
８０．
特定の塩基配列パターンが、複数の塩基を含み、特定の塩基配列パターンの塩基の大部分がＧである、項７７に記載の非一時的コンピュータ可読記憶媒体。
８０Ａ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項７７に記載の非一時的コンピュータ可読記憶媒体。
８０Ｂ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初の塩基及び最後の塩基のそれぞれのベースコールが、暗サイクルに関連付けられている、項７７に記載の非一時的コンピュータ可読記憶媒体。
８０Ｃ．
特定の塩基配列パターンが複数の塩基を含み、特定の塩基配列パターンの複数の塩基の大部分が不活性ベースコールに関連付けられている、項７７に記載の非一時的コンピュータ可読記憶媒体。
８０Ｄ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基の各々が、暗サイクルに関連付けられる、項６０に記載の非一時的コンピュータ可読記憶媒体。
８１．最終分類情報を生成することが、
ニューラルネットワークモデルによって、センサデータに関連付けられた第１の分類情報を第１のベースコーラから受信することと、
ニューラルネットワークモデルによって、第２のベースコーラから、センサデータに関連付けられた第２の分類情報を受信することと、
ニューラルネットワークモデルによって、第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することと、を含む、項４２に記載の非一時的コンピュータ可読記憶媒体。 77.
The sensor data is associated with a plurality of sequencing cycles;
The first classification information includes a first called base sequence corresponding to a plurality of sequencing cycles, and the second classification information includes a second called base sequence corresponding to a plurality of sequencing cycles;
The first called base sequence and the second called base sequence do not match, and at least one of the first and second called base sequences has a specific base sequence pattern;
a first base code implements a neural network model, and a second base code does not include a neural network model;
Producing final classification information
43. The non-transitory computer-readable storage medium of claim 42, comprising generating final classification information in response to (i) at least one of the first or second called base sequences having a particular base sequence pattern, and (ii) a second base caller that does not include a neural network model, such that a final called base sequence of the final classification information matches the second called base sequence and does not match the first called base sequence.
78.
78. The non-transitory computer-readable storage medium of claim 77, wherein the specific base sequence pattern comprises a homopolymer pattern or a near homopolymer pattern.
79.
78. The non-transitory computer-readable storage medium of claim 77, wherein the specific base sequence pattern includes a plurality of bases, and at least the first and last bases are G.
80.
78. The non-transitory computer-readable storage medium of claim 77, wherein the specific base sequence pattern comprises a plurality of bases, and a majority of the bases in the specific base sequence pattern are G.
80A.
80. The non-transitory computer-readable storage medium of claim 77, wherein the particular base sequence pattern comprises a plurality of bases, and at least the first and last bases are each associated with an inactive base call.
80B.
78. The non-transitory computer-readable storage medium of claim 77, wherein the particular base sequence pattern comprises a plurality of bases, and the base call of at least the first base and the last base is associated with a dark cycle.
80C.
78. The non-transitory computer-readable storage medium of claim 77, wherein the specific base sequence pattern comprises a plurality of bases, and a majority of the plurality of bases of the specific base sequence pattern are associated with an inactive base call.
80D.
61. The non-transitory computer-readable storage medium of claim 60, wherein the specific base sequence pattern includes at least five bases, and each of at least three bases of the specific base sequence pattern is associated with a dark cycle.
81. Producing final classification information comprises:
receiving first classification information associated with the sensor data by the neural network model from a first base caller;
receiving, by the neural network model, second classification information associated with the sensor data from a second base caller;
and generating final classification information based on the first classification information and the second classification information by a neural network model.

項セット２（２つのベースコーラを切り替える／選択的に有効化する）
１．少なくとも２つのベースコーラを使用するベースコールのためのコンピュータ実装方法であって、
一連の感知サイクルにおける感知サイクルについて生成されたセンサデータに対して第１のベースコーラを実行することと、第１のベースコーラによって、センサデータに対して第１のベースコーラを実行することに基づいて、センサデータに関連付けられた第１の分類情報を生成することと、
第１の分類情報がセンサデータの最終分類情報の生成に不適切であると決定することと、
第１の分類情報の不適切性の決定に応答して、センサデータに対して第２のベースコーラを実行することであって、第２のベースコーラが第１のベースコーラとは異なることと、
第２のベースコーラによって、センサデータに対して第２のベースコーラを実行することに基づいて、センサデータに関連付けられた第２の分類情報を生成することと、
第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することであって、最終分類情報が、センサデータに対する１つ以上のベースコールを含むことと、を含む、コンピュータ実装方法。
２．第１の分類情報が、第１の呼び出される塩基配列を含み、第１の分類情報が不適切であると決定することが、第１の呼び出される塩基配列が特定の塩基配列パターンに一致すると決定することと、
第１の呼び出される塩基配列が特定の塩基配列パターンと一致することに基づいて、第１の分類情報が最終分類情報の生成に不適切であると決定することと、を含む、項１に記載の方法。
３．
特定の塩基配列パターンが、ホモポリマーパターン又は近ホモポリマーパターンを含む、項２に記載の方法。
４．
特定の塩基配列パターンが、複数の塩基を含み、複数の塩基のうち少なくとも最初及び最後の塩基がＧである、項２に記載の方法。
４Ａ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、少なくとも最初及び最後の塩基がＧである、項２に記載の方法。
５．
特定の塩基配列パターンが、複数の塩基を含み、複数の塩基のうち少なくとも３つの塩基がＧである、項２に記載の方法。
５Ａ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基がＧである、項２に記載の方法。
６．
特定の塩基配列パターンが、ＧＧＸＧＧ、ＧＸＧＧＧ、ＧＧＧＸＧ、ＧＸＸＧＧ、ＧＧＸＸＧのいずれかを含み、Ｘが、Ａ、Ｃ、Ｔ、又はＧのいずれかである、項２に記載の方法。
６Ａ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項２に記載の方法。
６Ｂ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれのベースコールが、暗サイクルに関連付けられている、項２に記載の方法。
６Ｃ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基の各々が、不活性ベースコールに関連付けられている、項２に記載の方法。
６Ｄ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基の各々が、暗サイクルに関連付けられる、項２に記載の方法。
７．最終分類情報を生成することが、第１の呼び出される塩基配列が特定の塩基配列パターンと一致することに応答して、第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって最終分類情報を生成することを含み、第１の重みと第２の重みとは異なる、項２に記載の方法。
８．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第１の重みが第２の重みよりも低く、それによって、最終分類情報を生成しながら、第１の分類情報が第２の分類情報よりも低く重み付けされる、項７に記載の方法。
９．
第１のベースコーラがニューラルネットワークモデルを実装し、第２のベースコーラがニューラルネットワークモデルを含まず、
第２の分類情報が、第２の呼び出される塩基配列を含み、
第１の呼び出される塩基配列が、第２の呼び出される塩基配列と一致せず、
（ｉ）第１の呼び出される塩基配列が特定の塩基配列パターンと一致し、（ｉｉ）第２のベースコーラがニューラルネットワークモデルを含まないことに応答して、最終分類情報の最終の呼び出される塩基配列が、第２の呼び出される塩基配列と一致し、第１の呼び出される塩基配列と一致しないように、最終分類情報を生成することである、項２に記載の方法。
１０．第１の分類情報が不適切であると決定することが、
センサデータが生成されたクラスタにおける気泡の存在を検出することと、
気泡の検出に基づいて、第１の分類情報が最終分類情報の生成に不適切であると決定することと、を含む、項１に記載の方法。
１１．第２のベースコーラがニューラルネットワークモデルを実装し、第１のベースコーラがニューラルネットワークモデルを含まず、最終分類情報を生成することが、
第１の分類情報に第１の重みを置き、第２の分類情報に第２の重みを置くことによって、最終分類情報を生成することであって、第２の重みが第１の重みより大きいことを含む、項１０に記載の方法。
１２．
センサデータが、現在のセンサデータであり、
現在のセンサデータが、感知サイクルＮ１及び１つ以上の後続の感知サイクルのためのものであり、Ｎ１が、１より大きい正の整数であり、
現在のセンサデータに対して第２のベースコーラを実行することが、
感知サイクルＮ１の前に発生する少なくともＴ個の感知サイクルに関連付けられた過去のセンサデータに対して第２のベースコーラを最初に実行して、少なくともＴ個の感知サイクルに関連付けられたフェージングデータを推定することと、
推定されたフェージングデータを使用して、感知サイクルＮ１及び１つ以上の後続の感知サイクルに関連付けられた現在のセンサデータに対して第２のベースコーラを続いて実行することと、を含む、項１に記載の方法。
１３．センサデータが、フローセルのタイルの第１の１つ以上のクラスタから生成された第１のセンサデータであり、方法が、
一連の感知サイクル内の感知サイクルについて、フローセルのタイルの第２の１つ以上のクラスタから第２のセンサデータを生成することと、
第２のセンサデータに対して第１のベースコーラを実行することと、
第１のベースコーラによって、第２のセンサデータに対して第１のベースコーラを実行することに基づいて、第２のセンサデータに関連付けられた第３の分類情報を生成することと、
第３の分類情報が第２の１つ以上のクラスタについての最終分類情報の生成に適切であると決定することと、を更に含み、
第１のセンサデータに対して第２のベースコーラを実行することが、
（ｉ）第１の１つ以上のクラスタについての最終分類が第１及び第２のベースコーラの出力に基づき、（ｉｉ）第２の１つ以上のクラスタについての最終分類が第１のベースコーラの出力に基づくが第２のベースコーラの出力には基づかないように、第２のセンサデータに対して第２のベースコーラを実行せずに、第１のセンサデータに対して第２のベースコーラを実行することを含む、項１に記載の方法。
１４．第１の分類情報が不適切であると決定することが、
センサデータに関連付けられたコンテキスト情報を受信することと、
コンテキストデータに基づいて、第１の分類情報が閾値確率よりも高いエラーの確率を含むと決定することと、
第１の分類情報が閾値確率よりも高いエラーの確率を含むと決定することに基づいて、第１の分類情報がセンサデータのための最終分類情報の生成に不適切であると決定することと、を含む、項１に記載の方法。
１５．ベースコールのためのシステムであって、
検体のセットの強度発光を表す画像を記憶するメモリであって、強度発光が、配列決定実行の配列決定サイクル中に検体のセット中の検体によって生成され、メモリが、第１のベースコーラ及び第２のベースコーラのトポロジを更に記憶する、メモリと、
画像に関連付けられたコンテキスト情報を生成するように構成されたコンテキスト情報生成モジュールと、
画像に対して第１のベースコーラを実行し、それによって、画像に関連付けられた第１の分類情報を生成するように構成された１つ以上のプロセッサと、
画像に関連付けられた最終分類情報を生成する際に第１の分類情報の不足を決定するように構成された最終ベースコール決定モジュールと、を含み、
第１の分類情報の不足の決定に応じて、１つ以上のプロセッサは、画像に対して第２のベースコーラを実行することにより、画像に関連付けられた第２の分類情報を生成するように構成され、
最終ベースコール決定モジュールが、第２の分類情報に少なくとも部分的に基づいて、配列決定実行の１つ以上の最終ベースコールを含む最終分類情報を生成するように更に構成されている、システム。
１６．最終分類情報が、第１の分類情報に少なくとも部分的に更に基づいて生成される、項１５に記載のシステム。
１７．最終分類情報が、第１の分類情報及び第２の分類情報の加重和に基づいて生成される、項１５に記載のシステム。
１８．第１の分類情報が、第１の呼び出される塩基配列を含み、第１の分類情報の不足を決定するために、最終ベースコール決定モジュールが、
第１の呼び出される塩基配列が特定の塩基配列パターンに一致することを決定し、
第１の呼び出される塩基配列が特定の塩基配列パターンと一致することに基づいて、第１の分類情報が最終分類情報の生成に不適切であると決定する、項１５に記載のシステム。
１９．
特定の塩基配列パターンが、ホモポリマーパターン又は近ホモポリマーパターンを含む、項１８に記載のシステム。
２０．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基がＧである、項１８に記載のシステム。
２１．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基がＧである、項１８に記載のシステム。
２２．
特定の塩基配列パターンが、ＧＧＸＧＧ、ＧＸＧＧＧ、ＧＧＧＸＧ、ＧＸＸＧＧ、ＧＧＸＸＧのいずれかを含み、Ｘが、Ａ、Ｃ、Ｔ、又はＧのいずれかである、項１８に記載のシステム。
２３Ａ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれが、不活性ベースコールに関連付けられている、項１８に記載のシステム。
２３Ｂ．
特定の塩基配列パターンが、複数の塩基を含み、少なくとも最初及び最後の塩基のそれぞれのベースコールが、暗サイクルに関連付けられている、項１８に記載のシステム。
２３Ｃ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基の各々が、不活性ベースコールに関連付けられている、項１８に記載のシステム。
２３Ｄ．
特定の塩基配列パターンが、少なくとも５つの塩基を含み、特定の塩基配列パターンの少なくとも３つの塩基の各々が、暗サイクルに関連付けられる、項１８に記載のシステム。
２３．第１の分類情報の不足を決定するために、最終ベースコール決定モジュールが、
画像が生成されたクラスタ内の気泡の存在を検出し、
気泡の検出に基づいて、第１の分類情報が最終分類情報の生成に不十分であると決定する、項１５に記載のシステム。
２４．第１の分類情報の不足を決定するために、最終ベースコール決定モジュールが、
画像内の焦点外画像の存在を検出し、
焦点外画像の検出に基づいて、第１の分類情報が最終分類情報の生成に不十分であると決定する、項１５に記載のシステム。
２５．ベースコーラを漸進的に訓練するためのコンピュータプログラム命令が記憶された非一時的コンピュータ可読記憶媒体であって、命令が、プロセッサ上で実行されると、
一連の感知サイクルにおける感知サイクルについて生成されたセンサデータに対して第１ベースコーラを実行して、センサデータに関連付けられた第１の分類情報を生成することと、
（ｉ）センサデータに関連付けられたコンテキスト情報、及び（ｉｉ）第１の分類情報を処理することと、
コンテキスト情報及び第１の分類情報を処理することに基づいて、センサデータに対して第２のベースコーラを実行して、センサデータに関連付けられた第２の分類情報を生成することと、
第１の分類情報及び第２の分類情報に基づいて、最終分類情報を生成することであって、最終分類情報が、センサデータに対する１つ以上のベースコールを含むことと、を含む方法を実装する、非一時的コンピュータ可読記憶媒体。 Term set 2 (switching/selectively activating two base calls)
1. A computer-implemented method for base calling using at least two base callers, comprising:
performing a first base coder on the sensor data generated for the sensing cycle in the series of sensing cycles; and generating first classification information associated with the sensor data based on performing the first base coder on the sensor data with the first base coder;
determining that the first classification information is inadequate for generating final classification information for the sensor data;
responsive to determining the inappropriateness of the first classification information, performing a second base call on the sensor data, the second base call being different from the first base call;
generating second classification information associated with the sensor data based on performing a second base call on the sensor data with the second base call;
11. A computer-implemented method comprising: generating final classification information based on the first classification information and the second classification information, the final classification information comprising one or more base calls for the sensor data.
2. The first classification information includes a first called base sequence, and determining that the first classification information is inappropriate includes determining that the first called base sequence matches a specific base sequence pattern;
2. The method of claim 1, further comprising: determining that the first classification information is inappropriate for generating final classification information based on the first called base sequence matching a particular base sequence pattern.
3.
Item 3. The method according to Item 2, wherein the specific base sequence pattern comprises a homopolymer pattern or a near homopolymer pattern.
4.
Item 3. The method according to Item 2, wherein the specific base sequence pattern includes a plurality of bases, at least the first and last bases of which are G.
4A.
Item 3. The method according to Item 2, wherein the specific base sequence pattern contains at least five bases, and at least the first and last bases are G.
5.
Item 3. The method according to Item 2, wherein the specific base sequence pattern comprises a plurality of bases, at least three of which are G.
5A.
Item 3. The method according to Item 2, wherein the specific base sequence pattern contains at least five bases, and at least three bases of the specific base sequence pattern are G.
6.
Item 3. The method according to Item 2, wherein the specific base sequence pattern includes any one of GGXGG, GXGGG, GGGXG, GXXGG, and GGXXG, and X is any one of A, C, T, or G.
6A.
3. The method according to claim 2, wherein the particular base sequence pattern comprises a plurality of bases, and at least the first and last bases are each associated with an inactive base call.
6B.
Item 3. The method according to item 2, wherein the specific base sequence pattern includes a plurality of bases, and the base call of at least the first and last base is associated with the dark cycle.
6C.
3. The method according to claim 2, wherein the specific base sequence pattern comprises at least five bases, and each of at least three bases of the specific base sequence pattern is associated with an inactive base call.
6D.
Item 3. The method according to item 2, wherein the specific base sequence pattern comprises at least five bases, and each of at least three bases of the specific base sequence pattern is associated with a dark cycle.
7. The method of claim 2, wherein generating the final classification information includes generating the final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to the first called base sequence matching a particular base sequence pattern, the first weight and the second weight being different.
8.
a first base code implements a neural network model, and a second base code does not include a neural network model;
8. The method of claim 7, wherein the first weight is lower than the second weight, whereby the first classification information is weighted lower than the second classification information while generating the final classification information.
9.
a first base code implements a neural network model, and a second base code does not include a neural network model;
the second classification information includes a second called sequence;
the first called sequence does not match the second called sequence;
3. The method of claim 2, wherein the final called base sequence of the final classification information is generated such that, in response to (i) the first called base sequence matching a particular base sequence pattern and (ii) the second base call not including a neural network model, the final called base sequence of the final classification information matches the second called base sequence and does not match the first called base sequence.
10. Determining that the first classification information is inappropriate
Detecting the presence of an air bubble in the cluster from which the sensor data was generated;
and determining, based on the detection of an air bubble, that the first classification information is unsuitable for generating final classification information.
11. The second base caller implements a neural network model, and the first base caller does not include a neural network model and generates final classification information;
11. The method of claim 10, further comprising: generating final classification information by placing a first weight on the first classification information and a second weight on the second classification information, the second weight being greater than the first weight.
12.
the sensor data is current sensor data,
the current sensor data is for sensing cycle N1 and one or more subsequent sensing cycles, N1 being a positive integer greater than 1;
performing a second base call on the current sensor data;
first performing a second base call on past sensor data associated with at least T sensing cycles occurring prior to sensing cycle N1 to estimate fading data associated with at least T sensing cycles;
and subsequently performing a second base call on current sensor data associated with sensing cycle N1 and one or more subsequent sensing cycles using the estimated fading data.
13. The sensor data is first sensor data generated from a first one or more clusters of tiles of a flow cell, and the method comprises:
generating second sensor data from a second one or more clusters of tiles of the flow cell for a sensing cycle in the series of sensing cycles;
performing a first base call on the second sensor data;
generating third classification information associated with the second sensor data based on performing the first base call on the second sensor data by the first base call;
determining that the third classification information is suitable for generating final classification information for the second one or more clusters;
performing a second base call on the first sensor data;
2. The method of claim 1, comprising: (i) running a second base caller on the first sensor data, without running the second base caller on the second sensor data, such that a final classification for a first one or more clusters is based on the output of the first and second base callers, and (ii) a final classification for a second one or more clusters is based on the output of the first base caller but not on the output of the second base caller.
14. Determining that the first classification information is inappropriate includes:
Receiving context information associated with the sensor data;
determining, based on the context data, that the first classification information includes a probability of error that is higher than a threshold probability;
determining that the first classification information is unsuitable for generating final classification information for the sensor data based on determining that the first classification information includes a probability of error higher than a threshold probability.
15. A system for base calling, comprising:
a memory that stores an image representative of intensity luminescence of a set of analytes, the intensity luminescence being generated by analytes in the set of analytes during a sequencing cycle of a sequencing run, the memory further storing a topology of a first base collaborator and a second base collaborator;
a context information generation module configured to generate context information associated with the image;
one or more processors configured to perform a first base call on the image, thereby generating first classification information associated with the image;
a final base call determination module configured to determine a deficiency of the first classification information in generating final classification information associated with the image,
In response to determining a lack of the first classification information, the one or more processors are configured to generate second classification information associated with the image by performing a second base call on the image;
The system, wherein the final base call determination module is further configured to generate final classification information comprising one or more final base calls for the sequencing run based at least in part on the second classification information.
16. The system of claim 15, wherein the final classification information is generated further based at least in part on the first classification information.
17. The system of claim 15, wherein the final classification information is generated based on a weighted sum of the first classification information and the second classification information.
18. The first classification information includes a first called base sequence, and a final base call determination module is provided to determine deficiencies in the first classification information, the final base call determination module comprising:
determining that the first called sequence matches a particular sequence pattern;
16. The system of claim 15, wherein the first classification information is determined to be inappropriate for generating final classification information based on the first called base sequence matching a particular base sequence pattern.
19.
Item 19. The system according to Item 18, wherein the specific base sequence pattern comprises a homopolymer pattern or a near homopolymer pattern.
20.
Item 19. The system according to Item 18, wherein the specific base sequence pattern includes a plurality of bases, and at least the first and last bases are G.
21.
Item 19. The system according to Item 18, wherein the specific base sequence pattern includes at least five bases, and at least three bases of the specific base sequence pattern are G.
22.
Item 19. The system according to Item 18, wherein the specific base sequence pattern includes any one of GGXGG, GXGGG, GGGXG, GXXGG, and GGXXG, where X is any one of A, C, T, or G.
23A.
20. The system of claim 18, wherein the particular base sequence pattern comprises a plurality of bases, and at least the first and last bases are each associated with an inactive base call.
23B.
Item 19. The system of item 18, wherein the specific base sequence pattern includes a plurality of bases, and the base call of each of at least the first and last bases is associated with a dark cycle.
23C.
20. The system of claim 18, wherein the specific base sequence pattern comprises at least five bases, and each of at least three bases of the specific base sequence pattern is associated with an inactive base call.
23D.
20. The system of claim 18, wherein the specific base sequence pattern includes at least five bases, and each of at least three bases of the specific base sequence pattern is associated with a dark cycle.
23. A final base call determination module for determining deficiencies in the first classification information, comprising:
Detecting the presence of bubbles within the clusters from which the images are generated;
20. The system of claim 15, further comprising determining, based on detection of an air bubble, that the first classification information is insufficient to generate final classification information.
24. To determine the deficiency of the first classification information, a final base call determination module performs:
Detecting the presence of an out-of-focus image in the image;
20. The system of claim 15, further comprising determining, based on the detection of an out-of-focus image, that the first classification information is insufficient for generating final classification information.
25. A non-transitory computer readable storage medium having stored thereon computer program instructions for progressively training a base caller, the instructions, when executed on a processor, performing:
performing a first base call on the sensor data generated for a sensing cycle in the series of sensing cycles to generate first classification information associated with the sensor data;
(i) processing context information associated with the sensor data and (ii) the first classification information;
performing a second base call on the sensor data based on processing the context information and the first classification information to generate second classification information associated with the sensor data;
1. A non-transitory computer-readable storage medium implementing a method comprising: generating final classification information based on the first classification information and the second classification information, the final classification information comprising one or more base calls for the sensor data.

１００バイオセンサ
１０１励起光
１０２フローセル
１０４サンプリングデバイス
１０６、１０８、１１０、１１２センサ
１０６’、１０８’、１１０’、１１２’ ピクセル領域
１０６’’ 反応部位
１０６Ａ、１０６Ｂクラスタ対
１０８Ａ、１０８Ｂクラスタ対
１１０Ａ、１１０Ｂクラスタ対
１１２Ａ、１１２Ｂクラスタ対
１１４Ａ、１１４Ｂクラスタ対
１２０～１２６基質層
１３０導電ビア
１３２電気接点
１３４試料表面
１３６フローカバー
１３８側壁
１４２、１４６出口ポート
１４４フローチャネル
１４６出口ポート
２００フローセル
２０２レーン
２０２ａ、２０２ｂレーン
２０８拡大セクション
２１２タイル
２１６クラスタ
３０４クラスタ
４００配列決定マシン
４０１フローセル
４０２中央処理装置（ＣＰＵ）
４０３メモリ
４０４メモリ
４０５バス
４５０プロセッサ
４５１データフロー論理
４５２ベースコール実行論理
４５４プロセッサ
４５４データフロー経路
４５５制御経路
４５６Ｎａｔｕｒｅ
４６０メモリ
４６１バス
５００ライン
５０１画像処理スレッド
５０２品質スコアスレッド
５０３高速バス
５０４データキャッシュ
５０５高速バス
５１０ディスパッチ論理
５１１ライン
５１２ライン
５２０ハードウェア
６００ラッパー
６０１クラスタ
６０９ＣＰＵ通信リンク
６１０ＤＲＡＭ通信リンク
６１１ライン
６１２ライン
６１３ライン
６１５プロセスデータ
６１６ライン
７００パッチ
７０１～７０５スタック
７１０層
７１１～７１６層
７２０逆階層
７２１、７２２、７２３時間層
７３５出力分類データ
７４０ソフトマックス関数
７５０ベースコール確率
８０１～８０５、８１１、８１５、８２０アレイ
９１０層
９２４、９２５層
９３０ソフトマックス関数
１０００サイクル
１００１スタック
１０２０時間スタック
１０３５，２，７出力
１０３５出力
１４００ベースコールシステム
１４０４配列決定マシン
１４０５フローセル
１４０６タイル
１４０６ａエッジタイル
１４０６ｂ非エッジタイル
１４０７クラスタ
１４０７ａエッジクラスタ
１４０７ａ非エッジクラスタ
１４０７ｂ非エッジクラスタ
１４１２センサデータ
１４１４、１４１６ベースコーラ
１４１８コンテキスト情報生成モジュール
１４２０コンテキスト情報
１４２２スイッチングモジュール
１４２４及び１４２６イネーブル信号
１４２８ベースコール結合モジュール
１４３４及び１４３６ベースコール分類情報
１４４０最終ベースコール
１６０１コンテキスト情報
１６０４空間的コンテキスト情報
１６０６時間的コンテキスト情報
１６０８塩基配列コンテキスト情報
１６１０コンテキスト情報
１７００Ｃフェーディング
１７０２レーン
１９０１ルックアップテーブル（ＬＵＴ）
２１００ベースコールシステム
２１００システム
２１２８最終ベースコール決定モジュール
２２００バイオアッセイシステム
２２０２バイオセンサ
２２０４システムコントローラ
２２０６流体制御システム
２２０８流体貯蔵システム
２２０９照明システム
２２１０温度制御システム
２２１２インターフェース
２２１３ディスプレイ
２２１４ユーザインターフェース
２２１５ユーザ入力デバイス
２２１６ハウジング
２３２０通信ポート
２３３０主制御モジュール
２３３１流体制御モジュール
２３３２流体貯蔵モジュール
２３３３温度制御モジュール
２３３４デバイスモジュール
２３３５識別モジュール
２３３６ＳＢＳモジュール
２３３７増幅モジュール
２３３８分析モジュール
２３３９照明モジュール
２３４８テンプレート発生器
２３５８ベースコーラ
２４００コンピュータシステム
２４１０記憶サブシステム
２４２２メモリサブシステム
２４３２専用メモリ（read only memory、ＲＯＭ）
２４３４メインランダムアクセスメモリ（random access memory、ＲＡＭ）
２４３６ファイル記憶サブシステム
２４３８ユーザインターフェース入力デバイス
２４５５バスサブシステム
２４７２中央処理ユニット（ＣＰＵ）
２４７４ネットワークインターフェースサブシステム
２４７６ユーザインターフェース出力デバイス
２４７８深層学習プロセッサ
１４９７８深層学習プロセッサ 100 Biosensor 101 Excitation light 102 Flow cell 104 Sampling device 106, 108, 110, 112 Sensor 106', 108', 110', 112' Pixel area 106'' Reaction site 106A, 106B Cluster pair 108A, 108B Cluster pair 110A, 110B Cluster pair 112A, 112B Cluster pair 114A, 114B Cluster pair 120-126 Substrate layer 130 Conductive via 132 Electrical contact 134 Sample surface 136 Flow cover 138 Side wall 142, 146 Exit port 144 Flow channel 146 Exit port 200 Flow cell 202 Lanes 202a, 202b Lanes 208 Enlarged section 212 Tile 216 Cluster 304 Cluster 400 Sequencing machine 401 Flow cell 402 Central Processing Unit (CPU)
403 Memory 404 Memory 405 Bus 450 Processor 451 Data flow logic 452 Base call execution logic 454 Processor 454 Data flow path 455 Control path 456 Nature
460 Memory 461 Bus 500 Line 501 Image processing thread 502 Quality score thread 503 High speed bus 504 Data cache 505 High speed bus 510 Dispatch logic 511 Line 512 Line 520 Hardware 600 Wrapper 601 Cluster 609 CPU communication link 610 DRAM communication link 611 Line 612 Line 613 Line 615 Process data 616 Line 700 Patch 701-705 Stack 710 Layer 711-716 Layer 720 Inverted hierarchy 721, 722, 723 Time layer 735 Output classification data 740 Softmax function 750 Base call probability 801-805, 811, 815, 820 Array 910 Layer 924, 925 Layer 930 Softmax function 1000 Cycle 1001 Stack 1020 Time stack 1035, 2, 7 Output 1035 Output 1400 Base calling system 1404 Sequencing machine 1405 Flow cell 1406 Tile 1406a Edge tile 1406b Non-edge tile 1407 Cluster 1407a Edge cluster 1407a Non-edge cluster 1407b Non-edge cluster 1412 Sensor data 1414, 1416 Base caller 1418 Context information generation module 1420 Context information 1422 Switching modules 1424 and 1426 Enable signal 1428 Base call combination modules 1434 and 1436 Base call classification information 1440 Final base call 1601 Context information 1604 Spatial context information 1606 Temporal context information 1608 Base sequence context information 1610 Context information 1700C Fading 1702 Lane 1901 Look-up table (LUT)
2100 Base calling system 2100 System 2128 Final base call determination module 2200 Bioassay system 2202 Biosensor 2204 System controller 2206 Fluidic control system 2208 Fluid storage system 2209 Illumination system 2210 Temperature control system 2212 Interface 2213 Display 2214 User interface 2215 User input device 2216 Housing 2320 Communication port 2330 Main control module 2331 Fluidic control module 2332 Fluid storage module 2333 Temperature control module 2334 Device module 2335 Identification module 2336 SBS module 2337 Amplification module 2338 Analysis module 2339 Illumination module 2348 Template generator 2358 Base caller 2400 Computer system 2410 Storage subsystem 2422 Memory subsystem 2432 Read only memory (ROM)
2434 Main random access memory (RAM)
2436 File Storage Subsystem 2438 User Interface Input Device 2455 Bus Subsystem 2472 Central Processing Unit (CPU)
2474 Network interface subsystem 2476 User interface output device 2478 Deep learning processor 14978 Deep learning processor

Claims

1. A computer-implemented method for base calling using at least two base callers, comprising:
performing at least a first base coder and a second base coder on the sensor data generated for a sensing cycle in the series of sensing cycles;
generating first classification information associated with the sensor data based on executing the first base caller on the sensor data by the first base caller;
generating second classification information associated with the sensor data based on executing the second base caller on the sensor data by the second base caller;
generating final classification information based on the first classification information and the second classification information, the final classification information comprising one or more base calls for the sensor data.

The method of claim 1, wherein at least one of the first base chorus and the second base chorus implements a non-linear function and at least one other of the first base chorus and the second base chorus is at least partially linear.

The method of claim 1 or 2, wherein at least one of the first base coder and the second base coder implements a neural network model, and at least another of the first base coder and the second base coder does not include a neural network model.

the first classification information produced by the first base caller includes, for each base calling cycle, (i) a first plurality of scores, each score of the first plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a first called base;
4. The method of claim 1, wherein the second classification information produced by the second base caller comprises, for each base calling cycle, (i) a second plurality of scores, each score of the second plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a second called base.

The method of any one of claims 1 to 4, wherein the final classification information includes, for each base calling cycle, (i) a third plurality of scores, each score of the third plurality of scores indicating a probability that the called base is one of A, C, T, or G, and (ii) a final called base.

The method of any one of claims 1 to 5, wherein at least one of the first base caller and the second base caller uses a softmax function to generate the corresponding scores.

generating the final classification information,
7. The method of claim 1, comprising generating the final classification information by selectively combining the first classification information and the second classification information based on context information associated with the sensor data.

The method of claim 7, wherein the context information associated with the sensor data includes temporal context information, spatial context information, sequence context information, and other context information.

The method of claim 7 or 8, wherein the context information associated with the sensor data includes temporal context information indicative of one or more base call cycle numbers associated with the sensor data.

The method of any one of claims 7 to 9, wherein the context information associated with the sensor data includes spatial context information indicating a location of one or more tiles within the flow cell that generate the sensor data.

The method of any one of claims 7 to 10, wherein the context information associated with the sensor data includes spatial context information indicating a location of one or more clusters within a tile of the flow cell generating the sensor data.

The method of claim 10 or 11, wherein the spatial context information indicates whether the one or more clusters within the tile of the flow cell generating the sensor data are edge clusters or non-edge clusters.

The method of any one of claims 10 to 12, wherein a cluster is classified as an edge cluster if the cluster is estimated to be located within a threshold distance from an edge of the tile.

The method of any one of claims 10 to 13, wherein a cluster is classified as a non-edge cluster if the cluster is estimated to be located more than a threshold distance from any edge of the tile.

The method according to any one of claims 7 to 14, wherein the context information associated with the sensor data includes base sequence context information indicating a base sequence that is called for the sensor data.

for a particular base being called, the first classification information includes a first score, a second score, a third score, and a fourth score indicating probabilities that the base being called is A, C, T, and G, respectively;
for the particular called base, the second classification information comprises a fifth score, a sixth score, a seventh score and an eighth score indicating the probability that the called base is A, C, T and G, respectively;
generating the final classification information,
16. The method of any one of claims 7 to 15, comprising generating the final classification information for the particular base called based on the first score, the second score, the third score, the fourth score, the fifth score, the sixth score, the seventh score, and the eighth score.

the final score comprises a first final score that is a function of the first score and the fifth score, the first final score indicating a probability that the called base is A;
the final score comprises a second final score that is a function of the second score and the sixth score, the second final score indicating a probability that the called base is C;
the final score comprises a third final score that is a function of the third score and the seventh score, the third final score indicating the probability that the called base is T;
17. The method of claim 16, wherein the final score comprises a fourth final score that is a function of the fourth score and the eighth score, the fourth final score indicating the probability that the called base is G.

the first final score is an average, a normalized weighted average, a minimum, or a maximum of the first score and the fifth score;
the second final score is the average, normalized weighted average, minimum, or maximum of the second score and the sixth score;
the third final score is the average, normalized weighted average, minimum, or maximum of the third score and the seventh score;
18. The method of claim 17, wherein the fourth final score is an average, a normalized weighted average, a minimum or a maximum of the fourth score and the eighth score.

for a particular base being called, the first classification information includes a first called base having a highest corresponding score among the first score, the second score, the third score, and the fourth score, the first called base being one of A, C, T, and G;
19. The method of claim 17 or 18, wherein for the particular base called, the second classification information includes a second called base that is one of A, C, T, and G and has the highest corresponding score among the fifth score, the sixth score, the seventh score, and the eighth score.

for a particular base being called, the first classification information includes a first called base being one of A, C, T, and G;
for the particular base called, the second classification information includes a second called base that is the same as the first called base;
generating the final classification information,
20. The method of any one of claims 1 to 19, comprising generating the final classification information such that for the particular base called, the final classification information includes a final called base that matches the first called base and the second called base.

for a particular base being called, the first classification information includes a first called base being one of A, C, T, and G;
for the particular base called, the second classification information includes a second called base that is another of A, C, T, and G, whereby the second called base does not match the first called base;
generating the final classification information,
21. The method of any one of claims 1 to 20, comprising generating the final classification information such that for the particular base called, the final classification information includes one of (i) the first called base, (ii) the second called base, or (iii) a final called base that is marked as uncertain.

At least one of the first classification information, the second classification information, and the final classification information indicates a called base sequence having a specific base sequence pattern;
22. The method of claim 1, wherein in response to the indication that the called base sequence has the particular base sequence pattern, generating the final classification information by placing a first weight on the first classification information and a second weight on the second classification information, wherein the first weight and the second weight are different.

the sensor data includes (i) first sensor data for a first one or more sensing cycles; and (ii) second sensor data for a second one or more sensing cycles occurring subsequent to the first one or more sensing cycles;
The final classification information is
(i) first final classification information for the first one or more sensing cycles generated by: (a) placing a first weight on the first classification information associated with the first one or more sensing cycles; and (b) placing a second weight on the second classification information associated with the first one or more sensing cycles;
(i) second final classification information for the second one or more sensing cycles generated by: (a) placing a third weight on the first classification information associated with the second one or more sensing cycles; and (b) placing a fourth weight on the second classification information associated with the second one or more sensing cycles;
The method of any one of claims 1 to 22, wherein the first, second, third and fourth weights are different.

the first base code implements a neural network model and the second base code does not include a neural network model;
the first weight is lower than the second weight, such that for the first one or more sensing cycles, the second classification information from the second base caller is emphasized over the first classification information from the first base caller;
24. The method of claim 23, wherein the third weight is higher than the fourth weight, thereby emphasizing the first classification information from the first base caller over the second classification information from the second base caller for the second one or more sensing cycles.

the sensor data includes (i) first sensor data from a first one or more clusters of tiles of a flow cell; and (ii) second sensor data from a second one or more clusters of tiles of the flow cell, and the final classification information includes:
(i) first final classification information for the first sensor data from the first one or more clusters, the first final classification information being generated by: (a) placing a first weight on the first classification information from the first one or more clusters; and (b) placing a second weight on the second classification information from the first one or more clusters.
(i) second final classification information for the second sensor data from the second one or more clusters, the second final classification information being generated by: (a) placing a third weight on the first classification information from the second one or more clusters; and (b) placing a fourth weight on the second classification information from the second one or more clusters;
The method of any one of claims 1 to 24, wherein the first, second, third and fourth weights are different.

the first one or more clusters are edge clusters located within a threshold distance from one or more edges of the tiles of the flow cell;
26. The method of claim 25, wherein the second one or more clusters are non-edge clusters located beyond the threshold distance from the one or more edges of the tile of the flow cell.

detecting the presence of one or more air bubbles in at least one cluster of tiles of a flow cell from the sensor data;
generating the final classification information,
27. The method of any one of claims 1 to 26, comprising generating the final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to the detection of the one or more bubbles, the first weight and the second weight being different.

The sensor data includes at least one image, and the method further comprises:
detecting that the at least one image is an out-of-focus image;
generating the final classification information,
28. The method of any one of claims 1 to 27, comprising generating the final classification information by placing a first weight on the first classification information and a second weight on the second classification information in response to the detection of the out-of-focus image, the first weight and the second weight being different.

1. A computer-implemented method comprising:
generating sensor data for a sensing cycle in the series of sensing cycles;
executing at least a first base caller and a second base caller on at least a corresponding portion of the sensor data, and selectively switching between execution of the first and second base callers based on context information associated with the sensor data, wherein the first base caller is different from the second base caller;
generating first and second classification information based on the first and second base collases, respectively;
generating a base call based on one or both of the first classification information and the second classification information.

A non-transitory computer readable storage medium having stored thereon computer program instructions for progressively training a base caller, the instructions, when executed on a processor, performing:
performing at least a first base coder and a second base coder on the sensor data generated for a sensing cycle in the series of sensing cycles;
generating first classification information associated with the sensor data based on executing the first base caller on the sensor data by the first base caller;
generating second classification information associated with the sensor data based on executing the second base caller on the sensor data by the second base caller;
generating final classification information based on the first classification information and the second classification information, the final classification information comprising one or more base calls for the sensor data.