JP3546582B2

JP3546582B2 - Semiconductor device

Info

Publication number: JP3546582B2
Application number: JP05132196A
Authority: JP
Inventors: 隆夫渡部; 一重鮎川; 良藤田; 一正柳沢; 均田中
Original assignee: Renesas Technology Corp
Current assignee: Renesas Technology Corp
Priority date: 1996-03-08
Filing date: 1996-03-08
Publication date: 2004-07-28
Anticipated expiration: 2016-03-08
Also published as: JPH09331019A

Description

【０００１】
【発明の属する技術分野】
本発明は、メモリを集積した半導体装置に係わり、特に複数のI/O線をもつメモリと論理回路とを同一の半導体チップ上に集積した半導体装置において、種々の目的に応じたLSIチップを短期間で設計するための方式およびそれによる製品群を与え、また上記メモリ・論理回路間のデータの転送パターンを高速に変化させることのできる高集積なデータ転送回路の方式を与えるものである。
【０００２】
【従来の技術】
多数の演算器、プロセッサおよびメモリを相互結合する並列計算システムを同一の半導体チップ上に集積する試みとして、米国特許5371896をあげることができる。この従来例では、複数のメモリと複数の演算回路を同一の半導体チップ上に集積され、両者の間がクロスバスイッチからなるネットワークで結合される。この従来例は、必要に応じてSIMD(Single Instruction Multi Data Stream)動作とMIMD(Multi Instruction Multi Data Stream)動作切り換えて行うことができることが特長である。SIMD動作時には、複数のメモリのうち１つがインストラクションメモリとして使われ、残りのメモリがデータメモリとして使われる。演算回路には、インストラクションメモリからの命令が共通に与えられる。MIMD動作時には、SIMD動作時にデータメモリとして使われたメモリの一部がインストラクションメモリとして使われることにより、個々の演算回路に、別々のインストラクションメモリからの命令が与えられる。個々のメモリと演算回路との間のデータ転送経路は、クロスバネットワークにより様々に切り換えることができる。
【０００３】
【発明が解決しようとする課題】
メモリを集積した半導体装置は上記のほかにも種々考案されているが、最近特にDRAM(Dynamic Random Access Memory)などからなる比較的高集積のメモリと論理回路とを同一の半導体チップに集積したものが注目を集めている。このようなLSIは、一般にユーザの要求を受けて半導体メーカがそれを作り始めるため、チップの完成までの時間(Time to Customers)の短縮が必要である。しかし、一方、必要となるメモリ容量や演算回路の種類は用途によって異なる。この相反する要求を満たすには、設計方式から改革する必要がある。しかし、従来の高集積メモリ、特にDRAMでは、仕様が標準化されているため、そのままの設計方式では、上記の要求に対応することができなかった。
【０００４】
さらに、DRAMなどの高集積メモリと論理回路とを同一の半導体チップに集積する場合には、それを単に集積しただけでは個別チップに対して大きなメリットが生じにくい。コストと要求性能を考慮すると、1cm角程度の半導体チップ上に大容量のメモリと大規模な論理回路や演算回路を集積し、両者の間の結合線の本数を数百本以上とし、かつてデータ転送速度を１GigaByte/sec以上にする必要がある。したがって、メモリと論理回路とを結合する結合回路として、高速かつ高集積なものが必要である。しかし、上記従来例のようにクロスバスイッチを用いた場合、結合線の数が増加するとスイッチの個数が膨大となりハードウエアの規模が増大し、遅延も増大してしまう。上記の従来技術のように独立した複数のメモリと複数の演算回路との間のデータ転送経路を切り換える場合には一般にメモリや演算回路の数も少ないので従来の並列計算機で使われていた方式をそのまま同じチップ上に実現するのも可能である。しかしながら数百本以上ものメモリのI/O線群と論理回路や演算回路とのI/O線群の間の対応を切り換える場合には、集積度と動作速度の要求がきびしく、従来の方式をそのまま利用するのは困難である。
【０００５】
【課題を解決するための手段】
上記の第一の課題を解決するために、本発明では多くのI/O線をもち、容量の異なるメモリコアとメモリコアのI/O線のピッチに合わせて設計した結合回路用のモジュールのレイアウトパターンをあらかじめ作ってデータベースに記憶させる。さらに、論理回路を合成するための論理ライブラリも作成し、データベースに記憶させる。データベースには、それらのレイアウトパターンや仕様、特性など設計に必要なデータを記憶させる。
【０００６】
結合回路用のモジュールは、スイッチ群とバッファ群とからなり、組合せて結合回路を構成できる。スイッチ群は、入力されたデータをその中でその順番を入れ替えることができるものである。複数のスイッチ群を接続して、転送パターンに合わせて所望の転送パターンに対応するスイッチ群を活性化することにより、高速に転送パターンを切り替えることができる。これらのモジュールは、メモリコアのI/O線のピッチに合わせて作られることとされ、レイアウトパターンを変更することなくメモリコアのI/O線にそのまま結合できる。
【０００７】
上記のように本発明によれば、メモリコア、結合回路用モジュール、論理ライブラリのレイアウトパターンがデータベースにあらかじめ登録されており、なおかつメモリコアと結合回路用モジュールとの配線ピッチがそろえられることとされ、そのまま結合して使うことができる。したがって、ユーザからLSIチップの仕様が与えられてからの設計を短期間に終わらせることができる。すなわち、必要な容量のメモリコアと仕様に合った転送回路を作るためのモジュールとをデータベースから取り出して組合せ、さらに論理部分は、論理合成用のCADツールを用いて論理ライブラリから所望の論理回路を合成すればよい。それらの間の配線は、配置配線CADツールにより高速にできる。したがって、メモリと論理回路とを集積したチップが短期間にできる。
【０００８】
さらに、上記の結合回路では、メモリと論理回路とで転送されるデータが通過するのは活性化されたスイッチ群のみであるために高速なデータ転送が実現できる。さらに、転送パターン数に合わせて段数を増減するため転送パターンが少ない場合には無駄な占有面積がない。
【０００９】
【発明の実施の形態】
[メモリコアを用いたシステムLSIの設計方法]
図１には、本発明に係るメモリコアを内蔵したシステムLSIの概念をが示されている。図１を用いて本発明に係るLSIの設計方法を説明する。
【００１０】
図１の左に示されるのは、コア回路、論理ライブラリのレイアウトパターンや特性を登録したデータベース用記憶装置DBである。ここには、多くのI/O線をもち容量の異なる複数のメモリコアMRと、メモリコアのI/O線のピッチに合わせて設計した転送回路（結合回路）TG用のモジュール群と、論理回路を合成するための基本ゲートからなる論理ライブラリLLとのレイアウトパターンや仕様、特性など設計に必要なデータをあらかじめ記憶する。
【００１１】
ここで、転送回路TG用のモジュールは、スイッチ群SWGとバッファ群TGBUFi
からなり、組合せて転送回路TGを合成できる。詳しくは後述するが、複数のスイッチ群を接続することにより様々な転送パターンを持つ転送回路TGを合成することができる。これらのモジュールは、メモリコアMRのI/O線のピッチに合わせて作られているので、レイアウトパターンを変更することなくメモリコアMRのI/O線にそのまま結合できる。
【００１２】
LSIチップの仕様が与えられると上記データベースDBから必要なデータを設計用ワークステーションWSに転送しながら設計を行なう。メモリコアMRと転送回路TG用モジュールの配線ピッチがそろっているため、こららはそのまま結合して使うことができる。すなわち、必要な容量のメモリコアMRと仕様に合った転送回路TGを作るためのモジュールをデータベースDBから取り出して組合せればよい。論理部分は、論理合成用のCADツールを用いることにより、論理ライブラリLLから所望の論理回路LCを容易に合成できる。最後にチップのフロアプランに合わせてそれらを配置し、その間の配線を、配置配線CADツールにより行なえばチップのレイアウトデータが完成する。このようにして、メモリコアMRを内蔵したシステムLSIの製品群を短期間に設計できる。
【００１３】
なお、ここでは論理ライブラリLLを用いて論理を合成する例を示したが、場合によってはチップの一部をゲートアレイにして論理を合成してもよい。その場合は、メモリコアMRが共通で論理が異なるチップを容易に製造できるという利点がある。
【００１４】
図１の右下に上記のようにして設計したチップの例が２つ示される。半導体チップLSI-Aは、メモリコアMRと論理回路LCを転送回路TGで結合したブロックA,B,C,Dを４つ並べて、その中心にチップ全体を制御する制御回路CCを配置したものである。半導体チップLSI-Bは、メモリコアMRと論理回路LCを転送回路TGで結合したブロックA,Bを２つ並べて、中心にチップ全体を制御する制御回路CCを配置したものである。
【００１５】
本発明では、もちろん一つのメモリコアMRを用いるチップも実現できるが、本例のように複数のブロックを集積するチップも容易に設計できる。その場合、各ブロックのメモリコアMR、論理回路LCを異なるものとしてもよいし、同一の構成にしてもよい。前者は、異なる処理を同一のチップで並列に行なうものに適しており、後者は、同一の処理を並列に行なうものに適している。特に後者は、グラフィックス、自然画像処理、ニューラルネットワークなど並列動作が可能な処理を行なうものに適している。
【００１６】
半導体チップLSI-A,LSI-BのどちらもメモリコアMRとデータの授受を行なう論理回路LCをメモリコアMRに近接しているため配線遅延の影響が少なく高速のデータ転送が実現できる。また、制御回路CCから各ブロックへの距離が半導体チップLSI-Bでは等しく、半導体チップLSI-Aでも差が少ないので、制御信号のスキューが小さくできるという利点がある。
【００１７】
半導体チップLSI-Bでは、論理回路LCを制御回路CCに近接したが、メモリコアMRの制御信号の配線を短くして配線遅延を少なくする必要がある場合には、ブロックを制御回路CCに対して反転させメモリコアMRを制御回路CCに近接してもよい。
なお、半導体チップLSI-Aにおいて制御回路CCからの距離がブロックAとBおよびDとCで異なることが問題となる場合も考えられる。その場合には、半導体チップLSI-Bのような配置を行なって制御回路CCの左右にブロックを２つずつ配置すればよい。
【００１８】
ブロックの形状が横に長い場合には、そのようにするとチップの短辺と長辺の差が大きくなりすぎる場合がある。そのような場合には、図１に示される半導体チップLSI-Aの配置のまま、制御信号の入力端子をブロックの片側の面に集中させ、ブロックAとBおよびDとCを反転して配置することにより、ブロック同士が隣接する面に制御信号の入力端子が来るようにできる。これにより制御信号のスキューを減少することができる。以下では、図１に示される転送回路TGについて詳しく説明する。
【００１９】
[多重I/Oメモリコア内蔵LSI]
図２には、本発明に係る多重I/Oのメモリ内蔵LSIの例が示される。図２に示される半導体チップSICは、複数のI/O線MIOiをもつメモリコアMRと、複数のI/O線LIOiをもつ論理回路LCと、メモリコアMRと論理回路LCの間のデータの転送パターンを制御する転送回路TG等とを単一の単結晶シリコン等からなる半導体基板に集積したものである。
【００２０】
論理回路LCの内容は論理ライブラリLLを用いて目的に応じたものを合成すればよい。ここでは、画像あるいはグラフィックスに適する一例が示される。メモリコアMRに記憶された画素に対して演算を行なう演算器群ARWとメモリコアMRの内容を画面に表示するために一定の速度で読み出すための表示用バッファDBRならびにそれらとメモリコアMRを制御するための制御回路LCCから構成されている。
【００２１】
メモリコアMRは、複数のデータ線DLと複数のワード線WLならびにそれらの交
点に形成されたメモリセルMCを有する。メモリセルMCは、１トランジスタ・１キャパシタのDRAMセル, ４又は６トランジスタのSRAM(Static Random Access
Memory)セル,１トランジスタの不揮発性のフラッシュメモリセルなどを用いることができる。なお、以下では書き込みと読み出しのできるいわゆるRAM型を仮定するが読みだし専用のいわゆるROM型のものを使用する場合にも本発明は有効である。メモリコアMRへのデータの書込み読み出しは、読み出し書込み回路RWCにより制御され、周辺回路PERによって選択された、複数のメモリセルMCに複数のI/O線から並列にデータを読み書きすることができる。周辺回路PERには、論理回路LCからのメモリコア制御信号MRC、制御信号CTL及びアドレス信号DATA等のバスが接続されている。メモリコアMRは、論理回路LCの基準信号であるクロック信号と同期して制御信号、アドレス信号、I/O信号を入力したり、出力したりする。
【００２２】
論理回路LCは、メモリコアMRから転送回路TGを通じて読み出されるデータや半導体チップSIC外部からのデータに対して演算を行う。その結果を再び転送回路TGを通じてメモリコアMRへ書き込んだり、半導体チップSIC外部へ出力する。
【００２３】
転送回路TGは、多段に接続されたスイッチ群SWGより構成され、制御信号TGCiによってメモリコアMRの複数のI/O線MIOiと論理回路LCの複数のI/O線LIOiとの間の接続関係（以下、転送パターンという。）を切り換えることができる。
【００２４】
図３には、転送パターンの例としてP0からP7まで８つのパターンを実現する場合が示される。この例は、2のn乗本のI/O線MIOiとLIOiに対して、その1/4(２の(n−2)乗)を単位としたMIO0,1,2,3とLIO0,1,2,3の対応を切り換えるものである。転送単位が2のn乗本である必要はなく、また全ての転送単位が等しくなくとも本発明を適用できることはもちろんである。矢印の向きはデータの流れを示しており転送パターンP1はメモリへの書込みのみに使用し、残りのパターン(P0,P2〜P7)は読み出し、書込み両方に使う。
【００２５】
転送パターンP0は、そのままデータの入れ替えなしに転送するパターンである。転送パターンP1は、(LIO0,1)に入力するデータを(MIO0,1), (MIO2,3)に伝達してメモリへ書き込むためのものである。この例は他のパターンとは異なり異なるメモリのI/O線が導通する。このため、読出し時には異なるデータが衝突する場合があるので書込み時にのみ使用する。このパターンは後述するようにメモリの内容を高速に初期化するなどに有効である。
【００２６】
転送パターンP2およびP3はそれぞれ(LIO0,1)と(MIO0,1)、(LIO0,1)と (MIO2,3)との間に転送経路を形成するものである。転送パターンP4からP7はそれぞれ(LIO1)と(MIO0)、(LIO1)と(MIO1)、(LIO1)と(MIO2)、(LIO1)と (MIO3)との間に転送経路を形成するものである。
【００２７】
８つの転送パターン(P0〜P7)は、制御信号TGCiにより自由に切り換えることができる。それぞれの転送パターンは、転送回路TG内のひとつのスイッチ群SWGをオンすることにより実現できる。たとえば、転送パターンP0は、図２に示されるスイッチ群SWG#0をオンすることにより実現できる。転送回路TGの具体的な構成は後述する。
【００２８】
本実施例では、メモリコアMR、転送回路TG、論理回路LCを同一の半導体チップ上に形成するので数十本から数百本のI/O線を容易に配線することが可能である。
【００２９】
以下、図２に示される多重I/Oメモリコア内蔵LSIの動作を説明する。
【００３０】
まず、読出し動作を説明する。メモリコアMR内の周辺回路PERにより一本のワード線WLを選択すると、そのワード線WL上のメモリセルMC群からデータ線DLにデータが読み出され、読出し書込み回路RWCを通じて複数のI/O線MIOiに並列にデータが読み出される。制御信号TGCiにより、転送回路TG内のスイッチ群SWGのうち一つが活性化されるとメモリコアMRの複数のI/O線MIOiと論理回路LCの複数のI/O線LIOiの間の転送パターンが確定しI/O線MIOiからI/O線LIOiにデータが転送され、論理回路LCに入力される。
【００３１】
書込み動作もデータの流れが逆になる以外は同様である。すなわち、論理回路LCから複数のI/O線LIOiに出力されたデータは、制御信号TGCiにより確定した転送パターンにしたがってI/O線LIOiからI/O線MIOiに転送され、読出し書込み回路を通じてデータ線DLに伝達され、さらに選択されたワード線WL上のメモリセルMCに並列に書き込まれる。
【００３２】
読出しあるいは書込みを連続に行ったり、交互に行ったりする際には、サイクル毎に選択択するワード線WLや転送パターンを切り換えて動作させることができる。したがって、論理回路LCの要求に応じてサイクル毎に異なるアドレスに対応するメモリセルMCに並列に読出し書込みを行うことができる。
【００３３】
本実施例によれば、メモリコアMRと論理回路LCの間のデータの授受は、一段のスイッチ群SWGを通じて行われるため、非常に高速なデータ転送が実現できる。また、I/O線MIOiとLIOiが同一方向に走るようにメモリコアMRと論理回路LCを配置するため、メモリコアMRと論理回路LCの間に転送回路TGを配置することができる。転送回路TGのスイッチ群SWGの段数は転送パターンに応じて決まるため転送パターン数が少ない場合には転送回路のデータ線方向の寸法（図２の横方向）を小さくすることが可能である。したがって、図２に示されるようにメモリコアMRのワード線WL方向の寸法（図２の縦方向）に収まるように転送回路TGと論理回路LCをレイアウトすると余分な面積をとることなく全体の面積を小さくすることができる。
【００３４】
なお、周辺回路PERは、上記のようにワード線WLを選択するXデコーダのみを含んでもよいし、データ線の一部を選んでI/O線に接続するYデコーダを含んでもよい。本実施例によればI/O線は多数設けることができるので、通常、Yデコーダは、例えば1024本のデータ線のうち128本を選ぶような簡単なものでよい。
【００３５】
[転送回路の第１の具体例]
次に転送回路TGの具体的な回路例を図４を使って説明する。図４には、図２に示される転送パターンを実現する転送回路TGの回路例実施が示される。
【００３６】
図４において、MIO0, MIO1, MIO2, MIO3はメモリコアMRのI/O線であり、LIO0,LIO1, LIO2, LIO3は論理回路LCのI/O線である。また、SWG0, SWG1, .., SWG7はスイッチ群であり、TGBUF0, TGBUF1, TGBUF2, TGBUF3はバッファ回路である。
【００３７】
TGC0, TGC1,.., TGC7はそれぞれスイッチ群SWG0, SWG1, .., SWG7をオンオフする信号である。スイッチ群SWGのスイッチSWをどの種類のトランジスタで構成するかに依存するが、ここでは制御信号TGCiを高電位とするとその制御信号TGCiが印加されているスイッチSWがオンし、低電位とするとオフするとする。たとえば制御信号TGC3を高電位とし、他の制御信号を低電位とするとスイッチ群SWG3内の矢印で示した２つのスイッチSWがオンする。その結果P3の転送パターンが形成され、メモリコアのI/O線MIO2, MIO3と論理回路LCのI/O線LIO0, LIO1との間に転送経路ができる。その他の転送パターンも同様にして制御信号TGCiのうちの一つを高電位とすれば実現できる。
【００３８】
バッファ回路TGBUF0, TGBUF1, TGBUF2, TGBUF3はI/O線の付加容量の影響で
信号が遅延するのを避けるためにつけたものである。この回路の構成例は図５に示される。図５を用いてバッファ回路TGBUFiの動作を説明する。
【００３９】
バッファ回路TGBUFiは、メモリコアMRの読み出し書き込み動作に合わせてデータの流れを切り替える両方向のバッファであるとともに、転送パターンが形成されたときに使用しない論理回路LCのI/O線LIOiの電位をラッチする働きを持つ回路である。
【００４０】
図３に示される例では転送パターンP0を除いていずれも論理回路LCのI/O線LIOiの一部は使用されない。使用されないI/O線の電位が確定しないでいわゆるフローティング状態になると電荷のリークにより中間電位となる可能性がある。その場合、論理回路LC側でCMOS(Cmplement Metal Oxide Semiconductor)トランジスタのゲートに入力されていると過剰な電流が定常的に流れてしまう。それを避けるために論理回路LCのI/O線LIOiのうち使用しないものは電位をラッチする。
【００４１】
論理回路LCのバッファ回路TGBUFiのイネーブル信号LIOEiを低電位とすると図４に示される論理から明らかなように信号TGWi, TGRiが低レベル、信号TGWBi, TGRBiが高レベルとなりクロックドインバータ回路RINV,WINVがオフする。さらに信号LIOPRiがゲートに入力されているMOSトランジスタQ1がオンしてI/O信号LIOiは低レベルにラッチされる。使用するI/O信号LIOiについては、イネーブル信号LIOEiを高電位とする。データ方向の切り替えは以下のように行う。
【００４２】
メモリコアMRが読み出し動作のときには、信号TGRWを低電位とする。するとイネーブル信号LIOEiが高電位であるときには、読み出し用クロックドインバータRINVのみが活性化されI/O線LIOi'からI/O線LIOiにデータが転送される。一方、メモリコアMRが書き込み動作のときには、信号TGRWを高電位とする。するとイネーブル信号LIOEiが高電位であるときには、書き込み用クロックドインバータWINVのみが活性化されI/O線LIOiからI/O線LIOi'にデータが転送されスイッチSWを通じてメモリコアMRのI/O線MIOiにデータが転送される。
【００４３】
以上説明したように図４、図５に示される実施例を用いれば、転送されるデータが通過するスイッチSWの段数は一段なので高速な動作が実現できる。また、スイッチSWの段数は転送パターン数に等しいので、無駄なレイアウト領域が不要で高集積化が可能である。さらに、論理回路LCのI/O線LIOiのうち使用しないI/O線のバッファ回路TGBUFiを停止し、さらに電位がフローティング状態になることを避けらるので無駄な電力消費がなく論理回路LCのゲートに過剰な電流が流れることを防止できる。したがってI/O線の一部を使用しない転送パターンを自由に設定できる。
【００４４】
なお、図４では、スイッチ群SWG内のスイッチSWのうち制御信号TGCiが入力されない不必要なものも設けてある。これは以下の理由による。図４からわかるように転送回路TGのスイッチ群SWGはスイッチSWの制御信号TGCiとの接続、I/O線MIOiとの接続に必要な配線、コンタクト以外は転送パターンによらず共通の形状をしている。したがって、スイッチのTGCiとの接続、MIOiとの接続に必要な配線、コンタクトをのぞく共通部分をレイアウトライブラリとして用意しておけば、チップのレイアウト設計が容易となるためである。さらに、万一転送パターンを修正する場合にもスイッチ群SWG内のスイッチSWをすべて作っておけばスイッチSWを構成するトランジスタ部のマスクの修正は不要なため修正するマスクの枚数を削減できる。本発明のようなメモリ、論理混載のチップは用途によってメモリ容量や論理の構成を変える必用がある。したがって、何種類かのメモリコアMRと転送回路TG用の上記スイッチSWG群の基本パターンをライブラリとして用意しておけば、それらから必要なものを選んで、さらに論理部分を論理用基本ライブラリLLを用いて合成して配置配線を行うことによりLSIチップのマスクを迅速に設計することができる。
【００４５】
[転送回路の第２の具体例]
もちろんI/O線に接続されるスイッチSWの数が増えると付加容量の増大による遅延が問題になる場合もある。したがって、スイッチ群SWGの段数が非常に多い場合には不要なスイッチSWは省略してもよい。
【００４６】
図６には、図２の転送回路TGを図４に示されるより少ない７段のスイッチ群SWGで実現する転送回路TGの第２の具体例が示される。図４では、一つの転送パターンに一つのスイッチ群SWGが対応していた。しかし、図３の転送パターンP0, P1, P2には、メモリコアMRのI/O線のうちMIO0, MIO1と、論理回路LCのI/O線LIO0,LIO1を接続する共通点がある。また、、転送パターン P1とP3には、メモリコアMRのI/O線のうちMIO2, MIO3と、論理回路LCのI/O線LIO0, LIO1を接続する共通点がある。これに着目してスイッチ群SWG０を削除してスイッチ群SWG1とSWG2を変更したのが図６の実施例である。
【００４７】
図７の下部には、各転送パターン(P0〜P7)を実現するための制御信号TGCi、TGRW、LIOEiの設定法が示される。ここで"1"は高電位、"0"は低電位を示す。なお、転送パターンP1は前述の理由から書き込み動作しかできないので制御信号TGRWは"1"にしか設定できない。転送パターンP0, P1を実現するための制御信号TGCiの設定が図４の実施例と異なる。
【００４８】
転送パターンP0を実現するためには、制御信号TGC1とTGC2の二つの制御信号を高電位にすればよい。制御信号TGC1により、I/O線のうちMIO2とLIO2、MIO3とLIO3が接続され、制御信号TGC2により、I/O線のうちMIO0とLIO0, LIO1とMIO1が接続される。
【００４９】
転送パターンP1を実現するためには、TGC2とTGC3の二つの制御信号を高電位にすればよい。制御信号TGC2により、I/O線のうちMIO0とLIO0, LIO1とMIO1が接続され、制御信号TGC3により、I/O線のうちMIO2とLIO0、MIO3とLIO1が接続される。本実施例では、このようにスイッチ群SWGの段数を削減できる。ここで２つのスイッチ群SWGを活性化して転送パターンP0とP1を実現しているが、データが通過するのはスイッチSW一段であるところが第２の特長である。この点は、従来のオメガネットワークなど複数の段数をデータが通過するものと異なる。以上のように本実施例によれば、高速性を損なうことなくより高速化を達成できる。
【００５０】
[転送回路の第３の具体例]
図８には、スイッチSWを並列に接続することにより、図６の実施例よりさらにスイッチ群SWGの段数を削減した例が示される。この例では、スイッチ群SWGを３段に削減できる。制御信号の設定方法は図７に示される実施例と同じである。図８に示される例では、各スイッチ群SWGにおいてI/O線LIOi'の両側にスイッチSWが配置される。スイッチSWの回路構成とレイアウトの例を図９に示される。各スイッチSWはnMOS(nチャネルMOS)とpMOS(pチャネルMOS)より構成し、nMOSのゲートには制御信号TGCi, TGCjをpMOSのゲートにはその逆相信号TGCiB, TGCjBを入力する。
【００５１】
図９右側には、スイッチ部のレイアウト例が示される。ここではnMOSのみを示した。M2, M1, CONT1, CONT2, Lはそれぞれ第２配線層、第１配線層, M1とLとのコンタクト、第１配線層と第２配線層とのコンタクト、拡散層である。本実施例では２つのスイッチSWを構成するMOS同士の拡散層LをI/O線LIOi'のところで共通化できるので狭いI/O線のピッチに収めることができる。なお、ここでは並列に接続するスイッチSWの数を２つとしたが、I/O線のピッチにが広い場合には３つ以上のスイッチSWを並列に接続してさらに段数の削減を行ってもよいのはもちろんである。
【００５２】
[メモリ読み出し書き込み回路制御信号による低消費電力化]
図４、図６、図８に示される実施例では、転送回路TGのバッファ回路TGBUFiをイネーブル信号でコントロールし無駄な消費電力が削減されるとともに論理回路LCのゲート電位がフローティング状態になることが防止される。
【００５３】
図１０には、さらにメモりコアMRの読み出し書き込み回路RWCを転送パターンに応じて制御することにより、使用しないメモりコアMRのI/O線MIOiを駆動することによる読み出し時の無駄な消費電力を削減し、さらに書き込み時に使用しないI/O線MIOiからメモリコアMRに誤ったデータが書き込まれるのを防止する例が示される。
【００５４】
図２の転送パターンのうちP2からP7は、メモリコアMRのI/O線MIOiの一部しか使用しない。したがって、メモリコアMRの書き込み読み出し回路RWCを制御する信号を設けて使用しないメモリコアMRのI/O線MIOiを受け持つ書き込み読み出し回路RWCiを停止する。図１０においてRWC0, RWC1, RWC2, RWC3は、読出し書込み回路RWCを構成する回路で、各々メモりコアMRのI/O線MIO0, MIO1, MIO2, MIO3用の書き込み読み出し回路RWCiである。また、MIOE0, MIOE1, MIOE2, MIOE3は、各々書き込み読み出し回路RWC0, RWC1, RWC3を制御するイネーブル信号である。
【００５５】
各々の転送パターンにおいて書き込み読み出し回路RWCiを制御するイネーブル信号MIOE0, MIOE1, MIOE2, MIOE3と論理回路LCのバッファ回路TGBUFiのイネーブル信号LIOEiの設定法が図１１に示される。ここで、イネーブル信号の"1"は高電位を示し、活性状態で"0"は低電位で停止状態を示す。なお、イネーブル信号MIOE0, MIOE1, MIOE2, MIOE3をメモリコアMRに隣接した論理回路LCから発生する場合、図１１に示されるように転送回路TGを貫通して配線するとレイアウトを高密度にできる。
【００５６】
本実施例によれば、転送パターンに応じてメモりコアMRの読み出し書き込み回路RWCを制御することにより、使用しないI/O線MIOiを駆動することによる読み出し時の無駄な消費電力を削減し、さらに書き込み時に使用しないI/O線MIOiからメモリコアMRに誤ったデータが書き込まれるのを防止することができる。
【００５７】
[メモリ読み出し書き込み回路とバッファ制御信号の共用化]
図１０に示される実施例では、書き込み読み出し回路RWCを制御するイネーブル信号MIOEiと論理回路LCのバッファ回路TGBUFiのイネーブル信号LIOEiを独立にした。このようにすると図１１に示されるように転送パターンに合わせてそれぞれ異なる設定をする必要がある。I/O線の数と転送パターンの数が増えるとイネーブル信号MIOEiとLIOEiとを独立に設定するのは繁雑である。
【００５８】
図１２から図１５には、論理回路LCのバッファ回路TGBUFiのイネーブル信号LIOEi用の転送回路CTGを設けて書き込み読み出し回路RWCのイネーブル信号MIOEiをイネーブル信号LIOEiより自動的に発生するようにした例のが示される。図１２には、図２に示されるデータの転送パターンが示され。図１３には、図１２のデータ転送パターンに対応するバッファ回路TGBUFiの制御信号LIOEiの転送パターンでが示される。
【００５９】
この転送パターンに従ってバッファ回路TGBUFiの制御信号LIOEiをメモリコアMR側に転送してやれば、その信号をそのままメモリコアMRの書き込み読み出し回路RWCのイネーブル信号MIOEiとして使用することができる。
【００６０】
ここでデータが使用しないI/O線を受け持つ制御信号もメモリコアMRの書き込み読み出し回路RWCを停止するために転送する必要があることに注意しなければならない。すなわち、転送パターンP1からP7のようにデータは一部のI/O線しか使用しない場合にも、図８の中段に示されるように制御信号は全て転送される。
【００６１】
バッファ回路TGBUFiの制御信号LIOEiの転送回路CTGの具体的な構成が図１４
に示される。データの転送回路TGと同じようにスイッチ群SWGEiからなる。この転送回路CTGによれば図１５に示されるように転送パターンに応じて制御信号ECiを設定することにより図８の中段に示される転送パターンが実現できる。
【００６２】
ここで、図１３に示される転送パターンを見るとP0, P2, P5の形が同じであることがわかる。そこで制御信号EC0, EC2, EC5に関するスイッチ群SWGEは一つにまとめて制御信号EC0, EC2, EC5のOR論理をとって入力される。これによりスイッチ群SWGEの段数を削減して高集積化を図ることができる。動作原理はこれまで説明してきたデータの転送回路TGと同じなので省略する。
【００６３】
本実施例によれば、データの転送回路TGに加えてバッファ回路TGBUFiの制御信号LIOEiの転送回路CTGを設けることにより、書き込み読み出し回路RWCのイネーブル信号MIOEiとバッファ回路TGBUFiのイネーブル信号LIOEiをそれぞれ独立に設定する必要がない。このため、I/O線の数や転送パターンの数が増えてもイネーブル信号の設定が繁雑になるのを避けることができる。
【００６４】
[データの転送単位を細かく設定可能なイネーブル信号]
これまでの実施例では、データの転送時にまとまって転送されるI/O線（図２では２の（n-2)乗）に対して書き込み読み出し回路RWCのイネーブル信号MIOEiとバッファのイネーブル信号LIOEiを設けていた。しかし、イネーブル信号の設定を細かくすることによりさらに多彩な転送パターンを実現することができる。
【００６５】
図１６及び図１７には、データの転送単位より細かく設定可能なイネーブル信号の例が示される。この実施例では、図３の転送パターンについてまとまって転送されるI/O線の単位を4Byteとし、イネーブル信号は、1Byte単位で設定した。すなわち、図１６に示されるように4ByteずつのメモリコアMRのI/O線MIOiと論理回路LCのI/O線LIOiとの間に図３に示される８種類の転送パターンが実現でき、イネーブル信号は4ByteのI/O線群に対して４本別々に設けてある。例えば、I/O線LIO0についてはLIOE-0, LIOE-1, LIOE-2, LIOE-3の４本のイネーブル信号がある。
【００６６】
図１７には、図１６の例で可能となる転送パターンの例とそのためのイネーブル信号の設定法が示される。イネーブル信号MIOEi-jはイネーブル信号LIOEi-jを転送することにより作ってもよいし、イネーブル信号LIOEi-jとは独立に設定してもよい。図１７の(A)は、転送回路TGで決まる基本転送パターンをP0とした状態でイネーブル信号を全て"1"とした場合である。これは、これまでのパターンと同じである。しかし、図１７の(B)のように基本転送パターンをP0としてイネーブル信号を2Byteづつ"0"と"1"とすると別の転送パターンを作ることができる。また、図１７の(C)は基本転送パターンP3で、図１７の(D)は、P3においてイネーブル信号の設定を変えたものである。
【００６７】
ここでは、二つの基本転送パターンについてそれぞれ一例のみ示したが、これ以外にもイネーブル信号を変えることによって基本転送パターンとは異なるさまざまな転送パターンができる。画像用途などでバイトごとにデータの属性が異なるような場合には、特定のバイトだけを転送する必要が有り得るが、そのような場合、本実施例が有用である。
【００６８】
図１８には、本発明を３次元コンピュータグラフィックス（以下3D-CGと記す）の描画処理を行うLSIに応用した例が示される。転送回路TGの基本転送パターンは、図３に示されるもので、図１８の上段に示されるようにメモリコアMRのデータをI/O線に割り付けられる。ここで、RGB-A, RGB-Bは、画素AおよびBの色、Z-A, Z-Bは画素AおよびBの奥行き座標で各々16bit長である。
【００６９】
3D-CGでは、Z比較という特別な処理がよく行われる。これは、よく知られているように新しくメモリへ画素の書き込みを行う場合、同じ位置の画素とZ値を比較して小さければ書き込み、大きければ書き込まないという処理である。このような処理を画素Aについて行う場合、図１８に示されるように、まず、転送パターンをP5として、メモリコアMRに既に記憶されているZ値Z-Aoldを読み出す。続いて、論理回路LCで新しい画素のZ値Zinと比較してZinが小さければ、新しい画素のRGBとZ値の書き込みを行う。ここで、転送パターンをP2に切り替えれば、RGBとZ値を並列に書き込むことができる。画素Bの場合には転送パターンP7とP3を用いればよい。なお、RGB値が3ByteでZ値が2Byteなどとビット数が異なる場合には、転送回路TGの基本転送パターンを3Byte単位として、Z値を扱う場合には図１６に示されるようなバイトごとのイネーブル信号を設けてマスクをかければよい。
【００７０】
3D-CGでは、さらにアルファブレンド処理という透明感を表わす処理がある。これを行うには図１８の中段に示されるようにすればよい。アルファブブレンド処理は、よく知られているように新しくメモリへ画素の書き込みを行う場合、同じ位置の画素を読み出して、新しい画素と係数αで重み付けして加算し、書き込むいう処理である。このような処理を画素Aについて行う場合、図１８に示されるように、まず、転送パターンをP4として、メモリコアMRに既に記憶されているRGB-Aoldを読み出す。続いて、論理回路LCで新しい画素のRGBinと係数αで重み付けして加算し、書き込みを行う。転送パターンは、P4のままでよい。画素Bの場合には転送パターンP6を用いればよい。この場合もしも論理回路に重み付け加算を行う演算回路が一つしかない場合には、バイトごとのイネーブル信号を設けることによってRとGとBの1Byteづつアルフアブレンド処理を行うことができる。
【００７１】
さらに画面クリアという処理も高速にできる。この処理では、メモリコアMR内のデータの初期化を行う。通常RGBについては、最小値か最大値、Z値については、奥行の最大になる最大値の書き込みを行う。図１８に示される実施例では、２つの画素分のI/O線があるので、転送パターンP1を利用すれば、２画素同時に書き込みが行えるため、高速にクリア処理ができる。さらに、図１８には示していないが、転送パターンP0とイネーブル信号を使えば２画素のRGBを同時に読み出すこともできるので高速の画面表示も行うことができる。以上述べたように、本発明の転送回路TGを用いれば高速の3D-CG描画処理を行うことができる。
【００７２】
[I/O線の割り付けをバイト毎にする例]
これまでは、説明を簡単にするため、転送の単位毎にI/O線MIOiやLIOiを割り振って図示してきた。実際のレイアウトでこのようにすると、特に転送の単位が大きい場合には多くのI/O線を横切ってデータが伝わるため、配線遅延や雑音の誘起など悪い影響が出る場合がある。
【００７３】
図１９には、I/O線の割り付けをバイト毎に変更した例が示される。図１９の例には、転送の単位が4Byteのときに1Byteづつ入れ子にする方法が示される。このようにするとデータの移動が少なくてすむ。たとえば、転送パターンP3では、図３に示されるようにすると8Byte分のI/O線を横切る必要があるが、図１９の例では2Byteで済む。ここではバイト毎に入れ替えたが、ビット毎に入れ替えてもよい。その場合にはさらに移動が少なくて済む。もちろん、本実施例のようにする場合は論理回路LCの受け口もそれに合わせて設計する必要があるが、配線遅延や雑音の誘起など悪い影響を避け、さらに配線の増加による面積の増大も低減することができる。
【００７４】
【発明の効果】
本発明によれば、メモリと論理回路を集積した半導体を短期間に設計でき、さらに面積の小さい転送回路によりリアルタイムで転送パターンを変えながら複数のI/O線をもつメモリコアと論理回路の間で高速なデータ転送を実現できる。
【図面の簡単な説明】
【図１】本発明に係る多重I/Oメモリコア内蔵システムLSIの概念。
【図２】本発明に係る多重I/Oメモリコア内蔵LSIの例。
【図３】図２の転送回路の転送パターン。
【図４】図３の転送パターンを実現する転送回路の第１の具体例。
【図５】転送回路のバッファ回路TGBUFiの具体例。
【図６】図３の転送パターンを実現する転送回路の第２の具体例。
【図７】図６の転送回路の制御信号の設定法。
【図８】図３の転送パターンを実現する転送回路の第３の具体例。
【図９】図８の転送回路の並列スイッチ部の回路構成とレイアウト例。
【図１０】メモリ読み出し書き込み制御信号による低消費電力化を行う例。
【図１１】図１０の転送回路の制御信号の設定法。
【図１２】データの転送パターン。
【図１３】バッファ制御信号の転送パターン。
【図１４】制御信号転送回路の例。
【図１５】図１４の制御信号転送回路の制御信号設定法。
【図１６】データの転送単位より細かく設定可能なイネーブル信号の例。
【図１７】図１６の例で可能となる転送パターンの例。
【図１８】３次元コンピュータグラフィックスへの応用例。
【図１９】I/O線のアドレスをバイト毎に変更した例。
【符号の説明】
MR メモリコア
MC メモリセル
DL データ線
WL ワード線
PER 周辺回路
RWC 読み出し書き込み回路
LC 論理回路
TG 転送回路
SWG スイッチ群
TGBUFi バッファ群
MIOi、TGCi、LIOi 制御信号
DB コア回路、論理ライブラリのデータベース用記憶装置
LL 論理ライブラリ
WS 設計用ワークステーション
LSI-A、LSI-B 半導体チップ。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a semiconductor device in which memories are integrated, and in particular, in a semiconductor device in which a memory having a plurality of I / O lines and a logic circuit are integrated on the same semiconductor chip, an LSI chip for various purposes can be used for a short time. It is intended to provide a method for designing between devices and a product group based on the method, and a method for a highly integrated data transfer circuit capable of changing a data transfer pattern between the memory and the logic circuit at a high speed.
[0002]
[Prior art]
U.S. Pat. No. 5,371,896 is an attempt to integrate parallel computing systems interconnecting a large number of arithmetic units, processors and memories on the same semiconductor chip. In this conventional example, a plurality of memories and a plurality of arithmetic circuits are integrated on the same semiconductor chip, and the two are connected by a network including a crossbar switch. This conventional example is characterized in that it can be switched between a single instruction multi data stream (SIMD) operation and a multi instruction multi data stream (MIMD) operation as required. During the SIMD operation, one of the plurality of memories is used as an instruction memory, and the remaining memories are used as data memories. The instructions from the instruction memory are commonly supplied to the arithmetic circuits. At the time of the MIMD operation, a part of the memory used as the data memory at the time of the SIMD operation is used as an instruction memory, so that instructions from different instruction memories are given to individual arithmetic circuits. The data transfer path between each memory and the arithmetic circuit can be variously switched by a crossbar network.
[0003]
[Problems to be solved by the invention]
In addition to the above, various semiconductor devices with integrated memory have been devised. Recently, in particular, a relatively high-integration memory such as a DRAM (Dynamic Random Access Memory) and a logic circuit are integrated on the same semiconductor chip. Is attracting attention. Such an LSI generally requires a reduction in time to completion of a chip (Time to Customers) since a semiconductor maker starts manufacturing it in response to a user request. However, on the other hand, the required memory capacity and the type of arithmetic circuit differ depending on the application. In order to meet this conflicting demand, it is necessary to reform from the design method. However, the specifications of conventional high-integration memories, especially DRAMs, are standardized, and therefore, the above-described design cannot be met with the same design method.
[0004]
Further, when a highly integrated memory such as a DRAM and a logic circuit are integrated on the same semiconductor chip, it is difficult to obtain a great merit over an individual chip simply by integrating them. Considering cost and required performance, large-capacity memory and large-scale logic circuits and arithmetic circuits are integrated on a semiconductor chip of about 1 cm square, and the number of coupling lines between them is made several hundred or more, The transfer speed needs to be 1 GigaByte / sec or more. Therefore, a high-speed and high-integration circuit is required as a coupling circuit for coupling a memory and a logic circuit. However, when a crossbar switch is used as in the conventional example, if the number of coupling lines increases, the number of switches becomes enormous, the scale of hardware increases, and the delay also increases. When switching the data transfer path between a plurality of independent memories and a plurality of arithmetic circuits as in the above-described prior art, the method used in a conventional parallel computer is generally used because the number of memories and arithmetic circuits is small. It is also possible to realize it on the same chip as it is. However, when switching the correspondence between the I / O line group of several hundred or more memories and the I / O line group of the logic circuit and the arithmetic circuit, the demands for the degree of integration and operation speed are strict, and the conventional method is used. It is difficult to use it as it is.
[0005]
[Means for Solving the Problems]
In order to solve the first problem described above, the present invention provides a memory core having many I / O lines and a module for a coupling circuit designed according to the pitch of the I / O lines of the memory cores having different capacities. A layout pattern is created in advance and stored in a database. Further, a logic library for synthesizing a logic circuit is also created and stored in a database. The database stores data required for design, such as layout patterns, specifications, and characteristics.
[0006]
The module for the coupling circuit includes a switch group and a buffer group, and can be combined to constitute a coupling circuit. The switch group can change the order of input data therein. By connecting a plurality of switch groups and activating the switch group corresponding to the desired transfer pattern in accordance with the transfer pattern, the transfer pattern can be switched at high speed. These modules are made to match the pitch of the I / O lines of the memory core, and can be directly connected to the I / O lines of the memory core without changing the layout pattern.
[0007]
As described above, according to the present invention, the layout patterns of the memory core, the coupling circuit module, and the logic library are registered in the database in advance, and the wiring pitch between the memory core and the coupling circuit module is made uniform. , Can be used as they are. Therefore, the design after the specification of the LSI chip is given by the user can be completed in a short time. In other words, a memory core of the required capacity and a module for creating a transfer circuit that meets the specifications are taken out from the database and combined, and the logic part is further converted from a logic library using a CAD tool for logic synthesis to a desired logic circuit. What is necessary is just to synthesize. The wiring between them can be performed at high speed by the placement and routing CAD tool. Therefore, a chip in which a memory and a logic circuit are integrated can be formed in a short time.
[0008]
Further, in the above-described coupling circuit, high-speed data transfer can be realized because only data of the activated switch group passes through data transferred between the memory and the logic circuit. Furthermore, since the number of stages is increased or decreased in accordance with the number of transfer patterns, there is no useless occupation area when the number of transfer patterns is small.
[0009]
BEST MODE FOR CARRYING OUT THE INVENTION
[Design method of system LSI using memory core]
FIG. 1 shows the concept of a system LSI incorporating a memory core according to the present invention. An LSI design method according to the present invention will be described with reference to FIG.
[0010]
On the left side of FIG. 1 is a database storage device DB in which layout patterns and characteristics of a core circuit and a logic library are registered. Here, a plurality of memory cores MR with many I / O lines and different capacities, a group of modules for a transfer circuit (coupling circuit) TG designed according to the pitch of the I / O lines of the memory core, and a logic Data necessary for design, such as layout patterns, specifications, and characteristics with a logic library LL consisting of basic gates for synthesizing a circuit is stored in advance.
[0011]
Here, the module for the transfer circuit TG includes a switch group SWG and a buffer group TGBUFi.
And the transfer circuit TG can be synthesized in combination. As will be described in detail later, by connecting a plurality of switch groups, transfer circuits TG having various transfer patterns can be synthesized. Since these modules are made in accordance with the pitch of the I / O lines of the memory core MR, they can be directly connected to the I / O lines of the memory core MR without changing the layout pattern.
[0012]
Given the specification of the LSI chip, the design is performed while transferring necessary data from the database DB to the design workstation WS. Since the wiring pitches of the memory core MR and the transfer circuit TG module are uniform, they can be used as they are. That is, a module for forming a memory core MR having a required capacity and a transfer circuit TG meeting specifications may be taken out from the database DB and combined. For the logic part, a desired logic circuit LC can be easily synthesized from the logic library LL by using a CAD tool for logic synthesis. Finally, the chip layout data is completed by arranging them according to the floor plan of the chip and performing wiring between them using a layout and wiring CAD tool. In this way, it is possible to design a system LSI product group including the memory core MR in a short time.
[0013]
Here, an example in which the logic is synthesized using the logic library LL has been described. However, in some cases, the logic may be synthesized using a part of a chip as a gate array. In this case, there is an advantage that a chip having a common memory core MR and different logic can be easily manufactured.
[0014]
At the lower right of FIG. 1, two examples of the chip designed as described above are shown. The semiconductor chip LSI-A has four blocks A, B, C, and D in which a memory core MR and a logic circuit LC are connected by a transfer circuit TG, and a control circuit CC for controlling the entire chip is arranged at the center of the block. is there. In the semiconductor chip LSI-B, two blocks A and B in which a memory core MR and a logic circuit LC are connected by a transfer circuit TG are arranged, and a control circuit CC for controlling the entire chip is arranged at the center.
[0015]
In the present invention, of course, a chip using one memory core MR can be realized, but a chip in which a plurality of blocks are integrated as in this example can be easily designed. In that case, the memory core MR and the logic circuit LC of each block may be different or may have the same configuration. The former is suitable for performing different processes in parallel on the same chip, and the latter is suitable for performing the same processes in parallel. In particular, the latter is suitable for processing that can perform parallel operations, such as graphics, natural image processing, and neural networks.
[0016]
In both of the semiconductor chips LSI-A and LSI-B, since the logic circuit LC for transmitting and receiving data to and from the memory core MR is close to the memory core MR, the effect of wiring delay is small and high-speed data transfer can be realized. Further, since the distance from the control circuit CC to each block is equal in the semiconductor chip LSI-B and the difference is small in the semiconductor chip LSI-A, there is an advantage that the skew of the control signal can be reduced.
[0017]
In the semiconductor chip LSI-B, the logic circuit LC is close to the control circuit CC, but if it is necessary to shorten the wiring of the control signal of the memory core MR to reduce the wiring delay, the block is connected to the control circuit CC. And the memory core MR may be brought close to the control circuit CC.
In the semiconductor chip LSI-A, there may be a case where the distance from the control circuit CC is different between the blocks A and B and between the blocks A and B. In such a case, two blocks may be arranged on the left and right sides of the control circuit CC by arranging them like the semiconductor chip LSI-B.
[0018]
If the shape of the block is long horizontally, the difference between the short side and the long side of the chip may become too large. In such a case, the input terminals of the control signal are concentrated on one surface of the block while the arrangement of the semiconductor chip LSI-A shown in FIG. By doing so, the input terminal of the control signal can be placed on the surface where the blocks are adjacent to each other. Thereby, the skew of the control signal can be reduced. Hereinafter, the transfer circuit TG shown in FIG. 1 will be described in detail.
[0019]
[LSI with multiple I / O memory cores]
FIG. 2 shows an example of a multi-I / O built-in memory LSI according to the present invention. The semiconductor chip SIC shown in FIG. 2 includes a memory core MR having a plurality of I / O lines MIOi, a logic circuit LC having a plurality of I / O lines LIOi, and data transfer between the memory core MR and the logic circuit LC. The transfer circuit TG for controlling a transfer pattern and the like are integrated on a single semiconductor substrate made of single-crystal silicon or the like.
[0020]
The contents of the logic circuit LC may be synthesized according to the purpose using the logic library LL. Here, an example suitable for an image or graphics is shown. Operation unit group ARW that performs operations on the pixels stored in the memory core MR, display buffer DBR for reading the contents of the memory core MR at a constant speed to display them on the screen, and control them and the memory core MR And a control circuit LCC.
[0021]
The memory core MR includes a plurality of data lines DL, a plurality of word lines WL, and their exchanges.
It has a memory cell MC formed at a point. The memory cell MC is a 1-transistor 1-capacitor DRAM cell, a 4- or 6-transistor SRAM (Static Random Access Memory).
Memory) cell, a one-transistor nonvolatile flash memory cell, or the like can be used. In the following, a so-called RAM type which can perform writing and reading is assumed, but the present invention is also effective when a reading-only so-called ROM type is used. Writing and reading of data to and from the memory core MR is controlled by a read / write circuit RWC, and data can be read and written in parallel from a plurality of I / O lines to a plurality of memory cells MC selected by a peripheral circuit PER. The peripheral circuit PER is connected to buses such as a memory core control signal MRC, a control signal CTL, and an address signal DATA from the logic circuit LC. The memory core MR inputs and outputs a control signal, an address signal, and an I / O signal in synchronization with a clock signal that is a reference signal of the logic circuit LC.
[0022]
The logic circuit LC performs an operation on data read from the memory core MR through the transfer circuit TG and data from outside the semiconductor chip SIC. The result is again written to the memory core MR through the transfer circuit TG or output outside the semiconductor chip SIC.
[0023]
The transfer circuit TG is composed of a switch group SWG connected in multiple stages, and a connection relationship between a plurality of I / O lines MIOi of the memory core MR and a plurality of I / O lines LIOi of the logic circuit LC by a control signal TGCi. (Hereinafter, referred to as a transfer pattern).
[0024]
FIG. 3 shows a case where eight patterns from P0 to P7 are realized as examples of transfer patterns. In this example, MIO0,1,2,3 and LIO0,1 in units of 1/4 (2 to the power of (n-2)) are used for 2 / n power I / O lines MIOi and LIOi. , 2, and 3 are switched. It is needless to say that the transfer units need not be 2 n, and the present invention can be applied even if all the transfer units are not equal. The direction of the arrow indicates the flow of data, and the transfer pattern P1 is used only for writing to the memory, and the remaining patterns (P0, P2 to P7) are used for both reading and writing.
[0025]
The transfer pattern P0 is a pattern for transferring data without replacement. The transfer pattern P1 is for transmitting data input to (LIO0,1) to (MIO0,1), (MIO2,3) and writing the data to the memory. In this example, unlike other patterns, I / O lines of different memories are turned on. For this reason, different data may collide at the time of reading, so it is used only at the time of writing. This pattern is effective for initializing the contents of the memory at a high speed as described later.
[0026]
The transfer patterns P2 and P3 form transfer paths between (LIO0,1) and (MIO0,1) and between (LIO0,1) and (MIO2,3), respectively. The transfer patterns P4 to P7 form transfer paths between (LIO1) and (MIO0), (LIO1) and (MIO1), (LIO1) and (MIO2), and (LIO1) and (MIO3), respectively. .
[0027]
The eight transfer patterns (P0 to P7) can be freely switched by the control signal TGCi. Each transfer pattern can be realized by turning on one switch group SWG in the transfer circuit TG. For example, the transfer pattern P0 can be realized by turning on the switch group SWG # 0 shown in FIG. The specific configuration of the transfer circuit TG will be described later.
[0028]
In this embodiment, since the memory core MR, the transfer circuit TG, and the logic circuit LC are formed on the same semiconductor chip, tens to hundreds of I / O lines can be easily wired.
[0029]
Hereinafter, the operation of the LSI with a multiplexed I / O memory core shown in FIG. 2 will be described.
[0030]
First, the read operation will be described. When one word line WL is selected by the peripheral circuit PER in the memory core MR, data is read from the group of memory cells MC on the word line WL to the data line DL, and a plurality of I / Os are read through the read / write circuit RWC. Data is read in parallel to the line MIOi. When one of the switches SWG in the transfer circuit TG is activated by the control signal TGCi, the transfer pattern between the plurality of I / O lines MIOi of the memory core MR and the plurality of I / O lines LIOi of the logic circuit LC Is determined, data is transferred from the I / O line MIOi to the I / O line LIOi, and input to the logic circuit LC.
[0031]
The write operation is similar except that the data flow is reversed. That is, data output from the logic circuit LC to the plurality of I / O lines LIOi is transferred from the I / O line LIOi to the I / O line MIOi in accordance with the transfer pattern determined by the control signal TGCi, and is read through the read / write circuit. The data is transmitted to the line DL and further written in parallel to the memory cells MC on the selected word line WL.
[0032]
When reading or writing is performed continuously or alternately, the operation can be performed by switching the selected word line WL or transfer pattern for each cycle. Therefore, reading and writing can be performed in parallel on the memory cells MC corresponding to different addresses in each cycle according to the request of the logic circuit LC.
[0033]
According to the present embodiment, transmission and reception of data between the memory core MR and the logic circuit LC is performed through the one-stage switch group SWG, so that extremely high-speed data transfer can be realized. Further, since the memory core MR and the logic circuit LC are arranged so that the I / O lines MIOi and LIOi run in the same direction, the transfer circuit TG can be arranged between the memory core MR and the logic circuit LC. Since the number of stages of the switch group SWG of the transfer circuit TG is determined according to the transfer pattern, when the number of transfer patterns is small, the size of the transfer circuit in the data line direction (horizontal direction in FIG. 2) can be reduced. Therefore, when the transfer circuit TG and the logic circuit LC are laid out so as to fit within the dimension of the memory core MR in the word line WL direction (vertical direction in FIG. 2) as shown in FIG. Can be reduced.
[0034]
Note that the peripheral circuit PER may include only the X decoder for selecting the word line WL as described above, or may include a Y decoder for selecting a part of the data line and connecting to the I / O line. According to this embodiment, since a large number of I / O lines can be provided, usually, the Y decoder may be as simple as selecting, for example, 128 out of 1024 data lines.
[0035]
[First Specific Example of Transfer Circuit]
Next, a specific circuit example of the transfer circuit TG will be described with reference to FIG. FIG. 4 shows a circuit example of the transfer circuit TG for realizing the transfer pattern shown in FIG.
[0036]
In FIG. 4, MIO0, MIO1, MIO2, and MIO3 are I / O lines of the memory core MR, and LIO0, LIO1, LIO2, and LIO3 are I / O lines of the logic circuit LC. SWG0, SWG1,..., SWG7 are switch groups, and TGBUF0, TGBUF1, TGBUF2, TGBUF3 are buffer circuits.
[0037]
TGC0, TGC1, .., TGC7 are signals for turning on / off the switch groups SWG0, SWG1, .., SWG7, respectively. Although it depends on what kind of transistor constitutes the switch SW of the switch group SWG, here, when the control signal TGCi is set to a high potential, the switch SW to which the control signal TGCi is applied is turned on, and when the control signal TGCi is set to a low potential, it is turned off. Then For example, when the control signal TGC3 is set to a high potential and the other control signals are set to a low potential, two switches SW indicated by arrows in the switch group SWG3 are turned on. As a result, a transfer pattern of P3 is formed, and a transfer path is formed between the I / O lines MIO2 and MIO3 of the memory core and the I / O lines LIO0 and LIO1 of the logic circuit LC. Similarly, other transfer patterns can be realized by setting one of the control signals TGCi to a high potential.
[0038]
Buffer circuits TGBUF0, TGBUF1, TGBUF2, TGBUF3 are affected by the additional capacitance of the I / O line.
This is added to avoid delay of the signal. An example of the configuration of this circuit is shown in FIG. The operation of the buffer circuit TGBUFi will be described with reference to FIG.
[0039]
The buffer circuit TGBUFi is a bidirectional buffer that switches the data flow according to the read / write operation of the memory core MR, and also latches the potential of the I / O line LIOi of the logic circuit LC not used when the transfer pattern is formed. It is a circuit that has the function of
[0040]
In the example shown in FIG. 3, except for the transfer pattern P0, none of the I / O lines LIOi of the logic circuit LC is used. If the potential of the unused I / O line is not determined and the state becomes a so-called floating state, there is a possibility that the potential of the unused I / O line becomes an intermediate potential due to electric charge leakage. In that case, if the logic circuit LC is input to the gate of a CMOS (Cmplement Metal Oxide Semiconductor) transistor, an excessive current flows constantly. To avoid this, unused ones of the I / O lines LIOi of the logic circuit LC latch the potential.
[0041]
When the enable signal LIOEi of the buffer circuit TGBUFi of the logic circuit LC is set to a low potential, as is apparent from the logic shown in FIG. Turns off. Further, the MOS transistor Q1 whose signal LIOPRi is input to the gate turns on, and the I / O signal LIOi is latched at a low level. For the I / O signal LIOi to be used, the enable signal LIOEi is set to a high potential. Switching of the data direction is performed as follows.
[0042]
When the memory core MR performs a read operation, the signal TGRW is set to a low potential. Then, when the enable signal LIOEi is at a high potential, only the read clocked inverter RINV is activated, and data is transferred from the I / O line LIOi ′ to the I / O line LIOi. On the other hand, when the memory core MR performs a write operation, the signal TGRW is set to a high potential. Then, when the enable signal LIOEi is at a high potential, only the write clocked inverter WINV is activated, data is transferred from the I / O line LIOi to the I / O line LIOi ', and the I / O line of the memory core MR is switched through the switch SW. Data is transferred to MIOi.
[0043]
As described above, when the embodiment shown in FIGS. 4 and 5 is used, a high-speed operation can be realized because the number of stages of the switch SW through which data to be transferred passes is one. Further, since the number of stages of the switches SW is equal to the number of transfer patterns, useless layout areas are not required, and high integration is possible. Furthermore, among the I / O lines LIOi of the logic circuit LC, the buffer circuit TGBUFi of the unused I / O line is stopped, and furthermore, the potential is prevented from being in a floating state. Excessive current can be prevented from flowing through the gate. Therefore, a transfer pattern that does not use a part of the I / O line can be set freely.
[0044]
In FIG. 4, unnecessary switches among the switches SW in the switch group SWG to which the control signal TGCi is not input are also provided. This is for the following reason. As can be seen from FIG. 4, the switch group SWG of the transfer circuit TG has a common shape irrespective of the transfer pattern except for the wiring required for the connection of the switch SW to the control signal TGCi, the connection to the I / O line MIOi, and the contacts. ing. Therefore, if common parts except for the connection of the switch to the TGCi and the wiring and the contact required for the connection to the MIOi are prepared as a layout library, the layout design of the chip becomes easy. Further, even in the case of correcting the transfer pattern, if all the switches SW in the switch group SWG are formed, it is not necessary to correct the mask of the transistor section constituting the switch SW, so that the number of masks to be corrected can be reduced. In a memory and logic mixed chip as in the present invention, it is necessary to change the memory capacity and the logic configuration depending on the application. Therefore, if several types of basic patterns of the above-mentioned switch SWG group for the memory core MR and the transfer circuit TG are prepared as a library, necessary ones are selected from them, and the logical part is further converted to the logical basic library LL. By combining and arranging and placing the wirings, the mask of the LSI chip can be quickly designed.
[0045]
[Second Specific Example of Transfer Circuit]
Of course, when the number of switches SW connected to the I / O line increases, a delay due to an increase in additional capacitance may become a problem. Therefore, when the number of stages of the switch group SWG is very large, the unnecessary switch SW may be omitted.
[0046]
FIG. 6 shows a second specific example of the transfer circuit TG in which the transfer circuit TG of FIG. 2 is implemented by a smaller number of seven-stage switch groups SWG than that shown in FIG. In FIG. 4, one switch group SWG corresponds to one transfer pattern. However, the transfer patterns P0, P1 and P2 in FIG. 3 have a common point for connecting the MIO0 and MIO1 of the I / O lines of the memory core MR and the I / O lines LIO0 and LIO1 of the logic circuit LC. Further, the transfer patterns P1 and P3 have a common point that connects MIO2 and MIO3 among the I / O lines of the memory core MR and the I / O lines LIO0 and LIO1 of the logic circuit LC. Focusing on this, the embodiment of FIG. 6 changes the switch groups SWG1 and SWG2 by deleting the switch group SWG0.
[0047]
The lower part of FIG. 7 shows how to set the control signals TGCi, TGRW, and LIOEi for realizing each transfer pattern (P0 to P7). Here, "1" indicates a high potential and "0" indicates a low potential. Since the transfer pattern P1 can only perform a write operation for the above-described reason, the control signal TGRW can be set only to "1". The setting of the control signal TGCi for realizing the transfer patterns P0 and P1 is different from the embodiment of FIG.
[0048]
In order to realize the transfer pattern P0, two control signals TGC1 and TGC2 may be set to a high potential. The control signal TGC1 connects MIO2 and LIO2 of the I / O lines, and the MIO3 and LIO3, and the control signal TGC2 connects MIO0 and LIO0 and LIO1 and MIO1 of the I / O lines.
[0049]
In order to realize the transfer pattern P1, the two control signals TGC2 and TGC3 may be set to a high potential. The control signal TGC2 connects MIO0 and LIO0 and LIO1 and MIO1 of the I / O lines, and the control signal TGC3 connects MIO2 and LIO0 and MIO3 and LIO1 of the I / O lines. In this embodiment, the number of stages of the switch group SWG can be reduced in this way. Here, the transfer patterns P0 and P1 are realized by activating the two switch groups SWG. The second feature is that data passes through only one switch SW. This point is different from the case where data passes through a plurality of stages, such as a conventional omega network. As described above, according to the present embodiment, higher speed can be achieved without impairing high speed.
[0050]
[Third Specific Example of Transfer Circuit]
FIG. 8 shows an example in which the number of stages of the switch group SWG is further reduced as compared with the embodiment of FIG. 6 by connecting the switches SW in parallel. In this example, the switch group SWG can be reduced to three stages. The control signal setting method is the same as that of the embodiment shown in FIG. In the example shown in FIG. 8, switches SW are arranged on both sides of the I / O line LIOi 'in each switch group SWG. FIG. 9 shows an example of the circuit configuration and layout of the switch SW. Each switch SW is composed of an nMOS (n-channel MOS) and a pMOS (p-channel MOS), and the control signals TGCi and TGCj are input to the gate of the nMOS, and the inverted signals TGCiB and TGCjB are input to the gate of the pMOS.
[0051]
The right side of FIG. 9 shows a layout example of the switch section. Here, only the nMOS is shown. M2, M1, CONT1, CONT2, L denote a second wiring layer, a first wiring layer, a contact between M1 and L, a contact between the first wiring layer and the second wiring layer, and a diffusion layer, respectively. In the present embodiment, the diffusion layer L between the MOSs constituting the two switches SW can be shared at the I / O line LIOi ', so that the pitch of the I / O line can be reduced. Here, the number of switches SW connected in parallel is two, but if the pitch of the I / O line is wide, three or more switches SW may be connected in parallel to further reduce the number of stages. Of course it is good.
[0052]
[Low power consumption by memory read / write circuit control signal]
In the embodiments shown in FIGS. 4, 6, and 8, the buffer circuit TGBUFi of the transfer circuit TG is controlled by an enable signal to reduce wasteful power consumption and to cause the gate potential of the logic circuit LC to be in a floating state. Is prevented.
[0053]
FIG. 10 further shows that the read / write circuit RWC of the memory core MR is controlled according to the transfer pattern, thereby driving the I / O line MIOi of the unused memory core MR to waste power consumption at the time of reading. An example is shown in which erroneous data is prevented from being written to the memory core MR from I / O lines MIOi that are not used at the time of writing.
[0054]
P2 to P7 of the transfer patterns in FIG. 2 use only a part of the I / O lines MIOi of the memory core MR. Therefore, a signal for controlling the write / read circuit RWC of the memory core MR is provided, and the write / read circuit RWCi for the I / O line MIOi of the unused memory core MR is stopped. In FIG. 10, RWC0, RWC1, RWC2, and RWC3 are circuits constituting the read / write circuit RWC, and are write / read circuits RWCi for the I / O lines MIO0, MIO1, MIO2, and MIO3 of the memory core MR, respectively. MIOE0, MIOE1, MIOE2, MIOE3 are enable signals for controlling the write / read circuits RWC0, RWC1, RWC3, respectively.
[0055]
FIG. 11 shows a setting method of the enable signals MIOE0, MIOE1, MIOE2, MIOE3 for controlling the write / read circuit RWCi and the enable signal LIOEi for the buffer circuit TGBUFi of the logic circuit LC in each transfer pattern. Here, “1” of the enable signal indicates a high potential, and “0” indicates a stopped state at a low potential in an active state. When the enable signals MIOE0, MIOE1, MIOE2, and MIOE3 are generated from the logic circuit LC adjacent to the memory core MR, the layout can be made higher by wiring through the transfer circuit TG as shown in FIG.
[0056]
According to the present embodiment, by controlling the read / write circuit RWC of the memory core MR according to the transfer pattern, unnecessary power consumption at the time of reading by driving unused I / O lines MIOi is reduced, Further, it is possible to prevent erroneous data from being written to the memory core MR from the I / O lines MIOi not used at the time of writing.
[0057]
[Common use of memory read / write circuit and buffer control signal]
In the embodiment shown in FIG. 10, the enable signal MIOEi for controlling the write / read circuit RWC and the enable signal LIOEi for the buffer circuit TGBUFi of the logic circuit LC are independent. In this case, it is necessary to make different settings according to the transfer pattern as shown in FIG. When the number of I / O lines and the number of transfer patterns increase, it is complicated to set the enable signals MIOEi and LIOEi independently.
[0058]
12 to 15 show an example in which a transfer circuit CTG for an enable signal LIOEi of a buffer circuit TGBUFi of a logic circuit LC is provided and an enable signal MIOEi of a write / read circuit RWC is automatically generated from the enable signal LIOEi. Is shown. FIG. 12 shows a transfer pattern of the data shown in FIG. FIG. 13 shows a transfer pattern of the control signal LIOEi of the buffer circuit TGBUFi corresponding to the data transfer pattern of FIG.
[0059]
If the control signal LIOEi of the buffer circuit TGBUFi is transferred to the memory core MR according to this transfer pattern, the signal can be used as it is as the enable signal MIOEi of the write / read circuit RWC of the memory core MR.
[0060]
Here, it should be noted that a control signal for an I / O line not used by data must be transferred to stop the write / read circuit RWC of the memory core MR. That is, even when data uses only some I / O lines as in the transfer patterns P1 to P7, all control signals are transferred as shown in the middle part of FIG.
[0061]
FIG. 14 shows a specific configuration of the transfer circuit CTG of the control signal LIOEi of the buffer circuit TGBUFi.
Is shown in Like the data transfer circuit TG, the switch group SWGEi is included. According to this transfer circuit CTG, the transfer pattern shown in the middle part of FIG. 8 can be realized by setting the control signal ECi according to the transfer pattern as shown in FIG.
[0062]
Here, it can be seen from the transfer patterns shown in FIG. 13 that the shapes of P0, P2, and P5 are the same. Therefore, the switch group SWGE relating to the control signals EC0, EC2, EC5 is collectively input as an OR of the control signals EC0, EC2, EC5. This makes it possible to reduce the number of stages of the switch group SWGE and achieve high integration. The principle of operation is the same as that of the data transfer circuit TG described above, and a description thereof will be omitted.
[0063]
According to the present embodiment, by providing the transfer circuit CTG of the control signal LIOEi of the buffer circuit TGBUFi in addition to the data transfer circuit TG, the enable signal MIOEi of the write / read circuit RWC and the enable signal LIOEi of the buffer circuit TGBUFi are independent of each other. Need not be set to For this reason, even if the number of I / O lines and the number of transfer patterns increase, the setting of the enable signal can be prevented from becoming complicated.
[0064]
[Enable signal that allows fine setting of data transfer unit]
In the embodiments described above, the enable signal MIOEi of the write / read circuit RWC and the enable signal LIOEi of the buffer are applied to the I / O lines (2 (n−2) power in FIG. 2) which are collectively transferred at the time of data transfer. Was provided. However, by making the setting of the enable signal finer, more various transfer patterns can be realized.
[0065]
16 and 17 show examples of enable signals that can be set more finely than data transfer units. In this embodiment, the unit of the I / O lines transferred collectively for the transfer pattern of FIG. 3 is set to 4 bytes, and the enable signal is set in units of 1 byte. That is, as shown in FIG. 16, the eight types of transfer patterns shown in FIG. 3 can be realized between the I / O line MIOi of the memory core MR and the I / O line LIOi of the logic circuit LC in units of 4 bytes, and enable Four signals are separately provided for a 4-byte I / O line group. For example, for the I / O line LIO0, there are four enable signals LIOE-0, LIOE-1, LIOE-2, and LIOE-3.
[0066]
FIG. 17 shows an example of a transfer pattern made possible in the example of FIG. 16 and a method of setting an enable signal therefor. The enable signal MIOEi-j may be generated by transferring the enable signal LIOEi-j, or may be set independently of the enable signal LIOEi-j. FIG. 17A shows a case where all the enable signals are set to "1" while the basic transfer pattern determined by the transfer circuit TG is set to P0. This is the same as the previous pattern. However, as shown in FIG. 17B, if the basic transfer pattern is P0 and the enable signal is "0" and "1" every 2 bytes, another transfer pattern can be created. FIG. 17C shows the basic transfer pattern P3, and FIG. 17D shows the setting of the enable signal changed in P3.
[0067]
Here, only one example of each of the two basic transfer patterns is shown. However, by changing the enable signal, various transfer patterns different from the basic transfer pattern can be formed. When data attributes are different for each byte for image use or the like, it may be necessary to transfer only specific bytes. In such a case, the present embodiment is useful.
[0068]
FIG. 18 shows an example in which the present invention is applied to an LSI that performs rendering processing of three-dimensional computer graphics (hereinafter, referred to as 3D-CG). The basic transfer pattern of the transfer circuit TG is shown in FIG. 3, and the data of the memory core MR is allocated to the I / O lines as shown in the upper part of FIG. Here, RGB-A and RGB-B are the colors of pixels A and B, and ZA and ZB are the depth coordinates of pixels A and B, each 16 bits long.
[0069]
In 3D-CG, a special process called Z comparison is often performed. As is well known, when a new pixel is written to the memory, the pixel at the same position is compared with the Z value. When such processing is performed on the pixel A, as shown in FIG. 18, first, the transfer pattern is set to P5, and the Z value Z-Aold already stored in the memory core MR is read. Subsequently, if Zin is smaller than the Z value Zin of the new pixel by the logic circuit LC, the RGB and Z value of the new pixel are written. Here, if the transfer pattern is switched to P2, RGB and Z values can be written in parallel. In the case of the pixel B, the transfer patterns P7 and P3 may be used. When the RGB value is 3 Bytes and the Z value is 2 Bytes and the number of bits is different, the basic transfer pattern of the transfer circuit TG is set to 3 Byte units. What is necessary is just to provide an enable signal and mask it.
[0070]
In 3D-CG, there is an alpha blending process for expressing transparency. This can be done as shown in the middle part of FIG. As is well known, when writing a new pixel to the memory, the alpha blending process is a process of reading a pixel at the same position, adding the new pixel to the new pixel by weighting with a coefficient α, and writing the result. When such processing is performed on the pixel A, as shown in FIG. 18, first, the transfer pattern is set to P4, and RGB-Aold already stored in the memory core MR is read. Subsequently, the logic circuit LC weights and adds the RGBin of the new pixel and the coefficient α to perform writing. The transfer pattern may be P4. In the case of the pixel B, the transfer pattern P6 may be used. In this case, if there is only one arithmetic circuit for performing weighted addition in the logic circuit, by providing an enable signal for each byte, it is possible to perform the alpha blending process for each of R, G, and B bytes.
[0071]
Furthermore, the process of clearing the screen can be performed at high speed. In this process, data in the memory core MR is initialized. Normally, for RGB, the minimum value or maximum value is written, and for the Z value, the maximum value that maximizes the depth is written. In the embodiment shown in FIG. 18, since there are two pixels of I / O lines, if the transfer pattern P1 is used, two pixels can be written simultaneously, so that the clearing process can be performed at high speed. Further, although not shown in FIG. 18, if the transfer pattern P0 and the enable signal are used, RGB of two pixels can be simultaneously read, so that high-speed screen display can be performed. As described above, high-speed 3D-CG drawing processing can be performed by using the transfer circuit TG of the present invention.
[0072]
[Example of assigning I / O lines by byte]
Until now, for simplicity of explanation, I / O lines MIOi and LIOi have been allocated and illustrated for each transfer unit. If this is done in an actual layout, especially when the transfer unit is large, data is transmitted across many I / O lines, which may have adverse effects such as wiring delay and noise.
[0073]
FIG. 19 shows an example in which the allocation of I / O lines is changed for each byte. The example of FIG. 19 shows a method of nesting one byte at a time when the transfer unit is 4 bytes. In this way, data movement is reduced. For example, in the case of the transfer pattern P3, as shown in FIG. 3, it is necessary to cross an I / O line for 8 bytes, but in the example of FIG. Here, replacement is performed for each byte, but replacement may be performed for each bit. In that case, less movement is required. Of course, in the case of the present embodiment, the receptacle of the logic circuit LC also needs to be designed in accordance with it, but it avoids bad influences such as wiring delay and noise induction, and further reduces the increase in area due to an increase in wiring. be able to.
[0074]
【The invention's effect】
According to the present invention, a semiconductor in which a memory and a logic circuit are integrated can be designed in a short period of time. And realizes high-speed data transfer.
[Brief description of the drawings]
FIG. 1 is a concept of a system LSI with a built-in multiple I / O memory core according to the present invention.
FIG. 2 is an example of an LSI with a built-in multiple I / O memory core according to the present invention.
FIG. 3 is a transfer pattern of the transfer circuit of FIG. 2;
FIG. 4 is a first specific example of a transfer circuit that realizes the transfer pattern of FIG. 3;
FIG. 5 is a specific example of a buffer circuit TGBUFi of a transfer circuit.
FIG. 6 is a second specific example of a transfer circuit that realizes the transfer pattern of FIG. 3;
FIG. 7 is a method for setting a control signal of the transfer circuit of FIG. 6;
FIG. 8 is a third specific example of a transfer circuit that realizes the transfer pattern of FIG. 3;
9 is a circuit configuration and layout example of a parallel switch unit of the transfer circuit in FIG.
FIG. 10 illustrates an example in which power consumption is reduced by a memory read / write control signal.
11 is a setting method of a control signal of the transfer circuit of FIG.
FIG. 12 is a data transfer pattern.
FIG. 13 is a transfer pattern of a buffer control signal.
FIG. 14 illustrates an example of a control signal transfer circuit.
FIG. 15 is a control signal setting method of the control signal transfer circuit of FIG. 14;
FIG. 16 shows an example of an enable signal that can be set more finely than a data transfer unit.
FIG. 17 shows an example of a transfer pattern enabled by the example of FIG. 16;
FIG. 18 is an example of application to three-dimensional computer graphics.
FIG. 19 is an example in which the address of an I / O line is changed for each byte.
[Explanation of symbols]
MR memory core
MC memory cell
DL data line
WL word line
PER peripheral circuit
RWC read / write circuit
LC logic circuit
TG transfer circuit
SWG switch group
TGBUFi buffer group
MIOi, TGCi, LIOi control signal
DB core circuit, logical library database storage
LL logic library
WS design workstation
LSI-A, LSI-B Semiconductor chip.

Claims

A memory core having a plurality of first I / O lines;
A logic circuit having a plurality of second I / O lines;
A data transfer circuit that performs data transfer between the plurality of first I / O lines and the plurality of second I / O lines,
The data transfer circuit has the same pitch as the plurality of first I / O lines, and includes a plurality of third I / O lines for connecting to the plurality of first I / O lines, and data between the plurality of third I / O lines and the logic circuit. And a plurality of switches connected between the plurality of third I / O lines and the plurality of fourth I / O lines,
A data transfer pattern between the plurality of third I / O lines and the plurality of fourth I / O lines is realized by switching the on / off state of the plurality of switches with a switch control signal supplied to the data transfer circuit. A semiconductor integrated circuit device that can be changed with time.

In claim 1,
A semiconductor integrated circuit device, wherein a wiring of the switch control signal is arranged orthogonally to a wiring of the plurality of first I / O lines or a wiring of the plurality of second I / O lines.

In claim 1 or claim 2,
The plurality of switches are arranged at the same pitch as the plurality of first I / O lines in a direction parallel to the plurality of first I / O lines, and are arranged in a direction orthogonal to the plurality of first I / O lines. A semiconductor integrated circuit device characterized by being arranged side by side.

4. The semiconductor integrated circuit device according to claim 1, wherein:
The logic circuit is synthesized by combining basic logic gates,
A semiconductor integrated circuit device, wherein the plurality of switches are formed of MOS transistors and have a common shape.

The semiconductor integrated circuit device according to claim 1, wherein
At least one of the data transfer patterns is connected to a plurality of the second I / O lines for one of a plurality of fourth I / O lines.

In any one of claims 1 to 5,
A semiconductor integrated circuit device, wherein the memory core includes a DRAM type cell including one transistor and one capacitor.

Creating a memory core having a plurality of first I / O lines, a logical library, and a library of transfer circuit modules;
Synthesizing a desired logic circuit from the logic library;
A method for designing a semiconductor integrated circuit device, comprising: a step of setting a wiring between the logic circuit and the transfer circuit and a wiring between the plurality of first I / O lines and the transfer circuit module. ,
The transfer circuit module has the same pitch as the plurality of first I / O lines, and includes a plurality of second I / O lines for connecting to the plurality of first I / O lines, and A plurality of 3 I / O lines for transmitting and receiving data, and a plurality of switches for connecting the plurality of second I / O lines and the plurality of third I / O lines;
In the step of setting the wiring, by setting a wiring for supplying a switch control signal for switching an on / off state of the plurality of switches to the transfer circuit module, the plurality of second I / O lines and the plurality of third I A method for designing a semiconductor integrated circuit device for setting a data transfer pattern to / from the / O line.