You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+35-2Lines changed: 35 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -57,8 +57,6 @@ The script `./vector-embeddings/02-create-vectors-table.sql` does exactly that.
57
57
58
58
## Find similar articles by calculating cosine distance
59
59
60
-
The third script `./vector-embeddings/03-find-similar-articles.sql` starts invoking OpenAI to get the vector embeddings of an arbitrary text.
61
-
62
60
Make sure to have an Azure OpenAI [embeddings model](https://learn.microsoft.com/azure/cognitive-services/openai/concepts/models#embeddings-models) deployed and make sure it is using the `text-embedding-ada-002` model.
63
61
64
62
Once the Azure OpenAI model is deployed, it can be called from Azure SQL database using [sp_invoke_external_rest_endpoint](https://learn.microsoft.com/sql/relational-databases/system-stored-procedures/sp-invoke-external-rest-endpoint-transact-sql), to get the embedding vector for the "the foundation series by isaac asimov", text, for example, using the following code (make sure to replace the `<your-api-name>` and `<api-key>` with yout Azure OpenAI deployment):
thanks to columnstore, even on small SKU, the performance can be pretty fast, well within the sub-second goal.
103
101
102
+
## Encapsulating logic to retrieve embeddings
103
+
104
+
The described process can be wrapped into stored procedures to make it easy to re-use it. The scripts in the `./vector-embeddings/` show how to create a stored procedure to retrieve the embeddings from OpenAI:
105
+
106
+
-`03-store-openai-credentials.sql`: stores the Azure OpenAI credentials in the Azure SQL database
107
+
-`04-create-get-embeddings-procedure.sql`: create a stored procedure to encapsulate the call to OpenAI using the script.
108
+
109
+
## Finding similar articles
110
+
111
+
The script `05-find-similar-articles.sql` uses the created stored procedure and the process explained above to find similar articles to the provided text.
112
+
113
+
## Encapsulating logic to do similarity saerch
114
+
115
+
To make it even easier to use, the script `06-sample-function.sql` shows a sample function that can be used to find similar articles by just providing the text, as demonstrated in script `07-sample-function-usage` with the following example:
116
+
117
+
```sql
118
+
declare @e nvarchar(max);
119
+
declare @text nvarchar(max) = N'the foundation series by isaac asimov';
select*fromdbo.SimilarContentArticles(@e) as r order by cosine_distance desc
124
+
```
125
+
126
+
## Alternative sample with Python and a local embedding model
127
+
128
+
If you don't want or can't use OpenAI to generate embeddings, you can use a local model like `https://huggingface.co/sentence-transformers/multi-qa-MiniLM-L6-cos-v1` to generate embeddings. The Python script `./python/hybrid_search.py` shows how to
129
+
130
+
- use Python to generate the embeddings
131
+
- do similarity search in Azure SQL database
132
+
- use [Fulltext search in Azure SQL database with BM25 ranking](https://learn.microsoft.com/en-us/sql/relational-databases/search/limit-search-results-with-rank?view=sql-server-ver16#ranking-of-freetexttable)
133
+
- do re-ranking applying Reciprocal Rank Fusion (RRF) to combine the BM25 ranking with the cosine similarity ranking
134
+
135
+
Make sure to setup the database for this sample using the `./python/00-setup-database.sql` script. Database can be either an Azure SQL DB or a SQL Server database.
136
+
104
137
## Conclusions
105
138
106
139
Azure SQL database, and by extension SQL Server, already has a great support for vector operations thanks to columnstore and its usage of [SIMD](https://en.wikipedia.org/wiki/Single_instruction,_multiple_data)[AVX-512 instructions](https://www.intel.com/content/www/us/en/architecture-and-technology/avx-512-overview.html).
0 commit comments