Let's see some vector terms before understanding AI Vector Search feature. What is Vector ? A vector is a set of numbers that represents the attributes of an object in the most optimized and organized way. An object can be a word, a sentence, a document, an image, an audio, or a video. Data is first converted into numbers and then it is stored in the form of vectors to represent the object. v = (v1, v2) What is Vectorization ? Vectorization is the process of converting data (Text/Images/Audios/Videos) in the form of vectors. What is Vector database ? A vector database is any database that stores and manages vector embeddings and handles the unstructured data such as documents, images, audios, or videos. Vector database stores and processes data as vectors which are mathematical representations of features of objects in multidimensional space. What is Vector Indexes ? Vector Indexes are used to efficiently store and search high-dimensional vector data. It organizes vector data in a manner such that similar items where similarity is defined by distance between two vectors are grouped together that makes the search process extremely efficient. This enables efficient similarity searches and faster query performance for AI-driven applications. What is Semantic search ? Semantic Search means "search with meaning". It has set of search engine capabilities which understand words from the searcher's intent and search context. The older search method "Lexical Search" focuses only on finding exact matches i.e. keyword-based search. Semantic search algorithm understands what the users actually mean, not just what they say. Semantic Search uses Natural Language Processing (NLP) and Machine Learning algorithms to improve the accuracy of search results. What is AI Vector Search ? Oracle AI Vector Search is one of the key feature of Oracle 23ai database that has a semantic search capabilities using Artificial Intelligence (AI) which allows users to search data based on the semantics or meaning of data. This includes a new vector data type, vector indexes, and vector functions that allow the database to store semantic document content, images, and other unstructured data as vectors. What is Vector Embedding Models ? Vector Embeddings are the representation of data points such as text, images, audios, videos in the form of vectors. It is often said that vector search is better than keyword based serach as vector search is based on the meaning and context behind the words, not the actual words. Using embedding models, you can transform unstructured data into vector embeddings that can then be used for semantic queries on business data. Depending on the type of your data, you can use different pretrained, open-source models to create vector embeddings. You can use different pretrained open-source models to create vector embeddings depending on the type of your data. You can also generate vector embeddings outside the Oracle Database using pretrained open-source embeddings models or you can use your own embeddings models. You also have the option to import those models directly into the Oracle Database if they are compatible with the Open Neural Network Exchange (ONNX) standard. Oracle Database implements an ONNX runtime directly within the database. This allows you to generate vector embeddings directly within the Oracle Database using SQL. In the above screen, you can see the representation of data points is in the vector format as below: The user's intent can be like : 1) The data points that are nearest to v(3,-3). 2) The data points that are neareset to v(5,3). 3) Fetch only two data points that are nearest to v(-4,-3). 4) Fetch only one data point that is nearest to v(-2,5). 5) Fetch rows that are closest to any vector by using different metrics. |
Key features of Oracle AI Vector Search: 1) VECTOR data type: Simplifies applications by supporting vectors with different dimension counts and formats. Starting with Oracle Database 23ai, there is a new built-in data type VECTOR. This data type represents a vector as a series of numbers stored in INT8, FLOAT32 or FLOAT64. You can define a column as vector data type without any value or vector with number of dimensions and its storing format. 2) Vector Indexes: Vector Indexes are used to efficiently store and search high-dimensional vector data. This enables efficient similarity searches and faster query performance for AI-driven applications. 3) Flexible Vector Generation: You can use different pretrained open-source models to create vector embeddings depending on the type of your data. You can also generate vector embeddings outside the Oracle Database using pretrained open-source embeddings models or you can use your own embeddings models. You also have the option to import those models directly into the Oracle Database if they are compatible with the Open Neural Network Exchange (ONNX) standard. 4) SQL Extensions for querying Vectors: This uses simple, intuitive extensions to SQL for similarity search on vectors within queries on relational, text, JSON, and other data types. 5) Simple Target Accuracy Specification: This allows to define default accuracy during index creation and overrides in search queries if needed. 6) Exadata optimizations: This accelerates vector index creation and search with Exadata System Software 24ai optimizations. Advantages of Oracle AI Vector Search: 1) Oracle database 23ai provides a single data platform for vectors and all your data types. By combining SQL, JSON documents, graphs, geospatial data, text, and vectors in a single database and you will be able to rapidly build new features in your applications. 2) Semantic search on unstructured data can be combined with relational search on business data in one single database. This allows users to run AI-powered vector similarity searches within their existing Oracle Databases instead of having to move business data to a separate vector database. This avoids data movement that reduces complexity, improves security, and enables searches on current data. 3) Easier for developers to build next-gen AI applications directly within Oracle Database. 4) It enables Oracle Database to handle a very wide range of AI use cases. 5) Easily combine similarity search with relational, text, JSON, spatial, and graph data types to enhance your apps in a single database. 6) Allows users to use thier favorite development tools, AI frameworks, and languages to build AI apps. |
-: Demonstration :- Note: Here, I have used vector_distance function with euclidean as distance_metric. Below is the supported list of distance_metrics in Oracle 23ai database. The default value is an empty array. - euclidean - cosine - manhattan - hamming - dot - euclidean_squared Login to the 23ai database as below: C:\Windows\System32>sqlplus / as sysdba SQL*Plus: Release 23.0.0.0.0 - Production on Wed Aug 28 12:39:48 2024 Version 23.4.0.24.05 Copyright (c) 1982, 2024, Oracle. All rights reserved. Connected to: Oracle Database 23ai Free Release 23.0.0.0.0 - Production Version 23.4.0.24.05 SQL> select name,open_mode from v$database; NAME OPEN_MODE --------- ------------ FREE READ WRITE Create a table with vector data type and insert few values as per the above (x,y) co-ordinator graph. SQL> create table vector_demo(id number(10),dp_name varchar2(20),dp_color varchar2(20),v vector); Table created. SQL> insert into vector_demo values (1,'TRIANGLE','GREEN','[3,-3]'); 1 row created. SQL> insert into vector_demo values (2,'TRIANGLE','GREEN','[-4,4]'); 1 row created. SQL> insert into vector_demo values (3,'TRIANGLE','GREEN','[-3,-5]'); 1 row created. SQL> insert into vector_demo values (4,'SQUARE','SKY BLUE','[2,5]'); 1 row created. SQL> insert into vector_demo values (5,'SQUARE','SKY BLUE','[5,3]'); 1 row created. SQL> insert into vector_demo values (6,'SQUARE','SKY BLUE','[-2,5]'); 1 row created. SQL> insert into vector_demo values (7,'CIRCLE','RED','[2,3]'); 1 row created. SQL> insert into vector_demo values (8,'CIRCLE','RED','[4,5]'); 1 row created. SQL> insert into vector_demo values (9,'CIRCLE','RED','[-4,-3]'); 1 row created. SQL> commit; Commit complete. SQL> set lines 300 pages 3000 SQL> col v for a25 SQL> col DP_NAME for a8 SQL> col DP_COLOR for a8 SQL> select * from vector_demo; ID DP_NAME DP_COLOR V -- --------- ---------- --------------------- 1 TRIANGLE GREEN [3.0E+000,-3.0E+000] 2 TRIANGLE GREEN [-4.0E+000,4.0E+000] 3 TRIANGLE GREEN [-3.0E+000,-5.0E+000] 4 SQUARE SKY BLUE [2.0E+000,5.0E+000] 5 SQUARE SKY BLUE [5.0E+000,3.0E+000] 6 SQUARE SKY BLUE [-2.0E+000,5.0E+000] 7 CIRCLE RED [2.0E+000,3.0E+000] 8 CIRCLE RED [4.0E+000,5.0E+000] 9 CIRCLE RED [-4.0E+000,-3.0E+000] 9 rows selected. |
Query the vector_demo table to find only three nearest data points to v(3,0). You can see in the below output, the neareset three data points to the vector(3,0) are GREEN-TRIANGLE, RED-CIRCLE and SKY BLUE-SQUARE. You can also see the pictorial representation of the output in below screen. |
Query the vector_demo table to find only three nearest data points to v(1,4). You can see in the below output, the neareset three data points to the vector(1,4) are SKY BLUE-SQUARE, RED-CIRCLE and SKY BLUE-SQUARE. You can also see the pictorial representation of the output as in below screen. |
Thanks for reading this post ! Please comment if you like this post ! Click FOLLOW to get future blog updates !
Very good Information, Thanks for sharing
ReplyDelete