Quickstart
We're going to create a blogging site then add vector search to it through django-vectordb
.
Project setup
Create a new Django project named tutorial
, then start a new app called blog
.
# Create a virtual environment to isolate our package dependencies locally
python3 -m venv env
source env/bin/activate # On Windows use `env\Scripts\activate`
# Install Django and Django VectorDB into the virtual environment
pip install django
pip install django-vectordb[standard] # include optional dependencies
# Set up a new project with a single application
django-admin startproject tutorial
cd tutorial
django-admin startapp blog
The project layout should look like:
tutorial
├── blog
│ ├── __init__.py
│ ├── admin.py
│ ├── apps.py
│ ├── migrations
│ │ └── __init__.py
│ ├── models.py
│ ├── tests.py
│ └── views.py
├── manage.py
└── tutorial
├── __init__.py
├── asgi.py
├── settings.py
├── urls.py
└── wsgi.py
Now sync your database for the first time:
We'll also create an initial user named admin
with a password. We'll authenticate as that user later in our example.
Now lets add blog
app and vectordb
to the INSTALLED_APPS
in tutorial/settings.py
# settings.py
INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
"blog", # add blog to here
"vectordb", # add vectordb to here. The order is not important though
]
Next, let's add a POST
model to our blog app by modifying blog/models.py
, which will enable us to save our posts. Additionally, we will include the get_vectordb_text
method to specify the text we want to search by in vectordb
. We will also implement the get_vectordb_metadata
method to incorporate specific fields we wish to filter by. If we don't create our custom get_vectordb_metadata
, vectordb
will serialize all the fields into metadata, which could be suboptimal as we might not be interested in some of the fields added. Therefore, it is recommended to implement our own get_vectordb_metadata
that contains only the fields we want to filter by. These methods will allow vectordb
to perform additional functions for us.
Our model has three fields, the title
, description
, created_at
, and a user
foreign key. In the metadata we include the user_id
, username
, created_at_year
, and created_at_month
. We will use these fields to filter our results later.
Now lets make migrations and make tables for the blog
app and the vectordb
app.
Run the development server now
Adding Sample Blogs
Lets add the Post model to the admins panel. Edit the blog/admin.py
# blog/admin.py
from django.contrib import admin
from .models import Post
admin.site.register(Post)
Now we can visit http://127.0.0.1:8000/admin/blog/post/ and add a few Posts. Go ahead and add something
Configuring Vector Search
First, let's synchronize all posts with the vector database. To do this, simply run the following command:
For this example we run:
Lastly, let's make sure that the Post model remains synchronized with the vector database, so that any changes, such as creating, deleting, or updating posts, are automatically registered by vectordb
. To do this, we'll create a signals.py
file in the blog directory and input the following code:
blog/signals.py | |
---|---|
The above code is equivalent to doing the following manually:
Then import the signals in your apps.py
blog/apps.py | |
---|---|
Now we can run the development server again
from blog.models import Post
from django.contrib.auth import get_user_model
User = get_user_model()
user = User.objects.first()
post = Post(title="A Culinary Journey", description="A journey through France", user=user)
post.save()
Now run the app
Now, when you visit the Django Vector Database Admin Panel at http://127.0.0.1:8000/admin/vectordb/vector/, you'll notice that the new post has been automatically added to the vector database. Similarly, if you try to delete a post, it will be automatically removed from the vector database as well. All these features come built-in with vectordb
, ensuring a seamless and efficient experience.
Fast Vector Search
To perform a search, simply invoke the vectordb.search()
method:
from vectordb import vectordb
results = vectordb.search("A Culinary Journey", k=10) # k represents the maximum number of results desired.
# get the search time in seconds
print(results.search_time) # only available if search is the last method called
Note that the search method returns a QuerySet with results ordered by the best match. The QuerySet will also have the search_time in seconds which only available when search is the last method called on the QuerySet. Each result item will contain the following fields: id
, content_object
, object_id
, content_type
, text
, embedding
, an annotated score
, and a vector
property that returns the np.ndarray
representation of the embedding field, which is in bytes
. As the search provides a QuerySet
, you can selectively display the fields you want like this:
from vectordb import vectordb
results = vectordb.search("A Culinary Journey", k=10).only('text', 'content_object')
If k
is not specified, the default value is 10.
Search doesn't only work for text
you can also search for model instances:
post1 = Post.objects.get(id=1)
# Limit the search scope to a user with an id of 1
results = vectordb.search(post1, k=10)
This is also a way to get related posts to post1
. Thus, you can use vectordb
for recommendations as well.
Note: Seaching by model instances will automatically scope the results to instances of that type. For example, if you search by post1
you will only get results that are instances of Post
.
Filtering
You can apply filters on text
or metadata
using the full capabilities of Django QuerySet filtering:
# Limit the search scope to a user with an id of 1
results = vectordb.filter(metadata__user_id=1).search("A Culinary Journey", k=10)
# Scope the results to text which contains France, belonging to user with id 1 and created in 2023
vectordb.filter(text__icontains="France", metadata__user_id=1, metadata__create_at_year=2023).search("A Culinary Journey", k=10)
We can also use model instances instead of text:
post1 = Post.objects.get(id=1)
# Limit the search scope to a user with an id of 1
results = vectordb.filter(metadata__user_id=1).search(post1, k=10)
# Scope the results to text which contains France, belonging to user with id 1 and created in 2023
vectordb.filter(text__icontains="France", metadata__user_id=1, metadata__create_at_year=2023).search(post1, k=10)
For more information on filtering, refer to the Django documentation on querying the JSONField
.
Manually Adding Items to the Vector Database
VectorDB also provides a way to manually add items to it. This is useful if you want to add items to the database that are not in the database yet.
VectorDB provides two utility methods for adding items to the database: vectordb.add_instance
or vectordb.add_text
. Note that for adding the instance, you need to provide the get_vectordb_text
and an optional get_vectordb_metadata
methods.
1. Adding Model Instances
post1 = models.create(title="post1", description="post1 description", user=user1) # provide valid user
# add to vector database
vectordb.add_instance(post1)
2. Adding Text to the Model
To add text to the database, you can use vectordb.add_text()
:
The text
and id
are required. Additionally, the id
must be unique, or an error will occur. metadata
can be None
or any valid JSON.
Settings
You can customize vectordb
by providing your settings in the settings.py
file of your project. The following settings are available:
# settings.py
DJANGO_VECTOR_DB = {
"DEFAULT_EMBEDDING_CLASS": ..., # Default: "vectordb.embedding_functions.SentenceTransformerEncoder",
"DEFAULT_EMBEDDING_MODEL": ..., # Default: "all-MiniLM-L6-v2",
# Can be "cosine" or "l2"
"DEFAULT_EMBEDDING_SPACE": ..., # Default "l2"
"DEFAULT_EMBEDDING_DIMENSION": ..., # Default is 384 for "all-MiniLM-L6-v2"
"DEFAULT_MAX_N_RESULTS": 10, # Number of results to return from search maximum is default is 10
"DEFAULT_MIN_SCORE": 0.0, # Minimum score to return from search default is 0.0
"DEFAULT_MAX_BRUTEFORCE_N": 10_000, # Maximum number of items to search using brute force default is 10_000. If the number of items is greater than this number, the search will be done using the HNSW index.
}
Summary
Great! That was a quickstart to django-vectordb
. We've created a blogging site and added extremely fast vector search to it. If you want to get a more in depth understanding of vectordb
head on over to the tutorial.