How I Added Semantic Search to a Production App Using pgvector: What the Tutorials Skip

A real integration story from the database schema to the first query running in production

The Problem Nobody Documents Well

Our support ticket system had a search bar. It matched keywords. If a user typed “login issues,” they got results containing the exact phrase “login issues.” If our agents had labelled those tickets as “authentication failure” or “credential reset request,” the search returned nothing.

The user assumed no one had ever solved their problem. The agent answered the same question again. Both walked away slightly more frustrated than before.

We were not the only team dealing with this. Exact-match search is fast, predictable and wrong roughly half the time in any system where humans describe the same thing in different words.

I had been watching the pgvector ecosystem for a while. When pgvector 0.7.0 shipped with production-ready HNSW indexing, I decided it was time to stop watching and start integrating.

This is what the actual integration looked like, the decisions that were not obvious from reading the docs and the mistakes I made before the thing worked correctly in production.

What pgvector Actually Does

pgvector is a PostgreSQL extension. It adds a vector data type and a set of operators for computing vector similarity. Instead of comparing strings character by character, you compare numerical representations of meaning.

Here is the core idea. You take a piece of text and convert it into a list of floating-point numbers, a vector, using an embedding model. Similar pieces of text produce vectors that are numerically close to each other. pgvector stores those vectors in your existing Postgres table and gives you SQL operators to search by proximity rather than by exact match.

As of version 0.8.x, pgvector supports two index types:

IVFFlat: Partitions vector space into clusters. Fast to build, slightly less accurate at retrieval. Good for tables under a few million rows where you update frequently.
HNSW (Hierarchical Navigable Small World): A graph-based structure. Slower to build and more memory-hungry, but significantly more accurate for approximate nearest neighbor queries. This is what you want in production.

Version 0.8.0 introduced iterative index scans, which prevents a specific failure mode where filtered queries with strict WHERE clauses would return fewer results than requested because the index scan stopped before checking enough candidates. This matters as soon as you combine semantic search with tenant filtering or status filters, which is almost every real use case.

The Schema Change

Our tickets table looked like this before:

CREATE TABLE tickets (
    id          BIGSERIAL PRIMARY KEY,
    tenant_id   UUID NOT NULL,
    subject     TEXT NOT NULL,
    body        TEXT,
    status      VARCHAR(50),
    created_at  TIMESTAMPTZ DEFAULT NOW()
);

Adding vector support is one migration:

-- Enable the extension (runs once per database)
CREATE EXTENSION IF NOT EXISTS vector;

-- Add the embedding column
ALTER TABLE tickets
    ADD COLUMN subject_embedding vector(1536);

The 1536 dimension count matches the output size of OpenAI's text-embedding-3-small model. If you use a different embedding model, change this number to match. Mismatched dimensions cause an error on insert, so catch this in your migration review.

After the column exists, add the HNSW index:

CREATE INDEX tickets_subject_embedding_hnsw
    ON tickets
    USING hnsw (subject_embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 64);

m = 16 controls the number of connections per node in the graph. ef_construction = 64 controls index build quality. Higher values produce a more accurate index and take longer to build. For a table with under 500,000 rows, m = 16 and ef_construction = 64 are a reasonable starting point.

Generating and Storing Embeddings

We call an embedding API when a ticket is created or its subject changes. Here is the Laravel code:

<?php

namespace App\Services;
use Illuminate\Support\Facades\Http;

class EmbeddingService
{
    public function embed(string $text): array
    {
        $response = Http::withToken(config('services.openai.key'))
            ->post('https://api.openai.com/v1/embeddings', [
                'model' => 'text-embedding-3-small',
                'input' => $text,
            ]);
        if ($response->failed()) {
            throw new \RuntimeException('Embedding API request failed: ' . $response->status());
        }
        return $response->json('data.0.embedding');
    }
}

In the Ticket model observer:

<?php

namespace App\Observers;
use App\Models\Ticket;
use App\Services\EmbeddingService;

class TicketObserver
{
    public function __construct(private EmbeddingService $embedder) {}
    public function created(Ticket $ticket): void
    {
        $this->updateEmbedding($ticket);
    }
    public function updated(Ticket $ticket): void
    {
        if ($ticket->wasChanged('subject')) {
            $this->updateEmbedding($ticket);
        }
    }
    private function updateEmbedding(Ticket $ticket): void
    {
        $embedding = $this->embedder->embed($ticket->subject);
        // Cast the array to the PostgreSQL vector literal format
        $vector = '[' . implode(',', $embedding) . ']';
        $ticket->updateQuietly(['subject_embedding' => $vector]);
    }
}

updateQuietly prevents the observer from firing again on its own update.

Querying by Similarity

The search query uses a cosine distance operator (<=>) provided by pgvector:

<?php

namespace App\Http\Controllers;
use App\Models\Ticket;
use App\Services\EmbeddingService;
use Illuminate\Http\Request;

class TicketSearchController extends Controller
{
    public function __construct(private EmbeddingService $embedder) {}
    public function search(Request $request)
    {
        $query     = $request->input('q');
        $tenantId  = $request->user()->tenant_id;
        $embedding = $this->embedder->embed($query);
        $vector    = '[' . implode(',', $embedding) . ']';
        $tickets = Ticket::query()
            ->where('tenant_id', $tenantId)
            ->whereNotNull('subject_embedding')
            ->orderByRaw(
                'subject_embedding <=> ?::vector',
                [$vector]
            )
            ->limit(20)
            ->get();
        return response()->json($tickets);
    }
}

The <=> operator returns the cosine distance between two vectors. Lower distance means higher similarity. Ordering ascending by distance gives you the most similar results first.

The Production Mistakes

Mistake 1: Not backfilling before going live.

We turned on the observer and deployed. Existing tickets had no embeddings. The first users to search got results only from newly created tickets. It looked broken.

Backfill before you deploy the observer. Run this as a queued job in batches:

Ticket::whereNull('subject_embedding')
    ->chunkById(100, function ($tickets) use ($embedder) {
        foreach ($tickets as $ticket) {
            $embedding = $embedder->embed($ticket->subject);
            $vector    = '[' . implode(',', $embedding) . ']';
            $ticket->updateQuietly(['subject_embedding' => $vector]);
        }
    });

Mistake 2: Building the HNSW index before backfilling.

Building the index on a mostly-empty column and then inserting hundreds of thousands of rows rebuilds graph nodes incrementally, which is slower than building the index once after the data is in place. Drop the index, backfill, then create the index.

Mistake 3: Forgetting about the embedding API rate limit.

The backfill job hit the OpenAI rate limit on the second batch. Add a sleep between batches or use a dedicated queue with a rate limiter:

RateLimiter::attempt(
    'embedding-api',
    100, // 100 requests
    function () use ($ticket, $embedder) {
        // embed and save
    },
    60   // per 60 seconds
);

Mistake 4: Not setting `ef_search` for filtered queries.

When you combine semantic search with a WHERE clause (like filtering by tenant_id and status), the HNSW scan may not explore enough of the graph before the filter eliminates candidates. Setting the session variable increases scan depth:

SET hnsw.ef_search = 100;

You can set this per query using DB::statement('SET hnsw.ef_search = 100') before running the search. The default is 40. We found 100 gave better recall without a noticeable latency impact for our dataset size.

Choosing the Right Embedding Model

The embedding model you pick determines your vector dimensions, your cost per query and how well semantic similarity actually matches domain-specific language.

text-embedding-3-small (1,536 dimensions) is a reasonable default. It is faster and cheaper than text-embedding-3-large (3,072 dimensions) and performs well on general language tasks. For most support ticket or document search scenarios, the smaller model is the right starting point.

If your content is highly domain-specific (medical records, legal documents, code) you may get better results from a model trained on that domain. Sentence Transformers’ all-MiniLM-L6-v2 (384 dimensions) is widely used for general-purpose tasks and can be self-hosted with no API cost. The trade-off is that smaller dimension counts can miss nuance in longer documents.

A practical test: embed the same ten queries with two candidate models, run them against a labeled dataset of relevant documents and compare the top-5 recall. That test takes an afternoon and will tell you more than any benchmark chart.

One thing to plan for: once you commit to a model and dimension size, your column schema is locked to that number. Changing embedding models later means re-embedding every row and rebuilding the index. Do the comparison before you ship, not after you have 500,000 rows.

Is It Worth the Complexity?

For a simple tag-based search on a well-structured dataset, probably not. Exact match is faster and cheaper.

For anything where users describe the same concept in different words, semantic search removes a real friction point. Our first user test showed a 40% reduction in “no results found” events after the rollout. That was enough to keep it.

The ongoing cost is the embedding API call on each ticket creation and each search query. At 1,536 dimensions, text-embedding-3-small charges $0.02 per million tokens. For most B2B SaaS applications with a few hundred tickets per day, this is negligible.

If you need to run fully on-premises, you can self-host an embedding model (like nomic-embed-text or any sentence-transformers model) and skip the API call entirely. The pgvector integration is identical. Only the embedding generation changes.

What I Would Do Differently

Store the model name alongside the embedding column. When you upgrade your embedding model, old and new vectors are not comparable. A column like subject_embedding_model VARCHAR(100) makes it easy to identify which rows need regeneration.

Add a background validity check. If the embedding API is down during ticket creation, the row gets saved with a null embedding and silently falls out of semantic search results. A daily job that finds null embeddings and queues them for retry catches this before it becomes a support complaint.

Use a hybrid search approach once you have enough traffic data. pgvector handles the semantic side. Postgres full-text search handles the keyword side. Combining both scores with a weighted sum gives better results than either alone, especially for short queries where exact matches still matter.

Semantic search is no longer a research project or a feature you need a dedicated vector database for. If you already run PostgreSQL, pgvector gives you most of what you need with no new infrastructure. The real work is in the integration details, and those are what the tutorials tend to skip.

How I Added Semantic Search to a Production App Using pgvector: What the Tutorials Skip

A real integration story from the database schema to the first query running in production

What pgvector Actually Does

The Schema Change

Generating and Storing Embeddings

Querying by Similarity

The Production Mistakes

Mistake 1: Not backfilling before going live.

Mistake 2: Building the HNSW index before backfilling.

Mistake 3: Forgetting about the embedding API rate limit.

Mistake 4: Not setting `ef_search` for filtered queries.

Choosing the Right Embedding Model

Is It Worth the Complexity?

What I Would Do Differently

You might also like

Your Migration Is One ALTER TABLE Away From an Outage

The orWhere Trap: How SQL Operator Precedence Breaks Multi-Tenant Data Isolation in Laravel

Your Eloquent Relationships Are Beautiful. They Are Also Destroying Your Performance.

A real integration story from the database schema to the first query running in production

What pgvector Actually Does

The Schema Change

Generating and Storing Embeddings

Querying by Similarity

The Production Mistakes

Mistake 1: Not backfilling before going live.

Mistake 2: Building the HNSW index before backfilling.

Mistake 3: Forgetting about the embedding API rate limit.

Mistake 4: Not setting ef_search for filtered queries.

Choosing the Right Embedding Model

Is It Worth the Complexity?

What I Would Do Differently

You might also like

Your Migration Is One ALTER TABLE Away From an Outage

The orWhere Trap: How SQL Operator Precedence Breaks Multi-Tenant Data Isolation in Laravel

Your Eloquent Relationships Are Beautiful. They Are Also Destroying Your Performance.

Mistake 4: Not setting `ef_search` for filtered queries.