Full-Text Search with Astro DB

6 Min Read

000000

Astro DB is a SQL database service created specifically for Astro ( v4.5 or later). One of its goals is to provide an API for directly interacting with a SQL database inside an Astro project. This opens up serveral use-cases which include full-text search (FTS).

Internally, Astro DB partnered with Turso as their DB hosting provider. The underlying SQL language used by Turso is a fork of SQLite called: libSQL . For the context of this guide, this basically means that we can leverage existing SQLite features including FTS5 . This feature allows us to perform full-text search queries and is enabled by default through Turso’s Extensions .

Lets start by creating a Virtual Table using the FTS5 virtual table module. We can place this logic within db/seed.ts so it is run automatically when Astro is started locally:

  1. Prepare the data we want to INSERT into the Virtual Table. The following example will be working with a collection of blog posts.

    db/seed.ts
       import { getCollection } from 'astro:content';
      import { sql, db } from 'astro:db';
      import { fromMarkdown } from 'mdast-util-from-markdown';
      import { toString } from 'mdast-util-to-string';
     
      export default async function () {
        const posts = (await getCollection('posts')).map((post) => {
          const content = toString(fromMarkdown(post.body), { includeHtml: false }).replaceAll('\n', ' ');
          return sql`(${post.slug}, ${post.data.title}, ${content})`;
        });
      } 
  2. Create the Virtual Table using the FTS5 virtual table module and insert the data we prepared in the previous step.

    We DROP the table initially to ensure a clean slate and prevent duplicate entries during local development. Columns slug and title are unindexed as we only care to preform full-text searches against content .

    db/seed.ts
       // . . .
      export default async function () {
        // . . .
        await db.batch([
          db.run(sql`drop table if exists PostSearch`),
          db.run(sql`create virtual table PostSearch using fts5(slug unindexed, title unindexed, content)`),
          db.run(sql.join([sql`insert into PostSearch(slug, title, content) values `, sql.join(posts, sql`,`)])),
        ]);
      } 

When working with Astro DB, table schemas are defined within db/config.ts and will automatically generate TypeScript types for each table. Unfortunately, this does not apply to Virtual Tables. For that, we must manually define schemas using Zod . We can set this up within db/config.ts to keep our schema definitions centralized:

db/config.ts
 import { z } from 'astro/zod';
 
export type PostSearchTable = z.infer<typeof PostSearchTableSchema>;
export type PostSearchQuery = z.infer<typeof PostSeachQuerySchema>;
export const PostSearchTableSchema = z.object({ slug: z.string(), title: z.string(), content: z.string() });
export const PostSeachQuerySchema = z.object({ rows: z.array(PostSearchTableSchema) }).transform((v) => v.rows);
// . . . 

Now that we have our FTS5 Virtual Table setup with proper type-safety, we can create a page that will allow us to perform full-text searches.

This example will be using SSR to handle form submissions and to re-render the page with search results. This can also be achieved using Hybrid Mode in combination with Page Partials . The attached repository below explores the latter approach.

We can create a simple search form with submission validation via Zod. Within the Zod schema, we wrap the raw query with double quotes to allow special characters. Without this, SQLite will throw an error when trying to perform a full-text search. Also, note the * character at the end of the schema which allows for partial or fuzzy matching.

pages/index.astro
 ---
import { z } from 'astro/zod';
import { db, sql } from 'astro:db';
 
const query = Astro.url.searchParams.get('query');
const querySchema = z.string().trim().min(1).transform((v) => `"${v.replaceAll('"', '""')}"*`);
---
 
<form>
  <input name="query" type="text" spellcheck="false" autocomplete="off" value={query} />
  <button type="submit">Search</button>
</form> 

With the search form in place, we can now perform a full-text search against the PostSearch FTS5 Virtual Table. We check the validity of the query string and if it passes, we can proceed with the full-text search using the MATCH operator:

pages/index.astro
 ---
// . . .
import { PostSeachQuerySchema, type PostSearchQuery } from 'db/config.ts';
 
const results: PostSearchQuery = [];
 
if (query) {
  const safeQuery = querySchema.safeParse(query);
 
  if (safeParse.success) {
    const payload = await PostSeachQuerySchema.safeParseAsync(
      await db.run(
        sql`
          select slug, title, highlight(PostSearch, 2, '<mark>', '</mark>') as content
          from PostSearch
          where content match ${safeQuery.data}
          order by rank
        `,
      ),
    );
 
    if (payload.success) results.push(...payload.data);
  }
}
---
 
<!-- . . .  -->
<ul>{results.map((post) => (<li><a href={`/posts/${post.slug}`} aria-label={post.title}>{post.content}</a></li>))}</ul> 

To this point, we have setup full-text search in our Astro project for local development. We need to account for pushing this FTS5 Virtual Table to production and populating it with data:

  1. Similar to the non-production seed file, we need to prepare the data we want to INSERT or UPDATE into the Virtual Table. We can place this logic within db/seed.prod.ts :

    db/seed.prod.ts
       import { sql, db } from 'astro:db';
      import { fromMarkdown } from 'mdast-util-from-markdown';
      import { toString } from 'mdast-util-to-string';
      import { readFile, readdir } from 'node:fs/promises';
     
      export default async function () {
        const posts = await Promise.all(
          (await readdir('src/content/posts')).map(async (file) => {
            const post = await readFile(`src/content/posts/${file}`, 'utf-8');
            const slug = file.replace(/\.mdx$/, '');
            const title = post.match(/^title:\s*(['"])?(.*?)\1?$/m)?.[2]?.replace(/^['"]|['"]$/g, '').trim();
            const content = toString(fromMarkdown(post.replace(/---[\s\S]*?---/, '')), { includeHtml: false }).replaceAll('\n', ' ');
     
            return sql`(${slug}, ${title}, ${content})`;
          }),
        );
      } 
  2. Compose a batch of SQL queries to do the following:

    • Create the Virtual Table if it does not exist.
    • UPDATE existing rows if their content has changed.
    • INSERT new rows if they do not exist.
    db/seed.prod.ts
       // . . .
      export default async function () {
        // . . .
        await db.batch([
          db.run(sql`create virtual table if not exists PostSearch using fts5(slug unindexed, title unindexed, content)`),
          db.run(sql`
            with target(slug, title, content) as (values ${sql.join(posts, sql`, `)})
            update PostSearch
            set content = target.content
            from target
            where target.slug = PostSearch.slug
            and target.content != PostSearch.content;
          `),
          db.run(sql`
            with target(slug, title, content) as (values ${sql.join(posts, sql`, `)})
            insert into PostSearch (slug, content)
            select slug, title, content from target
            where not exists (select 1 from PostSearch where slug = target.slug);
          `),
        ]);
      } 
  3. Finally, we need a way of executing the db/seed.prod.ts file. This can be done manually by running: npx astro db execute db/seed.prod.ts --remote . However, it is recommended to automate this process by baking it into a CI/CD pipeline.

    Here is an example using the _studio.yml action that Astro automatically generates when connecting a project to Astro DB:

    .github/workflows/_studio.yml
     # . . .
    - if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/main' }}
      run: npx astro db execute db/seed.prod.ts --remote 

By no means is this the only way to implement full-text search within an Astro project. This guide is meant to provide a starting point for those looking to leverage Astro DB and SQLite’s FTS5 virtual table module. Here are a few additional libraries to consider which accomplish the same goal:

  • Pagefind : Fully static search library that aims to perform well on large sites.
  • Fuse.js : Powerful, lightweight fuzzy-search library, with zero dependencies.
  • Orama : Full-text search engine entirely written in TypeScript, with zero dependencies.
  • DocSearch : Algolia Search for developer docs.

full-text-search-with-astro-db

Leverage Astro DB and SQLite's FTS5 to perform full-text search queries within an Astro project.

Astro
0
0

Introduction