OceanBase Releases seekdb: An Open Source AI Native Hybrid Search Database for Multi-model RAG and AI Agents

Artificial intelligence applications seldom operate on a single, straightforward dataset. Instead, they integrate diverse data types such as user profiles, conversation histories, JSON metadata, vector embeddings, and occasionally spatial information. Traditionally, development teams have managed this complexity by combining an OLTP database, a vector storage system, and a search engine into a fragmented architecture. Addressing this challenge, seekdb emerges as an open-source, AI-centric database solution licensed under Apache 2.0. It is designed as a unified platform that consolidates relational tables, vector data, textual content, JSON structures, and geographic information system (GIS) data within a single engine, enabling hybrid search capabilities and in-database AI workflows.

Introducing seekdb: An AI-Optimized Embedded Database

seekdb is essentially a streamlined, embedded variant of the OceanBase database engine, tailored specifically for AI-driven applications rather than broad distributed systems. Operating as a single-node database, it supports embedded deployment as well as client-server modes, maintaining compatibility with MySQL drivers and SQL syntax to ensure ease of integration.

In terms of deployment capabilities, seekdb offers:

  • Support for embedded database use cases
  • Functionality as a standalone database
  • No support for distributed database configurations (reserved for full OceanBase)

From a data modeling standpoint, seekdb accommodates:

  • Relational data managed through standard SQL queries
  • Vector-based similarity search
  • Comprehensive full-text search
  • Flexible JSON data handling
  • Spatial GIS data processing

All these data types coexist within a unified storage and indexing framework, simplifying data management for AI applications.

Hybrid Search: The Heart of seekdb’s Innovation

The standout feature of seekdb is its hybrid search capability, which seamlessly merges vector-based semantic retrieval, keyword-driven full-text search, and scalar filtering into a single query with unified ranking. This integration eliminates the need for multiple search engines or complex orchestration layers.

seekdb implements hybrid search through the DBMS_HYBRID_SEARCH package, offering two primary interfaces:

  • DBMS_HYBRID_SEARCH.SEARCH: Returns search results as JSON, ranked by relevance
  • DBMS_HYBRID_SEARCH.GET_SQL: Provides the exact SQL query string executed

This hybrid search mechanism supports:

  • Pure vector similarity searches
  • Exclusive full-text keyword searches
  • Combined hybrid queries blending both approaches

Additionally, it pushes relational filters and join operations down to the storage layer for efficiency. Advanced reranking techniques such as weighted scoring, reciprocal rank fusion, and integration with large language model (LLM)-based rerankers are also supported.

For use cases like retrieval-augmented generation (RAG) and agent memory management, this means developers can craft a single SQL statement that performs semantic embedding matching, exact keyword filtering (e.g., product codes or named entities), and relational constraints (e.g., user or tenant scopes) simultaneously.

Robust Vector and Full-Text Search Engines

At its core, seekdb offers a sophisticated vector and full-text search infrastructure.

Vector search capabilities include:

  • Support for both dense and sparse vector formats
  • Multiple distance metrics such as Manhattan, Euclidean, inner product, and cosine similarity
  • In-memory indexing options like HNSW, HNSW SQ, and HNSW BQ
  • Disk-based index types including IVF and IVF PQ

seekdb’s hybrid vector index allows raw text to be ingested directly, with the system automatically invoking embedding models and maintaining vector indexes internally-eliminating the need for separate preprocessing pipelines.

Full-text search features encompass:

  • Support for keyword, phrase, and Boolean queries
  • BM25 algorithm for relevance ranking
  • Multiple tokenizer configurations to handle diverse text formats

Importantly, vector and full-text indexes are treated as first-class citizens within the same query planner that manages scalar and GIS indexes. This tight integration enables hybrid search queries to execute without external coordination.

Embedded AI Functions for Streamlined Workflows

seekdb incorporates native AI function expressions, allowing direct invocation of machine learning models from SQL queries without relying on intermediary application services. The primary AI functions include:

  • AI_EMBED: Converts textual data into vector embeddings
  • AI_COMPLETE: Generates text completions or chat responses using language models
  • AI_RERANK: Reorders candidate results based on AI-driven scoring
  • AI_PROMPT: Constructs prompt templates and dynamic inputs formatted as JSON for use with AI_COMPLETE

Model metadata and endpoint configurations are managed through the DBMS_AI_SERVICE package, enabling registration of external AI providers, URL settings, and key management directly within the database environment.

Handling Multimodal Data and Complex Queries

seekdb is architected to manage multiple data modalities within a single node, featuring a multimodal data and indexing layer that supports vectors, text, JSON, and GIS data. Its multi-model compute layer facilitates hybrid queries that combine vector similarity, full-text search, and scalar filters.

Additional indexing capabilities include JSON indexes for metadata queries and GIS indexes for spatial constraints, enabling complex queries such as:

  • Retrieving semantically related documents
  • Filtering results based on JSON metadata attributes like tenant ID, geographic region, or category
  • Applying spatial filters using geographic ranges or polygons

All these operations occur within the same database engine, streamlining data consistency and query performance.

Derived from the OceanBase engine, seekdb benefits from ACID-compliant transactions, hybrid row-column storage, and vectorized query execution. However, large-scale distributed deployments remain the domain of the full OceanBase system.

Summary of seekdb’s Advantages

  1. AI-Native Hybrid Search: By integrating vector search, full-text search, and relational filtering into a unified SQL interface and the DBMS_HYBRID_SEARCH package, seekdb enables multi-signal retrieval in a single query, simplifying RAG and AI agent workflows.
  2. Unified Multimodal Data Management: seekdb consolidates relational, vector, textual, JSON, and GIS data within one engine, ensuring consistency across documents, embeddings, and metadata without juggling multiple databases.
  3. In-Database AI Model Integration: Functions like AI_EMBED, AI_COMPLETE, AI_RERANK, and AI_PROMPT allow direct model calls from SQL, reducing pipeline complexity and centralizing orchestration within the database.
  4. Single-Node, Embedded-Friendly Architecture: Compatible with MySQL and supporting embedded and standalone modes, seekdb is ideal for local, edge, and embedded AI applications, while OceanBase handles distributed scaling.
  5. Open Source with Expanding Ecosystem: Licensed under Apache 2.0, seekdb integrates with a growing array of AI tools and frameworks, including Python bindings via pyseekdb and MCP-based integrations for code assistants and AI agents, positioning it as a comprehensive data platform for AI solutions.

More from this stream

Recomended