Artificial intelligence applications seldom operate on a single, straightforward dataset. Instead, they integrate diverse data types such as user profiles, conversation histories, JSON metadata, vector embeddings, and occasionally spatial information. Traditionally, development teams have managed this complexity by combining an OLTP database, a vector storage system, and a search engine into a fragmented architecture. Addressing this challenge, seekdb emerges as an open-source, AI-centric database solution licensed under Apache 2.0. It is designed as a unified platform that consolidates relational tables, vector data, textual content, JSON structures, and geographic information system (GIS) data within a single engine, enabling hybrid search capabilities and in-database AI workflows.
Introducing seekdb: An AI-Optimized Embedded Database
seekdb is essentially a streamlined, embedded variant of the OceanBase database engine, tailored specifically for AI-driven applications rather than broad distributed systems. Operating as a single-node database, it supports embedded deployment as well as client-server modes, maintaining compatibility with MySQL drivers and SQL syntax to ensure ease of integration.
In terms of deployment capabilities, seekdb offers:
- Support for embedded database use cases
- Functionality as a standalone database
- No support for distributed database configurations (reserved for full OceanBase)
From a data modeling standpoint, seekdb accommodates:
- Relational data managed through standard SQL queries
- Vector-based similarity search
- Comprehensive full-text search
- Flexible JSON data handling
- Spatial GIS data processing
All these data types coexist within a unified storage and indexing framework, simplifying data management for AI applications.
Hybrid Search: The Heart of seekdb’s Innovation
The standout feature of seekdb is its hybrid search capability, which seamlessly merges vector-based semantic retrieval, keyword-driven full-text search, and scalar filtering into a single query with unified ranking. This integration eliminates the need for multiple search engines or complex orchestration layers.
seekdb implements hybrid search through the DBMS_HYBRID_SEARCH package, offering two primary interfaces:
DBMS_HYBRID_SEARCH.SEARCH: Returns search results as JSON, ranked by relevanceDBMS_HYBRID_SEARCH.GET_SQL: Provides the exact SQL query string executed
This hybrid search mechanism supports:
- Pure vector similarity searches
- Exclusive full-text keyword searches
- Combined hybrid queries blending both approaches
Additionally, it pushes relational filters and join operations down to the storage layer for efficiency. Advanced reranking techniques such as weighted scoring, reciprocal rank fusion, and integration with large language model (LLM)-based rerankers are also supported.
For use cases like retrieval-augmented generation (RAG) and agent memory management, this means developers can craft a single SQL statement that performs semantic embedding matching, exact keyword filtering (e.g., product codes or named entities), and relational constraints (e.g., user or tenant scopes) simultaneously.
Robust Vector and Full-Text Search Engines
At its core, seekdb offers a sophisticated vector and full-text search infrastructure.
Vector search capabilities include:
- Support for both dense and sparse vector formats
- Multiple distance metrics such as Manhattan, Euclidean, inner product, and cosine similarity
- In-memory indexing options like HNSW, HNSW SQ, and HNSW BQ
- Disk-based index types including IVF and IVF PQ
seekdb’s hybrid vector index allows raw text to be ingested directly, with the system automatically invoking embedding models and maintaining vector indexes internally-eliminating the need for separate preprocessing pipelines.
Full-text search features encompass:
- Support for keyword, phrase, and Boolean queries
- BM25 algorithm for relevance ranking
- Multiple tokenizer configurations to handle diverse text formats
Importantly, vector and full-text indexes are treated as first-class citizens within the same query planner that manages scalar and GIS indexes. This tight integration enables hybrid search queries to execute without external coordination.
Embedded AI Functions for Streamlined Workflows
seekdb incorporates native AI function expressions, allowing direct invocation of machine learning models from SQL queries without relying on intermediary application services. The primary AI functions include:
AI_EMBED: Converts textual data into vector embeddingsAI_COMPLETE: Generates text completions or chat responses using language modelsAI_RERANK: Reorders candidate results based on AI-driven scoringAI_PROMPT: Constructs prompt templates and dynamic inputs formatted as JSON for use withAI_COMPLETE
Model metadata and endpoint configurations are managed through the DBMS_AI_SERVICE package, enabling registration of external AI providers, URL settings, and key management directly within the database environment.
Handling Multimodal Data and Complex Queries
seekdb is architected to manage multiple data modalities within a single node, featuring a multimodal data and indexing layer that supports vectors, text, JSON, and GIS data. Its multi-model compute layer facilitates hybrid queries that combine vector similarity, full-text search, and scalar filters.
Additional indexing capabilities include JSON indexes for metadata queries and GIS indexes for spatial constraints, enabling complex queries such as:
- Retrieving semantically related documents
- Filtering results based on JSON metadata attributes like tenant ID, geographic region, or category
- Applying spatial filters using geographic ranges or polygons
All these operations occur within the same database engine, streamlining data consistency and query performance.
Derived from the OceanBase engine, seekdb benefits from ACID-compliant transactions, hybrid row-column storage, and vectorized query execution. However, large-scale distributed deployments remain the domain of the full OceanBase system.
Summary of seekdb’s Advantages
- AI-Native Hybrid Search: By integrating vector search, full-text search, and relational filtering into a unified SQL interface and the
DBMS_HYBRID_SEARCHpackage, seekdb enables multi-signal retrieval in a single query, simplifying RAG and AI agent workflows. - Unified Multimodal Data Management: seekdb consolidates relational, vector, textual, JSON, and GIS data within one engine, ensuring consistency across documents, embeddings, and metadata without juggling multiple databases.
- In-Database AI Model Integration: Functions like
AI_EMBED,AI_COMPLETE,AI_RERANK, andAI_PROMPTallow direct model calls from SQL, reducing pipeline complexity and centralizing orchestration within the database. - Single-Node, Embedded-Friendly Architecture: Compatible with MySQL and supporting embedded and standalone modes, seekdb is ideal for local, edge, and embedded AI applications, while OceanBase handles distributed scaling.
- Open Source with Expanding Ecosystem: Licensed under Apache 2.0, seekdb integrates with a growing array of AI tools and frameworks, including Python bindings via pyseekdb and MCP-based integrations for code assistants and AI agents, positioning it as a comprehensive data platform for AI solutions.
