1. Overview
1
User Grants Access
A user connects their Google Drive or Notion through the OAuth flow, granting Mintlify API permission to access their documents.
2
Ingestion Queue
Mintlify API monitors new or updated files and records them in the
QueueIngestions
table for processing.3
File Fetcher
Retrieves the actual file content from the provider (Google Drive or Notion) using API requests.
4
Markdown Converter
Converts files (PDF, DOCX, HTML, etc.) into a standardized Markdown format for consistent processing.
5
Enrichment & Embeddings
Optionally applies text classification, metadata extraction, or vector embeddings for AI-powered retrieval.
6
Storage
Processed content and embeddings are securely stored in Supabase/Postgres, making them easily accessible for queries.
7
AI Queries (RAG) & Docs API
- Retrieval-Augmented Generation (RAG) enables advanced AI-powered searches over stored documents.
- Mintlify Docs API provides structured access to documentation, allowing seamless frontend integration.
2. Triggering Ingestion
Siftgum supports both automatic and manual triggers.Automatic Sync
By default, Siftgum periodically checks each integration for new/updated files. You can customize these intervals in the Integrations or Configuration panel.
Identifies which user’s account to check.
Specifies which integration (e.g., Google Drive).
If provided, limits ingestion to specific files. Otherwise, the system will scan the user’s entire drive/folder for changes.
3. Processing Logic
File Fetch & Content Extraction
File Fetch & Content Extraction
- Check OAuth Token: Siftgum uses the user’s stored token to fetch documents from the provider’s API. - Rate Limiting: Some providers (Google, Notion) have daily or per-minute quotas. Siftgum automatically handles retry logic.
Markdown Conversion
Markdown Conversion
- Text Extraction: For PDFs,
.docx
, or other formats, Siftgum parses text content. - Standardization: The extracted text is saved as Markdown, ensuring consistency across different file types. - Metadata Insertion: Timestamps, user IDs, and file references are inserted as YAML frontmatter or JSON.
Embeddings & Enrichment
Embeddings & Enrichment
- Optional Step: If configured, each chunk of text is passed to an embedding model (e.g., OpenAI, local model) for vector representation. - Storage: Vectors + text are then stored in Supabase/Postgres. - Entity Tagging: If an enrichment pipeline is set up, Siftgum can label or tag documents (e.g., categories, named entities).
4. Access Control & Visibility
During ingestion, Siftgum validates user permissions. If a user revokes access or the integration is disabled, the ingestion process halts for that user’s files. Additionally: RBAC: Only users with proper roles can see or query these newly ingested documents. Audit Logs: Ingestion events are logged (including file metadata and timestamps).5. Debugging Ingestion Issues
1
Check the Logs
Look for ingestion-related messages in Siftgum server logs or the admin panel logs.
2
QueueIngestions Table
The
QueueIngestions
table tracks each file’s status (queued, in-progress, done, error). Inspect failed jobs for error messages.3
Verify OAuth Tokens
Confirm the user’s token is valid and not expired.
4
API Rate Limits
Some providers (e.g., Google) might throttle calls if the ingestion scans too many files at once.
- Next Steps With documents ingested and processed, you can leverage Retrieval-Augmented Generation (RAG), advanced AI queries, or data analytics.