Ingestion & Processing
How files and documents move from external integrations into Siftgum.
Siftgum automates the retrieval, conversion, and indexing of documents from multiple external sources (e.g., Google Drive, Notion). This ensures your AI workflows have fresh, consistent data. The steps below outline how files are fetched, transformed, and made available for AI use cases.
1. Overview
User Grants Access
A user connects their Google Drive or Notion through the OAuth flow, granting Mintlify API permission to access their documents.
Ingestion Queue
Mintlify API monitors new or updated files and records them in the QueueIngestions
table for processing.
File Fetcher
Retrieves the actual file content from the provider (Google Drive or Notion) using API requests.
Markdown Converter
Converts files (PDF, DOCX, HTML, etc.) into a standardized Markdown format for consistent processing.
Enrichment & Embeddings
Optionally applies text classification, metadata extraction, or vector embeddings for AI-powered retrieval.
Storage
Processed content and embeddings are securely stored in Supabase/Postgres, making them easily accessible for queries.
AI Queries (RAG) & Docs API
- Retrieval-Augmented Generation (RAG) enables advanced AI-powered searches over stored documents.
- Mintlify Docs API provides structured access to documentation, allowing seamless frontend integration.
2. Triggering Ingestion
Siftgum supports both automatic and manual triggers.
Automatic Sync By default, Siftgum periodically checks each integration for new/updated files. You can customize these intervals in the Integrations or Configuration panel.
Manual Trigger via API
Identifies which user’s account to check.
Specifies which integration (e.g., Google Drive).
If provided, limits ingestion to specific files. Otherwise, the system will scan the user’s entire drive/folder for changes.
3. Processing Logic
4. Access Control & Visibility
During ingestion, Siftgum validates user permissions. If a user revokes access or the integration is disabled, the ingestion process halts for that user’s files. Additionally:
RBAC: Only users with proper roles can see or query these newly ingested documents. Audit Logs: Ingestion events are logged (including file metadata and timestamps).
5. Debugging Ingestion Issues
Check the Logs
Look for ingestion-related messages in Siftgum server logs or the admin panel logs.
QueueIngestions Table
The QueueIngestions
table tracks each file’s status (queued, in-progress, done, error). Inspect failed jobs for error messages.
Verify OAuth Tokens
Confirm the user’s token is valid and not expired.
API Rate Limits
Some providers (e.g., Google) might throttle calls if the ingestion scans too many files at once.
- Next Steps With documents ingested and processed, you can leverage Retrieval-Augmented Generation (RAG), advanced AI queries, or data analytics.
Check out the RAG & LLM Integration guide to learn how to query these newly ingested documents in your AI workflows.