Siftgum automates the retrieval, conversion, and indexing of documents from multiple external sources (e.g., Google Drive, Notion). This ensures your AI workflows have fresh, consistent data. The steps below outline how files are fetched, transformed, and made available for AI use cases.


1. Overview

1

User Grants Access

A user connects their Google Drive or Notion through the OAuth flow, granting Mintlify API permission to access their documents.

2

Ingestion Queue

Mintlify API monitors new or updated files and records them in the QueueIngestions table for processing.

3

File Fetcher

Retrieves the actual file content from the provider (Google Drive or Notion) using API requests.

4

Markdown Converter

Converts files (PDF, DOCX, HTML, etc.) into a standardized Markdown format for consistent processing.

5

Enrichment & Embeddings

Optionally applies text classification, metadata extraction, or vector embeddings for AI-powered retrieval.

6

Storage

Processed content and embeddings are securely stored in Supabase/Postgres, making them easily accessible for queries.

7

AI Queries (RAG) & Docs API

  • Retrieval-Augmented Generation (RAG) enables advanced AI-powered searches over stored documents.
  • Mintlify Docs API provides structured access to documentation, allowing seamless frontend integration.

2. Triggering Ingestion

Siftgum supports both automatic and manual triggers.

Automatic Sync By default, Siftgum periodically checks each integration for new/updated files. You can customize these intervals in the Integrations or Configuration panel.

Manual Trigger via API

// Example call using the Siftgum client
await SiftgumClient.triggerIngestion({
  endUserId: 'user_123',
  integrationId: 'integration_abc',
  fileIds: ['doc_456']
})
endUserId
string
required

Identifies which user’s account to check.

integrationId
string
required

Specifies which integration (e.g., Google Drive).

fileIds
string[]

If provided, limits ingestion to specific files. Otherwise, the system will scan the user’s entire drive/folder for changes.

3. Processing Logic

4. Access Control & Visibility

During ingestion, Siftgum validates user permissions. If a user revokes access or the integration is disabled, the ingestion process halts for that user’s files. Additionally:

RBAC: Only users with proper roles can see or query these newly ingested documents. Audit Logs: Ingestion events are logged (including file metadata and timestamps).

5. Debugging Ingestion Issues

1

Check the Logs

Look for ingestion-related messages in Siftgum server logs or the admin panel logs.

2

QueueIngestions Table

The QueueIngestions table tracks each file’s status (queued, in-progress, done, error). Inspect failed jobs for error messages.

3

Verify OAuth Tokens

Confirm the user’s token is valid and not expired.

4

API Rate Limits

Some providers (e.g., Google) might throttle calls if the ingestion scans too many files at once.

  1. Next Steps With documents ingested and processed, you can leverage Retrieval-Augmented Generation (RAG), advanced AI queries, or data analytics.

Check out the RAG & LLM Integration guide to learn how to query these newly ingested documents in your AI workflows.