Skip to content

Repository Indexing Stopped

Symptom

One or more repositories are no longer indexing new commits, pull requests, releases, or deployments. You may notice:

  • Missing recent data: The latest commits in the repository are older than expected (e.g., commits from October but it's now December)
  • Stale metrics: Repository statistics and dashboards show outdated information
  • No new activity: The repository appears inactive even though there is recent GitHub activity

Root Causes

This issue typically occurs when:

  1. Indexing state is stuck in error: The indexing process encountered an error (e.g., API rate limit, permission issue) and the state was never reset
  2. Stale indexing date: The last_indexed_at timestamp in the indexing state is very old (e.g., from 2021) while actual indexed data exists (e.g., from October 2025)
  3. Repository marked as indexed: The repository status is set to "indexed" but indexing stopped working due to an error
  4. API errors: GitHub API returned errors (403 Forbidden, rate limits) that prevented indexing from continuing

Diagnosis

Use the diagnostic command to analyze the indexing state of a specific repository:

python manage.py diagnose_repo_indexing <repository_id>

This command will display:

  • Repository indexing status
  • IndexingState information (status, last indexed date, error messages)
  • Actual commits in MongoDB (latest and oldest commit dates)
  • Indexing mode (backfill vs maintenance)
  • Date range that would be fetched in the next batch

Example Output

=== Diagnosing Repository: owner/repo-name (ID: 39) ===

Repository Status: indexed
Last Indexed: 2025-08-25 01:52:31
Commit Count: 3526
Smart Indexing Status: indexed

=== IndexingState for commits ===
Status: error
Total Indexed: 16
Last Indexed At: 2021-01-26 18:49:22
⚠️  Last indexing was 1779 days ago!

=== Commits in MongoDB ===
Total Commits: 3588
Latest Commit Date: 2025-10-02 08:10:38
⚠️  Latest commit is 70 days old!

Resolution

Step 1: Diagnose the Issue

First, identify the problematic repository and understand the issue:

python manage.py diagnose_repo_indexing <repository_id>

Replace <repository_id> with the numeric ID of the repository (you can find this in the Django admin or database).

Step 2: Reset Indexing State

If the diagnosis shows an error state or a very old last_indexed_at date, reset the indexing state:

python manage.py diagnose_repo_indexing <repository_id> --fix

This command will:

  • Reset the IndexingState to pending status
  • Clear error messages
  • Reset retry count
  • Set repository status to in_progress
  • Clear the stale last_indexed_at date

Step 3: Trigger Re-indexing

After resetting the state, trigger a new indexing task. The system will automatically use the actual last indexed commit date from MongoDB instead of the stale last_indexed_at value.

You can manually trigger indexing using:

python manage.py start_intelligent_indexing

Or wait for the scheduled indexing task to run automatically.

Step 4: Verify the Fix

After the indexing task runs, verify that new data is being indexed:

python manage.py diagnose_repo_indexing <repository_id>

You should see:

  • Status changed from error to completed or pending
  • last_indexed_at updated to a recent date
  • New commits appearing in MongoDB

Prevention

The system has been improved to automatically handle this scenario:

  • Smart date detection: When last_indexed_at is older than 90 days, the system automatically uses the actual last indexed item date from MongoDB
  • Error recovery: Failed indexing tasks are automatically retried with exponential backoff
  • State validation: The indexing service validates state consistency before processing

Additional Notes

  • The diagnostic command works for all entity types (commits, pull requests, releases, deployments)
  • If multiple repositories are affected, run the diagnostic and fix command for each one
  • The --fix flag is safe to use multiple times; it will reset the state without causing data loss
  • Existing indexed data in MongoDB is preserved; only the indexing state is reset