Repository Indexing Stopped
Symptom
One or more repositories are no longer indexing new commits, pull requests, releases, or deployments. You may notice:
- Missing recent data: The latest commits in the repository are older than expected (e.g., commits from October but it's now December)
- Stale metrics: Repository statistics and dashboards show outdated information
- No new activity: The repository appears inactive even though there is recent GitHub activity
Root Causes
This issue typically occurs when:
- Indexing state is stuck in error: The indexing process encountered an error (e.g., API rate limit, permission issue) and the state was never reset
- Stale indexing date: The
last_indexed_attimestamp in the indexing state is very old (e.g., from 2021) while actual indexed data exists (e.g., from October 2025) - Repository marked as indexed: The repository status is set to "indexed" but indexing stopped working due to an error
- API errors: GitHub API returned errors (403 Forbidden, rate limits) that prevented indexing from continuing
Diagnosis
Use the diagnostic command to analyze the indexing state of a specific repository:
python manage.py diagnose_repo_indexing <repository_id>
This command will display:
- Repository indexing status
- IndexingState information (status, last indexed date, error messages)
- Actual commits in MongoDB (latest and oldest commit dates)
- Indexing mode (backfill vs maintenance)
- Date range that would be fetched in the next batch
Example Output
=== Diagnosing Repository: owner/repo-name (ID: 39) ===
Repository Status: indexed
Last Indexed: 2025-08-25 01:52:31
Commit Count: 3526
Smart Indexing Status: indexed
=== IndexingState for commits ===
Status: error
Total Indexed: 16
Last Indexed At: 2021-01-26 18:49:22
⚠️ Last indexing was 1779 days ago!
=== Commits in MongoDB ===
Total Commits: 3588
Latest Commit Date: 2025-10-02 08:10:38
⚠️ Latest commit is 70 days old!
Resolution
Step 1: Diagnose the Issue
First, identify the problematic repository and understand the issue:
python manage.py diagnose_repo_indexing <repository_id>
Replace <repository_id> with the numeric ID of the repository (you can find this in the Django admin or database).
Step 2: Reset Indexing State
If the diagnosis shows an error state or a very old last_indexed_at date, reset the indexing state:
python manage.py diagnose_repo_indexing <repository_id> --fix
This command will:
- Reset the
IndexingStatetopendingstatus - Clear error messages
- Reset retry count
- Set repository status to
in_progress - Clear the stale
last_indexed_atdate
Step 3: Trigger Re-indexing
After resetting the state, trigger a new indexing task. The system will automatically use the actual last indexed commit date from MongoDB instead of the stale last_indexed_at value.
You can manually trigger indexing using:
python manage.py start_intelligent_indexing
Or wait for the scheduled indexing task to run automatically.
Step 4: Verify the Fix
After the indexing task runs, verify that new data is being indexed:
python manage.py diagnose_repo_indexing <repository_id>
You should see:
- Status changed from
errortocompletedorpending last_indexed_atupdated to a recent date- New commits appearing in MongoDB
Prevention
The system has been improved to automatically handle this scenario:
- Smart date detection: When
last_indexed_atis older than 90 days, the system automatically uses the actual last indexed item date from MongoDB - Error recovery: Failed indexing tasks are automatically retried with exponential backoff
- State validation: The indexing service validates state consistency before processing
Additional Notes
- The diagnostic command works for all entity types (commits, pull requests, releases, deployments)
- If multiple repositories are affected, run the diagnostic and fix command for each one
- The
--fixflag is safe to use multiple times; it will reset the state without causing data loss - Existing indexed data in MongoDB is preserved; only the indexing state is reset