Overview
This documents the Screenshot Job Status and Delete Flow improvements — a bug fix and architecture update session that adds separate job status tracking for screenshot processing, fixes the re-upload duplicate key issue, and verifies the full delete chain end-to-end.
Job Status Architecture Change
Major change: Upload and delete flows now use separate job status tracking for story publishing and screenshot processing.
Before: Only StoryJobStatus existed (keyed by trackingId). The story publish flow created a PENDING status, marked it COMPLETED after saving to MongoDB, and sent SSE to iOS. Screenshot processing had no job status at all — iOS had no way to know when screenshots were ready.
After: Two separate job status collections:
| Collection | Unique Key | Purpose | Used By |
|---|---|---|---|
publishStoryJobStatus |
trackingId (UUID) |
Tracks story save/delete lifecycle. SSE pushes status to iOS in real-time. | Upload + Delete flows |
screenshotJobStatus |
youtubeContentID |
Tracks screenshot generation lifecycle. iOS polls this to know when thumbnails are ready. | Upload flow only |
Why this change was needed:
- iOS needs to know when screenshots are ready — Story save completes in ~1s, but screenshot generation takes 15-20s. Without a separate status, iOS has no way to know when to fetch and display chapter thumbnails.
- Different lifecycles — Story publishing is synchronous (save → SSE → done). Screenshot processing is async and happens after the story SSE is already sent.
- Different unique keys —
StoryJobStatususestrackingId(UUID, new every request).ScreenshotJobStatususesyoutubeContentID(same video = same key — requires cleanup on re-upload). - Delete flow stays simple — Delete only needs
StoryJobStatusfor SSE notification.ScreenshotJobStatuscleanup happens automatically at next upload time.
Kafka Communication Pattern
Upload Flow
Geospatial Service YouTube Service
│ │
│─── GENERATE_SCREENSHOTS ──────────>│ (adventuretube-screenshots)
│ {youtubeContentID, chapters} │
│ │── yt-dlp + ffmpeg + S3 upload
│<── SCREENSHOTS_COMPLETED ──────────│ (adventuretube-screenshots-result)
│ {youtubeContentID, urls[]} │
│── update MongoDB with URLs │
Delete Flow
Geospatial Service YouTube Service
│ │
│─── DELETE_SCREENSHOTS ────────────>│ (adventuretube-screenshots)
│ {youtubeContentID, trackingId} │
│ │── delete images from S3
│<── SCREENSHOTS_DELETED ────────────│ (adventuretube-screenshots-result)
│ {youtubeContentID, trackingId} │
│── delete adventureTubeData │
│── markCompleted + SSE to iOS │
Bug Fix: screenshotJobStatus Duplicate Key on Re-upload
Problem: After deleting a story, the screenshotJobStatus record (unique index on youtubeContentID) was never cleaned up. Re-uploading the same video caused DuplicateKeyException in createPendingJob().
Root cause analysis:
- First attempted fix: delete both job status records during the delete flow
- This caused a race condition —
storyJobStatusService.deleteByTrackingId()ran before iOS could poll the SSE status, throwingJobNotFoundException - Final fix: handle cleanup at insert time —
ScreenshotJobStatusService.createPendingJob()checks for existing record byyoutubeContentIDand deletes it before inserting
Files changed:
| File | Change |
|---|---|
ScreenshotJobStatusService.java |
createPendingJob() — find-and-delete existing record before insert |
ScreenshotJobStatusRepository.java |
Added deleteByYoutubeContentID() |
StoryJobStatusRepository.java |
Added deleteByYoutubeContentID(), deleteByTrackingId() |
StoryJobStatusService.java |
Added deleteByTrackingId() |
ScreenshotConsumer.java |
Delete flow: markCompleted only (cleanup removed to avoid race condition) |
Kafka Producer Idempotence Configuration
Added enable.idempotence: true and acks: all to the Kafka producer config in both geospatial-service.yml and youtube-service.yml.
Why: During debugging, Kafka producer logs showed "Node -1 disconnected" — the producer's connection to the broker metadata node dropped and reconnected. Without idempotence, a retry during an in-flight send() could cause the broker to write the same message twice.
Was this the actual cause of the bug? No. The real cause was the orphaned screenshotJobStatus record. The Kafka duplicate was a separate, rare edge case.
Why keep it anyway: enable.idempotence: true is a Kafka best practice with no downside. The broker tracks (ProducerID, sequence number) per partition and deduplicates retries silently.
TODO
- Handle delete request during active screenshot processing — iOS should check
screenshotJobStatusbefore allowing delete. If status is PENDING/PROCESSING, block the delete. - Deploy Kafka idempotence config — Push config-service changes to main and restart geospatial-service + youtube-service
Created: 2026-04-05
