Episode 5: Chapter Thumbnail — Screenshot Job Status & Delete Flow

Overview

This documents the Screenshot Job Status and Delete Flow improvements — a bug fix and architecture update session that adds separate job status tracking for screenshot processing, fixes the re-upload duplicate key issue, and verifies the full delete chain end-to-end.


Job Status Architecture Change

Major change: Upload and delete flows now use separate job status tracking for story publishing and screenshot processing.

Before: Only StoryJobStatus existed (keyed by trackingId). The story publish flow created a PENDING status, marked it COMPLETED after saving to MongoDB, and sent SSE to iOS. Screenshot processing had no job status at all — iOS had no way to know when screenshots were ready.

After: Two separate job status collections:

Collection Unique Key Purpose Used By
publishStoryJobStatus trackingId (UUID) Tracks story save/delete lifecycle. SSE pushes status to iOS in real-time. Upload + Delete flows
screenshotJobStatus youtubeContentID Tracks screenshot generation lifecycle. iOS polls this to know when thumbnails are ready. Upload flow only

Why this change was needed:

  1. iOS needs to know when screenshots are ready — Story save completes in ~1s, but screenshot generation takes 15-20s. Without a separate status, iOS has no way to know when to fetch and display chapter thumbnails.
  2. Different lifecycles — Story publishing is synchronous (save → SSE → done). Screenshot processing is async and happens after the story SSE is already sent.
  3. Different unique keysStoryJobStatus uses trackingId (UUID, new every request). ScreenshotJobStatus uses youtubeContentID (same video = same key — requires cleanup on re-upload).
  4. Delete flow stays simple — Delete only needs StoryJobStatus for SSE notification. ScreenshotJobStatus cleanup happens automatically at next upload time.

Kafka Communication Pattern

Upload Flow

Geospatial Service                    YouTube Service
      │                                     │
      │─── GENERATE_SCREENSHOTS ──────────>│  (adventuretube-screenshots)
      │    {youtubeContentID, chapters}     │
      │                                     │── yt-dlp + ffmpeg + S3 upload
      │<── SCREENSHOTS_COMPLETED ──────────│  (adventuretube-screenshots-result)
      │    {youtubeContentID, urls[]}       │
      │── update MongoDB with URLs          │

Delete Flow

Geospatial Service                    YouTube Service
      │                                     │
      │─── DELETE_SCREENSHOTS ────────────>│  (adventuretube-screenshots)
      │    {youtubeContentID, trackingId}   │
      │                                     │── delete images from S3
      │<── SCREENSHOTS_DELETED ────────────│  (adventuretube-screenshots-result)
      │    {youtubeContentID, trackingId}   │
      │── delete adventureTubeData          │
      │── markCompleted + SSE to iOS        │

Bug Fix: screenshotJobStatus Duplicate Key on Re-upload

Problem: After deleting a story, the screenshotJobStatus record (unique index on youtubeContentID) was never cleaned up. Re-uploading the same video caused DuplicateKeyException in createPendingJob().

Root cause analysis:

  1. First attempted fix: delete both job status records during the delete flow
  2. This caused a race conditionstoryJobStatusService.deleteByTrackingId() ran before iOS could poll the SSE status, throwing JobNotFoundException
  3. Final fix: handle cleanup at insert timeScreenshotJobStatusService.createPendingJob() checks for existing record by youtubeContentID and deletes it before inserting

Files changed:

File Change
ScreenshotJobStatusService.java createPendingJob() — find-and-delete existing record before insert
ScreenshotJobStatusRepository.java Added deleteByYoutubeContentID()
StoryJobStatusRepository.java Added deleteByYoutubeContentID(), deleteByTrackingId()
StoryJobStatusService.java Added deleteByTrackingId()
ScreenshotConsumer.java Delete flow: markCompleted only (cleanup removed to avoid race condition)

Kafka Producer Idempotence Configuration

Added enable.idempotence: true and acks: all to the Kafka producer config in both geospatial-service.yml and youtube-service.yml.

Why: During debugging, Kafka producer logs showed "Node -1 disconnected" — the producer's connection to the broker metadata node dropped and reconnected. Without idempotence, a retry during an in-flight send() could cause the broker to write the same message twice.

Was this the actual cause of the bug? No. The real cause was the orphaned screenshotJobStatus record. The Kafka duplicate was a separate, rare edge case.

Why keep it anyway: enable.idempotence: true is a Kafka best practice with no downside. The broker tracks (ProducerID, sequence number) per partition and deduplicates retries silently.


TODO

  • Handle delete request during active screenshot processing — iOS should check screenshotJobStatus before allowing delete. If status is PENDING/PROCESSING, block the delete.
  • Deploy Kafka idempotence config — Push config-service changes to main and restart geospatial-service + youtube-service

Created: 2026-04-05

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top