Kafka in AdventureTube-Backend

1. Why Kafka Over Other Message Brokers

AdventureTube uses Apache Kafka for asynchronous messaging between microservices. Here’s why Kafka was chosen over alternatives like RabbitMQ or JMS.

Kafka vs RabbitMQ — Key Differences

Feature	RabbitMQ	Kafka
Model	Queue-based (point-to-point)	Topic-based (pub/sub)
Message retention	Deleted after consumed	Stays in topic — multiple consumers read independently
Scaling	Clustering added later on top of single-node design	Designed from day one as a distributed system
Replication	Requires explicit queue mirroring config	Built-in topic replication across brokers
Partitioning	Not native	Topics split into partitions for parallel processing
Spring Boot support	`spring-boot-starter-amqp`	`spring-kafka` (auto-configures `KafkaTemplate`)

The Pub/Sub Advantage

This is the biggest reason. In RabbitMQ, a message goes to one consumer and is removed from the queue:

Producer → Queue → Consumer A reads Message 1 (gone forever)
                   Consumer B reads Message 2

In Kafka, messages stay in the topic. Every consumer reads independently:

Producer → Topic → Consumer A reads Message 1, 2, 3
                   Consumer B reads Message 1, 2, 3
                   Consumer C reads Message 1, 2, 3

This matters for AdventureTube because when geospatial-service publishes an adventure-data-created event:

geo-service → Topic: adventure-data-created
                ├── Consumer 1: web-service (update cache)
                ├── Consumer 2: notification-service (send push)
                └── Consumer 3: analytics-service (track stats)

All services independently read the same event. With RabbitMQ you’d need fanout exchanges and multiple queues to achieve this. With Kafka it’s the default behavior.

Other Kafka Advantages

Cluster-native scalability — Kafka was designed from the ground up to run across multiple nodes. Adding brokers automatically distributes load. RabbitMQ was designed as a single-node broker with clustering added later.
Resilience through replication — Topics are replicated across brokers. If one node dies, no data is lost.
Simpler API — KafkaTemplate is typed with generics and handles domain objects directly. No convertAndSend() needed.
JMS limitation — JMS is Java-only. Kafka supports any language/platform.

2. Migrate from Zookeeper to KRaft Mode (Completed 2026-03-09)

What is Zookeeper?

Zookeeper is a separate service that Kafka historically needed to manage cluster metadata — which brokers are alive, which broker leads which partition, topic configurations, etc.

What is KRaft?

Starting from Kafka 3.3, Kafka can manage its own metadata without Zookeeper using the KRaft (Kafka Raft) protocol. One of the Kafka brokers acts as the metadata controller.

Why Migrate?

	With Zookeeper	With KRaft
Processes	2 (zoo1 + kafka1)	1 (kafka1 only)
RAM savings	~128-256 MB wasted on Zookeeper JVM	Freed up
Complexity	Two services to configure, monitor, debug	One service
Single point of failure	Zookeeper dies → Kafka can’t function	No external dependency
Future	Deprecated — will be removed in Kafka 4.0	The standard going forward

PI1 (Pi 5) already runs 20+ containers at 65% RAM. Removing the zoo1 container saves resources and simplifies the stack.

Gotcha: ACL Authorizer

The Zookeeper-based kafka.security.authorizer.AclAuthorizer crashes in KRaft mode because it tries to connect to Zookeeper. Must use the KRaft-native authorizer:

Before: KAFKA_AUTHORIZER_CLASS_NAME: kafka.security.authorizer.AclAuthorizer
After:  KAFKA_AUTHORIZER_CLASS_NAME: org.apache.kafka.metadata.authorizer.StandardAuthorizer

Cluster ID

HLThU0HXQJegC_vT4NGpCw

Generated via docker exec kafka1 kafka-storage random-uuid. This same ID is used by both kafka1 (PI1) and kafka2 (PI2) — stored in env files as KAFKA_CLUSTER_ID.

Migration Steps

Step 1: Generate a cluster ID

docker exec kafka1 kafka-storage random-uuid

Step 2: Update docker-compose-kafka.yml — remove zoo1 service, update kafka1 to KRaft mode (see Section 4 for the actual compose file).

Step 3: Stop old stack, start new:

docker compose --env-file env.pi -f docker-compose-kafka.yml down
docker compose --env-file env.pi -f docker-compose-kafka.yml up -d

Step 4: Verify topics are intact:

docker exec kafka1 kafka-topics --bootstrap-server 192.168.1.199:29092 --list

3. Kafka 2-Broker Cluster (Completed 2026-03-09)

A Kafka cluster is not a separate product or special installation. It’s simply multiple Kafka brokers configured to work together using the same Cluster ID.

Architecture

PI1 (192.168.1.199): kafka1 (node.id=1, broker + controller)
PI2 (192.168.1.105): kafka2 (node.id=2, broker only)

kafka1 acts as both broker and KRaft controller. kafka2 is a broker-only node that connects to kafka1’s controller on port 9093.

Topic Replication

Topic: adventuretube-data (2 partitions, replication-factor=2)

PI1 kafka1: Partition 0 (follower),  Partition 1 (leader)
PI2 kafka2: Partition 0 (leader),    Partition 1 (follower)

Leader = handles all reads/writes for that partition
Follower = keeps a backup copy. If leader dies, follower is promoted
Every broker is a peer — no master/slave hierarchy
ISR (In-Sync Replicas) = both brokers are in sync for both partitions

Key Terminology

Term	Meaning
Broker	A single Kafka server process
Cluster	A group of brokers working together (same Cluster ID)
Partition	A slice of a topic’s data — enables parallel consumption
Leader	The broker that handles reads/writes for a partition
Follower	A broker that keeps a backup copy of a partition
Replication factor	How many copies of each partition exist across brokers
ISR	In-Sync Replicas — the set of brokers fully caught up with the leader

Listener Architecture (Critical for Cross-Host Brokers)

Each broker has multiple listeners for different purposes:

Listener	Port	Purpose	Advertised As
INTERNAL	19092	Inter-broker replication	Host IP (e.g. `192.168.1.199:19092`)
EXTERNAL	29092	Client connections (microservices)	Host IP (e.g. `192.168.1.199:29092`)
CONTROLLER	9093	KRaft metadata (kafka1 only)	PI1 IP only

Gotcha: Since kafka1 and kafka2 run on separate Docker hosts (PI1 and PI2), the INTERNAL advertised listener must use the host IP, not the container hostname. Using kafka2:19092 as the advertised address would fail because kafka1’s container cannot resolve the kafka2 hostname across Docker hosts. This caused replication to hang until fixed.

Infrastructure

Pi	Spec	RAM	Current Role	Kafka
PI1	Pi 5 + 1TB SSD	8 GiB (65% used)	Main server: Kafka, DBs, monitoring, Jenkins, nginx	kafka1 (broker + controller)
PI2	Pi 4	8 GiB (29% used, ~5.6 GiB free)	Cloud services + kafka2	kafka2 (broker only)
PI3	Pi 4	2 GiB (50% used, swapping)	Light workloads	Too small for Kafka

4. Configuration Files

Env Variables (in Jenkins Credentials)

All env files (env.pi, env.pi2, env.mac, env.prod) contain:

# Pi cluster IP addresses
PI1_IP=192.168.1.199
PI2_IP=192.168.1.105
PI3_IP=192.168.1.144

# Kafka
KAFKA_CLUSTER_ID=HLThU0HXQJegC_vT4NGpCw
KAFKA_BOOTSTRAP_SERVERS=192.168.1.199:29092,192.168.1.105:29092

PI1_IP, PI2_IP, PI3_IP — used by docker-compose-kafka.yml for listener addresses
KAFKA_CLUSTER_ID — shared cluster identity, used by docker-compose
KAFKA_BOOTSTRAP_SERVERS — used by Spring microservices (literal IPs, not ${PI1_IP} references, because Spring Boot reads env values directly without shell interpolation)

docker-compose-kafka.yml (PI1 — kafka1 + kafdrop)

services:
  kafka1:
    image: confluentinc/cp-kafka:7.6.0
    hostname: kafka1
    container_name: kafka1
    ports:
      - "9092:9092"
      - "29092:29092"
      - "9999:9999"
      - "9093:9093"
      - "19092:19092"
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@${PI1_IP}:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      CLUSTER_ID: ${KAFKA_CLUSTER_ID}
      KAFKA_LISTENERS: INTERNAL://0.0.0.0:19092,EXTERNAL://0.0.0.0:29092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: INTERNAL://${PI1_IP}:19092,EXTERNAL://${CLOUD_IP_ADDRESS}:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_AUTHORIZER_CLASS_NAME: org.apache.kafka.metadata.authorizer.StandardAuthorizer
      KAFKA_ALLOW_EVERYONE_IF_NO_ACL_FOUND: "true"

  kafdrop:
    image: obsidiandynamics/kafdrop
    container_name: kafdrop
    ports:
      - "9000:9000"
    environment:
      KAFKA_BROKERCONNECT: ${PI1_IP}:19092,${PI2_IP}:19092
    depends_on:
      - kafka1

docker-compose-kafka.yml (PI2 — kafka2)

services:
  kafka2:
    image: confluentinc/cp-kafka:7.6.0
    hostname: kafka2
    container_name: kafka2
    ports:
      - "9092:9092"
      - "29092:29092"
      - "9999:9999"
      - "19092:19092"
    environment:
      KAFKA_NODE_ID: 2
      KAFKA_PROCESS_ROLES: broker
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@${PI1_IP}:9093
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      CLUSTER_ID: ${KAFKA_CLUSTER_ID}
      KAFKA_LISTENERS: INTERNAL://0.0.0.0:19092,EXTERNAL://0.0.0.0:29092
      KAFKA_ADVERTISED_LISTENERS: INTERNAL://${PI2_IP}:19092,EXTERNAL://${PI2_IP}:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,INTERNAL:PLAINTEXT,EXTERNAL:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: INTERNAL
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_AUTHORIZER_CLASS_NAME: org.apache.kafka.metadata.authorizer.StandardAuthorizer
      KAFKA_ALLOW_EVERYONE_IF_NO_ACL_FOUND: "true"

Key differences from kafka1:

KAFKA_NODE_ID: 2 — unique per broker
KAFKA_PROCESS_ROLES: broker — not controller (PI1’s kafka1 handles that)
KAFKA_CONTROLLER_QUORUM_VOTERS points to PI1’s controller
No CONTROLLER listener (only the controller node needs it)
Same CLUSTER_ID — this is what makes them part of the same cluster

5. How to Start the 2-Broker Cluster and Verify It Is Working

Step 1: Ensure env files are updated in Jenkins

Both env.pi and env.pi2 in Jenkins Credentials must contain PI1_IP, PI2_IP, PI3_IP, KAFKA_CLUSTER_ID, and KAFKA_BOOTSTRAP_SERVERS. See Section 4 for the exact values.

Step 2: Deploy via Jenkins

Run both Jenkins pipelines to pull the latest main branch. This updates docker-compose-kafka.yml on both Pis via git pull:

adventuretube-clouds — deploys to PI2 (eureka, config, gateway)
adventuretube-microservice — deploys to PI2 (auth, member, web, geospatial)

Wait for Spring Cloud services (eureka, config, gateway) to be healthy before proceeding.

Step 3: Start kafka1 on PI1

ssh strider@strider-pi.local "cd ~/adventuretube-microservice && docker compose --env-file env.pi -f docker-compose-kafka.yml up -d"

This starts kafka1 (broker + controller) and kafdrop (web UI).

Step 4: Start kafka2 on PI2

ssh strider@strider-pi2.local "cd ~/adventuretube-microservice && docker compose --env-file env.pi2 -f docker-compose-kafka.yml up -d"

kafka2 connects to kafka1’s controller on port 9093 and joins the cluster.

Step 5: Verify — Both brokers visible

ssh strider@strider-pi.local "docker exec kafka1 kafka-broker-api-versions --bootstrap-server 192.168.1.199:29092 2>&1 | grep '^[0-9]'"

Expected output:

192.168.1.199:29092 (id: 1 rack: null) -> (
192.168.1.105:29092 (id: 2 rack: null) -> (

If only one broker shows, check kafka2 logs: docker logs kafka2

Step 6: Verify — Topic replication

ssh strider@strider-pi.local "docker exec kafka1 kafka-topics --bootstrap-server 192.168.1.199:29092 --describe --topic adventuretube-data"

Expected output:

Topic: adventuretube-data  PartitionCount: 2  ReplicationFactor: 2
  Partition: 0  Leader: 2  Replicas: 2,1  Isr: 2,1
  Partition: 1  Leader: 1  Replicas: 1,2  Isr: 1,2

Replicas should show both broker IDs (1,2) for each partition
Isr should match Replicas (both brokers in sync)
If Isr is missing a broker, replication is behind — check inter-broker connectivity

Step 7: Verify — Kafdrop web UI

Open http://192.168.1.199:9000 (or kafka.adventuretube.net).

You should see:

2 brokers listed on the main page
Click on adventuretube-data topic — shows 2 partitions, each with replicas on both brokers
Partition leaders should be distributed across both brokers

Step 8: Verify — Produce and consume a test message

# Produce a test message
ssh strider@strider-pi.local "docker exec kafka1 bash -c 'echo \"test-message\" | kafka-console-producer --bootstrap-server 192.168.1.199:29092 --topic adventuretube-data'"

# Consume from kafka2 to verify cross-broker replication
ssh strider@strider-pi2.local "docker exec kafka2 kafka-console-consumer --bootstrap-server 192.168.1.105:29092 --topic adventuretube-data --from-beginning --max-messages 1"

If the message appears on kafka2, cross-broker replication is working.

Recreating a Topic with Replication

If adventuretube-data was created before the 2-broker cluster (replication factor 1), recreate it:

# Delete old topic
docker exec kafka1 kafka-topics --bootstrap-server 192.168.1.199:29092 --delete --topic adventuretube-data

# Wait a few seconds, then create with replication
docker exec kafka1 kafka-topics --bootstrap-server 192.168.1.199:29092 \
  --create --topic adventuretube-data --partitions 2 --replication-factor 2

Warning: If a microservice is connected, Kafka may auto-recreate the topic with replication factor 1 before you can recreate it. If that happens, use kafka-reassign-partitions instead:

# Create reassignment JSON
echo '{"version":1,"partitions":[{"topic":"adventuretube-data","partition":0,"replicas":[1,2]}]}' > /tmp/reassign.json

# Execute reassignment
docker exec kafka1 kafka-reassign-partitions --bootstrap-server 192.168.1.199:29092 \
  --reassignment-json-file /tmp/reassign.json --execute

# Verify completion
docker exec kafka1 kafka-reassign-partitions --bootstrap-server 192.168.1.199:29092 \
  --reassignment-json-file /tmp/reassign.json --verify

6. Troubleshooting

kafka2 joins but replication hangs (Isr missing broker 2)

Cause: INTERNAL advertised listener uses container hostname (kafka2:19092) instead of host IP. Brokers on different Docker hosts cannot resolve each other’s container hostnames.

Fix: Use host IPs for INTERNAL advertised listeners:

# kafka1 on PI1
KAFKA_ADVERTISED_LISTENERS: INTERNAL://192.168.1.199:19092,...
# kafka2 on PI2
KAFKA_ADVERTISED_LISTENERS: INTERNAL://192.168.1.105:19092,...

And expose port 19092 on both hosts.

Topic auto-recreates after deletion

Cause: A connected microservice or kafka-exporter triggers auto-topic-creation.

Fix: Use kafka-reassign-partitions to change replication factor in-place (see Section 5), or stop all consumers before deleting.

kafka2 logs show “Connection to controller refused”

Cause: Port 9093 not exposed on PI1, or KAFKA_CONTROLLER_QUORUM_VOTERS points to wrong IP.

Fix: Ensure PI1’s docker-compose exposes port 9093 and the voters config uses PI1’s IP.