Apache Kafka — Interview Questions

Stack context: This system uses Apache Kafka (Confluent 7.6.1, KRaft mode) as the event-streaming backbone for asynchronous inter-service communication. Topics include order.created, payment.processed, and order.status.updated. Avro + Confluent Schema Registry is used for serialization, and Spring's @RetryableTopic handles retry/DLT flows.

Q1 — What is Apache Kafka and how does it differ from a traditional message queue? `junior`

Answer: Apache Kafka is a distributed, append-only event log. Unlike a traditional message queue (RabbitMQ, ActiveMQ) where a message is removed once consumed, Kafka retains messages on disk for a configurable retention period. Multiple independent consumers can read the same messages at different offsets.

Key differences:

Persistence: Kafka stores messages regardless of consumption; queues delete after acknowledgment.
Multiple consumers: Kafka supports independent consumer groups, each with their own offset. Queues deliver each message to one consumer only.
Ordering: Kafka guarantees ordering within a partition. Queues typically offer FIFO but not strict partition-level ordering.
Throughput: Kafka is designed for millions of messages per second using sequential disk I/O and batching.

Real use case in this system: The order.created topic is consumed independently by payment-processors and notification-processors. Both groups receive every event without interfering with each other.

Q2 — What are topics, partitions, and offsets? `junior`

Answer:

Topic: A named category/stream of messages (e.g., order.created). Think of it as a database table.
Partition: A topic is split into N ordered, immutable sub-logs. Each partition is stored independently and can be hosted on a different broker. This enables horizontal scalability.
Offset: A monotonically increasing integer assigned to each message within a partition. Each consumer group tracks its own committed offset per partition, enabling independent replay and resume-on-restart.

order.created (3 partitions):
  Partition 0: [offset-0, offset-1, offset-2, ...]
  Partition 1: [offset-0, offset-1, ...]
  Partition 2: [offset-0, ...]

Real use case: In this system order.created has 3 partitions. Orders are keyed by orderId, so all events for the same order always land in the same partition, guaranteeing per-order processing order.

Q3 — What is a consumer group and how does partition assignment work? `junior`

Answer: A consumer group is a logical grouping of consumers that share the work of consuming a topic. Kafka assigns each partition to exactly one consumer within a group, ensuring each message is processed once per group.

Rules:

If consumers < partitions: some consumers handle multiple partitions.
If consumers = partitions: 1:1 assignment.
If consumers > partitions: extra consumers are idle (no partition assigned).

When a consumer joins or leaves, Kafka triggers a rebalance to redistribute partitions.

Real use case: payment-processors group has consumers equal to the 3 partitions of order.created. Each consumer handles one partition. If a payment-service pod crashes, Kafka rebalances and the remaining consumers absorb its partition.

Q4 — What is `auto.offset.reset` and when would you use `earliest` vs `latest`? `junior`

Answer: auto.offset.reset determines where a new consumer group (no committed offset yet) starts reading:

earliest: Read from the very beginning of the topic (offset 0 of each partition).
latest: Skip all existing messages; start reading only new ones produced after the consumer starts.

When to use each:

earliest: Testing, data reprocessing, event sourcing catch-up. Used in this system for local dev so new service pods can process all historic events.
latest: Production consumers that should only handle live traffic and not backfill historical data (e.g., live dashboards).

Q5 — Explain the `@KafkaListener` annotation and how it works in Spring. `junior`

Answer: @KafkaListener marks a method as a Kafka message handler. Spring's KafkaListenerContainerFactory creates a MessageListenerContainer that manages the consumer loop, offset commit, and thread management.

@KafkaListener(
    topics = "order.created",
    groupId = "payment-processors",
    containerFactory = "kafkaListenerContainerFactory"
)
public void handleOrder(OrderCreatedEvent event) { ... }

Key behaviors:

Deserialization happens via configured Deserializer (Avro in this system).
By default, offsets are committed after the method returns successfully (at-least-once delivery).
Exceptions propagate to the configured ErrorHandler (or @RetryableTopic if configured).

Q6 — What is Avro and why use it instead of JSON for Kafka messages? `junior`

Answer: Apache Avro is a binary serialization format with a schema defined in JSON. Compared to JSON:

Feature	Avro	JSON
Size	Compact binary (~5-10× smaller)	Verbose text
Schema enforcement	Required at produce/consume	Optional
Evolution	Supports backward/forward compatibility	No built-in rules
Performance	Fast binary encode/decode	Slower text parse

Schema Registry integration: Avro schemas are registered with Confluent Schema Registry. Only the schema ID (4 bytes) is sent in the message payload. Consumers fetch the schema by ID to deserialize.

Real use case: This system uses Avro for OrderCreatedEvent, PaymentProcessedEvent, and OrderStatusUpdatedEvent. Schema Registry prevents a producer from publishing a breaking schema change.

Q7 — What is the Confluent Schema Registry and how does it prevent breaking changes? `junior`

Answer: Schema Registry is a centralized service that stores and versions Avro (or Protobuf/JSON Schema) schemas. Producers register a schema before publishing; consumers retrieve schemas by ID.

Compatibility levels (configured per subject):

BACKWARD: New schema can read data written by the previous schema (consumers can upgrade first).
FORWARD: Previous schema can read data written by the new schema (producers can upgrade first).
FULL: Both directions.

If a proposed schema violates the compatibility rule (e.g., removing a required field), the Registry rejects it and the producer fails to start — preventing runtime deserialization errors.

Q8 — What is `@RetryableTopic` and how does Spring implement retry logic with it? `junior`

Answer: @RetryableTopic is a Spring Kafka annotation that configures non-blocking retry using separate retry topics instead of blocking the main consumer thread.

@RetryableTopic(
    attempts = "4",
    backoff = @Backoff(delay = 1000, multiplier = 2.0),
    dltTopicSuffix = ".DLT"
)
@KafkaListener(topics = "order.created", groupId = "payment-processors")
public void processPayment(OrderCreatedEvent event) { ... }

Spring auto-creates order.created-retry-0, order.created-retry-1, order.created-retry-2, and order.created.DLT. On exception, the message is forwarded to the next retry topic with a header indicating the delay. After all attempts are exhausted, the message lands in the DLT.

Advantage over blocking retry: The main consumer continues processing other messages while retries are scheduled, preventing a single bad message from stalling the entire partition.

Q9 — What is a Dead Letter Topic (DLT) and how should it be handled in production? `senior`

Answer: A DLT receives messages that failed all retry attempts. It serves as a "poison pill parking lot" that preserves failed events for investigation and reprocessing.

Production handling strategy:

Monitor: Alert on DLT consumer-group lag > 0 (PagerDuty/Datadog).
@DltHandler: Annotate a method to process DLT messages — log structured context (orderId, error type, attempt count, headers).
Manual replay: After fixing the root cause, re-publish DLT messages back to the main topic.
Automatic replay (optional): A scheduled job re-publishes DLT messages after a cooldown period.

@DltHandler
public void handleDlt(OrderCreatedEvent event, @Header KafkaHeaders.RECEIVED_TOPIC String topic) {
    log.error("DLT received: orderId={}, topic={}", event.getOrderId(), topic);
    // persist to dead_letters table for ops dashboard
}

This system: order.created.DLT has 3 partitions. The @DltHandler logs the failure; ops teams can replay via the admin endpoint.

Q10 — What is the Transactional Outbox Pattern and why is it needed? `senior`

Answer: The Transactional Outbox solves the dual-write problem: you cannot atomically write to both a database and Kafka in a single transaction (they are different systems).

Problem: order-service creates an order in PostgreSQL and publishes to Kafka. If the app crashes after the DB commit but before the Kafka publish, the order exists but no event is sent — downstream services never know.

Solution: Write the event to an outbox table in the same DB transaction as the order. A separate OutboxPublisher polls the outbox, publishes to Kafka, then marks the row as sent.

BEGIN TRANSACTION
  INSERT INTO orders (...)
  INSERT INTO outbox (event_type, payload, published=false)
COMMIT

-- Separate thread:
SELECT * FROM outbox WHERE published = false
  → KafkaTemplate.send(...)
  → UPDATE outbox SET published = true WHERE id = ?

Guarantees: At-least-once delivery (idempotent consumers needed). The order and event are always consistent.

This system: order-service implements this with OutboxPublisher polling every 100 ms.

Q11 — Explain Kafka's delivery semantics: at-most-once, at-least-once, exactly-once. `senior`

Answer:

Semantic	Description	Config	Risk
At-most-once	Commit offset before processing	`enable.auto.commit=true`, early commit	Message loss on consumer crash
At-least-once	Commit after processing	Manual commit after handler returns	Duplicate processing on crash
Exactly-once	Idempotent producer + transactions	`enable.idempotence=true`, `isolation.level=read_committed`	Higher latency, complexity

At-least-once + idempotent consumers is the pragmatic choice for most systems. This system uses it: @RetryableTopic may redeliver the same event, so payment-service checks if an order payment already exists before processing.

Exactly-once is appropriate for financial ledgers or inventory deductions where duplicates are unacceptable and the overhead is justified.

Q12 — How does Kafka KRaft mode differ from ZooKeeper-based Kafka? `senior`

Answer: KRaft (Kafka Raft) replaces ZooKeeper as the metadata store in Kafka 3.x+. The cluster elects a Controller Quorum using the Raft consensus algorithm, and metadata is stored in an internal __cluster_metadata topic.

Benefits of KRaft:

Eliminates ZooKeeper operational overhead (separate JVM cluster to maintain).
Faster metadata propagation (metadata stored in Kafka itself).
Single system to monitor, secure, and scale.
Supports larger clusters (ZK had practical limits ~tens of thousands of partitions; KRaft supports millions).
Simplifies Docker/Kubernetes deployments (one container type instead of two).

This system: Uses KRaft mode (docker-compose.yml runs Kafka with KAFKA_KRAFT_MODE=true, no ZooKeeper container).

Q13 — What is consumer lag and how do you monitor and alert on it? `senior`

Answer: Consumer lag is the difference between the latest produced offset and the last committed consumer offset for a partition:

lag = latest_offset - committed_offset

High lag means consumers are falling behind producers — events are queuing up.

Monitoring:

kafka-consumer-groups.sh --describe (CLI)
JMX metrics: kafka.consumer:type=consumer-fetch-manager-metrics,attribute=records-lag-max
Kafka Exporter + Prometheus + Grafana (common production setup)

Alerting thresholds (example):

Warning: lag > 1,000 for more than 2 minutes
Critical: lag > 10,000 or lag growing at > 500/min

Remediation:

Add more consumers (up to partition count)
Increase consumer processing throughput (tune max.poll.records, optimize DB calls)
Investigate DLT — stuck DLT processing can block main topic commits

Q14 — How does Kafka guarantee ordering and what are its limitations? `senior`

Answer: Kafka guarantees ordering within a partition. Messages with the same key always go to the same partition (via key hash % partition count). A single consumer per partition in a consumer group processes them in order.

Limitations:

Cross-partition ordering: No guarantee. If an order has events in partition 0 and partition 1, they may be processed out of sequence (shouldn't happen with key-based routing, but edge cases exist on partition count change).
Rebalance window: During consumer rebalance, the previous consumer may not have committed its last offset. The new consumer re-processes from the last commit — potential duplicates.
Repartitioning: Increasing partition count breaks key-to-partition mapping for existing keys until all messages are consumed. Plan partition count upfront.

Q15 — What are idempotent producers and when are they important? `senior`

Answer: An idempotent producer ensures that retrying a failed send() does not result in duplicate messages in Kafka. Each producer is assigned a Producer ID (PID), and each message gets a sequence number. Kafka brokers track PID+sequence and deduplicate retried messages.

Enable: enable.idempotence=true (default in Kafka 3+).

When important:

Network blips where the broker receives a message but the ACK is lost, causing the producer to retry.
Combined with acks=all and retries=Integer.MAX_VALUE for strongest durability.

Transactional producers: Extend idempotency across partitions and topics — required for exactly-once semantics.

Q16 — Explain `acks` configuration and its impact on durability vs. throughput. `senior`

Answer:

`acks`	Meaning	Durability	Throughput
`0`	Fire and forget — no ACK waited	Lowest	Highest
`1`	Leader ACK only	Medium	Medium
`all` (`-1`)	All in-sync replicas (ISR) ACK	Highest	Lowest

acks=all combined with min.insync.replicas=2 ensures that at least 2 replicas have the message before the producer gets success. If the leader crashes before replication, the message isn't lost.

This system: Order events use acks=all (configured via Spring spring.kafka.producer.acks=all) because losing an order event is unacceptable. Payment notifications can tolerate acks=1.

Q17 — What is a Kafka compacted topic and when would you use one? `senior`

Answer: A compacted topic retains only the latest value for each key, removing older versions during log compaction. It behaves like a changelog or snapshot store rather than an event log.

Use cases:

CDC (Change Data Capture): Latest state of a database row keyed by primary key.
Configuration store: Downstream services subscribe to get current configuration.
Materialized views: Reconstruct current state without replaying entire history.

How compaction works: A background thread (log cleaner) periodically scans log segments, keeps the latest record per key, and discards older ones. Tombstones (null value) signal deletion.

This system: Does not use compacted topics currently, but product.catalog.updated would be a candidate — consumers need the latest product state, not full history.

Q18 — How do you handle schema evolution in Avro without breaking consumers? `senior`

Answer: Schema Registry enforces compatibility rules. For backward compatibility (consumers can read new schema with the old one):

Safe: Adding optional fields with defaults.
Safe: Removing optional fields.
Unsafe: Removing required fields, renaming fields, changing types.

Evolution example:

// V1
{"name": "OrderCreatedEvent", "fields": [{"name": "orderId", "type": "string"}]}

// V2 — backward compatible (new optional field with default)
{"name": "OrderCreatedEvent", "fields": [
  {"name": "orderId", "type": "string"},
  {"name": "priority", "type": ["null", "string"], "default": null}
]}

Rolling upgrade strategy:

Register V2 schema.
Deploy consumers first (they handle V1 messages with default for missing field).
Deploy producers (publish V2).

This system: Deploy order-service consumers before order-service producers when adding new event fields.

Q19 — What is the difference between `poll()` timeout and `max.poll.interval.ms`? `senior`

Answer:

poll() timeout: The duration poll() blocks waiting for records. If records are available immediately, it returns sooner. Set it to a value that balances responsiveness and CPU.
max.poll.interval.ms: The maximum time between consecutive poll() calls. If a consumer doesn't call poll() within this window (e.g., processing takes too long), the broker considers it dead and triggers a rebalance.

Common mistake: Setting max.poll.records high and having slow message processing. The consumer processes 500 records but takes 6 minutes, exceeding max.poll.interval.ms=300000 (5 min default) → rebalance → duplicate processing.

Fix: Reduce max.poll.records, increase max.poll.interval.ms accordingly, or process records asynchronously (with care around offset commit ordering).

Q20 — How does partition rebalance work with the cooperative sticky assignor? `senior`

Answer: The default Eager Assignor (Range/RoundRobin) triggers a "stop-the-world" rebalance: all consumers revoke ALL their partitions, then re-assign. This causes processing gaps.

The Cooperative Sticky Assignor (available since Kafka 2.4) implements incremental rebalancing:

Consumers propose their current assignments.
The Group Coordinator identifies partitions that need to move.
Only those partitions are revoked and reassigned; other partitions continue processing.

Benefits: No processing pause for partitions that don't move. Reduces consumer lag spike on rebalance.

Configure: partition.assignment.strategy=org.apache.kafka.clients.consumer.CooperativeStickyAssignor

Q21 — How would you implement exactly-once processing end-to-end? `senior`

Answer: Full exactly-once requires coordination at three layers:

Idempotent producer (enable.idempotence=true): Deduplicates retried produce calls within a session.
Transactional producer: Groups multiple sends into an atomic unit.
Consumer isolation (isolation.level=read_committed): Consumers only see committed transactional messages, not in-flight or aborted ones.

producer.initTransactions();
producer.beginTransaction();
try {
    producer.send(record1);
    producer.send(record2);
    producer.sendOffsetsToTransaction(offsets, consumerGroupMetadata);
    producer.commitTransaction();
} catch (Exception e) {
    producer.abortTransaction();
}

Trade-offs: ~10-20% throughput reduction. Only appropriate for financial or inventory systems where duplicates are catastrophic. Most systems accept at-least-once + idempotent consumers.

Q22 — What is the role of `min.insync.replicas` and how does it interact with `acks=all`? `senior`

Answer: min.insync.replicas (broker/topic config) defines the minimum number of replicas that must acknowledge a write for it to succeed when acks=all.

Formula: replication.factor >= min.insync.replicas + 1 (leave room for one broker to be offline).

Example (replication.factor=3, min.insync.replicas=2):

Broker 1 (leader) + Broker 2 (ISR) = 2 replicas acknowledge → write succeeds.
Broker 1 (leader) only → write fails with NotEnoughReplicasException.

Meaning: With this config, you can lose 1 broker and still write. You cannot lose 2 brokers simultaneously.

This system: Using default single-node Kafka in Docker for dev/test. Production would use replication.factor=3, min.insync.replicas=2.

Q23 — How would you debug a consumer that is not receiving messages? `junior`

Answer: Systematic debugging steps:

Check consumer group status: kafka-consumer-groups.sh --describe --group payment-processors — verify members and assigned partitions.
Check lag: Is lag 0 (caught up) or growing?
Check topic existence: kafka-topics.sh --list — verify topic name (typo in config?).
Check auto.offset.reset: Is it latest and the consumer started before messages were produced?
Check deserializer: Deserialization errors cause the consumer to log and skip (or throw, depending on error handler).
Check Spring logs: Enable logging.level.org.springframework.kafka=DEBUG.
Produce a test message manually: kafka-console-producer.sh → see if consumer receives it.
Check ACL: Is the consumer authorized to read the topic (in secured clusters)?

Q24 — What is the difference between `commitSync` and `commitAsync`? `junior`

Answer:

commitSync(): Blocks until the broker acknowledges the offset commit. Retries on failure. Guarantees the commit succeeded before returning — safe but adds latency.
commitAsync(): Non-blocking. Fires the commit and returns immediately with a callback for success/failure. Higher throughput but riskier on shutdown.

Best practice: Use commitAsync() during normal processing for throughput, and commitSync() in the finally block on shutdown to ensure the last batch is committed.

Spring default: AckMode.BATCH — commits offsets after each poll batch returns. This is a form of async commit optimized for throughput.

Q25 — How do you test Kafka consumers and producers in unit/integration tests? `junior`

Answer: Unit tests (no broker needed):

Use MockProducer and MockConsumer from kafka-clients.
Test message handling logic directly by invoking the listener method with a mock payload.

Integration tests (with Testcontainers):

@Testcontainers
class OrderEventConsumerTest {
    @Container
    static KafkaContainer kafka = new KafkaContainer(DockerImageName.parse("confluentinc/cp-kafka:7.6.1"));

    @Test
    void shouldProcessOrderCreated() {
        // produce test message → await consumer invocation → assert side effects
    }
}

This system: Uses @EmbeddedKafka (Spring) for per-service tests and DockerComposeContainer for full E2E tests. KafkaTestUtils provides getSingleRecord() helper.

Q26 — What is a Kafka Streams application and how does it differ from a consumer group? `senior`

Answer: Kafka Streams is a library for stateful, exactly-once stream processing within a JVM application:

Joins, aggregations, windowing across multiple topics.
Maintains local state stores (RocksDB) backed by changelog topics.
Fault-tolerant: state can be rebuilt from changelog on restart.
No separate cluster: runs inside your application.

vs. Consumer group:

	Consumer Group	Kafka Streams
Processing	Stateless per-message	Stateful (joins, windows)
State	External (DB)	Local state store + changelog
Exactly-once	Complex to achieve	Built-in with `processing.guarantee=exactly_once_v2`
Complexity	Low	Higher

This system: Does not use Kafka Streams. Payment and notification services use simple consumer groups because they don't require stateful aggregation. Streams would be appropriate for "calculate total order value per customer per hour" analytics.

Q27 — How does Kafka handle backpressure? `senior`

Answer: Kafka does not have built-in backpressure signaling from consumer to producer. Instead, backpressure is managed through:

Consumer lag accumulation: Messages queue up in the topic. Producers write at their rate; consumers read at their rate. The gap (lag) absorbs the burst.
Producer blocking: If the producer's internal buffer (buffer.memory) fills up, send() blocks for max.block.ms before throwing an exception.
Application-level throttling: Rate limit producers based on consumer lag metrics (adaptive rate limiting).
Spring reactive WebFlux: In this system's API Gateway, reactive backpressure via Project Reactor Flux.limitRate() can throttle upstream requests.

This system: The order.created topic acts as a buffer. If payment-service is slow (DB contention during sales peak), lag grows in the topic. No messages are lost; they process when capacity is available.

Q28 — Explain Kafka's log segment structure and how retention works. `senior`

Answer: Each partition is stored as a series of log segments on disk:

order.created-0/
  00000000000000000000.log     ← message data
  00000000000000000000.index   ← offset → byte position index
  00000000000000000000.timeindex ← timestamp → offset index
  00000000000001000000.log     ← next segment (after roll)

A new segment is created when the active segment reaches log.segment.bytes (default 1 GB) or log.roll.ms (default 7 days).

Retention modes:

Time-based (log.retention.hours=168): Delete segments older than N hours.
Size-based (log.retention.bytes): Delete oldest segments when total partition size exceeds limit.
Compaction (per-topic): Retain only latest value per key (described in Q17).

This system: Default 7-day retention. Sufficient for retry/DLT scenarios. A payment DLT message is never older than days before ops investigates.

Q29 — How would you increase Kafka throughput for a high-volume producer? `senior`

Answer: Key producer throughput tunings:

Config	Default	High-throughput Value
`batch.size`	16 KB	65536 (64 KB) or higher
`linger.ms`	0	10-50 ms
`compression.type`	none	`lz4` or `snappy`
`buffer.memory`	32 MB	128 MB+
`acks`	`all`	`1` (if durability trade-off acceptable)

linger.ms > 0 causes the producer to wait briefly before sending, allowing more messages to batch together. Combined with larger batch.size, this dramatically increases throughput at a small latency cost.

Consumer throughput:

Increase fetch.min.bytes and fetch.max.wait.ms to reduce fetch round-trips.
Increase max.poll.records.
Scale consumers up to partition count.

Q30 — How would you design a multi-tenant Kafka setup where tenant data must be isolated? `senior`

Answer: Three main strategies:

1. Topic-per-tenant:

Topic naming: tenantA.order.created, tenantB.order.created
Full isolation, simple ACLs.
Scaling problem: 1,000 tenants × 10 topics = 10,000 topics → ZooKeeper/KRaft overhead.

2. Partition-per-tenant:

Single topic with partition key = tenantId.
Consumers filter by tenant header.
Limited to partition count tenants; no ACL isolation.

3. Kafka cluster-per-tenant (dedicated cluster):

Full isolation: storage, throughput, security.
Highest cost and operational burden.
Required for strict data residency compliance (GDPR, HIPAA).

Recommended (SaaS):

Small/medium tenants: shared topics with tenant key routing + ACL filtering.
Large/enterprise tenants: dedicated topics or clusters.
Use Kafka's built-in ACLs (kafka-acls.sh) to restrict consumer group access per topic.

This system: Single-tenant. No isolation needed. Would use topic-per-tenant if extended to multi-tenant SaaS.

Q31 — What is Kafka Connect and when would you use it? `junior`

Answer: Kafka Connect is a framework for streaming data between Kafka and external systems (databases, object stores, search engines, etc.) using reusable connectors — without writing custom producer/consumer code.

Two connector types:

Source connector: Reads from an external system → publishes to Kafka topic (e.g., Debezium reads PostgreSQL WAL → order.events topic).
Sink connector: Reads from Kafka topic → writes to external system (e.g., Elasticsearch Sink Connector → search index).

// Example: Debezium PostgreSQL source connector config
{
  "name": "orders-cdc-connector",
  "config": {
    "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
    "database.hostname": "postgres",
    "database.port": "5432",
    "database.dbname": "orders_db",
    "database.server.name": "orders",
    "table.include.list": "public.orders",
    "plugin.name": "pgoutput"
  }
}

When to use: CDC pipelines, ETL into data warehouses, syncing Kafka topics to Elasticsearch for search. Avoids writing boilerplate producer/consumer code for standard integrations.

This system: Not used directly, but Debezium Connect would be the natural choice to replace the manual OutboxPublisher polling with CDC-based outbox relay.

Q32 — How do you choose the right number of partitions for a topic? `senior`

Answer: Partitions determine parallelism and throughput. Over-partitioning wastes resources; under-partitioning limits throughput.

Formula starting point:

partitions = max(target_throughput / producer_throughput_per_partition,
                 target_throughput / consumer_throughput_per_partition)

Practical guidelines:

Start with partitions = max_parallelism_needed (e.g., number of consumer instances you plan to run).
A single partition typically supports 10–50 MB/s write throughput depending on hardware.
For low-volume topics: 1–3 partitions is fine.
For high-volume topics: start at 6–12, benchmark, then scale.

Factors to consider:

Factor	Impact
Target consumer parallelism	1 partition per consumer max
Message ordering requirements	More partitions → harder to maintain global order
Replication overhead	`partitions × replication.factor` log files on each broker
Retention size	Each partition is stored separately

Warning: You can increase partitions later, but it changes the key-to-partition mapping. This breaks ordering guarantees for key-based producers. Plan conservatively upfront for order-sensitive topics.

This system: order.created uses 3 partitions — matching the 3 payment-service pods. If we scale to 6 pods, we'd increase to 6 partitions during a maintenance window with consumer group re-assignment.

Q33 — What is the difference between key-based partitioning and round-robin? `junior`

Answer:

Key-based (default): The producer hashes the message key using murmur2(key) % numPartitions. All messages with the same key go to the same partition, guaranteeing ordering per key.
Round-robin (null key or RoundRobinPartitioner): Messages are distributed evenly across partitions in rotation. No ordering guarantee across messages.

// Key-based — orderId ensures all events for the same order go to the same partition
ProducerRecord<String, OrderCreatedEvent> record =
    new ProducerRecord<>("order.created", order.getId().toString(), event);

// Round-robin — no key
ProducerRecord<String, LogEvent> logRecord =
    new ProducerRecord<>("app.logs", null, logEvent);

When to use round-robin: Logs, metrics, or any data where ordering doesn't matter and you want maximum throughput distribution.

When to use key-based: Domain events where processing order matters (e.g., all events for orderId=123 must process in sequence: created → paid → shipped).

This system: order.created, payment.processed, and order.status.updated all use orderId as the key.

Q34 — How does Kafka handle message compression and what are the tradeoffs? `junior`

Answer: Kafka supports producer-side compression. The producer compresses a batch of messages; the broker stores the compressed batch; consumers decompress.

Supported codecs:

Codec	Compression Ratio	CPU Cost	Best For
`none`	1×	None	Low-volume or pre-compressed data
`gzip`	4–8×	High	High ratio needed, CPU available
`snappy`	2–4×	Low	Balanced (Google's default)
`lz4`	2–4×	Very low	Lowest latency
`zstd`	4–8×	Medium	Best ratio/speed tradeoff (Kafka 2.1+)

# Spring Boot producer config
spring:
  kafka:
    producer:
      compression-type: lz4
      batch-size: 65536      # 64 KB batch
      linger-ms: 10          # wait up to 10ms to fill batch

Key insight: Compression works best with larger batches (linger.ms > 0). Compressing single tiny messages has overhead without benefit.

This system: Uses lz4 for Avro-serialized order events. Avro is already compact binary, so compression gain is modest (~20%), but latency savings from smaller network payloads justify it.

Q35 — What are Kafka message headers and how are they used? `junior`

Answer: Kafka headers are key-value metadata attached to a message, separate from the message key and value. They don't affect partitioning and are stored as byte arrays.

Common uses:

Tracing: propagate X-Correlation-ID, X-Trace-ID across services.
Retry metadata: @RetryableTopic uses headers like kafka_original-topic, kafka_exception-message, kafka_backoff-timestamp.
Schema version: custom schema version indicator.
Source system: identify which service produced the message.

// Producer: add correlation ID header
@Autowired KafkaTemplate<String, OrderCreatedEvent> kafkaTemplate;

public void publish(OrderCreatedEvent event, String correlationId) {
    var record = new ProducerRecord<>("order.created", event.getOrderId(), event);
    record.headers().add("X-Correlation-ID", correlationId.getBytes(StandardCharsets.UTF_8));
    kafkaTemplate.send(record);
}

// Consumer: read header
@KafkaListener(topics = "order.created", groupId = "payment-processors")
public void handle(
    OrderCreatedEvent event,
    @Header(value = "X-Correlation-ID", required = false) byte[] correlationIdBytes
) {
    String correlationId = correlationIdBytes != null
        ? new String(correlationIdBytes, StandardCharsets.UTF_8)
        : UUID.randomUUID().toString();
    MDC.put("correlationId", correlationId);
    // process...
}

This system: Propagates X-Correlation-ID from the HTTP request through Kafka events to enable distributed tracing across order → payment → notification flow.

Q36 — How do you produce messages with Spring Boot's `KafkaTemplate`? `junior`

Answer: KafkaTemplate is Spring Kafka's high-level producer abstraction that wraps the native Kafka Producer. It handles serialization, threading, and async callbacks.

@Service
public class OrderEventPublisher {

    private final KafkaTemplate<String, OrderCreatedEvent> kafkaTemplate;

    public OrderEventPublisher(KafkaTemplate<String, OrderCreatedEvent> kafkaTemplate) {
        this.kafkaTemplate = kafkaTemplate;
    }

    public void publish(OrderCreatedEvent event) {
        CompletableFuture<SendResult<String, OrderCreatedEvent>> future =
            kafkaTemplate.send("order.created", event.getOrderId(), event);

        future.whenComplete((result, ex) -> {
            if (ex != null) {
                log.error("Failed to publish order event: orderId={}", event.getOrderId(), ex);
            } else {
                RecordMetadata meta = result.getRecordMetadata();
                log.info("Published to partition={} offset={}",
                    meta.partition(), meta.offset());
            }
        });
    }
}

# application.yml
spring:
  kafka:
    producer:
      bootstrap-servers: localhost:9092
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: io.confluent.kafka.serializers.KafkaAvroSerializer
      acks: all
      retries: 3
      enable-idempotence: true
    properties:
      schema.registry.url: http://localhost:8081

Key methods:

kafkaTemplate.send(topic, key, value) — async send, returns CompletableFuture.
kafkaTemplate.sendDefault(key, value) — sends to the configured default topic.
kafkaTemplate.executeInTransaction(ops -> ...) — transactional batch send.

Q37 — What is the purpose of `fetch.min.bytes` and `fetch.max.wait.ms`? `senior`

Answer: These two configs control how the broker responds to consumer fetch requests, trading latency vs. throughput:

fetch.min.bytes (default: 1 byte): The broker waits until it has at least this many bytes of data before responding to a fetch request. Increasing it allows the broker to batch more data per response, reducing fetch round-trips.
fetch.max.wait.ms (default: 500 ms): The maximum time the broker will wait if fetch.min.bytes hasn't been satisfied yet. Acts as a ceiling so consumers don't wait forever on low-volume topics.

Consumer sends FETCH request
  → Broker checks: do I have ≥ fetch.min.bytes?
      YES → respond immediately
      NO  → wait up to fetch.max.wait.ms, then respond with what's available

Tuning for throughput (high-volume topics):

spring:
  kafka:
    consumer:
      fetch-min-size: 102400     # 100 KB — wait for 100 KB before responding
      fetch-max-wait: 500        # but no more than 500ms
      max-poll-records: 500

Tuning for low latency (real-time alerting):

spring:
  kafka:
    consumer:
      fetch-min-size: 1          # respond immediately with any data
      fetch-max-wait: 50         # max 50ms wait

This system: Notification service uses low fetch-min-size=1 for near-real-time delivery of order.status.updated events.

Q38 — What is the `__consumer_offsets` internal topic? `senior`

Answer: __consumer_offsets is a Kafka-internal compacted topic that stores committed consumer group offsets. Before Kafka 0.9, offsets were stored in ZooKeeper; since 0.9, they're stored in this topic for better scalability and availability.

Structure:

Key: (group_id, topic, partition) — identifies a specific consumer group's position in a partition.
Value: offset + metadata + timestamp — the committed offset and optional metadata string.

50 partitions by default (controlled by offsets.topic.num.partitions). Each consumer group is assigned to a partition of __consumer_offsets based on hash(group_id) % 50.

How to inspect (debugging):

# List all consumer groups
kafka-consumer-groups.sh --bootstrap-server localhost:9092 --list

# Describe a group (shows committed offsets, lag, and assigned members)
kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group payment-processors

# Output:
# GROUP               TOPIC          PARTITION  CURRENT-OFFSET  LOG-END-OFFSET  LAG
# payment-processors  order.created  0          1523            1523            0
# payment-processors  order.created  1          1201            1205            4  ← lag!

This system: Used for monitoring. A lag > 0 alert on payment-processors group triggers investigation into payment-service pod health.

Q39 — How do you implement consumer pause and resume for rate limiting? `senior`

Answer: Kafka consumers support pause(partitions) and resume(partitions) to temporarily stop fetching from specific partitions without leaving the consumer group or triggering a rebalance.

Use cases:

Rate limit downstream services (e.g., external payment gateway has a 100 req/s limit).
Circuit breaker: pause consumption when downstream is degraded.
Backpressure from a slow database write path.

@Service
public class RateLimitedPaymentConsumer {

    private final RateLimiter rateLimiter = RateLimiter.create(100); // 100 events/sec

    @KafkaListener(topics = "order.created", groupId = "payment-processors",
                   containerFactory = "kafkaListenerContainerFactory")
    public void handle(OrderCreatedEvent event,
                       Acknowledgment ack,
                       Consumer<?, ?> consumer) {
        // Pause if rate limit exceeded
        if (!rateLimiter.tryAcquire()) {
            Collection<TopicPartition> assigned = consumer.assignment();
            consumer.pause(assigned);

            // Resume after a short delay (Spring will call poll() again)
            // In practice, use a scheduled task or AOP-based circuit breaker
        }

        processPayment(event);
        ack.acknowledge();

        // Resume paused partitions after processing
        if (!consumer.paused().isEmpty()) {
            consumer.resume(consumer.paused());
        }
    }
}

Spring approach: Use KafkaListenerEndpointRegistry to pause/resume the entire listener container:

@Autowired KafkaListenerEndpointRegistry registry;

// Pause
registry.getListenerContainer("paymentListener").pause();

// Resume
registry.getListenerContainer("paymentListener").resume();

Q40 — What is static group membership (`group.instance.id`) and why does it matter? `senior`

Answer: By default, Kafka consumers have dynamic membership: each poll() session gets a new member ID, and any disconnect (pod restart, GC pause) triggers a rebalance.

Static membership (group.instance.id) assigns a stable identity to a consumer. When a static member disconnects and reconnects within session.timeout.ms, Kafka recognizes it by instance ID and skips the rebalance, reassigning the same partitions.

# application.yml — give each pod a stable instance ID
spring:
  kafka:
    consumer:
      group-instance-id: payment-processor-pod-${HOSTNAME}
      session-timeout-ms: 60000   # give pod 60s to restart before rebalance

Benefits:

Eliminate unnecessary rebalances during rolling deployments (Kubernetes pod restarts).
Reduce consumer lag spikes — no stop-the-world rebalance during deploys.
State preservation — Kafka Streams benefits most: local state stores stay with the same instance ID.

Tradeoff: If a pod dies permanently (not just restarting), its partitions won't be reassigned until session.timeout.ms expires. Set it shorter than your deployment window.

This system: payment-service pods in Kubernetes use HOSTNAME-based group.instance.id. A rolling deploy of 3 pods takes ~90 seconds but causes zero rebalances.

Q41 — How do you handle large messages in Kafka? `senior`

Answer: Kafka's default max.message.bytes is 1 MB. For larger payloads, you have three strategies:

1. Increase message size limits (simplest, not recommended past ~10 MB):

# Broker topic config
log.message.max.bytes=10485760  # 10 MB

# Producer
spring.kafka.producer.properties.max.request.size=10485760

# Consumer
spring.kafka.consumer.properties.fetch.message.max.bytes=10485760

2. External reference pattern (recommended for large blobs): Store the large payload in object storage (S3, MinIO) and send only a reference in the Kafka message:

// Producer
String s3Key = "orders/large-payload/" + orderId + ".json";
s3Client.putObject(bucket, s3Key, largePayload);

LargeOrderEvent event = LargeOrderEvent.newBuilder()
    .setOrderId(orderId)
    .setPayloadRef(s3Key)   // reference, not data
    .build();
kafkaTemplate.send("order.large", orderId, event);

// Consumer
LargeOrderEvent event = ...;
String payload = s3Client.getObject(bucket, event.getPayloadRef());

3. Kafka chunking (complex, avoid if possible): Split the message into multiple Kafka records with sequence headers, reassemble on the consumer side.

This system: Order events are small Avro records (~200 bytes). If image uploads were included, the external reference pattern would be used with MinIO.

Q42 — What is the difference between `subscribe()` and `assign()` in a Kafka consumer? `junior`

Answer:

	`subscribe()`	`assign()`
Usage	`consumer.subscribe(List.of("order.created"))`	`consumer.assign(List.of(new TopicPartition("order.created", 0)))`
Partition assignment	Kafka Group Coordinator manages it	Manual — you specify exact partitions
Rebalance	Triggered automatically on membership changes	No rebalancing — you own the assignment
Consumer group	Participates in a group	No group coordination (no offset commit to broker group)
Offset management	Group offsets via `commitSync`/`commitAsync`	Manual or `commitSync` with assigned partitions

When to use assign():

Exactly-once replay of specific partitions for testing or backfill.
Custom partition assignment strategies outside Kafka's group protocol.
Reading a partition at a specific offset for debugging.

// Direct partition assignment for reprocessing partition 0 from offset 500
TopicPartition tp = new TopicPartition("order.created", 0);
consumer.assign(List.of(tp));
consumer.seek(tp, 500L);   // start from specific offset
while (true) {
    ConsumerRecords<String, OrderCreatedEvent> records = consumer.poll(Duration.ofMillis(100));
    // process...
}

Spring context: @KafkaListener always uses subscribe() internally. For assign() semantics, use KafkaConsumer directly.

Q43 — How does Spring Kafka's `ConcurrentMessageListenerContainer` achieve parallelism? `senior`

Answer: ConcurrentMessageListenerContainer spins up N internal KafkaMessageListenerContainer instances, each running its own poll loop in a dedicated thread. Each thread is an independent consumer in the same consumer group.

@Bean
public ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>
    kafkaListenerContainerFactory(ConsumerFactory<String, OrderCreatedEvent> consumerFactory) {

    ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent> factory =
        new ConcurrentKafkaListenerContainerFactory<>();
    factory.setConsumerFactory(consumerFactory);
    factory.setConcurrency(3);  // 3 threads = 3 consumers in the group
    factory.getContainerProperties().setAckMode(ContainerProperties.AckMode.MANUAL);
    return factory;
}

Or via annotation:

@KafkaListener(
    topics = "order.created",
    groupId = "payment-processors",
    concurrency = "3"   // overrides factory default for this listener
)
public void handle(OrderCreatedEvent event, Acknowledgment ack) { ... }

Thread safety: Each thread has its own KafkaConsumer instance (not thread-safe). Do NOT share a single KafkaConsumer across threads.

Partition assignment: With concurrency=3 and 3 partitions, each thread gets 1 partition. With concurrency=6 and 3 partitions, 3 threads are idle.

This system: payment-service uses concurrency=3 matching the 3 partitions of order.created. All 3 pods × 3 threads = 9 consumers, but only 3 partitions means 6 consumers are idle. Correct configuration: 1 thread per pod = 3 pods = 3 consumers = 1 per partition.

Q44 — What is Kafka MirrorMaker 2 and how does it enable cross-datacenter replication? `senior`

Answer: MirrorMaker 2 (MM2) is a Kafka Connect-based tool for replicating topics between Kafka clusters. It's the production replacement for the original MirrorMaker.

Architecture:

Primary DC (us-east)          Secondary DC (eu-west)
┌─────────────────┐           ┌─────────────────────────┐
│ Kafka Cluster A │──MM2────▶│ Kafka Cluster B          │
│ order.created   │           │ us-east.order.created    │
│ payment.processed│          │ us-east.payment.processed│
└─────────────────┘           └─────────────────────────┘

MM2 prefixes replicated topics with the source cluster alias (us-east.) to avoid naming conflicts. It replicates:

Topic data (messages + partitioning)
Consumer group offsets (translated to the target cluster's offset space)
Topic configurations

Use cases:

Disaster recovery: Failover to secondary cluster on primary outage.
Geo-replication: Serve EU consumers from EU cluster to reduce latency.
Data aggregation: Aggregate events from multiple regional clusters into a global cluster for analytics.

Consumer offset sync: MM2 translates offsets using a checkpointing mechanism, enabling consumer groups to resume on the secondary cluster with minimal lag after failover.

This system: Single-region Docker setup. MM2 would be added if deployed to multi-region production (primary: us-east-1, DR: us-west-2).

Q45 — How do you implement message filtering in a Kafka consumer without creating separate topics? `junior`

Answer: Two approaches for filtering without extra topics:

1. RecordFilterStrategy (Spring Kafka): Configures a filter at the container level. Filtered records are acknowledged but not passed to the listener method.

@Bean
public ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>
    filteredListenerContainerFactory(ConsumerFactory<String, OrderCreatedEvent> cf) {

    var factory = new ConcurrentKafkaListenerContainerFactory<String, OrderCreatedEvent>();
    factory.setConsumerFactory(cf);
    // Filter: only process HIGH_VALUE orders (> $500)
    factory.setRecordFilterStrategy(record ->
        record.value().getTotalAmount().compareTo(new BigDecimal("500")) < 0
    );
    factory.setAckDiscarded(true); // auto-ack filtered messages
    return factory;
}

2. Header-based routing in the listener method:

@KafkaListener(topics = "order.created", groupId = "fraud-detectors")
public void handle(
    OrderCreatedEvent event,
    @Header("order-region") String region
) {
    if (!"EU".equals(region)) {
        return; // skip non-EU orders silently
    }
    runFraudCheck(event);
}

Tradeoff: Consumer still fetches all messages from Kafka — filtering is application-level. For high-volume topics where only a small fraction is relevant, consider a dedicated filtered topic (produced by a streaming processor) to reduce consumer load.

Q46 — What are the tradeoffs between Avro, Protobuf, and JSON Schema with Schema Registry? `senior`

Answer:

	Avro	Protobuf	JSON Schema
Wire format	Binary	Binary	JSON text
Schema language	JSON	`.proto` IDL	JSON Schema
Size	Smallest	Smallest	Largest (5–10×)
Speed	Fast	Fastest	Slowest
Language support	Wide	Widest	Universal
Schema evolution	Backward/forward with defaults	Field number-based, robust	Limited enforcement
Tooling	Strong with Confluent	Strong everywhere	Basic
Human readable	No	No	Yes

When to choose:

Avro: Default choice for Kafka with Confluent Schema Registry. Best schema enforcement + compact binary. Ideal for internal services.
Protobuf: When you already use gRPC/Protobuf for service communication or need maximum performance. Field number-based evolution is more robust than Avro's name-based.
JSON Schema: When consumers are external systems, browser clients, or teams that need human-readable payloads without binary tooling.

Schema Registry supports all three with the same compatibility enforcement and schema ID embedding.

This system: Uses Avro throughout internal services. If an external partner API needed to consume events, a separate JSON Schema topic or a transformation layer would be added.

Q47 — How does `sendOffsetsToTransaction` enable consume-transform-produce exactly-once? `senior`

Answer: In a read-process-write pipeline (consume from topic A, transform, produce to topic B), exactly-once requires that the consumer offset commit and the producer send are atomic — either both commit or neither does.

sendOffsetsToTransaction() atomically includes the consumer offset commit inside the producer transaction. If the transaction aborts, the offset commit is rolled back too.

@Bean
public KafkaTransactionManager<String, TransformedEvent> txManager(
    ProducerFactory<String, TransformedEvent> pf) {
    return new KafkaTransactionManager<>(pf);
}

// Transactional consume-transform-produce
@Transactional("kafkaTransactionManager")
@KafkaListener(topics = "order.created", groupId = "transformer-group",
               containerFactory = "exactlyOnceFactory")
public void transformAndForward(
    ConsumerRecord<String, OrderCreatedEvent> record,
    @Autowired KafkaTemplate<String, TransformedEvent> producerTemplate
) {
    TransformedEvent transformed = transform(record.value());

    // Within the same transaction: send + offset commit
    producerTemplate.send("order.transformed", record.key(), transformed);

    // sendOffsetsToTransaction is called automatically by Spring
    // when using ChainedKafkaTransactionManager or TransactionalKafkaMessageListenerContainer
}

Key configs:

spring:
  kafka:
    producer:
      transaction-id-prefix: "transformer-txn-"
    consumer:
      isolation-level: read_committed  # don't read uncommitted transactional messages
    listener:
      ack-mode: MANUAL  # Spring manages offset commit inside transaction

This system: Not implemented (payment-service is not a consume-transform-produce pipeline). Would be applicable if building a stream processor that enriches order.created before forwarding to order.enriched.

Q48 — What is consumer group heartbeat and how do `session.timeout.ms` and `heartbeat.interval.ms` interact? `junior`

Answer: A Kafka consumer sends periodic heartbeats to the Group Coordinator broker to signal it is alive. If the coordinator doesn't receive a heartbeat within session.timeout.ms, it removes the consumer from the group and triggers a rebalance.

heartbeat.interval.ms (default: 3000 ms): How often the consumer sends a heartbeat. Must be < session.timeout.ms / 3.
session.timeout.ms (default: 45000 ms): How long the coordinator waits for a heartbeat before declaring the consumer dead.

Consumer ──heartbeat──▶ Coordinator  (every heartbeat.interval.ms)
Coordinator tracks last-seen timestamp per consumer
If now - last_seen > session.timeout.ms → consumer presumed dead → rebalance

Heartbeats vs. max.poll.interval.ms: Heartbeats are sent by a background thread — a consumer can be heartbeating but still trigger a rebalance if it doesn't call poll() within max.poll.interval.ms. These are two independent liveness checks:

session.timeout.ms / heartbeat.interval.ms: Network/process liveness.
max.poll.interval.ms: Processing liveness (consumer isn't stuck in business logic).

Recommended configuration:

spring:
  kafka:
    consumer:
      heartbeat-interval: 3000      # 3s heartbeat
      session-timeout-ms: 15000     # 15s session timeout (detect fast failures)
      properties:
        max.poll.interval.ms: 300000  # 5 min for slow processing

Q49 — How would you perform a zero-downtime Kafka topic partition increase? `senior`

Answer: Increasing partitions is a one-way operation — you cannot reduce them without recreating the topic. It also breaks key-to-partition mapping for existing keys, which can violate ordering guarantees.

Procedure for zero-downtime:

Step 1: Increase partitions during a low-traffic window:

kafka-topics.sh --bootstrap-server localhost:9092 \
  --alter --topic order.created \
  --partitions 6

Step 2: Wait for all existing messages in old partitions (0–2) to be consumed. Check lag:

kafka-consumer-groups.sh --bootstrap-server localhost:9092 \
  --describe --group payment-processors

Step 3: Scale consumers to match new partition count (e.g., from 3 to 6 pods).

Step 4: Verify new partitions (3–5) are being assigned and consumed.

Ordering caveat: After the change, murmur2("orderId-123") % 6 maps to a different partition than murmur2("orderId-123") % 3. In-flight messages for a key may be split across the old and new partition. Strategy:

Pre-drain: Ensure all in-flight messages complete before routing new ones to new partitions.
Sticky partitioner: Continue routing existing key traffic to old partitions temporarily using a custom partitioner.
Accept brief reordering: For non-critical ordering requirements.

This system: order.created partition increase from 3 → 6 would require a maintenance window to drain existing messages and update the payment-service deployment count.

Q50 — How does Kafka's `isolation.level` protect consumers in transactional topics? `senior`

Answer: isolation.level controls whether consumers read messages from transactional producers before those transactions are committed.

read_uncommitted (default):

Consumers see all messages immediately as they are appended, including those from in-progress or subsequently aborted transactions.
Risk: consumer processes a message from an aborted transaction (a "dirty read").

read_committed:

Consumers only see messages from committed transactions AND non-transactional messages.
Aborted transaction messages are never visible.
Consumer may see a read gap: the consumer pauses at the Last Stable Offset (LSO) — the offset up to which all transactions are decided — until in-flight transactions commit or abort.

# Consumer config for exactly-once processing
spring:
  kafka:
    consumer:
      properties:
        isolation.level: read_committed

Last Stable Offset (LSO) vs. High Watermark (HW):

Partition log:
  offset 0: msg (non-transactional) ✓ visible to all
  offset 1: msg (txn A - committed) ✓ visible to read_committed
  offset 2: msg (txn B - in-flight) ✗ read_committed pauses here (LSO=2)
  offset 3: msg (txn A - committed) ✓ becomes visible after txn B resolves

This system: Payment and notification consumers use read_uncommitted (default) since order events are not published transactionally. If the Outbox Publisher were upgraded to use Kafka transactions, consumers would switch to read_committed to avoid processing rolled-back order records.

Apache Kafka — Interview Questions

Q1 — What is Apache Kafka and how does it differ from a traditional message queue? junior

Q2 — What are topics, partitions, and offsets? junior

Q3 — What is a consumer group and how does partition assignment work? junior

Q4 — What is auto.offset.reset and when would you use earliest vs latest? junior

Q5 — Explain the @KafkaListener annotation and how it works in Spring. junior

Q6 — What is Avro and why use it instead of JSON for Kafka messages? junior

Q7 — What is the Confluent Schema Registry and how does it prevent breaking changes? junior

Q8 — What is @RetryableTopic and how does Spring implement retry logic with it? junior

Q9 — What is a Dead Letter Topic (DLT) and how should it be handled in production? senior

Q10 — What is the Transactional Outbox Pattern and why is it needed? senior

Q11 — Explain Kafka's delivery semantics: at-most-once, at-least-once, exactly-once. senior

Q12 — How does Kafka KRaft mode differ from ZooKeeper-based Kafka? senior

Q13 — What is consumer lag and how do you monitor and alert on it? senior

Q14 — How does Kafka guarantee ordering and what are its limitations? senior

Q15 — What are idempotent producers and when are they important? senior

Q16 — Explain acks configuration and its impact on durability vs. throughput. senior

Q17 — What is a Kafka compacted topic and when would you use one? senior

Q18 — How do you handle schema evolution in Avro without breaking consumers? senior

Q19 — What is the difference between poll() timeout and max.poll.interval.ms? senior

Q20 — How does partition rebalance work with the cooperative sticky assignor? senior

Q21 — How would you implement exactly-once processing end-to-end? senior

Q22 — What is the role of min.insync.replicas and how does it interact with acks=all? senior

Q23 — How would you debug a consumer that is not receiving messages? junior

Q24 — What is the difference between commitSync and commitAsync? junior

Q25 — How do you test Kafka consumers and producers in unit/integration tests? junior

Q26 — What is a Kafka Streams application and how does it differ from a consumer group? senior

Q27 — How does Kafka handle backpressure? senior

Q28 — Explain Kafka's log segment structure and how retention works. senior

Q29 — How would you increase Kafka throughput for a high-volume producer? senior

Q30 — How would you design a multi-tenant Kafka setup where tenant data must be isolated? senior

Q31 — What is Kafka Connect and when would you use it? junior

Q32 — How do you choose the right number of partitions for a topic? senior

Q33 — What is the difference between key-based partitioning and round-robin? junior

Q34 — How does Kafka handle message compression and what are the tradeoffs? junior

Q35 — What are Kafka message headers and how are they used? junior

Q36 — How do you produce messages with Spring Boot's KafkaTemplate? junior

Q37 — What is the purpose of fetch.min.bytes and fetch.max.wait.ms? senior

Q38 — What is the __consumer_offsets internal topic? senior

Q39 — How do you implement consumer pause and resume for rate limiting? senior

Q40 — What is static group membership (group.instance.id) and why does it matter? senior

Q41 — How do you handle large messages in Kafka? senior

Q42 — What is the difference between subscribe() and assign() in a Kafka consumer? junior

Q43 — How does Spring Kafka's ConcurrentMessageListenerContainer achieve parallelism? senior

Q44 — What is Kafka MirrorMaker 2 and how does it enable cross-datacenter replication? senior

Q45 — How do you implement message filtering in a Kafka consumer without creating separate topics? junior

Q46 — What are the tradeoffs between Avro, Protobuf, and JSON Schema with Schema Registry? senior

Q47 — How does sendOffsetsToTransaction enable consume-transform-produce exactly-once? senior

Q48 — What is consumer group heartbeat and how do session.timeout.ms and heartbeat.interval.ms interact? junior

Q49 — How would you perform a zero-downtime Kafka topic partition increase? senior

Q50 — How does Kafka's isolation.level protect consumers in transactional topics? senior

Q1 — What is Apache Kafka and how does it differ from a traditional message queue? `junior`

Q2 — What are topics, partitions, and offsets? `junior`

Q3 — What is a consumer group and how does partition assignment work? `junior`

Q4 — What is `auto.offset.reset` and when would you use `earliest` vs `latest`? `junior`

Q5 — Explain the `@KafkaListener` annotation and how it works in Spring. `junior`

Q6 — What is Avro and why use it instead of JSON for Kafka messages? `junior`

Q7 — What is the Confluent Schema Registry and how does it prevent breaking changes? `junior`

Q8 — What is `@RetryableTopic` and how does Spring implement retry logic with it? `junior`

Q9 — What is a Dead Letter Topic (DLT) and how should it be handled in production? `senior`

Q10 — What is the Transactional Outbox Pattern and why is it needed? `senior`

Q11 — Explain Kafka's delivery semantics: at-most-once, at-least-once, exactly-once. `senior`

Q12 — How does Kafka KRaft mode differ from ZooKeeper-based Kafka? `senior`

Q13 — What is consumer lag and how do you monitor and alert on it? `senior`

Q14 — How does Kafka guarantee ordering and what are its limitations? `senior`

Q15 — What are idempotent producers and when are they important? `senior`

Q16 — Explain `acks` configuration and its impact on durability vs. throughput. `senior`

Q17 — What is a Kafka compacted topic and when would you use one? `senior`

Q18 — How do you handle schema evolution in Avro without breaking consumers? `senior`

Q19 — What is the difference between `poll()` timeout and `max.poll.interval.ms`? `senior`

Q20 — How does partition rebalance work with the cooperative sticky assignor? `senior`

Q21 — How would you implement exactly-once processing end-to-end? `senior`

Q22 — What is the role of `min.insync.replicas` and how does it interact with `acks=all`? `senior`

Q23 — How would you debug a consumer that is not receiving messages? `junior`

Q24 — What is the difference between `commitSync` and `commitAsync`? `junior`

Q25 — How do you test Kafka consumers and producers in unit/integration tests? `junior`

Q26 — What is a Kafka Streams application and how does it differ from a consumer group? `senior`

Q27 — How does Kafka handle backpressure? `senior`

Q28 — Explain Kafka's log segment structure and how retention works. `senior`

Q29 — How would you increase Kafka throughput for a high-volume producer? `senior`

Q30 — How would you design a multi-tenant Kafka setup where tenant data must be isolated? `senior`

Q31 — What is Kafka Connect and when would you use it? `junior`

Q32 — How do you choose the right number of partitions for a topic? `senior`

Q33 — What is the difference between key-based partitioning and round-robin? `junior`

Q34 — How does Kafka handle message compression and what are the tradeoffs? `junior`

Q35 — What are Kafka message headers and how are they used? `junior`

Q36 — How do you produce messages with Spring Boot's `KafkaTemplate`? `junior`

Q37 — What is the purpose of `fetch.min.bytes` and `fetch.max.wait.ms`? `senior`

Q38 — What is the `__consumer_offsets` internal topic? `senior`

Q39 — How do you implement consumer pause and resume for rate limiting? `senior`

Q40 — What is static group membership (`group.instance.id`) and why does it matter? `senior`

Q41 — How do you handle large messages in Kafka? `senior`

Q42 — What is the difference between `subscribe()` and `assign()` in a Kafka consumer? `junior`

Q43 — How does Spring Kafka's `ConcurrentMessageListenerContainer` achieve parallelism? `senior`

Q44 — What is Kafka MirrorMaker 2 and how does it enable cross-datacenter replication? `senior`

Q45 — How do you implement message filtering in a Kafka consumer without creating separate topics? `junior`

Q46 — What are the tradeoffs between Avro, Protobuf, and JSON Schema with Schema Registry? `senior`

Q47 — How does `sendOffsetsToTransaction` enable consume-transform-produce exactly-once? `senior`

Q48 — What is consumer group heartbeat and how do `session.timeout.ms` and `heartbeat.interval.ms` interact? `junior`

Q49 — How would you perform a zero-downtime Kafka topic partition increase? `senior`

Q50 — How does Kafka's `isolation.level` protect consumers in transactional topics? `senior`