PostgreSQL — Interview Questions

Stack context: This system uses four isolated PostgreSQL instances (one per service): user-service (port 5432), product-service (port 5433), order-service (port 5434), payment-service (port 5435). Schema migrations are managed by Flyway. JPA/Hibernate is used for ORM, HikariCP for connection pooling.

Q1 — What is ACID and why does it matter in a relational database? `junior`

Answer: ACID defines the guarantees a database transaction must provide:

Property	Meaning	Example
Atomicity	All operations in a transaction succeed, or none do	Order insert + inventory update — if inventory fails, order is rolled back
Consistency	Transaction brings DB from one valid state to another	Foreign key constraints, NOT NULL constraints are enforced
Isolation	Concurrent transactions behave as if they ran serially	Two users placing the last item order don't both see stock=1
Durability	Committed transactions survive crashes	Power failure after commit does not lose the order

Why microservices make ACID harder: ACID guarantees apply within a single DB. When order-service writes to its DB and payment-service writes to its DB, there is no distributed ACID — hence patterns like Saga and Outbox.

PostgreSQL: Fully ACID-compliant. Uses MVCC for isolation and WAL for durability.

Q2 — What is MVCC (Multi-Version Concurrency Control)? `senior`

Answer: MVCC allows concurrent reads and writes without locking by maintaining multiple versions of each row.

How it works:

Each row has xmin (transaction that created it) and xmax (transaction that deleted/updated it).
When a transaction reads a row, it sees the version that was committed before the transaction started.
Writers create new row versions rather than overwriting existing ones.
Old versions are cleaned up by VACUUM.

Benefits:

Readers never block writers.
Writers never block readers.
High concurrency for read-heavy workloads (product catalog reads, order status queries).

Consequence: UPDATE = insert new version + mark old version as deleted. This causes table bloat over time — VACUUM reclaims dead tuple space.

-- See dead tuples accumulating (needs VACUUM)
SELECT relname, n_live_tup, n_dead_tup, last_vacuum 
FROM pg_stat_user_tables
WHERE relname = 'orders';

Q3 — What are PostgreSQL isolation levels and what anomalies do they prevent? `senior`

Answer: PostgreSQL supports four isolation levels, each preventing a different set of anomalies:

Level	Dirty Read	Non-Repeatable Read	Phantom Read	Serialization Anomaly
READ UNCOMMITTED	Possible (treated as RC)	Possible	Possible	Possible
READ COMMITTED (default)	Not possible	Possible	Possible	Possible
REPEATABLE READ	Not possible	Not possible	Not possible	Possible
SERIALIZABLE	Not possible	Not possible	Not possible	Not possible

Anomaly definitions:

Dirty read: Reading uncommitted changes from another transaction.
Non-repeatable read: Reading the same row twice returns different data (another transaction updated it).
Phantom read: Re-running a query returns different rows (another transaction inserted/deleted rows).
Serialization anomaly: Result is inconsistent with any serial execution order.

Setting isolation level:

// Spring @Transactional
@Transactional(isolation = Isolation.SERIALIZABLE)
public void transferFunds(UUID from, UUID to, BigDecimal amount) {
    // full serializable isolation — slowest but safest for financial operations
}

This system: Uses default READ COMMITTED. For order payment processing, REPEATABLE READ or SERIALIZABLE may be needed to prevent double payment.

Q4 — What is a PostgreSQL index and what types are available? `junior`

Answer: An index is a separate data structure that speeds up data retrieval by providing a fast lookup path. Without an index, PostgreSQL performs a sequential scan (reads every row).

Index types:

Type	Use case	Example
B-tree (default)	Equality, range queries, ordering	`WHERE status = 'CONFIRMED'`, `ORDER BY created_at`
Hash	Exact equality only (rare)	UUID lookups (but B-tree works too)
GIN	Full-text search, JSONB containment, arrays	`WHERE metadata @> '{"category": "electronics"}'`
GiST	Geometric data, ranges, full-text	`tsrange`, `geometry` types
BRIN	Very large tables with physical ordering	Time-series logs (append-only)
Partial index	Index subset of rows	`WHERE status != 'COMPLETED'` (index only pending orders)

Creating indexes:

-- Standard B-tree
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Composite index (multi-column)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- Partial index (only index active orders)
CREATE INDEX idx_active_orders ON orders(customer_id)
WHERE status IN ('PENDING', 'CONFIRMED');

-- Covering index (include frequently selected columns)
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (status, total_amount);

Q5 — How do you analyze query performance in PostgreSQL? `junior`

Answer: Use EXPLAIN ANALYZE to see the query execution plan and actual execution statistics.

EXPLAIN ANALYZE
SELECT o.id, o.status, o.total_amount
FROM orders o
WHERE o.customer_id = '123e4567-e89b-12d3-a456-426614174000'
  AND o.status = 'CONFIRMED'
ORDER BY o.created_at DESC
LIMIT 10;

Reading the output:

Index Scan using idx_orders_customer_id on orders  (cost=0.28..8.30 rows=5 width=56) 
                                                    (actual time=0.045..0.052 rows=3 loops=1)
  Index Cond: (customer_id = '...')
  Filter: (status = 'CONFIRMED')
  Rows Removed by Filter: 12
Planning Time: 0.8 ms
Execution Time: 0.1 ms

Key things to look for:

Seq Scan: Bad on large tables — missing index.
actual rows vs estimated rows: Large discrepancy = stale statistics → run ANALYZE orders.
Rows Removed by Filter: Many rows removed means a better index is needed.
Nested Loop / Hash Join: Join strategy — nested loop is fine for small results, hash join for large.
Execution Time: Actual wall clock time including I/O.

pg_stat_statements: Track cumulative query statistics to find the slowest queries:

SELECT query, mean_exec_time, calls FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

Q6 — What is Flyway and how does it manage database migrations? `junior`

Answer: Flyway is a database migration tool that applies versioned SQL scripts in order, tracking which scripts have been applied in a flyway_schema_history table.

Migration file naming convention:

V1__create_orders_table.sql        (V = versioned, 1 = version, __ = separator)
V2__add_order_status_index.sql
V3__add_outbox_events_table.sql
R__refresh_order_summary_view.sql  (R = repeatable — always re-run when checksum changes)

Flyway workflow:

On app startup, Flyway checks flyway_schema_history.
Compares applied versions to scripts on classpath.
Applies pending scripts in order.
If a script fails, the migration stops (and the transaction is rolled back in supported DBs).

Configuration:

spring:
  flyway:
    enabled: true
    locations: classpath:db/migration
    baseline-on-migrate: false  # set true only when adding Flyway to existing DB
    validate-on-migrate: true   # fail if applied migration checksums don't match files

Critical rule: Never modify an applied migration script. Always create a new version to alter an existing object.

Q7 — What is JPA and how does Hibernate implement it? `junior`

Answer: JPA (Jakarta Persistence API) is a Java specification for ORM (Object-Relational Mapping). Hibernate is the most popular JPA implementation.

Core annotations:

@Entity
@Table(name = "orders")
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    private UUID id;

    @Column(name = "customer_id", nullable = false)
    private UUID customerId;

    @Enumerated(EnumType.STRING)
    @Column(nullable = false)
    private OrderStatus status;

    @Column(name = "total_amount", precision = 10, scale = 2)
    private BigDecimal totalAmount;

    @CreationTimestamp
    @Column(name = "created_at", updatable = false)
    private Instant createdAt;

    @OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
    private List<OrderItem> items = new ArrayList<>();
}

JPA Repository (Spring Data):

public interface OrderRepository extends JpaRepository<Order, UUID> {
    List<Order> findByCustomerIdOrderByCreatedAtDesc(UUID customerId);
    Optional<Order> findByIdAndCustomerId(UUID id, UUID customerId);

    @Query("SELECT o FROM Order o WHERE o.status = :status AND o.createdAt < :cutoff")
    List<Order> findStalePendingOrders(@Param("status") OrderStatus status,
                                       @Param("cutoff") Instant cutoff);
}

Q8 — What is the N+1 query problem and how do you solve it? `junior`

Answer: N+1 occurs when loading N parent entities triggers N additional queries to load their related entities, instead of one join query.

Example:

// N+1: 1 query for orders + N queries for items (one per order)
List<Order> orders = orderRepo.findAll();        // SELECT * FROM orders (returns N rows)
for (Order o : orders) {
    System.out.println(o.getItems().size());     // SELECT * FROM order_items WHERE order_id = ?
}                                                // N times!

Solutions:

1. JOIN FETCH in JPQL:

@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.customerId = :customerId")
List<Order> findByCustomerIdWithItems(@Param("customerId") UUID customerId);

2. @EntityGraph:

@EntityGraph(attributePaths = {"items"})
List<Order> findByCustomerId(UUID customerId);

3. FetchType.EAGER (use with caution — always fetches, even when not needed):

@OneToMany(fetch = FetchType.EAGER)
private List<OrderItem> items;

4. Batch fetch (@BatchSize):

@BatchSize(size = 20)
@OneToMany
private List<OrderItem> items;
// Generates: SELECT * FROM order_items WHERE order_id IN (?, ?, ?, ...) — batches N queries into ⌈N/20⌉

Detection: Enable SQL logging (show-sql: true) or use p6spy to see all queries fired.

Q9 — What is HikariCP and how do you tune the connection pool? `senior`

Answer: HikariCP is the default JDBC connection pool in Spring Boot. It maintains a pool of pre-established DB connections for reuse, avoiding the latency of creating new connections per request.

Key configuration:

spring:
  datasource:
    hikari:
      maximum-pool-size: 10          # max connections in pool
      minimum-idle: 5                # idle connections maintained
      connection-timeout: 30000      # max time to wait for connection (ms)
      idle-timeout: 600000           # idle connection removed after 10 min
      max-lifetime: 1800000          # connection recycled after 30 min
      keepalive-time: 30000          # heartbeat to prevent idle timeout
      pool-name: "order-service-pool"

Sizing formula (Hikari documentation):

pool_size = (core_count * 2) + effective_spindle_count

For a 4-core machine with SSD: 4 * 2 + 1 = 9 → round to 10.

Monitoring:

hikaricp.connections.active (Micrometer)
hikaricp.connections.pending — high pending = pool too small
hikaricp.connections.timeout — requests waiting too long

This system: Each service has its own HikariCP pool. Typical microservice pool size: 5–20 depending on load.

Q10 — What is a transaction and how does `@Transactional` work in Spring? `junior`

Answer: A transaction is a unit of work that must complete fully or not at all. @Transactional in Spring wraps a method in a DB transaction using AOP (proxy).

@Service
public class OrderService {

    @Transactional  // begin transaction before method, commit after, rollback on exception
    public Order createOrder(CreateOrderRequest request) {
        Order order = new Order(request);
        orderRepository.save(order);

        // If this throws, both the order save AND the outbox insert are rolled back
        outboxRepository.save(new OutboxEvent("order.created", order.getId()));

        return order;
    }
}

Transaction propagation:

Propagation	Behavior
`REQUIRED` (default)	Join existing transaction or create new
`REQUIRES_NEW`	Always create new transaction (suspends existing)
`SUPPORTS`	Join if exists, no transaction if not
`MANDATORY`	Must join existing transaction, else exception
`NOT_SUPPORTED`	Run without transaction
`NEVER`	Must not have transaction, else exception

Pitfall: @Transactional only works when called via the Spring proxy. Calling this.transactionalMethod() from the same class bypasses the proxy — no transaction is started.

Q11 — What are PostgreSQL sequences and how do they differ from `gen_random_uuid()`? `junior`

Answer:

Sequences (auto-incrementing integers):

CREATE SEQUENCE orders_seq START 1 INCREMENT 1;

CREATE TABLE orders (
    id BIGINT DEFAULT nextval('orders_seq') PRIMARY KEY,
    -- or with SERIAL/BIGSERIAL shorthand:
);

-- SERIAL is equivalent to:
CREATE TABLE orders (id BIGSERIAL PRIMARY KEY, ...);

gen_random_uuid() (UUID v4):

CREATE TABLE orders (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    ...
);

	Sequence / BIGSERIAL	`gen_random_uuid()`
Type	Integer (8 bytes)	UUID (16 bytes)
Globally unique	No (unique per table)	Yes (globally unique)
URL/API exposure	Exposes record count (security risk)	No enumeration risk
Index performance	Excellent (sequential inserts)	Moderate (random inserts cause B-tree splits)
Join performance	Better (smaller key)	Slightly worse

This system: Uses UUIDs (gen_random_uuid()) for all entity IDs. Good for distributed systems where records might be created in multiple places.

Q12 — What is VACUUM and why is it important? `senior`

Answer: Due to MVCC, PostgreSQL never overwrites rows — UPDATE creates a new row version, and DELETE marks the old version as dead. VACUUM reclaims storage from dead tuples and updates statistics.

Two modes:

VACUUM (lazy): Marks dead tuple space as reusable but does not return it to the OS.
VACUUM FULL: Rewrites the table, returning space to OS. Requires an exclusive lock — avoid during peak hours.

AUTOVACUUM: PostgreSQL automatically runs VACUUM when:

20% + 50 rows of a table become dead tuples (default autovacuum_vacuum_scale_factor = 0.2).

Monitoring:

SELECT schemaname, relname, n_live_tup, n_dead_tup,
       last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Transaction ID wraparound: PostgreSQL uses 32-bit transaction IDs. After ~2 billion transactions, IDs wrap. VACUUM prevents this by advancing the freeze horizon. If ignored, the database becomes read-only for emergency VACUUM (catastrophic).

This system: Autovacuum handles routine cleanup. Monitor n_dead_tup for tables with high UPDATE/DELETE rates (payments, order status updates).

Q13 — How do you use `SELECT FOR UPDATE` and when is it needed? `senior`

Answer: SELECT FOR UPDATE acquires a row-level lock when reading, preventing other transactions from updating those rows until the lock is released.

Use case: Check-then-act on stock inventory:

BEGIN;
-- Lock the product row to prevent concurrent stock deductions
SELECT id, stock_quantity FROM products WHERE id = ? FOR UPDATE;

-- If stock > 0, deduct
UPDATE products SET stock_quantity = stock_quantity - 1 WHERE id = ? AND stock_quantity > 0;
COMMIT;

Without FOR UPDATE, two concurrent requests could both read stock=1, both decide to proceed, and both deduct — resulting in stock=-1.

Spring Data JPA:

@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT p FROM Product p WHERE p.id = :id")
Optional<Product> findByIdForUpdate(@Param("id") UUID id);

SKIP LOCKED (for queue processing — outbox poller):

SELECT * FROM outbox_events WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Skips rows locked by other processes — enables parallel polling workers

Q14 — How do you model a many-to-many relationship in PostgreSQL? `junior`

Answer: Many-to-many requires a join table (junction table) with foreign keys to both parent tables.

Example: Orders and Products (order_items):

CREATE TABLE order_items (
    id          UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    order_id    UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id  UUID NOT NULL REFERENCES products(id),
    quantity    INT NOT NULL CHECK (quantity > 0),
    unit_price  NUMERIC(10, 2) NOT NULL,  -- snapshot of price at order time
    UNIQUE(order_id, product_id)           -- one line per product per order
);

JPA mapping:

@Entity
public class OrderItem {
    @Id @GeneratedValue(strategy = GenerationType.UUID)
    private UUID id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "order_id")
    private Order order;

    @Column(name = "product_id")
    private UUID productId;   // UUID, not @ManyToOne — cross-service reference

    private int quantity;
    private BigDecimal unitPrice;
}

Note: productId is a UUID, not a JPA @ManyToOne relation — because product data lives in product-service's separate database. Cross-service relationships use IDs, not JPA joins.

Q15 — What is JSONB and when should you use it in PostgreSQL? `senior`

Answer: PostgreSQL supports two JSON types:

json: Stores JSON as text. Validates JSON syntax. No indexing.
jsonb: Stores JSON in binary format. Parsed on insert. Supports GIN indexing. Slightly larger storage. Recommended.

JSONB use cases:

-- Store semi-structured event metadata
CREATE TABLE outbox_events (
    id          UUID PRIMARY KEY,
    event_type  VARCHAR(100) NOT NULL,
    payload     JSONB NOT NULL,
    metadata    JSONB,              -- headers, trace IDs, etc.
    created_at  TIMESTAMPTZ DEFAULT now()
);

-- Query JSONB
SELECT * FROM outbox_events
WHERE payload->>'orderId' = '123'      -- text extraction
  AND payload @> '{"status": "PENDING"}';  -- containment check

-- GIN index for JSONB queries
CREATE INDEX idx_outbox_payload ON outbox_events USING GIN (payload);

When to use JSONB:

Polymorphic data where schema varies per record type.
Event sourcing (events have different payloads).
Storing external API responses without knowing the full schema.

When NOT to use: If you always query specific fields, use typed columns + indexes — they're faster and type-safe.

Q16 — What is connection pooling and why is it important? `junior`

Answer: Each PostgreSQL connection requires a dedicated backend process (~5–10 MB RAM). Without pooling, every request creates a new connection — slow and resource-exhausting.

Connection pool flow:

App request → HikariCP pool:
  Pool has idle connection → return it immediately (fast path)
  Pool is full, all busy → wait (up to connection-timeout)
  Pool below max → create new connection (slow path)

Request completes → return connection to pool (not closed)

External pooler (PgBouncer): For very high concurrency, HikariCP + direct connections can overwhelm PostgreSQL. PgBouncer sits between app and PostgreSQL:

Transaction pooling: Connection returned to PgBouncer pool after each transaction (most efficient).
Session pooling: Connection held for entire client session.

PostgreSQL max connections: Default 100. With 6 services × 10 connections = 60 connections — within limits. With 10 instances × 6 services × 10 = 600 — exceeds limits → need PgBouncer.

Q17 — What is Write-Ahead Logging (WAL) and how does it ensure durability? `senior`

Answer: WAL (Write-Ahead Logging) is PostgreSQL's durability mechanism. Before any data page is written to disk, the change is first written to the WAL log.

How it works:

Transaction commits → WAL record written to WAL buffer.
fsync() flushes WAL to disk (durability guarantee).
Data pages are written to disk lazily (later, by background writer).
On crash, PostgreSQL replays WAL records to recover uncommitted changes.

Benefits beyond durability:

Replication: Standby servers replay WAL from primary — logical and physical replication.
Point-in-time recovery (PITR): Archive WAL files → restore to any point in time.
Logical decoding: Parse WAL to extract change events (used by Debezium CDC).

Logical replication (Debezium alternative to outbox poller):

-- Enable logical replication in postgresql.conf
wal_level = logical

-- Create publication for outbox table
CREATE PUBLICATION outbox_pub FOR TABLE outbox_events;

Debezium reads the WAL stream, eliminating the polling outbox pattern.

Q18 — How do you handle full-text search in PostgreSQL? `senior`

Answer: PostgreSQL has built-in full-text search via tsvector and tsquery.

Setup:

-- Add tsvector column for full-text search on products
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Update with weighted vectors (A = highest weight)
UPDATE products SET search_vector =
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B');

-- GIN index for fast full-text search
CREATE INDEX idx_products_fts ON products USING GIN (search_vector);

-- Update automatically via trigger
CREATE TRIGGER update_product_search
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(
    search_vector, 'pg_catalog.english', name, description);

Querying:

SELECT id, name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'laptop & gaming') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

vs Elasticsearch: PostgreSQL FTS works for moderate datasets. For large-scale, faceted, fuzzy search → Elasticsearch.

Q19 — What is table partitioning and when should you use it? `senior`

Answer: Partitioning divides a large table into smaller physical sub-tables (partitions), improving query performance and maintenance.

Range partitioning by date (logs, orders):

CREATE TABLE orders (
    id         UUID NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    status     VARCHAR(20),
    ...
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02 PARTITION OF orders
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Benefits:

Partition pruning: WHERE created_at > '2026-01-01' only scans relevant partitions.
Maintenance: Drop old partitions in milliseconds (DROP TABLE orders_2024_01).
Parallel queries: Scan multiple partitions in parallel.

When to use: Tables with > 10 million rows, natural range (time) or list (status, tenant) partitioning key, and queries that typically access a subset of partitions.

This system: Not currently partitioned. orders table would benefit from monthly partitioning at scale (millions of orders per month).

Q20 — What are common PostgreSQL data types and how do you choose between them? `junior`

Answer:

Data	Recommended Type	Notes
Integer IDs	`BIGSERIAL` or `UUID`	BIGSERIAL for sequences, UUID for distributed
Text	`TEXT`	No length limit; use `VARCHAR(n)` only for enforced max
Status/enum	`VARCHAR(20)` or `TEXT` with CHECK	Avoid native `ENUM` (hard to alter)
Decimal money	`NUMERIC(10, 2)`	Never `FLOAT` for money (floating point imprecision)
Timestamps	`TIMESTAMPTZ`	Always store with timezone; normalizes to UTC internally
Boolean	`BOOLEAN`	`TRUE`/`FALSE`/`NULL`
JSON	`JSONB`	Prefer over `json` for indexing
IPs	`INET` or `CIDR`	Supports IP arithmetic and indexing
Binary	`BYTEA`	Small blobs; large files go to object storage
Array	`INTEGER[]`, `TEXT[]`	GIN indexable; avoid for relational data

Money pitfall:

-- WRONG: floating point imprecision
price DOUBLE PRECISION;

-- CORRECT: exact decimal
price NUMERIC(10, 2);  -- up to 99,999,999.99

Q21 — How do you detect and resolve deadlocks in PostgreSQL? `senior`

Answer: A deadlock occurs when two transactions each hold a lock the other needs:

Transaction A holds lock on Order → waits for Product.
Transaction B holds lock on Product → waits for Order.

PostgreSQL handles deadlocks automatically: Detects the cycle after deadlock_timeout (default 1 second) and rolls back one transaction (the "victim"), allowing the other to proceed.

Detection:

-- Check current locks
SELECT pid, relation::regclass, mode, granted
FROM pg_locks l JOIN pg_class c ON l.relation = c.oid
WHERE NOT granted;

-- Check waiting queries
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';

Log deadlocks:

log_min_duration_statement = 1000  # log queries > 1s
deadlock_timeout = 1s
log_lock_waits = on

Prevention strategies:

Consistent lock ordering: Always lock resources in the same order across all transactions (A before B, never B before A).
Minimize transaction scope: Hold locks for as short a time as possible.
Use SKIP LOCKED: For queue processing, skip locked rows instead of waiting.
Optimistic locking (JPA @Version): No DB lock; retry on conflict.

Q22 — What is logical replication and how does Debezium use it? `senior`

Answer: Logical replication streams decoded row changes (INSERT/UPDATE/DELETE) from PostgreSQL's WAL to external consumers, without requiring the consumer to understand PostgreSQL's binary format.

Setup:

-- postgresql.conf
wal_level = logical
max_replication_slots = 5
max_wal_senders = 5

-- Create replication slot for Debezium
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');

Debezium connector (replaces outbox poller):

{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "database.hostname": "postgres",
  "database.dbname": "orderdb",
  "slot.name": "debezium_slot",
  "table.include.list": "public.outbox_events",
  "transforms": "outbox",
  "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}

Benefits over outbox poller:

Sub-millisecond latency (streaming vs polling interval).
No DB polling overhead.
Captures ALL changes (not just INSERT — also UPDATE/DELETE).
One Debezium connector per DB instead of one poller per service.

Trade-off: Adds Debezium + Kafka Connect infrastructure complexity.

Q23 — How do you implement pagination in PostgreSQL efficiently? `junior`

Answer: Offset-based pagination (simple but slow for deep pages):

SELECT * FROM orders
WHERE customer_id = ?
ORDER BY created_at DESC
LIMIT 20 OFFSET 200;  -- skips 200 rows before returning 20

Problem: PostgreSQL reads and discards the first 200 rows — slow for large offsets.

Cursor-based (keyset) pagination (efficient for all pages):

-- First page
SELECT * FROM orders WHERE customer_id = ?
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page (using last row's values as cursor)
SELECT * FROM orders
WHERE customer_id = ?
  AND (created_at, id) < (:lastCreatedAt, :lastId)  -- keyset condition
ORDER BY created_at DESC, id DESC
LIMIT 20;

Benefits: Constant performance regardless of page depth. Uses the index efficiently.

Spring Data (offset):

Page<Order> findByCustomerId(UUID customerId, Pageable pageable);
// Usage: PageRequest.of(0, 20, Sort.by("createdAt").descending())

Spring Data (keyset — requires Blaze-Persistence or custom query):

@Query("SELECT o FROM Order o WHERE o.customerId = :id AND o.createdAt < :cursor ORDER BY o.createdAt DESC")
List<Order> findNextPage(@Param("id") UUID id, @Param("cursor") Instant cursor, Pageable pageable);

Q24 — What is `pg_stat_activity` and how do you use it to monitor running queries? `junior`

Answer: pg_stat_activity is a system view showing currently active database sessions and their current query.

-- All active queries sorted by duration
SELECT pid, usename, application_name, state,
       now() - query_start AS duration,
       left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Find long-running queries (> 5 seconds)
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < now() - interval '5 seconds';

-- Kill a stuck query
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE query LIKE '%outbox_events%' AND state = 'active' AND pid != pg_backend_pid();

Key columns:

state: active, idle, idle in transaction (potential lock holder), disabled.
wait_event_type / wait_event: What is the backend waiting for (Lock, IO, Client).
query: Currently executing SQL.
application_name: Set via JDBC URL (applicationName=order-service) to identify which service.

Q25 — What is `EXPLAIN` vs `EXPLAIN ANALYZE` and when to use each? `junior`

Answer:

	`EXPLAIN`	`EXPLAIN ANALYZE`
Executes query	No (estimate only)	Yes (runs the query!)
Shows actual rows	No	Yes
Shows actual time	No	Yes
Safe on production	Yes	Careful — runs the query!

EXPLAIN: Shows what the query planner thinks it will do. Useful for checking if the right index will be used without running an expensive query.

EXPLAIN ANALYZE: Actually executes the query and compares planned vs actual. Use when EXPLAIN shows the planner is wrong.

Caution: Never run EXPLAIN ANALYZE on a destructive query (UPDATE, DELETE) in production — it executes! Wrap in a transaction and rollback:

BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'CANCELLED' WHERE created_at < '2020-01-01';
ROLLBACK;

EXPLAIN (ANALYZE, BUFFERS): Also shows cache hits/misses — useful for I/O optimization:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = '...' ORDER BY created_at DESC;

Q26 — How do you implement optimistic locking with PostgreSQL and JPA? `senior`

Answer: Optimistic locking assumes conflicts are rare. Instead of locking rows on read, it checks that the row hasn't changed when writing. Uses a version counter.

@Entity
public class Product {
    @Id private UUID id;

    private int stockQuantity;

    @Version  // JPA version field
    private Long version;
}

How it works:

Read: SELECT * FROM products WHERE id = ? → returns {stockQuantity: 10, version: 5}.
Update: UPDATE products SET stock_quantity = 9, version = 6 WHERE id = ? AND version = 5.
If another transaction updated version to 6 between step 1 and 2: UPDATE affects 0 rows → JPA throws OptimisticLockException.
Caller retries with fresh data.

@Retryable(retryFor = OptimisticLockException.class, maxAttempts = 3)
@Transactional
public void deductStock(UUID productId, int quantity) {
    Product product = productRepo.findById(productId).orElseThrow();
    if (product.getStockQuantity() < quantity) throw new InsufficientStockException();
    product.setStockQuantity(product.getStockQuantity() - quantity);
    productRepo.save(product);  // fails on version conflict → triggers retry
}

Q27 — What is the difference between `TRUNCATE` and `DELETE`? `junior`

Answer:

	`DELETE`	`TRUNCATE`
Removes rows	One by one (MVCC)	All at once (drops and recreates storage)
WAL logging	Every row logged	Minimal logging
Triggers	Fires row-level triggers	Does NOT fire row-level triggers
`WHERE` clause	Yes — partial delete	No — removes ALL rows
Transaction safe	Yes — fully transactional	Yes — transactional in PostgreSQL
Speed	Slow on large tables	Very fast
Resets sequences	No	Optional: `TRUNCATE ... RESTART IDENTITY`
Foreign keys	Respects FK constraints	Respects FK (must cascade or have no children)

-- Delete specific rows
DELETE FROM outbox_events WHERE status = 'PUBLISHED' AND created_at < now() - interval '7 days';

-- Remove all rows quickly (e.g., for test cleanup)
TRUNCATE outbox_events RESTART IDENTITY;

This system: Test setup uses @Transactional rollback (Spring Test) or Testcontainers with fresh containers. TRUNCATE is used in @BeforeEach for integration tests that need clean state.

Q28 — What is connection string security and how do you protect DB credentials? `senior`

Answer: Database connection strings contain passwords — exposing them in source code or logs is a critical security vulnerability (OWASP A02: Cryptographic Failures, A05: Security Misconfiguration).

Anti-patterns (never do):

# WRONG — hardcoded in source code
spring.datasource.password: MyPassword123

Secure approaches:

1. Environment variables:

spring:
  datasource:
    url: jdbc:postgresql://${DB_HOST}:5432/orderdb
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}  # injected from environment

2. Docker secrets (Docker Swarm/Kubernetes secrets):

# Kubernetes Secret
kubectl create secret generic db-secret --from-literal=password=secret123

# Pod spec mounts secret as env var
env:
  - name: DB_PASSWORD
    valueFrom:
      secretKeyRef:
        name: db-secret
        key: password

3. Vault / AWS Secrets Manager: Rotate credentials without redeployment. Spring Cloud Vault or AWS Parameter Store integration.

This system: Docker Compose uses environment variables loaded from .env file (git-ignored). Production would use Kubernetes secrets or Vault.

Q29 — How do you perform a database backup and restore in PostgreSQL? `junior`

Answer: pg_dump — logical backup (SQL format):

# Backup single database
pg_dump -h localhost -U postgres -d orderdb -F c -f orderdb.dump

# Restore
pg_restore -h localhost -U postgres -d orderdb_restored orderdb.dump

pg_dumpall — backup all databases + global objects (roles):

pg_dumpall -h localhost -U postgres > full_backup.sql

Formats:

-F p (plain): SQL text — portable, human-readable, no parallelism.
-F c (custom): Compressed, supports parallel restore, recommended.
-F d (directory): One file per table, supports pg_restore -j 4 parallel restore.

Continuous backup with WAL archiving:

base backup (pg_basebackup) + WAL archive → point-in-time recovery

This system: Docker Compose data persists in named volumes. For production: automated nightly pg_dump to S3, with WAL archiving for PITR.

Q30 — How do you design the database schema for the outbox pattern? `senior`

Answer: The outbox table must support: transactional writes with business data, polling for unpublished events, marking events as published, and DLT for permanently failed events.

CREATE TABLE outbox_events (
    id            UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    aggregate_id  UUID NOT NULL,          -- the entity ID (order_id, payment_id)
    aggregate_type VARCHAR(100) NOT NULL,  -- 'Order', 'Payment'
    event_type    VARCHAR(200) NOT NULL,   -- 'OrderCreatedEvent'
    topic         VARCHAR(200) NOT NULL,   -- 'order.created'
    payload       JSONB NOT NULL,          -- serialized event (Avro bytes or JSON)
    headers       JSONB,                   -- Kafka headers (traceId, correlationId)
    status        VARCHAR(20) NOT NULL DEFAULT 'PENDING',  -- PENDING, PUBLISHED, FAILED
    attempts      INT NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at  TIMESTAMPTZ,
    last_error    TEXT
);

-- Index for efficient polling (poller queries pending events by created_at)
CREATE INDEX idx_outbox_pending ON outbox_events(created_at)
WHERE status = 'PENDING';

-- Index for aggregate lookups (debugging)
CREATE INDEX idx_outbox_aggregate ON outbox_events(aggregate_id, aggregate_type);

Poller query with SKIP LOCKED (safe parallel poller):

SELECT * FROM outbox_events
WHERE status = 'PENDING' AND attempts < 5
ORDER BY created_at ASC
LIMIT 100
FOR UPDATE SKIP LOCKED;

This design supports multiple poller instances running in parallel without duplicate publishing.

PostgreSQL — Interview Questions

Q1 — What is ACID and why does it matter in a relational database? junior

Q2 — What is MVCC (Multi-Version Concurrency Control)? senior

Q3 — What are PostgreSQL isolation levels and what anomalies do they prevent? senior

Q4 — What is a PostgreSQL index and what types are available? junior

Q5 — How do you analyze query performance in PostgreSQL? junior

Q6 — What is Flyway and how does it manage database migrations? junior

Q7 — What is JPA and how does Hibernate implement it? junior

Q8 — What is the N+1 query problem and how do you solve it? junior

Q9 — What is HikariCP and how do you tune the connection pool? senior

Q10 — What is a transaction and how does @Transactional work in Spring? junior

Q11 — What are PostgreSQL sequences and how do they differ from gen_random_uuid()? junior

Q12 — What is VACUUM and why is it important? senior

Q13 — How do you use SELECT FOR UPDATE and when is it needed? senior

Q14 — How do you model a many-to-many relationship in PostgreSQL? junior

Q15 — What is JSONB and when should you use it in PostgreSQL? senior

Q16 — What is connection pooling and why is it important? junior

Q17 — What is Write-Ahead Logging (WAL) and how does it ensure durability? senior

Q18 — How do you handle full-text search in PostgreSQL? senior

Q19 — What is table partitioning and when should you use it? senior

Q20 — What are common PostgreSQL data types and how do you choose between them? junior

Q21 — How do you detect and resolve deadlocks in PostgreSQL? senior

Q22 — What is logical replication and how does Debezium use it? senior

Q23 — How do you implement pagination in PostgreSQL efficiently? junior

Q24 — What is pg_stat_activity and how do you use it to monitor running queries? junior

Q25 — What is EXPLAIN vs EXPLAIN ANALYZE and when to use each? junior

Q26 — How do you implement optimistic locking with PostgreSQL and JPA? senior

Q27 — What is the difference between TRUNCATE and DELETE? junior

Q28 — What is connection string security and how do you protect DB credentials? senior

Q29 — How do you perform a database backup and restore in PostgreSQL? junior

Q30 — How do you design the database schema for the outbox pattern? senior

Q1 — What is ACID and why does it matter in a relational database? `junior`

Q2 — What is MVCC (Multi-Version Concurrency Control)? `senior`

Q3 — What are PostgreSQL isolation levels and what anomalies do they prevent? `senior`

Q4 — What is a PostgreSQL index and what types are available? `junior`

Q5 — How do you analyze query performance in PostgreSQL? `junior`

Q6 — What is Flyway and how does it manage database migrations? `junior`

Q7 — What is JPA and how does Hibernate implement it? `junior`

Q8 — What is the N+1 query problem and how do you solve it? `junior`

Q9 — What is HikariCP and how do you tune the connection pool? `senior`

Q10 — What is a transaction and how does `@Transactional` work in Spring? `junior`

Q11 — What are PostgreSQL sequences and how do they differ from `gen_random_uuid()`? `junior`

Q12 — What is VACUUM and why is it important? `senior`

Q13 — How do you use `SELECT FOR UPDATE` and when is it needed? `senior`

Q14 — How do you model a many-to-many relationship in PostgreSQL? `junior`

Q15 — What is JSONB and when should you use it in PostgreSQL? `senior`

Q16 — What is connection pooling and why is it important? `junior`

Q17 — What is Write-Ahead Logging (WAL) and how does it ensure durability? `senior`

Q18 — How do you handle full-text search in PostgreSQL? `senior`

Q19 — What is table partitioning and when should you use it? `senior`

Q20 — What are common PostgreSQL data types and how do you choose between them? `junior`

Q21 — How do you detect and resolve deadlocks in PostgreSQL? `senior`

Q22 — What is logical replication and how does Debezium use it? `senior`

Q23 — How do you implement pagination in PostgreSQL efficiently? `junior`

Q24 — What is `pg_stat_activity` and how do you use it to monitor running queries? `junior`

Q25 — What is `EXPLAIN` vs `EXPLAIN ANALYZE` and when to use each? `junior`

Q26 — How do you implement optimistic locking with PostgreSQL and JPA? `senior`

Q27 — What is the difference between `TRUNCATE` and `DELETE`? `junior`

Q28 — What is connection string security and how do you protect DB credentials? `senior`

Q29 — How do you perform a database backup and restore in PostgreSQL? `junior`

Q30 — How do you design the database schema for the outbox pattern? `senior`