PostgreSQL — Interview Questions

Stack context: This system uses four isolated PostgreSQL instances (one per service): user-service (port 5432), product-service (port 5433), order-service (port 5434), payment-service (port 5435). Schema migrations are managed by Flyway. JPA/Hibernate is used for ORM, HikariCP for connection pooling.


Q1 — What is ACID and why does it matter in a relational database? junior

Answer: ACID defines the guarantees a database transaction must provide:

Property Meaning Example
Atomicity All operations in a transaction succeed, or none do Order insert + inventory update — if inventory fails, order is rolled back
Consistency Transaction brings DB from one valid state to another Foreign key constraints, NOT NULL constraints are enforced
Isolation Concurrent transactions behave as if they ran serially Two users placing the last item order don't both see stock=1
Durability Committed transactions survive crashes Power failure after commit does not lose the order

Why microservices make ACID harder: ACID guarantees apply within a single DB. When order-service writes to its DB and payment-service writes to its DB, there is no distributed ACID — hence patterns like Saga and Outbox.

PostgreSQL: Fully ACID-compliant. Uses MVCC for isolation and WAL for durability.


Q2 — What is MVCC (Multi-Version Concurrency Control)? senior

Answer: MVCC allows concurrent reads and writes without locking by maintaining multiple versions of each row.

How it works:

  1. Each row has xmin (transaction that created it) and xmax (transaction that deleted/updated it).
  2. When a transaction reads a row, it sees the version that was committed before the transaction started.
  3. Writers create new row versions rather than overwriting existing ones.
  4. Old versions are cleaned up by VACUUM.

Benefits:

Consequence: UPDATE = insert new version + mark old version as deleted. This causes table bloat over time — VACUUM reclaims dead tuple space.

-- See dead tuples accumulating (needs VACUUM)
SELECT relname, n_live_tup, n_dead_tup, last_vacuum 
FROM pg_stat_user_tables
WHERE relname = 'orders';

Q3 — What are PostgreSQL isolation levels and what anomalies do they prevent? senior

Answer: PostgreSQL supports four isolation levels, each preventing a different set of anomalies:

Level Dirty Read Non-Repeatable Read Phantom Read Serialization Anomaly
READ UNCOMMITTED Possible (treated as RC) Possible Possible Possible
READ COMMITTED (default) Not possible Possible Possible Possible
REPEATABLE READ Not possible Not possible Not possible Possible
SERIALIZABLE Not possible Not possible Not possible Not possible

Anomaly definitions:

Setting isolation level:

// Spring @Transactional
@Transactional(isolation = Isolation.SERIALIZABLE)
public void transferFunds(UUID from, UUID to, BigDecimal amount) {
    // full serializable isolation — slowest but safest for financial operations
}

This system: Uses default READ COMMITTED. For order payment processing, REPEATABLE READ or SERIALIZABLE may be needed to prevent double payment.


Q4 — What is a PostgreSQL index and what types are available? junior

Answer: An index is a separate data structure that speeds up data retrieval by providing a fast lookup path. Without an index, PostgreSQL performs a sequential scan (reads every row).

Index types:

Type Use case Example
B-tree (default) Equality, range queries, ordering WHERE status = 'CONFIRMED', ORDER BY created_at
Hash Exact equality only (rare) UUID lookups (but B-tree works too)
GIN Full-text search, JSONB containment, arrays WHERE metadata @> '{"category": "electronics"}'
GiST Geometric data, ranges, full-text tsrange, geometry types
BRIN Very large tables with physical ordering Time-series logs (append-only)
Partial index Index subset of rows WHERE status != 'COMPLETED' (index only pending orders)

Creating indexes:

-- Standard B-tree
CREATE INDEX idx_orders_customer_id ON orders(customer_id);

-- Composite index (multi-column)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);

-- Partial index (only index active orders)
CREATE INDEX idx_active_orders ON orders(customer_id)
WHERE status IN ('PENDING', 'CONFIRMED');

-- Covering index (include frequently selected columns)
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (status, total_amount);

Q5 — How do you analyze query performance in PostgreSQL? junior

Answer: Use EXPLAIN ANALYZE to see the query execution plan and actual execution statistics.

EXPLAIN ANALYZE
SELECT o.id, o.status, o.total_amount
FROM orders o
WHERE o.customer_id = '123e4567-e89b-12d3-a456-426614174000'
  AND o.status = 'CONFIRMED'
ORDER BY o.created_at DESC
LIMIT 10;

Reading the output:

Index Scan using idx_orders_customer_id on orders  (cost=0.28..8.30 rows=5 width=56) 
                                                    (actual time=0.045..0.052 rows=3 loops=1)
  Index Cond: (customer_id = '...')
  Filter: (status = 'CONFIRMED')
  Rows Removed by Filter: 12
Planning Time: 0.8 ms
Execution Time: 0.1 ms

Key things to look for:

pg_stat_statements: Track cumulative query statistics to find the slowest queries:

SELECT query, mean_exec_time, calls FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;

Q6 — What is Flyway and how does it manage database migrations? junior

Answer: Flyway is a database migration tool that applies versioned SQL scripts in order, tracking which scripts have been applied in a flyway_schema_history table.

Migration file naming convention:

V1__create_orders_table.sql        (V = versioned, 1 = version, __ = separator)
V2__add_order_status_index.sql
V3__add_outbox_events_table.sql
R__refresh_order_summary_view.sql  (R = repeatable — always re-run when checksum changes)

Flyway workflow:

  1. On app startup, Flyway checks flyway_schema_history.
  2. Compares applied versions to scripts on classpath.
  3. Applies pending scripts in order.
  4. If a script fails, the migration stops (and the transaction is rolled back in supported DBs).

Configuration:

spring:
  flyway:
    enabled: true
    locations: classpath:db/migration
    baseline-on-migrate: false  # set true only when adding Flyway to existing DB
    validate-on-migrate: true   # fail if applied migration checksums don't match files

Critical rule: Never modify an applied migration script. Always create a new version to alter an existing object.


Q7 — What is JPA and how does Hibernate implement it? junior

Answer: JPA (Jakarta Persistence API) is a Java specification for ORM (Object-Relational Mapping). Hibernate is the most popular JPA implementation.

Core annotations:

@Entity
@Table(name = "orders")
public class Order {
    @Id
    @GeneratedValue(strategy = GenerationType.UUID)
    private UUID id;

    @Column(name = "customer_id", nullable = false)
    private UUID customerId;

    @Enumerated(EnumType.STRING)
    @Column(nullable = false)
    private OrderStatus status;

    @Column(name = "total_amount", precision = 10, scale = 2)
    private BigDecimal totalAmount;

    @CreationTimestamp
    @Column(name = "created_at", updatable = false)
    private Instant createdAt;

    @OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
    private List<OrderItem> items = new ArrayList<>();
}

JPA Repository (Spring Data):

public interface OrderRepository extends JpaRepository<Order, UUID> {
    List<Order> findByCustomerIdOrderByCreatedAtDesc(UUID customerId);
    Optional<Order> findByIdAndCustomerId(UUID id, UUID customerId);

    @Query("SELECT o FROM Order o WHERE o.status = :status AND o.createdAt < :cutoff")
    List<Order> findStalePendingOrders(@Param("status") OrderStatus status,
                                       @Param("cutoff") Instant cutoff);
}

Q8 — What is the N+1 query problem and how do you solve it? junior

Answer: N+1 occurs when loading N parent entities triggers N additional queries to load their related entities, instead of one join query.

Example:

// N+1: 1 query for orders + N queries for items (one per order)
List<Order> orders = orderRepo.findAll();        // SELECT * FROM orders (returns N rows)
for (Order o : orders) {
    System.out.println(o.getItems().size());     // SELECT * FROM order_items WHERE order_id = ?
}                                                // N times!

Solutions:

1. JOIN FETCH in JPQL:

@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.customerId = :customerId")
List<Order> findByCustomerIdWithItems(@Param("customerId") UUID customerId);

2. @EntityGraph:

@EntityGraph(attributePaths = {"items"})
List<Order> findByCustomerId(UUID customerId);

3. FetchType.EAGER (use with caution — always fetches, even when not needed):

@OneToMany(fetch = FetchType.EAGER)
private List<OrderItem> items;

4. Batch fetch (@BatchSize):

@BatchSize(size = 20)
@OneToMany
private List<OrderItem> items;
// Generates: SELECT * FROM order_items WHERE order_id IN (?, ?, ?, ...) — batches N queries into ⌈N/20⌉

Detection: Enable SQL logging (show-sql: true) or use p6spy to see all queries fired.


Q9 — What is HikariCP and how do you tune the connection pool? senior

Answer: HikariCP is the default JDBC connection pool in Spring Boot. It maintains a pool of pre-established DB connections for reuse, avoiding the latency of creating new connections per request.

Key configuration:

spring:
  datasource:
    hikari:
      maximum-pool-size: 10          # max connections in pool
      minimum-idle: 5                # idle connections maintained
      connection-timeout: 30000      # max time to wait for connection (ms)
      idle-timeout: 600000           # idle connection removed after 10 min
      max-lifetime: 1800000          # connection recycled after 30 min
      keepalive-time: 30000          # heartbeat to prevent idle timeout
      pool-name: "order-service-pool"

Sizing formula (Hikari documentation):

pool_size = (core_count * 2) + effective_spindle_count

For a 4-core machine with SSD: 4 * 2 + 1 = 9 → round to 10.

Monitoring:

This system: Each service has its own HikariCP pool. Typical microservice pool size: 5–20 depending on load.


Q10 — What is a transaction and how does @Transactional work in Spring? junior

Answer: A transaction is a unit of work that must complete fully or not at all. @Transactional in Spring wraps a method in a DB transaction using AOP (proxy).

@Service
public class OrderService {

    @Transactional  // begin transaction before method, commit after, rollback on exception
    public Order createOrder(CreateOrderRequest request) {
        Order order = new Order(request);
        orderRepository.save(order);

        // If this throws, both the order save AND the outbox insert are rolled back
        outboxRepository.save(new OutboxEvent("order.created", order.getId()));

        return order;
    }
}

Transaction propagation:

Propagation Behavior
REQUIRED (default) Join existing transaction or create new
REQUIRES_NEW Always create new transaction (suspends existing)
SUPPORTS Join if exists, no transaction if not
MANDATORY Must join existing transaction, else exception
NOT_SUPPORTED Run without transaction
NEVER Must not have transaction, else exception

Pitfall: @Transactional only works when called via the Spring proxy. Calling this.transactionalMethod() from the same class bypasses the proxy — no transaction is started.


Q11 — What are PostgreSQL sequences and how do they differ from gen_random_uuid()? junior

Answer:

Sequences (auto-incrementing integers):

CREATE SEQUENCE orders_seq START 1 INCREMENT 1;

CREATE TABLE orders (
    id BIGINT DEFAULT nextval('orders_seq') PRIMARY KEY,
    -- or with SERIAL/BIGSERIAL shorthand:
);

-- SERIAL is equivalent to:
CREATE TABLE orders (id BIGSERIAL PRIMARY KEY, ...);

gen_random_uuid() (UUID v4):

CREATE TABLE orders (
    id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    ...
);
Sequence / BIGSERIAL gen_random_uuid()
Type Integer (8 bytes) UUID (16 bytes)
Globally unique No (unique per table) Yes (globally unique)
URL/API exposure Exposes record count (security risk) No enumeration risk
Index performance Excellent (sequential inserts) Moderate (random inserts cause B-tree splits)
Join performance Better (smaller key) Slightly worse

This system: Uses UUIDs (gen_random_uuid()) for all entity IDs. Good for distributed systems where records might be created in multiple places.


Q12 — What is VACUUM and why is it important? senior

Answer: Due to MVCC, PostgreSQL never overwrites rows — UPDATE creates a new row version, and DELETE marks the old version as dead. VACUUM reclaims storage from dead tuples and updates statistics.

Two modes:

AUTOVACUUM: PostgreSQL automatically runs VACUUM when:

Monitoring:

SELECT schemaname, relname, n_live_tup, n_dead_tup,
       last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;

Transaction ID wraparound: PostgreSQL uses 32-bit transaction IDs. After ~2 billion transactions, IDs wrap. VACUUM prevents this by advancing the freeze horizon. If ignored, the database becomes read-only for emergency VACUUM (catastrophic).

This system: Autovacuum handles routine cleanup. Monitor n_dead_tup for tables with high UPDATE/DELETE rates (payments, order status updates).


Q13 — How do you use SELECT FOR UPDATE and when is it needed? senior

Answer: SELECT FOR UPDATE acquires a row-level lock when reading, preventing other transactions from updating those rows until the lock is released.

Use case: Check-then-act on stock inventory:

BEGIN;
-- Lock the product row to prevent concurrent stock deductions
SELECT id, stock_quantity FROM products WHERE id = ? FOR UPDATE;

-- If stock > 0, deduct
UPDATE products SET stock_quantity = stock_quantity - 1 WHERE id = ? AND stock_quantity > 0;
COMMIT;

Without FOR UPDATE, two concurrent requests could both read stock=1, both decide to proceed, and both deduct — resulting in stock=-1.

Spring Data JPA:

@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT p FROM Product p WHERE p.id = :id")
Optional<Product> findByIdForUpdate(@Param("id") UUID id);

SKIP LOCKED (for queue processing — outbox poller):

SELECT * FROM outbox_events WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Skips rows locked by other processes — enables parallel polling workers

Q14 — How do you model a many-to-many relationship in PostgreSQL? junior

Answer: Many-to-many requires a join table (junction table) with foreign keys to both parent tables.

Example: Orders and Products (order_items):

CREATE TABLE order_items (
    id          UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    order_id    UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
    product_id  UUID NOT NULL REFERENCES products(id),
    quantity    INT NOT NULL CHECK (quantity > 0),
    unit_price  NUMERIC(10, 2) NOT NULL,  -- snapshot of price at order time
    UNIQUE(order_id, product_id)           -- one line per product per order
);

JPA mapping:

@Entity
public class OrderItem {
    @Id @GeneratedValue(strategy = GenerationType.UUID)
    private UUID id;

    @ManyToOne(fetch = FetchType.LAZY)
    @JoinColumn(name = "order_id")
    private Order order;

    @Column(name = "product_id")
    private UUID productId;   // UUID, not @ManyToOne — cross-service reference

    private int quantity;
    private BigDecimal unitPrice;
}

Note: productId is a UUID, not a JPA @ManyToOne relation — because product data lives in product-service's separate database. Cross-service relationships use IDs, not JPA joins.


Q15 — What is JSONB and when should you use it in PostgreSQL? senior

Answer: PostgreSQL supports two JSON types:

JSONB use cases:

-- Store semi-structured event metadata
CREATE TABLE outbox_events (
    id          UUID PRIMARY KEY,
    event_type  VARCHAR(100) NOT NULL,
    payload     JSONB NOT NULL,
    metadata    JSONB,              -- headers, trace IDs, etc.
    created_at  TIMESTAMPTZ DEFAULT now()
);

-- Query JSONB
SELECT * FROM outbox_events
WHERE payload->>'orderId' = '123'      -- text extraction
  AND payload @> '{"status": "PENDING"}';  -- containment check

-- GIN index for JSONB queries
CREATE INDEX idx_outbox_payload ON outbox_events USING GIN (payload);

When to use JSONB:

When NOT to use: If you always query specific fields, use typed columns + indexes — they're faster and type-safe.


Q16 — What is connection pooling and why is it important? junior

Answer: Each PostgreSQL connection requires a dedicated backend process (~5–10 MB RAM). Without pooling, every request creates a new connection — slow and resource-exhausting.

Connection pool flow:

App request → HikariCP pool:
  Pool has idle connection → return it immediately (fast path)
  Pool is full, all busy → wait (up to connection-timeout)
  Pool below max → create new connection (slow path)

Request completes → return connection to pool (not closed)

External pooler (PgBouncer): For very high concurrency, HikariCP + direct connections can overwhelm PostgreSQL. PgBouncer sits between app and PostgreSQL:

PostgreSQL max connections: Default 100. With 6 services × 10 connections = 60 connections — within limits. With 10 instances × 6 services × 10 = 600 — exceeds limits → need PgBouncer.


Q17 — What is Write-Ahead Logging (WAL) and how does it ensure durability? senior

Answer: WAL (Write-Ahead Logging) is PostgreSQL's durability mechanism. Before any data page is written to disk, the change is first written to the WAL log.

How it works:

  1. Transaction commits → WAL record written to WAL buffer.
  2. fsync() flushes WAL to disk (durability guarantee).
  3. Data pages are written to disk lazily (later, by background writer).
  4. On crash, PostgreSQL replays WAL records to recover uncommitted changes.

Benefits beyond durability:

Logical replication (Debezium alternative to outbox poller):

-- Enable logical replication in postgresql.conf
wal_level = logical

-- Create publication for outbox table
CREATE PUBLICATION outbox_pub FOR TABLE outbox_events;

Debezium reads the WAL stream, eliminating the polling outbox pattern.


Q18 — How do you handle full-text search in PostgreSQL? senior

Answer: PostgreSQL has built-in full-text search via tsvector and tsquery.

Setup:

-- Add tsvector column for full-text search on products
ALTER TABLE products ADD COLUMN search_vector tsvector;

-- Update with weighted vectors (A = highest weight)
UPDATE products SET search_vector =
    setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
    setweight(to_tsvector('english', coalesce(description, '')), 'B');

-- GIN index for fast full-text search
CREATE INDEX idx_products_fts ON products USING GIN (search_vector);

-- Update automatically via trigger
CREATE TRIGGER update_product_search
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(
    search_vector, 'pg_catalog.english', name, description);

Querying:

SELECT id, name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'laptop & gaming') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;

vs Elasticsearch: PostgreSQL FTS works for moderate datasets. For large-scale, faceted, fuzzy search → Elasticsearch.


Q19 — What is table partitioning and when should you use it? senior

Answer: Partitioning divides a large table into smaller physical sub-tables (partitions), improving query performance and maintenance.

Range partitioning by date (logs, orders):

CREATE TABLE orders (
    id         UUID NOT NULL,
    created_at TIMESTAMPTZ NOT NULL,
    status     VARCHAR(20),
    ...
) PARTITION BY RANGE (created_at);

-- Monthly partitions
CREATE TABLE orders_2026_01 PARTITION OF orders
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE orders_2026_02 PARTITION OF orders
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Benefits:

When to use: Tables with > 10 million rows, natural range (time) or list (status, tenant) partitioning key, and queries that typically access a subset of partitions.

This system: Not currently partitioned. orders table would benefit from monthly partitioning at scale (millions of orders per month).


Q20 — What are common PostgreSQL data types and how do you choose between them? junior

Answer:

Data Recommended Type Notes
Integer IDs BIGSERIAL or UUID BIGSERIAL for sequences, UUID for distributed
Text TEXT No length limit; use VARCHAR(n) only for enforced max
Status/enum VARCHAR(20) or TEXT with CHECK Avoid native ENUM (hard to alter)
Decimal money NUMERIC(10, 2) Never FLOAT for money (floating point imprecision)
Timestamps TIMESTAMPTZ Always store with timezone; normalizes to UTC internally
Boolean BOOLEAN TRUE/FALSE/NULL
JSON JSONB Prefer over json for indexing
IPs INET or CIDR Supports IP arithmetic and indexing
Binary BYTEA Small blobs; large files go to object storage
Array INTEGER[], TEXT[] GIN indexable; avoid for relational data

Money pitfall:

-- WRONG: floating point imprecision
price DOUBLE PRECISION;

-- CORRECT: exact decimal
price NUMERIC(10, 2);  -- up to 99,999,999.99

Q21 — How do you detect and resolve deadlocks in PostgreSQL? senior

Answer: A deadlock occurs when two transactions each hold a lock the other needs:

PostgreSQL handles deadlocks automatically: Detects the cycle after deadlock_timeout (default 1 second) and rolls back one transaction (the "victim"), allowing the other to proceed.

Detection:

-- Check current locks
SELECT pid, relation::regclass, mode, granted
FROM pg_locks l JOIN pg_class c ON l.relation = c.oid
WHERE NOT granted;

-- Check waiting queries
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';

Log deadlocks:

log_min_duration_statement = 1000  # log queries > 1s
deadlock_timeout = 1s
log_lock_waits = on

Prevention strategies:

  1. Consistent lock ordering: Always lock resources in the same order across all transactions (A before B, never B before A).
  2. Minimize transaction scope: Hold locks for as short a time as possible.
  3. Use SKIP LOCKED: For queue processing, skip locked rows instead of waiting.
  4. Optimistic locking (JPA @Version): No DB lock; retry on conflict.

Q22 — What is logical replication and how does Debezium use it? senior

Answer: Logical replication streams decoded row changes (INSERT/UPDATE/DELETE) from PostgreSQL's WAL to external consumers, without requiring the consumer to understand PostgreSQL's binary format.

Setup:

-- postgresql.conf
wal_level = logical
max_replication_slots = 5
max_wal_senders = 5

-- Create replication slot for Debezium
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');

Debezium connector (replaces outbox poller):

{
  "connector.class": "io.debezium.connector.postgresql.PostgresConnector",
  "database.hostname": "postgres",
  "database.dbname": "orderdb",
  "slot.name": "debezium_slot",
  "table.include.list": "public.outbox_events",
  "transforms": "outbox",
  "transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}

Benefits over outbox poller:

Trade-off: Adds Debezium + Kafka Connect infrastructure complexity.


Q23 — How do you implement pagination in PostgreSQL efficiently? junior

Answer: Offset-based pagination (simple but slow for deep pages):

SELECT * FROM orders
WHERE customer_id = ?
ORDER BY created_at DESC
LIMIT 20 OFFSET 200;  -- skips 200 rows before returning 20

Problem: PostgreSQL reads and discards the first 200 rows — slow for large offsets.

Cursor-based (keyset) pagination (efficient for all pages):

-- First page
SELECT * FROM orders WHERE customer_id = ?
ORDER BY created_at DESC, id DESC
LIMIT 20;

-- Next page (using last row's values as cursor)
SELECT * FROM orders
WHERE customer_id = ?
  AND (created_at, id) < (:lastCreatedAt, :lastId)  -- keyset condition
ORDER BY created_at DESC, id DESC
LIMIT 20;

Benefits: Constant performance regardless of page depth. Uses the index efficiently.

Spring Data (offset):

Page<Order> findByCustomerId(UUID customerId, Pageable pageable);
// Usage: PageRequest.of(0, 20, Sort.by("createdAt").descending())

Spring Data (keyset — requires Blaze-Persistence or custom query):

@Query("SELECT o FROM Order o WHERE o.customerId = :id AND o.createdAt < :cursor ORDER BY o.createdAt DESC")
List<Order> findNextPage(@Param("id") UUID id, @Param("cursor") Instant cursor, Pageable pageable);

Q24 — What is pg_stat_activity and how do you use it to monitor running queries? junior

Answer: pg_stat_activity is a system view showing currently active database sessions and their current query.

-- All active queries sorted by duration
SELECT pid, usename, application_name, state,
       now() - query_start AS duration,
       left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;

-- Find long-running queries (> 5 seconds)
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < now() - interval '5 seconds';

-- Kill a stuck query
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE query LIKE '%outbox_events%' AND state = 'active' AND pid != pg_backend_pid();

Key columns:


Q25 — What is EXPLAIN vs EXPLAIN ANALYZE and when to use each? junior

Answer:

EXPLAIN EXPLAIN ANALYZE
Executes query No (estimate only) Yes (runs the query!)
Shows actual rows No Yes
Shows actual time No Yes
Safe on production Yes Careful — runs the query!

EXPLAIN: Shows what the query planner thinks it will do. Useful for checking if the right index will be used without running an expensive query.

EXPLAIN ANALYZE: Actually executes the query and compares planned vs actual. Use when EXPLAIN shows the planner is wrong.

Caution: Never run EXPLAIN ANALYZE on a destructive query (UPDATE, DELETE) in production — it executes! Wrap in a transaction and rollback:

BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'CANCELLED' WHERE created_at < '2020-01-01';
ROLLBACK;

EXPLAIN (ANALYZE, BUFFERS): Also shows cache hits/misses — useful for I/O optimization:

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = '...' ORDER BY created_at DESC;

Q26 — How do you implement optimistic locking with PostgreSQL and JPA? senior

Answer: Optimistic locking assumes conflicts are rare. Instead of locking rows on read, it checks that the row hasn't changed when writing. Uses a version counter.

@Entity
public class Product {
    @Id private UUID id;

    private int stockQuantity;

    @Version  // JPA version field
    private Long version;
}

How it works:

  1. Read: SELECT * FROM products WHERE id = ? → returns {stockQuantity: 10, version: 5}.
  2. Update: UPDATE products SET stock_quantity = 9, version = 6 WHERE id = ? AND version = 5.
  3. If another transaction updated version to 6 between step 1 and 2: UPDATE affects 0 rows → JPA throws OptimisticLockException.
  4. Caller retries with fresh data.
@Retryable(retryFor = OptimisticLockException.class, maxAttempts = 3)
@Transactional
public void deductStock(UUID productId, int quantity) {
    Product product = productRepo.findById(productId).orElseThrow();
    if (product.getStockQuantity() < quantity) throw new InsufficientStockException();
    product.setStockQuantity(product.getStockQuantity() - quantity);
    productRepo.save(product);  // fails on version conflict → triggers retry
}

Q27 — What is the difference between TRUNCATE and DELETE? junior

Answer:

DELETE TRUNCATE
Removes rows One by one (MVCC) All at once (drops and recreates storage)
WAL logging Every row logged Minimal logging
Triggers Fires row-level triggers Does NOT fire row-level triggers
WHERE clause Yes — partial delete No — removes ALL rows
Transaction safe Yes — fully transactional Yes — transactional in PostgreSQL
Speed Slow on large tables Very fast
Resets sequences No Optional: TRUNCATE ... RESTART IDENTITY
Foreign keys Respects FK constraints Respects FK (must cascade or have no children)
-- Delete specific rows
DELETE FROM outbox_events WHERE status = 'PUBLISHED' AND created_at < now() - interval '7 days';

-- Remove all rows quickly (e.g., for test cleanup)
TRUNCATE outbox_events RESTART IDENTITY;

This system: Test setup uses @Transactional rollback (Spring Test) or Testcontainers with fresh containers. TRUNCATE is used in @BeforeEach for integration tests that need clean state.


Q28 — What is connection string security and how do you protect DB credentials? senior

Answer: Database connection strings contain passwords — exposing them in source code or logs is a critical security vulnerability (OWASP A02: Cryptographic Failures, A05: Security Misconfiguration).

Anti-patterns (never do):

# WRONG — hardcoded in source code
spring.datasource.password: MyPassword123

Secure approaches:

1. Environment variables:

spring:
  datasource:
    url: jdbc:postgresql://${DB_HOST}:5432/orderdb
    username: ${DB_USERNAME}
    password: ${DB_PASSWORD}  # injected from environment

2. Docker secrets (Docker Swarm/Kubernetes secrets):

# Kubernetes Secret
kubectl create secret generic db-secret --from-literal=password=secret123

# Pod spec mounts secret as env var
env:
  - name: DB_PASSWORD
    valueFrom:
      secretKeyRef:
        name: db-secret
        key: password

3. Vault / AWS Secrets Manager: Rotate credentials without redeployment. Spring Cloud Vault or AWS Parameter Store integration.

This system: Docker Compose uses environment variables loaded from .env file (git-ignored). Production would use Kubernetes secrets or Vault.


Q29 — How do you perform a database backup and restore in PostgreSQL? junior

Answer: pg_dump — logical backup (SQL format):

# Backup single database
pg_dump -h localhost -U postgres -d orderdb -F c -f orderdb.dump

# Restore
pg_restore -h localhost -U postgres -d orderdb_restored orderdb.dump

pg_dumpall — backup all databases + global objects (roles):

pg_dumpall -h localhost -U postgres > full_backup.sql

Formats:

Continuous backup with WAL archiving:

base backup (pg_basebackup) + WAL archive → point-in-time recovery

This system: Docker Compose data persists in named volumes. For production: automated nightly pg_dump to S3, with WAL archiving for PITR.


Q30 — How do you design the database schema for the outbox pattern? senior

Answer: The outbox table must support: transactional writes with business data, polling for unpublished events, marking events as published, and DLT for permanently failed events.

CREATE TABLE outbox_events (
    id            UUID DEFAULT gen_random_uuid() PRIMARY KEY,
    aggregate_id  UUID NOT NULL,          -- the entity ID (order_id, payment_id)
    aggregate_type VARCHAR(100) NOT NULL,  -- 'Order', 'Payment'
    event_type    VARCHAR(200) NOT NULL,   -- 'OrderCreatedEvent'
    topic         VARCHAR(200) NOT NULL,   -- 'order.created'
    payload       JSONB NOT NULL,          -- serialized event (Avro bytes or JSON)
    headers       JSONB,                   -- Kafka headers (traceId, correlationId)
    status        VARCHAR(20) NOT NULL DEFAULT 'PENDING',  -- PENDING, PUBLISHED, FAILED
    attempts      INT NOT NULL DEFAULT 0,
    created_at    TIMESTAMPTZ NOT NULL DEFAULT now(),
    published_at  TIMESTAMPTZ,
    last_error    TEXT
);

-- Index for efficient polling (poller queries pending events by created_at)
CREATE INDEX idx_outbox_pending ON outbox_events(created_at)
WHERE status = 'PENDING';

-- Index for aggregate lookups (debugging)
CREATE INDEX idx_outbox_aggregate ON outbox_events(aggregate_id, aggregate_type);

Poller query with SKIP LOCKED (safe parallel poller):

SELECT * FROM outbox_events
WHERE status = 'PENDING' AND attempts < 5
ORDER BY created_at ASC
LIMIT 100
FOR UPDATE SKIP LOCKED;

This design supports multiple poller instances running in parallel without duplicate publishing.