PostgreSQL — Interview Questions
Stack context: This system uses four isolated PostgreSQL instances (one per service): user-service (port 5432), product-service (port 5433), order-service (port 5434), payment-service (port 5435). Schema migrations are managed by Flyway. JPA/Hibernate is used for ORM, HikariCP for connection pooling.
Q1 — What is ACID and why does it matter in a relational database? junior
Answer: ACID defines the guarantees a database transaction must provide:
| Property | Meaning | Example |
|---|---|---|
| Atomicity | All operations in a transaction succeed, or none do | Order insert + inventory update — if inventory fails, order is rolled back |
| Consistency | Transaction brings DB from one valid state to another | Foreign key constraints, NOT NULL constraints are enforced |
| Isolation | Concurrent transactions behave as if they ran serially | Two users placing the last item order don't both see stock=1 |
| Durability | Committed transactions survive crashes | Power failure after commit does not lose the order |
Why microservices make ACID harder: ACID guarantees apply within a single DB. When order-service writes to its DB and payment-service writes to its DB, there is no distributed ACID — hence patterns like Saga and Outbox.
PostgreSQL: Fully ACID-compliant. Uses MVCC for isolation and WAL for durability.
Q2 — What is MVCC (Multi-Version Concurrency Control)? senior
Answer: MVCC allows concurrent reads and writes without locking by maintaining multiple versions of each row.
How it works:
- Each row has
xmin(transaction that created it) andxmax(transaction that deleted/updated it). - When a transaction reads a row, it sees the version that was committed before the transaction started.
- Writers create new row versions rather than overwriting existing ones.
- Old versions are cleaned up by VACUUM.
Benefits:
- Readers never block writers.
- Writers never block readers.
- High concurrency for read-heavy workloads (product catalog reads, order status queries).
Consequence: UPDATE = insert new version + mark old version as deleted. This causes table bloat over time — VACUUM reclaims dead tuple space.
-- See dead tuples accumulating (needs VACUUM)
SELECT relname, n_live_tup, n_dead_tup, last_vacuum
FROM pg_stat_user_tables
WHERE relname = 'orders';
Q3 — What are PostgreSQL isolation levels and what anomalies do they prevent? senior
Answer: PostgreSQL supports four isolation levels, each preventing a different set of anomalies:
| Level | Dirty Read | Non-Repeatable Read | Phantom Read | Serialization Anomaly |
|---|---|---|---|---|
| READ UNCOMMITTED | Possible (treated as RC) | Possible | Possible | Possible |
| READ COMMITTED (default) | Not possible | Possible | Possible | Possible |
| REPEATABLE READ | Not possible | Not possible | Not possible | Possible |
| SERIALIZABLE | Not possible | Not possible | Not possible | Not possible |
Anomaly definitions:
- Dirty read: Reading uncommitted changes from another transaction.
- Non-repeatable read: Reading the same row twice returns different data (another transaction updated it).
- Phantom read: Re-running a query returns different rows (another transaction inserted/deleted rows).
- Serialization anomaly: Result is inconsistent with any serial execution order.
Setting isolation level:
// Spring @Transactional
@Transactional(isolation = Isolation.SERIALIZABLE)
public void transferFunds(UUID from, UUID to, BigDecimal amount) {
// full serializable isolation — slowest but safest for financial operations
}
This system: Uses default READ COMMITTED. For order payment processing, REPEATABLE READ or SERIALIZABLE may be needed to prevent double payment.
Q4 — What is a PostgreSQL index and what types are available? junior
Answer: An index is a separate data structure that speeds up data retrieval by providing a fast lookup path. Without an index, PostgreSQL performs a sequential scan (reads every row).
Index types:
| Type | Use case | Example |
|---|---|---|
| B-tree (default) | Equality, range queries, ordering | WHERE status = 'CONFIRMED', ORDER BY created_at |
| Hash | Exact equality only (rare) | UUID lookups (but B-tree works too) |
| GIN | Full-text search, JSONB containment, arrays | WHERE metadata @> '{"category": "electronics"}' |
| GiST | Geometric data, ranges, full-text | tsrange, geometry types |
| BRIN | Very large tables with physical ordering | Time-series logs (append-only) |
| Partial index | Index subset of rows | WHERE status != 'COMPLETED' (index only pending orders) |
Creating indexes:
-- Standard B-tree
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- Composite index (multi-column)
CREATE INDEX idx_orders_status_created ON orders(status, created_at DESC);
-- Partial index (only index active orders)
CREATE INDEX idx_active_orders ON orders(customer_id)
WHERE status IN ('PENDING', 'CONFIRMED');
-- Covering index (include frequently selected columns)
CREATE INDEX idx_orders_covering ON orders(customer_id) INCLUDE (status, total_amount);
Q5 — How do you analyze query performance in PostgreSQL? junior
Answer:
Use EXPLAIN ANALYZE to see the query execution plan and actual execution statistics.
EXPLAIN ANALYZE
SELECT o.id, o.status, o.total_amount
FROM orders o
WHERE o.customer_id = '123e4567-e89b-12d3-a456-426614174000'
AND o.status = 'CONFIRMED'
ORDER BY o.created_at DESC
LIMIT 10;
Reading the output:
Index Scan using idx_orders_customer_id on orders (cost=0.28..8.30 rows=5 width=56)
(actual time=0.045..0.052 rows=3 loops=1)
Index Cond: (customer_id = '...')
Filter: (status = 'CONFIRMED')
Rows Removed by Filter: 12
Planning Time: 0.8 ms
Execution Time: 0.1 ms
Key things to look for:
- Seq Scan: Bad on large tables — missing index.
- actual rows vs estimated rows: Large discrepancy = stale statistics → run
ANALYZE orders. - Rows Removed by Filter: Many rows removed means a better index is needed.
- Nested Loop / Hash Join: Join strategy — nested loop is fine for small results, hash join for large.
- Execution Time: Actual wall clock time including I/O.
pg_stat_statements: Track cumulative query statistics to find the slowest queries:
SELECT query, mean_exec_time, calls FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;
Q6 — What is Flyway and how does it manage database migrations? junior
Answer:
Flyway is a database migration tool that applies versioned SQL scripts in order, tracking which scripts have been applied in a flyway_schema_history table.
Migration file naming convention:
V1__create_orders_table.sql (V = versioned, 1 = version, __ = separator)
V2__add_order_status_index.sql
V3__add_outbox_events_table.sql
R__refresh_order_summary_view.sql (R = repeatable — always re-run when checksum changes)
Flyway workflow:
- On app startup, Flyway checks
flyway_schema_history. - Compares applied versions to scripts on classpath.
- Applies pending scripts in order.
- If a script fails, the migration stops (and the transaction is rolled back in supported DBs).
Configuration:
spring:
flyway:
enabled: true
locations: classpath:db/migration
baseline-on-migrate: false # set true only when adding Flyway to existing DB
validate-on-migrate: true # fail if applied migration checksums don't match files
Critical rule: Never modify an applied migration script. Always create a new version to alter an existing object.
Q7 — What is JPA and how does Hibernate implement it? junior
Answer: JPA (Jakarta Persistence API) is a Java specification for ORM (Object-Relational Mapping). Hibernate is the most popular JPA implementation.
Core annotations:
@Entity
@Table(name = "orders")
public class Order {
@Id
@GeneratedValue(strategy = GenerationType.UUID)
private UUID id;
@Column(name = "customer_id", nullable = false)
private UUID customerId;
@Enumerated(EnumType.STRING)
@Column(nullable = false)
private OrderStatus status;
@Column(name = "total_amount", precision = 10, scale = 2)
private BigDecimal totalAmount;
@CreationTimestamp
@Column(name = "created_at", updatable = false)
private Instant createdAt;
@OneToMany(mappedBy = "order", cascade = CascadeType.ALL, orphanRemoval = true)
private List<OrderItem> items = new ArrayList<>();
}
JPA Repository (Spring Data):
public interface OrderRepository extends JpaRepository<Order, UUID> {
List<Order> findByCustomerIdOrderByCreatedAtDesc(UUID customerId);
Optional<Order> findByIdAndCustomerId(UUID id, UUID customerId);
@Query("SELECT o FROM Order o WHERE o.status = :status AND o.createdAt < :cutoff")
List<Order> findStalePendingOrders(@Param("status") OrderStatus status,
@Param("cutoff") Instant cutoff);
}
Q8 — What is the N+1 query problem and how do you solve it? junior
Answer: N+1 occurs when loading N parent entities triggers N additional queries to load their related entities, instead of one join query.
Example:
// N+1: 1 query for orders + N queries for items (one per order)
List<Order> orders = orderRepo.findAll(); // SELECT * FROM orders (returns N rows)
for (Order o : orders) {
System.out.println(o.getItems().size()); // SELECT * FROM order_items WHERE order_id = ?
} // N times!
Solutions:
1. JOIN FETCH in JPQL:
@Query("SELECT o FROM Order o JOIN FETCH o.items WHERE o.customerId = :customerId")
List<Order> findByCustomerIdWithItems(@Param("customerId") UUID customerId);
2. @EntityGraph:
@EntityGraph(attributePaths = {"items"})
List<Order> findByCustomerId(UUID customerId);
3. FetchType.EAGER (use with caution — always fetches, even when not needed):
@OneToMany(fetch = FetchType.EAGER)
private List<OrderItem> items;
4. Batch fetch (@BatchSize):
@BatchSize(size = 20)
@OneToMany
private List<OrderItem> items;
// Generates: SELECT * FROM order_items WHERE order_id IN (?, ?, ?, ...) — batches N queries into ⌈N/20⌉
Detection: Enable SQL logging (show-sql: true) or use p6spy to see all queries fired.
Q9 — What is HikariCP and how do you tune the connection pool? senior
Answer: HikariCP is the default JDBC connection pool in Spring Boot. It maintains a pool of pre-established DB connections for reuse, avoiding the latency of creating new connections per request.
Key configuration:
spring:
datasource:
hikari:
maximum-pool-size: 10 # max connections in pool
minimum-idle: 5 # idle connections maintained
connection-timeout: 30000 # max time to wait for connection (ms)
idle-timeout: 600000 # idle connection removed after 10 min
max-lifetime: 1800000 # connection recycled after 30 min
keepalive-time: 30000 # heartbeat to prevent idle timeout
pool-name: "order-service-pool"
Sizing formula (Hikari documentation):
pool_size = (core_count * 2) + effective_spindle_count
For a 4-core machine with SSD: 4 * 2 + 1 = 9 → round to 10.
Monitoring:
hikaricp.connections.active(Micrometer)hikaricp.connections.pending— high pending = pool too smallhikaricp.connections.timeout— requests waiting too long
This system: Each service has its own HikariCP pool. Typical microservice pool size: 5–20 depending on load.
Q10 — What is a transaction and how does @Transactional work in Spring? junior
Answer:
A transaction is a unit of work that must complete fully or not at all. @Transactional in Spring wraps a method in a DB transaction using AOP (proxy).
@Service
public class OrderService {
@Transactional // begin transaction before method, commit after, rollback on exception
public Order createOrder(CreateOrderRequest request) {
Order order = new Order(request);
orderRepository.save(order);
// If this throws, both the order save AND the outbox insert are rolled back
outboxRepository.save(new OutboxEvent("order.created", order.getId()));
return order;
}
}
Transaction propagation:
| Propagation | Behavior |
|---|---|
REQUIRED (default) |
Join existing transaction or create new |
REQUIRES_NEW |
Always create new transaction (suspends existing) |
SUPPORTS |
Join if exists, no transaction if not |
MANDATORY |
Must join existing transaction, else exception |
NOT_SUPPORTED |
Run without transaction |
NEVER |
Must not have transaction, else exception |
Pitfall: @Transactional only works when called via the Spring proxy. Calling this.transactionalMethod() from the same class bypasses the proxy — no transaction is started.
Q11 — What are PostgreSQL sequences and how do they differ from gen_random_uuid()? junior
Answer:
Sequences (auto-incrementing integers):
CREATE SEQUENCE orders_seq START 1 INCREMENT 1;
CREATE TABLE orders (
id BIGINT DEFAULT nextval('orders_seq') PRIMARY KEY,
-- or with SERIAL/BIGSERIAL shorthand:
);
-- SERIAL is equivalent to:
CREATE TABLE orders (id BIGSERIAL PRIMARY KEY, ...);
gen_random_uuid() (UUID v4):
CREATE TABLE orders (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
...
);
| Sequence / BIGSERIAL | gen_random_uuid() |
|
|---|---|---|
| Type | Integer (8 bytes) | UUID (16 bytes) |
| Globally unique | No (unique per table) | Yes (globally unique) |
| URL/API exposure | Exposes record count (security risk) | No enumeration risk |
| Index performance | Excellent (sequential inserts) | Moderate (random inserts cause B-tree splits) |
| Join performance | Better (smaller key) | Slightly worse |
This system: Uses UUIDs (gen_random_uuid()) for all entity IDs. Good for distributed systems where records might be created in multiple places.
Q12 — What is VACUUM and why is it important? senior
Answer: Due to MVCC, PostgreSQL never overwrites rows — UPDATE creates a new row version, and DELETE marks the old version as dead. VACUUM reclaims storage from dead tuples and updates statistics.
Two modes:
- VACUUM (lazy): Marks dead tuple space as reusable but does not return it to the OS.
- VACUUM FULL: Rewrites the table, returning space to OS. Requires an exclusive lock — avoid during peak hours.
AUTOVACUUM: PostgreSQL automatically runs VACUUM when:
- 20% + 50 rows of a table become dead tuples (default
autovacuum_vacuum_scale_factor = 0.2).
Monitoring:
SELECT schemaname, relname, n_live_tup, n_dead_tup,
last_autovacuum, last_autoanalyze
FROM pg_stat_user_tables
ORDER BY n_dead_tup DESC;
Transaction ID wraparound: PostgreSQL uses 32-bit transaction IDs. After ~2 billion transactions, IDs wrap. VACUUM prevents this by advancing the freeze horizon. If ignored, the database becomes read-only for emergency VACUUM (catastrophic).
This system: Autovacuum handles routine cleanup. Monitor n_dead_tup for tables with high UPDATE/DELETE rates (payments, order status updates).
Q13 — How do you use SELECT FOR UPDATE and when is it needed? senior
Answer:
SELECT FOR UPDATE acquires a row-level lock when reading, preventing other transactions from updating those rows until the lock is released.
Use case: Check-then-act on stock inventory:
BEGIN;
-- Lock the product row to prevent concurrent stock deductions
SELECT id, stock_quantity FROM products WHERE id = ? FOR UPDATE;
-- If stock > 0, deduct
UPDATE products SET stock_quantity = stock_quantity - 1 WHERE id = ? AND stock_quantity > 0;
COMMIT;
Without FOR UPDATE, two concurrent requests could both read stock=1, both decide to proceed, and both deduct — resulting in stock=-1.
Spring Data JPA:
@Lock(LockModeType.PESSIMISTIC_WRITE)
@Query("SELECT p FROM Product p WHERE p.id = :id")
Optional<Product> findByIdForUpdate(@Param("id") UUID id);
SKIP LOCKED (for queue processing — outbox poller):
SELECT * FROM outbox_events WHERE status = 'PENDING'
ORDER BY created_at
LIMIT 10
FOR UPDATE SKIP LOCKED;
-- Skips rows locked by other processes — enables parallel polling workers
Q14 — How do you model a many-to-many relationship in PostgreSQL? junior
Answer: Many-to-many requires a join table (junction table) with foreign keys to both parent tables.
Example: Orders and Products (order_items):
CREATE TABLE order_items (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
product_id UUID NOT NULL REFERENCES products(id),
quantity INT NOT NULL CHECK (quantity > 0),
unit_price NUMERIC(10, 2) NOT NULL, -- snapshot of price at order time
UNIQUE(order_id, product_id) -- one line per product per order
);
JPA mapping:
@Entity
public class OrderItem {
@Id @GeneratedValue(strategy = GenerationType.UUID)
private UUID id;
@ManyToOne(fetch = FetchType.LAZY)
@JoinColumn(name = "order_id")
private Order order;
@Column(name = "product_id")
private UUID productId; // UUID, not @ManyToOne — cross-service reference
private int quantity;
private BigDecimal unitPrice;
}
Note: productId is a UUID, not a JPA @ManyToOne relation — because product data lives in product-service's separate database. Cross-service relationships use IDs, not JPA joins.
Q15 — What is JSONB and when should you use it in PostgreSQL? senior
Answer: PostgreSQL supports two JSON types:
json: Stores JSON as text. Validates JSON syntax. No indexing.jsonb: Stores JSON in binary format. Parsed on insert. Supports GIN indexing. Slightly larger storage. Recommended.
JSONB use cases:
-- Store semi-structured event metadata
CREATE TABLE outbox_events (
id UUID PRIMARY KEY,
event_type VARCHAR(100) NOT NULL,
payload JSONB NOT NULL,
metadata JSONB, -- headers, trace IDs, etc.
created_at TIMESTAMPTZ DEFAULT now()
);
-- Query JSONB
SELECT * FROM outbox_events
WHERE payload->>'orderId' = '123' -- text extraction
AND payload @> '{"status": "PENDING"}'; -- containment check
-- GIN index for JSONB queries
CREATE INDEX idx_outbox_payload ON outbox_events USING GIN (payload);
When to use JSONB:
- Polymorphic data where schema varies per record type.
- Event sourcing (events have different payloads).
- Storing external API responses without knowing the full schema.
When NOT to use: If you always query specific fields, use typed columns + indexes — they're faster and type-safe.
Q16 — What is connection pooling and why is it important? junior
Answer: Each PostgreSQL connection requires a dedicated backend process (~5–10 MB RAM). Without pooling, every request creates a new connection — slow and resource-exhausting.
Connection pool flow:
App request → HikariCP pool:
Pool has idle connection → return it immediately (fast path)
Pool is full, all busy → wait (up to connection-timeout)
Pool below max → create new connection (slow path)
Request completes → return connection to pool (not closed)
External pooler (PgBouncer): For very high concurrency, HikariCP + direct connections can overwhelm PostgreSQL. PgBouncer sits between app and PostgreSQL:
- Transaction pooling: Connection returned to PgBouncer pool after each transaction (most efficient).
- Session pooling: Connection held for entire client session.
PostgreSQL max connections: Default 100. With 6 services × 10 connections = 60 connections — within limits. With 10 instances × 6 services × 10 = 600 — exceeds limits → need PgBouncer.
Q17 — What is Write-Ahead Logging (WAL) and how does it ensure durability? senior
Answer: WAL (Write-Ahead Logging) is PostgreSQL's durability mechanism. Before any data page is written to disk, the change is first written to the WAL log.
How it works:
- Transaction commits → WAL record written to WAL buffer.
fsync()flushes WAL to disk (durability guarantee).- Data pages are written to disk lazily (later, by background writer).
- On crash, PostgreSQL replays WAL records to recover uncommitted changes.
Benefits beyond durability:
- Replication: Standby servers replay WAL from primary — logical and physical replication.
- Point-in-time recovery (PITR): Archive WAL files → restore to any point in time.
- Logical decoding: Parse WAL to extract change events (used by Debezium CDC).
Logical replication (Debezium alternative to outbox poller):
-- Enable logical replication in postgresql.conf
wal_level = logical
-- Create publication for outbox table
CREATE PUBLICATION outbox_pub FOR TABLE outbox_events;
Debezium reads the WAL stream, eliminating the polling outbox pattern.
Q18 — How do you handle full-text search in PostgreSQL? senior
Answer:
PostgreSQL has built-in full-text search via tsvector and tsquery.
Setup:
-- Add tsvector column for full-text search on products
ALTER TABLE products ADD COLUMN search_vector tsvector;
-- Update with weighted vectors (A = highest weight)
UPDATE products SET search_vector =
setweight(to_tsvector('english', coalesce(name, '')), 'A') ||
setweight(to_tsvector('english', coalesce(description, '')), 'B');
-- GIN index for fast full-text search
CREATE INDEX idx_products_fts ON products USING GIN (search_vector);
-- Update automatically via trigger
CREATE TRIGGER update_product_search
BEFORE INSERT OR UPDATE ON products
FOR EACH ROW EXECUTE FUNCTION tsvector_update_trigger(
search_vector, 'pg_catalog.english', name, description);
Querying:
SELECT id, name, ts_rank(search_vector, query) AS rank
FROM products, to_tsquery('english', 'laptop & gaming') query
WHERE search_vector @@ query
ORDER BY rank DESC
LIMIT 20;
vs Elasticsearch: PostgreSQL FTS works for moderate datasets. For large-scale, faceted, fuzzy search → Elasticsearch.
Q19 — What is table partitioning and when should you use it? senior
Answer: Partitioning divides a large table into smaller physical sub-tables (partitions), improving query performance and maintenance.
Range partitioning by date (logs, orders):
CREATE TABLE orders (
id UUID NOT NULL,
created_at TIMESTAMPTZ NOT NULL,
status VARCHAR(20),
...
) PARTITION BY RANGE (created_at);
-- Monthly partitions
CREATE TABLE orders_2026_01 PARTITION OF orders
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE orders_2026_02 PARTITION OF orders
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
Benefits:
- Partition pruning:
WHERE created_at > '2026-01-01'only scans relevant partitions. - Maintenance: Drop old partitions in milliseconds (
DROP TABLE orders_2024_01). - Parallel queries: Scan multiple partitions in parallel.
When to use: Tables with > 10 million rows, natural range (time) or list (status, tenant) partitioning key, and queries that typically access a subset of partitions.
This system: Not currently partitioned. orders table would benefit from monthly partitioning at scale (millions of orders per month).
Q20 — What are common PostgreSQL data types and how do you choose between them? junior
Answer:
| Data | Recommended Type | Notes |
|---|---|---|
| Integer IDs | BIGSERIAL or UUID |
BIGSERIAL for sequences, UUID for distributed |
| Text | TEXT |
No length limit; use VARCHAR(n) only for enforced max |
| Status/enum | VARCHAR(20) or TEXT with CHECK |
Avoid native ENUM (hard to alter) |
| Decimal money | NUMERIC(10, 2) |
Never FLOAT for money (floating point imprecision) |
| Timestamps | TIMESTAMPTZ |
Always store with timezone; normalizes to UTC internally |
| Boolean | BOOLEAN |
TRUE/FALSE/NULL |
| JSON | JSONB |
Prefer over json for indexing |
| IPs | INET or CIDR |
Supports IP arithmetic and indexing |
| Binary | BYTEA |
Small blobs; large files go to object storage |
| Array | INTEGER[], TEXT[] |
GIN indexable; avoid for relational data |
Money pitfall:
-- WRONG: floating point imprecision
price DOUBLE PRECISION;
-- CORRECT: exact decimal
price NUMERIC(10, 2); -- up to 99,999,999.99
Q21 — How do you detect and resolve deadlocks in PostgreSQL? senior
Answer: A deadlock occurs when two transactions each hold a lock the other needs:
- Transaction A holds lock on Order → waits for Product.
- Transaction B holds lock on Product → waits for Order.
PostgreSQL handles deadlocks automatically: Detects the cycle after deadlock_timeout (default 1 second) and rolls back one transaction (the "victim"), allowing the other to proceed.
Detection:
-- Check current locks
SELECT pid, relation::regclass, mode, granted
FROM pg_locks l JOIN pg_class c ON l.relation = c.oid
WHERE NOT granted;
-- Check waiting queries
SELECT pid, query, state, wait_event_type, wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock';
Log deadlocks:
log_min_duration_statement = 1000 # log queries > 1s
deadlock_timeout = 1s
log_lock_waits = on
Prevention strategies:
- Consistent lock ordering: Always lock resources in the same order across all transactions (A before B, never B before A).
- Minimize transaction scope: Hold locks for as short a time as possible.
- Use
SKIP LOCKED: For queue processing, skip locked rows instead of waiting. - Optimistic locking (JPA
@Version): No DB lock; retry on conflict.
Q22 — What is logical replication and how does Debezium use it? senior
Answer: Logical replication streams decoded row changes (INSERT/UPDATE/DELETE) from PostgreSQL's WAL to external consumers, without requiring the consumer to understand PostgreSQL's binary format.
Setup:
-- postgresql.conf
wal_level = logical
max_replication_slots = 5
max_wal_senders = 5
-- Create replication slot for Debezium
SELECT pg_create_logical_replication_slot('debezium_slot', 'pgoutput');
Debezium connector (replaces outbox poller):
{
"connector.class": "io.debezium.connector.postgresql.PostgresConnector",
"database.hostname": "postgres",
"database.dbname": "orderdb",
"slot.name": "debezium_slot",
"table.include.list": "public.outbox_events",
"transforms": "outbox",
"transforms.outbox.type": "io.debezium.transforms.outbox.EventRouter"
}
Benefits over outbox poller:
- Sub-millisecond latency (streaming vs polling interval).
- No DB polling overhead.
- Captures ALL changes (not just INSERT — also UPDATE/DELETE).
- One Debezium connector per DB instead of one poller per service.
Trade-off: Adds Debezium + Kafka Connect infrastructure complexity.
Q23 — How do you implement pagination in PostgreSQL efficiently? junior
Answer: Offset-based pagination (simple but slow for deep pages):
SELECT * FROM orders
WHERE customer_id = ?
ORDER BY created_at DESC
LIMIT 20 OFFSET 200; -- skips 200 rows before returning 20
Problem: PostgreSQL reads and discards the first 200 rows — slow for large offsets.
Cursor-based (keyset) pagination (efficient for all pages):
-- First page
SELECT * FROM orders WHERE customer_id = ?
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Next page (using last row's values as cursor)
SELECT * FROM orders
WHERE customer_id = ?
AND (created_at, id) < (:lastCreatedAt, :lastId) -- keyset condition
ORDER BY created_at DESC, id DESC
LIMIT 20;
Benefits: Constant performance regardless of page depth. Uses the index efficiently.
Spring Data (offset):
Page<Order> findByCustomerId(UUID customerId, Pageable pageable);
// Usage: PageRequest.of(0, 20, Sort.by("createdAt").descending())
Spring Data (keyset — requires Blaze-Persistence or custom query):
@Query("SELECT o FROM Order o WHERE o.customerId = :id AND o.createdAt < :cursor ORDER BY o.createdAt DESC")
List<Order> findNextPage(@Param("id") UUID id, @Param("cursor") Instant cursor, Pageable pageable);
Q24 — What is pg_stat_activity and how do you use it to monitor running queries? junior
Answer:
pg_stat_activity is a system view showing currently active database sessions and their current query.
-- All active queries sorted by duration
SELECT pid, usename, application_name, state,
now() - query_start AS duration,
left(query, 100) AS query_snippet
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
-- Find long-running queries (> 5 seconds)
SELECT pid, now() - query_start AS duration, query
FROM pg_stat_activity
WHERE state = 'active'
AND query_start < now() - interval '5 seconds';
-- Kill a stuck query
SELECT pg_terminate_backend(pid) FROM pg_stat_activity
WHERE query LIKE '%outbox_events%' AND state = 'active' AND pid != pg_backend_pid();
Key columns:
state:active,idle,idle in transaction(potential lock holder),disabled.wait_event_type/wait_event: What is the backend waiting for (Lock, IO, Client).query: Currently executing SQL.application_name: Set via JDBC URL (applicationName=order-service) to identify which service.
Q25 — What is EXPLAIN vs EXPLAIN ANALYZE and when to use each? junior
Answer:
EXPLAIN |
EXPLAIN ANALYZE |
|
|---|---|---|
| Executes query | No (estimate only) | Yes (runs the query!) |
| Shows actual rows | No | Yes |
| Shows actual time | No | Yes |
| Safe on production | Yes | Careful — runs the query! |
EXPLAIN: Shows what the query planner thinks it will do. Useful for checking if the right index will be used without running an expensive query.
EXPLAIN ANALYZE: Actually executes the query and compares planned vs actual. Use when EXPLAIN shows the planner is wrong.
Caution: Never run EXPLAIN ANALYZE on a destructive query (UPDATE, DELETE) in production — it executes! Wrap in a transaction and rollback:
BEGIN;
EXPLAIN ANALYZE UPDATE orders SET status = 'CANCELLED' WHERE created_at < '2020-01-01';
ROLLBACK;
EXPLAIN (ANALYZE, BUFFERS): Also shows cache hits/misses — useful for I/O optimization:
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT)
SELECT * FROM orders WHERE customer_id = '...' ORDER BY created_at DESC;
Q26 — How do you implement optimistic locking with PostgreSQL and JPA? senior
Answer: Optimistic locking assumes conflicts are rare. Instead of locking rows on read, it checks that the row hasn't changed when writing. Uses a version counter.
@Entity
public class Product {
@Id private UUID id;
private int stockQuantity;
@Version // JPA version field
private Long version;
}
How it works:
- Read:
SELECT * FROM products WHERE id = ?→ returns{stockQuantity: 10, version: 5}. - Update:
UPDATE products SET stock_quantity = 9, version = 6 WHERE id = ? AND version = 5. - If another transaction updated version to 6 between step 1 and 2:
UPDATEaffects 0 rows → JPA throwsOptimisticLockException. - Caller retries with fresh data.
@Retryable(retryFor = OptimisticLockException.class, maxAttempts = 3)
@Transactional
public void deductStock(UUID productId, int quantity) {
Product product = productRepo.findById(productId).orElseThrow();
if (product.getStockQuantity() < quantity) throw new InsufficientStockException();
product.setStockQuantity(product.getStockQuantity() - quantity);
productRepo.save(product); // fails on version conflict → triggers retry
}
Q27 — What is the difference between TRUNCATE and DELETE? junior
Answer:
DELETE |
TRUNCATE |
|
|---|---|---|
| Removes rows | One by one (MVCC) | All at once (drops and recreates storage) |
| WAL logging | Every row logged | Minimal logging |
| Triggers | Fires row-level triggers | Does NOT fire row-level triggers |
WHERE clause |
Yes — partial delete | No — removes ALL rows |
| Transaction safe | Yes — fully transactional | Yes — transactional in PostgreSQL |
| Speed | Slow on large tables | Very fast |
| Resets sequences | No | Optional: TRUNCATE ... RESTART IDENTITY |
| Foreign keys | Respects FK constraints | Respects FK (must cascade or have no children) |
-- Delete specific rows
DELETE FROM outbox_events WHERE status = 'PUBLISHED' AND created_at < now() - interval '7 days';
-- Remove all rows quickly (e.g., for test cleanup)
TRUNCATE outbox_events RESTART IDENTITY;
This system: Test setup uses @Transactional rollback (Spring Test) or Testcontainers with fresh containers. TRUNCATE is used in @BeforeEach for integration tests that need clean state.
Q28 — What is connection string security and how do you protect DB credentials? senior
Answer: Database connection strings contain passwords — exposing them in source code or logs is a critical security vulnerability (OWASP A02: Cryptographic Failures, A05: Security Misconfiguration).
Anti-patterns (never do):
# WRONG — hardcoded in source code
spring.datasource.password: MyPassword123
Secure approaches:
1. Environment variables:
spring:
datasource:
url: jdbc:postgresql://${DB_HOST}:5432/orderdb
username: ${DB_USERNAME}
password: ${DB_PASSWORD} # injected from environment
2. Docker secrets (Docker Swarm/Kubernetes secrets):
# Kubernetes Secret
kubectl create secret generic db-secret --from-literal=password=secret123
# Pod spec mounts secret as env var
env:
- name: DB_PASSWORD
valueFrom:
secretKeyRef:
name: db-secret
key: password
3. Vault / AWS Secrets Manager: Rotate credentials without redeployment. Spring Cloud Vault or AWS Parameter Store integration.
This system: Docker Compose uses environment variables loaded from .env file (git-ignored). Production would use Kubernetes secrets or Vault.
Q29 — How do you perform a database backup and restore in PostgreSQL? junior
Answer:
pg_dump — logical backup (SQL format):
# Backup single database
pg_dump -h localhost -U postgres -d orderdb -F c -f orderdb.dump
# Restore
pg_restore -h localhost -U postgres -d orderdb_restored orderdb.dump
pg_dumpall — backup all databases + global objects (roles):
pg_dumpall -h localhost -U postgres > full_backup.sql
Formats:
-F p(plain): SQL text — portable, human-readable, no parallelism.-F c(custom): Compressed, supports parallel restore, recommended.-F d(directory): One file per table, supportspg_restore -j 4parallel restore.
Continuous backup with WAL archiving:
base backup (pg_basebackup) + WAL archive → point-in-time recovery
This system: Docker Compose data persists in named volumes. For production: automated nightly pg_dump to S3, with WAL archiving for PITR.
Q30 — How do you design the database schema for the outbox pattern? senior
Answer: The outbox table must support: transactional writes with business data, polling for unpublished events, marking events as published, and DLT for permanently failed events.
CREATE TABLE outbox_events (
id UUID DEFAULT gen_random_uuid() PRIMARY KEY,
aggregate_id UUID NOT NULL, -- the entity ID (order_id, payment_id)
aggregate_type VARCHAR(100) NOT NULL, -- 'Order', 'Payment'
event_type VARCHAR(200) NOT NULL, -- 'OrderCreatedEvent'
topic VARCHAR(200) NOT NULL, -- 'order.created'
payload JSONB NOT NULL, -- serialized event (Avro bytes or JSON)
headers JSONB, -- Kafka headers (traceId, correlationId)
status VARCHAR(20) NOT NULL DEFAULT 'PENDING', -- PENDING, PUBLISHED, FAILED
attempts INT NOT NULL DEFAULT 0,
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
published_at TIMESTAMPTZ,
last_error TEXT
);
-- Index for efficient polling (poller queries pending events by created_at)
CREATE INDEX idx_outbox_pending ON outbox_events(created_at)
WHERE status = 'PENDING';
-- Index for aggregate lookups (debugging)
CREATE INDEX idx_outbox_aggregate ON outbox_events(aggregate_id, aggregate_type);
Poller query with SKIP LOCKED (safe parallel poller):
SELECT * FROM outbox_events
WHERE status = 'PENDING' AND attempts < 5
ORDER BY created_at ASC
LIMIT 100
FOR UPDATE SKIP LOCKED;
This design supports multiple poller instances running in parallel without duplicate publishing.