Messages Operations Guide

Practical operational guidance for monitoring messaging system health, debugging common issues, and handling production incidents.

System Health Monitoring

Key Metrics to Track

Message delivery rate:

sql

SELECT
    DATE(created_at) as date,
    COUNT(*) as messages_sent,
    COUNT(CASE WHEN is_read = true THEN 1 END) as messages_read,
    ROUND(COUNT(CASE WHEN is_read = true THEN 1 END)::numeric / COUNT(*)::numeric * 100, 2) as read_rate
FROM messages
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;

Reminder effectiveness:

sql

SELECT
    recipient_type,
    level,
    COUNT(*) as total_reminders,
    COUNT(CASE WHEN sent_at IS NOT NULL THEN 1 END) as sent,
    COUNT(CASE WHEN cancelled_at IS NOT NULL THEN 1 END) as cancelled,
    ROUND(COUNT(CASE WHEN cancelled_at IS NOT NULL THEN 1 END)::numeric / COUNT(*)::numeric * 100, 2) as cancel_rate
FROM message_thread_reminders
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY recipient_type, level;

AI service health:

bash

# Check recent AI analysis jobs
docker-compose exec -T wedissimo-api php artisan queue:failed | grep CheckMessageNeedsResponse

Queue depth:

bash

# Monitor queue status
docker-compose exec -T wedissimo-api php artisan queue:monitor default --max=100

Alerting Thresholds

Critical alerts:

Queue depth > 500 for more than 10 minutes
AI service failure rate > 5% over 1 hour
Message delivery failure rate > 2%

Warning alerts:

Average message response time > 48 hours
Unread message count > 1000 per vendor
Queue processing lag > 15 minutes

Common Issues & Solutions

Issue: Reminders Not Being Sent

Symptoms:

Users report not receiving reminder emails
Database shows reminders scheduled but not sent
sent_at column remains NULL past scheduled time

Diagnostic steps:

Check scheduler is running:

bash

docker-compose exec -T wedissimo-api php artisan schedule:list | grep messages:send-reminders

Verify command execution:

bash

docker-compose logs -f wedissimo-api | grep "messages:send-reminders"

Check pending reminders:

sql

SELECT COUNT(*), recipient_type, level
FROM message_thread_reminders
WHERE scheduled_for <= NOW()
  AND sent_at IS NULL
  AND cancelled_at IS NULL
GROUP BY recipient_type, level;

Solutions:

Ensure Laravel scheduler cron is running
Restart queue workers: docker-compose restart wedissimo-api
Check mail configuration: php artisan config:cache
Manually run: php artisan messages:send-reminders

Issue: AI Service Timeouts

Symptoms:

Slow message creation
Failed jobs in queue
Vertex AI timeout errors in logs

Diagnostic steps:

Check recent failures:

bash

docker-compose exec -T wedissimo-api php artisan queue:failed | grep CheckMessageNeedsResponseJob

Test AI service directly:

bash

docker-compose exec -T wedissimo-api php artisan tinker

php

$ai = app(\Modules\Messages\Services\VertexAiScanningService::class);
$result = $ai->scanForNeedsResponse("Can you help me with my wedding?");
dd($result);

Solutions:

Check config in modules/Messages/Config/config.php
The job has built-in retry with backoff (60s, 120s, 300s)
On failure, job defaults to scheduling reminders (fail-safe)

Issue: Duplicate Reminders

Symptoms:

Users receiving multiple reminder emails
Database shows duplicate reminder records

Diagnostic steps:

Check for duplicate records:

sql

SELECT thread_id, trigger_message_id, level, COUNT(*)
FROM message_thread_reminders
GROUP BY thread_id, trigger_message_id, level
HAVING COUNT(*) > 1;

Check unique constraint exists:

sql

SELECT indexname FROM pg_indexes
WHERE tablename = 'message_thread_reminders'
AND indexname LIKE '%unique%';

Solutions:

The unique_reminder constraint prevents duplicates
MessageReminderService::sendReminder() uses pessimistic locking
Review if jobs are being dispatched multiple times

Issue: Reminders Not Cancelled When User Responds

Symptoms:

User responds but still receives reminders
cancelled_at not set after response

Diagnostic steps:

Check event listener is registered:

bash

docker-compose exec -T wedissimo-api php artisan tinker
>>> app()->make('events')->getListeners(\Modules\Messages\Events\ParticipantRespondedToThread::class)

Check if event was fired:

bash

docker-compose logs -f wedissimo-api | grep "ParticipantRespondedToThread"

Check pending reminders for user:

sql

SELECT * FROM message_thread_reminders
WHERE recipient_id = 'user-uuid'
  AND sent_at IS NULL
  AND cancelled_at IS NULL;

Solutions:

Verify CancelRemindersOnResponse listener is registered in MessagesServiceProvider
Check MessageService::sendMessage() fires the event
Manually cancel: MessageReminderService::cancelPendingRemindersForRecipient()

Performance Optimization

Slow Reminder Queries

Problem: Reminder scheduler running slowly.

Solution - Verify partial index:

sql

-- Check the partial index exists
SELECT indexname, indexdef FROM pg_indexes
WHERE tablename = 'message_thread_reminders';

-- Should see reminders_due_idx with WHERE clause

Solution - Check query plan:

sql

EXPLAIN ANALYZE
SELECT * FROM message_thread_reminders
WHERE scheduled_for <= NOW()
  AND sent_at IS NULL
  AND cancelled_at IS NULL;

Queue Backlogs

Problem: Jobs growing faster than processing.

Solution - Scale workers:

bash

# Run additional worker
docker-compose exec -T wedissimo-api php artisan queue:work --queue=default --tries=3

Solution - Check job throughput:

bash

# Monitor job processing
docker-compose exec -T wedissimo-api php artisan queue:monitor default

Data Maintenance

Cleaning Up Old Reminders

Remove sent reminders after 6 months:

php

// Via tinker
MessageThreadReminder::query()
    ->whereNotNull('sent_at')
    ->where('sent_at', '<', now()->subMonths(6))
    ->delete();

Cancel stale pending reminders:

php

// Cancel reminders for messages older than 30 days
MessageThreadReminder::query()
    ->whereNull('sent_at')
    ->whereNull('cancelled_at')
    ->whereHas('triggerMessage', fn($q) => $q->where('created_at', '<', now()->subDays(30)))
    ->update([
        'cancelled_at' => now(),
        'cancellation_reason' => 'stale',
    ]);

Cleaning Up Old Messages

Soft-deleted message cleanup (after 90 days):

php

Message::onlyTrashed()
    ->where('deleted_at', '<', now()->subDays(90))
    ->forceDelete();

Incident Response

Message Delivery Failure

Immediate actions:

Check mail service status (Mailpit/SES)
Verify queue workers are running
Review recent failed jobs
Check notification channel configuration

Recovery steps:

Retry failed jobs: php artisan queue:retry all
Restart queue workers: docker-compose restart wedissimo-api
Verify delivery with test message
Monitor queue for 15 minutes

AI Service Outage

Immediate actions:

Jobs will fail and retry automatically
After 3 retries, failed() method schedules reminders as fallback
Monitor failed job count

Check fallback behavior:

php

// The job's failed() method schedules reminders even if AI fails
// This is the safe default - better to remind than miss a booking

Recovery:

Test AI service restoration via tinker
Retry failed jobs: php artisan queue:retry all
Monitor AI success rate

Runbook Checklist

Daily checks:

[ ] Queue depth < 100
[ ] No failed AI jobs in last 24 hours
[ ] Scheduler running (check schedule:list)

Weekly checks:

[ ] Review reminder cancellation rates (40-60% is healthy - means users are responding)
[ ] Check for message spam patterns
[ ] Review slow query logs

Monthly checks:

[ ] Clean up old sent reminders
[ ] Archive messages > 2 years old
[ ] Review reminder effectiveness metrics

Key Commands

bash

# Send pending reminders manually
docker-compose exec -T wedissimo-api php artisan messages:send-reminders

# Check scheduler
docker-compose exec -T wedissimo-api php artisan schedule:list

# Check failed jobs
docker-compose exec -T wedissimo-api php artisan queue:failed

# Retry failed jobs
docker-compose exec -T wedissimo-api php artisan queue:retry all

# Clear config cache
docker-compose exec -T wedissimo-api php artisan config:cache

Support Resources

Laravel Queue Documentation:https://laravel.com/docs/queues

Internal Contacts:

Platform Team: #platform-support
AI Integration: #ai-engineering

Core Domains

Messages Module

Architecture Decision Records (ADR)

GDPR Compliance

Messages Operations Guide

System Health Monitoring

Key Metrics to Track

Alerting Thresholds

Common Issues & Solutions

Issue: Reminders Not Being Sent

Issue: AI Service Timeouts

Issue: Duplicate Reminders

Issue: Reminders Not Cancelled When User Responds

Performance Optimization

Slow Reminder Queries

Queue Backlogs

Data Maintenance

Cleaning Up Old Reminders

Cleaning Up Old Messages

Incident Response

Message Delivery Failure

AI Service Outage

Runbook Checklist

Key Commands

Support Resources

Messages Module

Messages Operations Guide ​

System Health Monitoring ​

Key Metrics to Track ​

Alerting Thresholds ​

Common Issues & Solutions ​

Issue: Reminders Not Being Sent ​

Issue: AI Service Timeouts ​

Issue: Duplicate Reminders ​

Issue: Reminders Not Cancelled When User Responds ​

Performance Optimization ​

Slow Reminder Queries ​

Queue Backlogs ​

Data Maintenance ​

Cleaning Up Old Reminders ​

Cleaning Up Old Messages ​

Incident Response ​

Message Delivery Failure ​

AI Service Outage ​

Runbook Checklist ​

Key Commands ​

Support Resources ​

Messages Operations Guide

System Health Monitoring

Key Metrics to Track

Alerting Thresholds

Common Issues & Solutions

Issue: Reminders Not Being Sent

Issue: AI Service Timeouts

Issue: Duplicate Reminders

Issue: Reminders Not Cancelled When User Responds

Performance Optimization

Slow Reminder Queries

Queue Backlogs

Data Maintenance

Cleaning Up Old Reminders

Cleaning Up Old Messages

Incident Response

Message Delivery Failure

AI Service Outage

Runbook Checklist

Key Commands

Support Resources