Messages Operations Guide
Practical operational guidance for monitoring messaging system health, debugging common issues, and handling production incidents.
System Health Monitoring
Key Metrics to Track
Message delivery rate:
SELECT
DATE(created_at) as date,
COUNT(*) as messages_sent,
COUNT(CASE WHEN is_read = true THEN 1 END) as messages_read,
ROUND(COUNT(CASE WHEN is_read = true THEN 1 END)::numeric / COUNT(*)::numeric * 100, 2) as read_rate
FROM messages
WHERE created_at >= NOW() - INTERVAL '7 days'
GROUP BY DATE(created_at)
ORDER BY date DESC;Reminder effectiveness:
SELECT
recipient_type,
level,
COUNT(*) as total_reminders,
COUNT(CASE WHEN sent_at IS NOT NULL THEN 1 END) as sent,
COUNT(CASE WHEN cancelled_at IS NOT NULL THEN 1 END) as cancelled,
ROUND(COUNT(CASE WHEN cancelled_at IS NOT NULL THEN 1 END)::numeric / COUNT(*)::numeric * 100, 2) as cancel_rate
FROM message_thread_reminders
WHERE created_at >= NOW() - INTERVAL '30 days'
GROUP BY recipient_type, level;AI service health:
# Check recent AI analysis jobs
docker-compose exec -T wedissimo-api php artisan queue:failed | grep CheckMessageNeedsResponseQueue depth:
# Monitor queue status
docker-compose exec -T wedissimo-api php artisan queue:monitor default --max=100Alerting Thresholds
Critical alerts:
- Queue depth > 500 for more than 10 minutes
- AI service failure rate > 5% over 1 hour
- Message delivery failure rate > 2%
Warning alerts:
- Average message response time > 48 hours
- Unread message count > 1000 per vendor
- Queue processing lag > 15 minutes
Common Issues & Solutions
Issue: Reminders Not Being Sent
Symptoms:
- Users report not receiving reminder emails
- Database shows reminders scheduled but not sent
sent_atcolumn remains NULL past scheduled time
Diagnostic steps:
- Check scheduler is running:
docker-compose exec -T wedissimo-api php artisan schedule:list | grep messages:send-reminders- Verify command execution:
docker-compose logs -f wedissimo-api | grep "messages:send-reminders"- Check pending reminders:
SELECT COUNT(*), recipient_type, level
FROM message_thread_reminders
WHERE scheduled_for <= NOW()
AND sent_at IS NULL
AND cancelled_at IS NULL
GROUP BY recipient_type, level;Solutions:
- Ensure Laravel scheduler cron is running
- Restart queue workers:
docker-compose restart wedissimo-api - Check mail configuration:
php artisan config:cache - Manually run:
php artisan messages:send-reminders
Issue: AI Service Timeouts
Symptoms:
- Slow message creation
- Failed jobs in queue
- Vertex AI timeout errors in logs
Diagnostic steps:
- Check recent failures:
docker-compose exec -T wedissimo-api php artisan queue:failed | grep CheckMessageNeedsResponseJob- Test AI service directly:
docker-compose exec -T wedissimo-api php artisan tinker$ai = app(\Modules\Messages\Services\VertexAiScanningService::class);
$result = $ai->scanForNeedsResponse("Can you help me with my wedding?");
dd($result);Solutions:
- Check config in
modules/Messages/Config/config.php - The job has built-in retry with backoff (60s, 120s, 300s)
- On failure, job defaults to scheduling reminders (fail-safe)
Issue: Duplicate Reminders
Symptoms:
- Users receiving multiple reminder emails
- Database shows duplicate reminder records
Diagnostic steps:
- Check for duplicate records:
SELECT thread_id, trigger_message_id, level, COUNT(*)
FROM message_thread_reminders
GROUP BY thread_id, trigger_message_id, level
HAVING COUNT(*) > 1;- Check unique constraint exists:
SELECT indexname FROM pg_indexes
WHERE tablename = 'message_thread_reminders'
AND indexname LIKE '%unique%';Solutions:
- The
unique_reminderconstraint prevents duplicates MessageReminderService::sendReminder()uses pessimistic locking- Review if jobs are being dispatched multiple times
Issue: Reminders Not Cancelled When User Responds
Symptoms:
- User responds but still receives reminders
cancelled_atnot set after response
Diagnostic steps:
- Check event listener is registered:
docker-compose exec -T wedissimo-api php artisan tinker
>>> app()->make('events')->getListeners(\Modules\Messages\Events\ParticipantRespondedToThread::class)- Check if event was fired:
docker-compose logs -f wedissimo-api | grep "ParticipantRespondedToThread"- Check pending reminders for user:
SELECT * FROM message_thread_reminders
WHERE recipient_id = 'user-uuid'
AND sent_at IS NULL
AND cancelled_at IS NULL;Solutions:
- Verify
CancelRemindersOnResponselistener is registered inMessagesServiceProvider - Check
MessageService::sendMessage()fires the event - Manually cancel:
MessageReminderService::cancelPendingRemindersForRecipient()
Performance Optimization
Slow Reminder Queries
Problem: Reminder scheduler running slowly.
Solution - Verify partial index:
-- Check the partial index exists
SELECT indexname, indexdef FROM pg_indexes
WHERE tablename = 'message_thread_reminders';
-- Should see reminders_due_idx with WHERE clauseSolution - Check query plan:
EXPLAIN ANALYZE
SELECT * FROM message_thread_reminders
WHERE scheduled_for <= NOW()
AND sent_at IS NULL
AND cancelled_at IS NULL;Queue Backlogs
Problem: Jobs growing faster than processing.
Solution - Scale workers:
# Run additional worker
docker-compose exec -T wedissimo-api php artisan queue:work --queue=default --tries=3Solution - Check job throughput:
# Monitor job processing
docker-compose exec -T wedissimo-api php artisan queue:monitor defaultData Maintenance
Cleaning Up Old Reminders
Remove sent reminders after 6 months:
// Via tinker
MessageThreadReminder::query()
->whereNotNull('sent_at')
->where('sent_at', '<', now()->subMonths(6))
->delete();Cancel stale pending reminders:
// Cancel reminders for messages older than 30 days
MessageThreadReminder::query()
->whereNull('sent_at')
->whereNull('cancelled_at')
->whereHas('triggerMessage', fn($q) => $q->where('created_at', '<', now()->subDays(30)))
->update([
'cancelled_at' => now(),
'cancellation_reason' => 'stale',
]);Cleaning Up Old Messages
Soft-deleted message cleanup (after 90 days):
Message::onlyTrashed()
->where('deleted_at', '<', now()->subDays(90))
->forceDelete();Incident Response
Message Delivery Failure
Immediate actions:
- Check mail service status (Mailpit/SES)
- Verify queue workers are running
- Review recent failed jobs
- Check notification channel configuration
Recovery steps:
- Retry failed jobs:
php artisan queue:retry all - Restart queue workers:
docker-compose restart wedissimo-api - Verify delivery with test message
- Monitor queue for 15 minutes
AI Service Outage
Immediate actions:
- Jobs will fail and retry automatically
- After 3 retries,
failed()method schedules reminders as fallback - Monitor failed job count
Check fallback behavior:
// The job's failed() method schedules reminders even if AI fails
// This is the safe default - better to remind than miss a bookingRecovery:
- Test AI service restoration via tinker
- Retry failed jobs:
php artisan queue:retry all - Monitor AI success rate
Runbook Checklist
Daily checks:
- [ ] Queue depth < 100
- [ ] No failed AI jobs in last 24 hours
- [ ] Scheduler running (check
schedule:list)
Weekly checks:
- [ ] Review reminder cancellation rates (40-60% is healthy - means users are responding)
- [ ] Check for message spam patterns
- [ ] Review slow query logs
Monthly checks:
- [ ] Clean up old sent reminders
- [ ] Archive messages > 2 years old
- [ ] Review reminder effectiveness metrics
Key Commands
# Send pending reminders manually
docker-compose exec -T wedissimo-api php artisan messages:send-reminders
# Check scheduler
docker-compose exec -T wedissimo-api php artisan schedule:list
# Check failed jobs
docker-compose exec -T wedissimo-api php artisan queue:failed
# Retry failed jobs
docker-compose exec -T wedissimo-api php artisan queue:retry all
# Clear config cache
docker-compose exec -T wedissimo-api php artisan config:cacheSupport Resources
Laravel Queue Documentation:https://laravel.com/docs/queues
Internal Contacts:
- Platform Team: #platform-support
- AI Integration: #ai-engineering