1. Key lifecycle
- Use separate keys per environment (
dev,staging,prod) - Never store keys in source code
- Use a secret manager or secure CI variables
- Schedule rotation windows
2. HTTP client baseline
Every client should enforce:- Connection timeout
- Read timeout
- Retries only for transient failures
- Request correlation identifiers in logs
3. Error handling policy
400/401/403/404: fix payload/config/permissions, do not blind-retry429: backoff and retry5xx: bounded retries with exponential backoff
4. Async chat policy
- Use stable polling interval (for example 2-3s)
- Define maximum wait time
- Implement fallback path for
failedstatus
5. Session strategy
- Keep deterministic
session_idstrategy per channel/user - Avoid long-lived idle sessions
- Close session explicitly when business flow ends
6. Document operations
- Validate quality before activation
- Activate only approved content
- Use bulk endpoints for consistent state updates
7. Observability
Track at minimum:- Success/failure rate by endpoint
- Latency percentiles
- 429 and 5xx trends
- Chat completion time
- Feedback distribution
8. Change management
- Maintain integration changelog
- Make staging smoke tests mandatory before production
- Keep docs and code changes in sync
9. Runbook checklist
- Correct key for correct environment
- Correct tenant/site IDs
- Critical documents active
- Sync + async test suite passing
- Monitoring alerts active