We are using `ringcentral-softphone-ts` as the monitoring softphone in a RingCentral call supervision setup.
Our flow is:
1. Receive telephony session notifications for monitored agents
2. When party status becomes `Answered`, call:
`POST /restapi/v1.0/account/~/telephony/sessions/{sessionId}/parties/{partyId}/supervise`
3. Wait for the monitoring SIP invite in `softphone.on('invite', ...)`
4. Answer the invite
5. Start consuming `audioPacket` and stream audio to transcription
This works, but we consistently lose the first ~3-4 seconds of the call transcript.
Observed behavior:
- The first transcript segment always starts a few seconds after the actual human conversation begins
- The delay appears to happen before audio packets begin arriving, i.e. between `Answered` -> `/supervise` -> monitoring invite -> audio start
Questions:
1. Is `Answered` the earliest supported moment to call `/supervise` for call monitoring?
2. Can `/supervise` be called earlier, such as during `Setup` or `Proceeding`, to reduce or eliminate the initial audio gap?
3. Is there any recommended approach to capture audio from the very start of the call when using RingCentral supervision/monitoring?
4. Is the observed 3-4 second delay expected behavior for monitoring leg creation?
5. Are there any best practices to reduce this attach delay?
Relevant details:
- Monitoring extension is correctly configured in the call monitoring group
- SIP monitoring device is reachable and invites are answered successfully
- Once the invite is received, `audioPacket` events flow normally
- The problem is the gap before the monitoring invite/audio starts
Example worker sequence:
- telephony event: `Answered`
- call `/supervise`
- receive monitoring SIP invite
- answer invite
- start receiving `audioPacket`
- transcription begins, but several seconds into the call
Any guidance would be appreciated, especially on whether there is a supported way to avoid missing the opening seconds of the conversation.
