nvidia
/

diar_streaming_sortformer_4spk-v2.1

Automatic Speech Recognition

speaker-diarization

speaker-recognition

Model card Files Files and versions

diar_streaming_sortformer_4spk-v2.1 / explainability.md

Taejin's picture

Adding subcards from NeMo model cards

8ec4d72 11 days ago

|

2.77 kB

	Field \| Response
	:------------------------------------------------------------------------------------------------------\|:---------------------------------------------------------------------------------
	Intended Task/Domain: \| Speaker Diarization (Speaker Tagging in Speech Recognition)
	Model Type: \| FastConformer Encoder, Transformer Encoder, and RNNT Decoder
	Intended Users: \| People working with conversational AI models that transcribe speech-to-text for multiple users.
	Output: \| Text with speaker tags
	Describe how the model works: \| The model incorporates a novel mechanism, the Arrival-Order Speaker Cache (AOSC). This cache management technique dynamically adjusts each speaker’s cache size, prioritizing the speech frames most valuable to cache. The model is fine-tuned with increased weighting on far-field datasets to perform better for meeting-style speech.
	Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of: \| Not Applicable
	Technical Limitations & Mitigation: \| This model can detect up to four speakers; performance degrades in recordings with five or more speakers. The model was trained on publicly available English speech datasets. As a result, it is not suitable for non-English audio. Performance may also degrade on out-of-domain data, such as recordings in noisy conditions.
	Verified to have met prescribed NVIDIA quality standards: \| Yes
	Performance Metrics: \| Concatenated minimum-permutation word error rate (cpWER) and time-constrained minimum-permutation word error rate (tcpWER)
	Potential Known Risks: \| Transcripts may not be 100% accurate in instances with background noise. Punctuation/capitalization may not be 100% accurate.
	Licensing: \| GOVERNING TERMS: Use of this model is governed by the NVIDIA Open Model License Agreement (found [here](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)