Evaluation Malayalam Speech Corpus(MSC) dataset

Loading dataset and evaluating model


source

load_malayalam_speech_corpus_dataset

 load_malayalam_speech_corpus_dataset ()

Evaluating Whisper based model


source

evaluate_whisper_model_msc

 evaluate_whisper_model_msc (model_name:str, werlist:List[float],
                             cerlist:List[float], modelsizelist:List[str],
                             timelist:List[float], bs:int=16)
Type Default Details
model_name str The model name
werlist typing.List[float] WER List
cerlist typing.List[float] CER list
modelsizelist typing.List[str] model size list
timelist typing.List[float] time(s) list
bs int 16 batch size
Returns None

Testing with a sample model

wer_list = []
cer_list = []
model_size_list = []
time_list = []
evaluate_whisper_model_msc("openai/whisper-tiny", wer_list, cer_list, model_size_list, time_list)
KeyboardInterrupt: 
evaluate_whisper_model_msc("anuragshas/whisper-large-v2-ml", wer_list, cer_list, model_size_list, time_list, bs=4)

Evaluating Faster-whisper based models


source

evaluate_faster_whisper_model_msc

 evaluate_faster_whisper_model_msc (model_name:str, werlist:List[float],
                                    cerlist:List[float],
                                    modelsizelist:List[str],
                                    timelist:List[float], bs:int=16,
                                    compute_type:str='float16',
                                    beam_size=1)

A utility function for calculing WER in Common voice dataset provided a model name in huggingface. You can store a WER, CER, ModelSize, TimeList to calculate results cumulatively over different epochs

Type Default Details
model_name str The model name
werlist typing.List[float] WER List
cerlist typing.List[float] CER list
modelsizelist typing.List[str] model size list
timelist typing.List[float] time(s) list
bs int 16 batch size. Default value is 16.
compute_type str float16 The compute type supported by faster-Whisper
beam_size int 1 beam size
Returns None

Evaluating faster-Whisper based model

wer_list = []
cer_list = []
model_size_list = []
time_list = []
evaluate_faster_whisper_model_msc("kurianbenoy/vegam-whisper-medium-ml-fp16", wer_list, cer_list, model_size_list, time_list)
wer_list, cer_list, model_size_list, time_list

Made by Kurian Benoy. See the code.