from malayalam_asr_benchmarking.commonvoice import evaluate_whisper_model_common_voice
= []
werlist = []
cerlist = []
modelsizelist = []
timelist
"parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist) evaluate_whisper_model_common_voice(
malayalam_asr_benchmarking
Objective of the project
A study to benchmark ASRs in Malayalam. Till now the project has benchmark based on Malayalam ASR models based in Whisper ASR and faster-whisper ASR.
Benchmarked Datasets
Till now we have mainly benchmarked on two datasets:
- Common Voice 11 Dataset
I have now done benchmarking on Mozilla’s Common Voice 11 Malayalam subset. The benchmarking results can be found in the below dataset.
- Malayalam Speech Corpus
I have now benchmarked on SMC’s Malayalam Speech corpus dataset. The benchmarking results can be found in the below dataset.
Install
pip install malayalam_asr_benchmarking
or from github repository
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
pip install git+https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
Or locally
# Ensure git is installed, else install it. Eg: In ubuntu via apt install git
git clone https://github.com/kurianbenoy/malayalam_asr_benchmarking.git
cd malayalam_asr_benchmarking
pip install -e .
Setting up your development environment
I am developing this project with nbdev. Please take some time reading up on nbdev … how it works, directives, etc… by checking out the walk-thrus and tutorials on the nbdev website
Step 1: Install Quarto:
nbdev_install_quarto
Step 2: Install hooks
nbdev_install_hooks
Step 3: Install our library
pip install -e '.[dev]'
How to use
Evaluate Whisper-based Malayalam ASR models
from malayalam_asr_benchmarking.msc import evaluate_whisper_model_msc
= []
werlist = []
cerlist = []
modelsizelist = []
timelist
"parambharat/whisper-tiny-ml", werlist, cerlist, modelsizelist, timelist) evaluate_whisper_model_msc(
Evaluate faster-whisper based models
from malayalam_asr_benchmarking.commonvoice import evaluate_faster_whisper_model_common_voice
= []
werlist = []
cerlist = []
modelsizelist = []
timelist
"kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist) evaluate_faster_whisper_model_common_voice(
from malayalam_asr_benchmarking.msc import evaluate_faster_whisper_model_msc
= []
werlist = []
cerlist = []
modelsizelist = []
timelist
"kurianbenoy/vegam-whisper-medium-ml", werlist, cerlist, modelsizelist, timelist) evaluate_faster_whisper_model_msc(
Made by Kurian Benoy. See the code.