Platform guide
...
Analyze your models
Audio

Automatic Speech Recognition

4min

The below input parameters are for different attack types. To start working with the APIs, see Audio Speech Recognition.

Audio Speech Recognition is an early access feature with limited functionality. It is not available as part of AIShield pypi package. For early access, kindly contact [email protected]

  • 

ASR Models Support:

  1. OpenAI-Whisper: All variants in PyTorch and Hugging Face formats are supported. Refer to the OpenAI-Whisper model card for the complete list of available variants. [Ref: OpenAI-Whisper-ModelCard]



Common Parameter

The below table parameters are common for Evasion Attack type.

Parameter

Data type

Description

Remark

model_Id

String

Model_id received during model registration. We need to provide this model ID in query parameter in URL.

You have to do model registration only once for a model to perform model analysis. This will help you track the no of api call made, and it's success metric.

Request Body (Json format)







model_framework

String

Framework on which model is trained on.

curretly supported framework are: onnx for Evasion and pytorch for data-poisoning .



To access all sample artifacts, please visit Artifacts.





File upload format

  • Data: The processed audio data, ready to be passed to the model for prediction, should be saved in a folder.
  • Label: A CSV file should be created with two columns: "audio" and "label." The first column should contain the audio file name, and the second column should contain the label(text). Check sample label file attached.
  • Model: The model should be saved in onnx format. This can be ignored when model is hosted as an API.

Note:

  1. Format: All uploaded files must be in a zipped format.
  2. Dataset: The provided files are sample audio data from the Librespeech dataset.
  3. Audio File Properties: Each audio file should not exceed 30 seconds and must have a sampling rate of 16,000 Hz.
  4. Sample Count: The minimum number of samples required must be between 300 and 500.
  5. Data Sampling Strategy: Ensure that the uploaded samples are representative of the complete dataset, using techniques such as normal distributive sampling for even distribution.

Parameter

Parameter

Data type

Description

Remark

Request Body (Json format)







model_api_details

String

Provide API details of hosted model as encrypted JSON string

provide this only if use_model_api is "yes".

use_model_api

String

Use model API to use your model endpoint instead of uploading the model as a zip file.

when this parameter is yes, you don't have to upload model as zip. You can pass api url along with other verification credential in json file.

Convert pytorch model to ONNX format for ASR

Converting Pytorch model to onnx will create two models (Encoder and Decoder) in onnx format.

  • To Load an OpenAI Model:
    • load_model: "tiny.en", "base.en", "small.en", "medium.en", "large.en"
  • To Load a Distil-Whisper Model:
    • load_model: model.bin or "distil-whisper/distil-large-v3"

Loading of Whisper model

Python


Convert to Onnx Format

First load sample audio file to get the feature of audio signal:

python


Prepare the input for decoder model:

python


Convert and save model (encoder and decoder) in onnx format:

python


File upload format

  • Reference Data: The processed clean audio data, ready to be passed to the model for prediction, should be saved in a folder.
  • Refernce Label: A CSV file corresponding to clean data should be created with two columns: "audio" and "label." The first column should contain the audio file name, and the second column should contain the label(text). Check sample label file attached.
  • Model: The model should be saved in onnx format. This can be ignored when model is hosted as an API.
  • Universal Data: Audio samples under test (poisoned, clean samples), should be saved in a folder .
  • Universal Label: A CSV files corresponding to universal data hould be created with two columns: "audio" and "label." The first column should contain the audio file name, and the second column should contain the label(text). Check sample label file attached.



Note:

  1. Format: All uploaded files must be in a zipped format.
  2. Dataset: The provided files are sample audio data from the Librespeech dataset.
  3. Audio File Properties: Each audio file should not exceed 30 seconds and must have a sampling rate of 16,000 Hz.
  4. Sample Count: The minimum number of samples required must be between 500 to 700.
  5. Data Sampling Strategy: Ensure that the uploaded samples are representative of the complete dataset, using techniques such as normal distributive sampling for even distribution.

Conversion of OpenAI-Whisper models to HuggingFace format





Updated 04 Feb 2025
Doc contributor
Doc contributor
Doc contributor
Did this page help you?