`art.estimators.speech_recognition`¶

Module containing estimators for speech recognition.

Mixin Base Class Speech Recognizer¶

class art.estimators.speech_recognition.SpeechRecognizerMixin¶: Mix-in base class for ART speech recognizers.

Speech Recognizer Deep Speech - PyTorch¶

class art.estimators.speech_recognition.PyTorchDeepSpeech(model: DeepSpeech | None = None, pretrained_model: str | None = None, filename: str | None = None, url: str | None = None, use_half: bool = False, optimizer: torch.optim.Optimizer | None = None, use_amp: bool = False, opt_level: str = 'O1', decoder_type: str = 'greedy', lm_path: str = '', top_paths: int = 1, alpha: float = 0.0, beta: float = 0.0, cutoff_top_n: int = 40, cutoff_prob: float = 1.0, beam_width: int = 10, lm_workers: int = 4, clip_values: CLIP_VALUES_TYPE | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, device_type: str = 'gpu', verbose: bool = True)¶

This class implements a model-specific automatic speech recognizer using the end-to-end speech recognizer DeepSpeech and PyTorch. It supports both version 2 and version 3 of DeepSpeech models as released at https://github.com/SeanNaren/deepspeech.pytorch.

Paper link: https://arxiv.org/abs/1512.02595

__init__(model: DeepSpeech | None = None, pretrained_model: str | None = None, filename: str | None = None, url: str | None = None, use_half: bool = False, optimizer: torch.optim.Optimizer | None = None, use_amp: bool = False, opt_level: str = 'O1', decoder_type: str = 'greedy', lm_path: str = '', top_paths: int = 1, alpha: float = 0.0, beta: float = 0.0, cutoff_top_n: int = 40, cutoff_prob: float = 1.0, beam_width: int = 10, lm_workers: int = 4, clip_values: CLIP_VALUES_TYPE | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, device_type: str = 'gpu', verbose: bool = True)¶

Initialization of an instance PyTorchDeepSpeech.

Parameters:

model – DeepSpeech model.
pretrained_model – The choice of pretrained model if a pretrained model is required. Currently this estimator supports 3 different pretrained models consisting of an4, librispeech and tedlium.
filename – Name of the file.
url – Download URL.
use_half (bool) – Whether to use FP16 for pretrained model.
optimizer – The optimizer used to train the estimator.
use_amp (bool) – Whether to use the automatic mixed precision tool to enable mixed precision training or gradient computation, e.g. with loss gradient computation. When set to True, this option is only triggered if there are GPUs available.
opt_level (str) – Specify a pure or mixed precision optimization level. Used when use_amp is True. Accepted values are O0, O1, O2, and O3.
decoder_type (str) – Decoder type. Either greedy or beam. This parameter is only used when users want transcription outputs.
lm_path (str) – Path to an (optional) kenlm language model for use with beam search. This parameter is only used when users want transcription outputs.
top_paths (int) – Number of beams to be returned. This parameter is only used when users want transcription outputs.
alpha (float) – The weight used for the language model. This parameter is only used when users want transcription outputs.
beta (float) – Language model word bonus (all words). This parameter is only used when users want transcription outputs.
cutoff_top_n (int) – Cutoff_top_n characters with highest probs in vocabulary will be used in beam search. This parameter is only used when users want transcription outputs.
cutoff_prob (float) – Cutoff probability in pruning. This parameter is only used when users want transcription outputs.
beam_width (int) – The width of beam to be used. This parameter is only used when users want transcription outputs.
lm_workers (int) – Number of language model processes to use. This parameter is only used when users want transcription outputs.
clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
preprocessing_defences – Preprocessing defence(s) to be applied by the estimator.
postprocessing_defences – Postprocessing defence(s) to be applied by the estimator.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
device_type (str) – Type of device to be used for model and tensors, if cpu run on CPU, if gpu run on GPU if available otherwise run on CPU.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for samples x.

Parameters:

x (ndarray) – Input samples.
y (ndarray) – Target values.

Returns:

Loss values.

Return type:

Format as expected by the model

compute_loss_and_decoded_output(masked_adv_input: torch.Tensor, original_output: ndarray, **kwargs) → Tuple[torch.Tensor, ndarray]¶

Compute loss function and decoded output.

Parameters:

masked_adv_input – The perturbed inputs.
original_output (ndarray) – Target values of shape (nb_samples). Each sample in original_output is a string and it may possess different lengths. A possible example of original_output could be: original_output = np.array([‘SIXTY ONE’, ‘HELLO’]).
real_lengths – Real lengths of original sequences.

Returns:

The loss and the decoded output.

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

property device: torch.device¶

Get current used device.

Returns:: Current used device.

property device_type: str¶

Return the type of device on which the estimator is run.

Returns:: Type of device on which the estimator is run, either gpu or cpu.

fit(x: ndarray, y: ndarray, batch_size: int = 128, nb_epochs: int = 10, **kwargs) → None¶

Fit the estimator on the training set (x, y).

Parameters:

x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).
y (ndarray) – Target values of shape (nb_samples). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]).
batch_size (int) – Size of batches.
nb_epochs (int) – Number of epochs to use for training.
kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the gradient of the loss function w.r.t. x.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).
y (ndarray) – Target values of shape (nb_samples). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]).

Returns:

Loss gradients of the same shape as x.

property model: DeepSpeech¶

Get current model.

Returns:: Current model.

property opt_level: str¶

Return a string specifying a pure or mixed precision optimization level.

Returns:: A string specifying a pure or mixed precision optimization level. Possible values are O0, O1, O2, and O3.

property optimizer: torch.optim.Optimizer¶

Return the optimizer.

Returns:: The optimizer.

predict(x: ndarray, batch_size: int = 128, **kwargs) → Tuple[ndarray, ndarray] | ndarray¶

Perform prediction for a batch of inputs.

Parameters:

x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).
batch_size (int) – Batch size.
transcription_output – Indicate whether the function will produce probability or transcription as prediction output. If transcription_output is not available, then probability output is returned. Default: True

Returns:

Predicted probability (if transcription_output False) or transcription (default, if transcription_output is True): - Probability return is a tuple of (probs, sizes), where probs is the probability of characters of shape (nb_samples, seq_length, nb_classes) and sizes is the real sequence length of shape (nb_samples,). - Transcription return is a numpy array of characters. A possible example of a transcription return is np.array([‘SIXTY ONE’, ‘HELLO’]).

property sample_rate: int¶

Get the sampling rate.

Returns:: The audio sampling rate.

set_batchnorm(train: bool) → None¶

Set all batch normalization layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_dropout(train: bool) → None¶

Set all dropout layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_multihead_attention(train: bool) → None¶

Set all multi-head attention layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

to_training_mode() → None¶: Put the estimator in the training mode.

property use_amp: bool¶

Return a boolean indicating whether to use the automatic mixed precision tool.

Returns:: Whether to use the automatic mixed precision tool.

Speech Recognizer Espresso - PyTorch¶

class art.estimators.speech_recognition.PyTorchEspresso(espresso_config_filepath: str | None = None, model: str | None = None, clip_values: CLIP_VALUES_TYPE | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, device_type: str = 'gpu', verbose: bool = True)¶

This class implements a model-specific automatic speech recognizer using the end-to-end speech recognizer in Espresso.

Paper link: https://arxiv.org/abs/1909.08723

__init__(espresso_config_filepath: str | None = None, model: str | None = None, clip_values: CLIP_VALUES_TYPE | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, device_type: str = 'gpu', verbose: bool = True)¶

Initialization of an instance PyTorchEspresso

Parameters:

espresso_config_filepath – The path of the espresso config file (yaml)
model – The choice of pretrained model if a pretrained model is required.
clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
preprocessing_defences – Preprocessing defence(s) to be applied by the estimator.
postprocessing_defences – Postprocessing defence(s) to be applied by the estimator.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
device_type (str) – Type of device to be used for model and tensors, if cpu run on CPU, if gpu run on GPU if available otherwise run on CPU.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for samples x.

Parameters:

x (ndarray) – Input samples.
y (ndarray) – Target values.

Returns:

Loss values.

Return type:

Format as expected by the model

compute_loss_and_decoded_output(masked_adv_input: torch.Tensor, original_output: ndarray, **kwargs) → Tuple[torch.Tensor, ndarray]¶

Compute loss function and decoded output.

Parameters:

masked_adv_input – The perturbed inputs.
original_output (ndarray) – Target values of shape (nb_samples). Each sample in original_output is a string and it may possess different lengths. A possible example of original_output could be: original_output = np.array([‘SIXTY ONE’, ‘HELLO’]).

Returns:

The loss and the decoded output.

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

property device: torch.device¶

Get current used device.

Returns:: Current used device.

property device_type: str¶

Return the type of device on which the estimator is run.

Returns:: Type of device on which the estimator is run, either gpu or cpu.

fit(x: ndarray, y: ndarray, batch_size: int = 128, nb_epochs: int = 10, **kwargs) → None¶

Fit the estimator on the training set (x, y).

Parameters:

x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).
y (ndarray) – Target values of shape (nb_samples). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]).
batch_size (int) – Size of batches.
nb_epochs (int) – Number of epochs to use for training.
kwargs – Dictionary of framework-specific arguments. This parameter is not currently supported for PyTorch and providing it takes no effect.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the gradient of the loss function w.r.t. x.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).
y (ndarray) – Target values of shape (nb_samples). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]).

Returns:

Loss gradients of the same shape as x.

property model: SpeechTransformerModel¶

Get current model.

Returns:: Current model.

predict(x: ndarray, batch_size: int = 128, **kwargs) → ndarray¶

Perform prediction for a batch of inputs.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape (nb_samples, seq_length). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.array([np.array([0.1, 0.2, 0.1, 0.4]), np.array([0.3, 0.1])]).
batch_size (int) – Batch size.

Returns:

Transcription as a numpy array of characters. A possible example of a transcription return is np.array([‘SIXTY ONE’, ‘HELLO’]).

property sample_rate: int¶

Get the sampling rate.

Returns:: The audio sampling rate.

set_batchnorm(train: bool) → None¶

Set all batch normalization layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_dropout(train: bool) → None¶

Set all dropout layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_multihead_attention(train: bool) → None¶

Set all multi-head attention layers into train or eval mode.

Parameters:: train (bool) – False for evaluation mode.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

to_training_mode() → None¶: Put the estimator in the training mode.

Speech Recognizer Lingvo ASR - TensorFlow¶

class art.estimators.speech_recognition.TensorFlowLingvoASR(clip_values: CLIP_VALUES_TYPE | None = None, channels_first: bool | None = None, preprocessing_defences: Preprocessor | List[Preprocessor] | None = None, postprocessing_defences: Postprocessor | List[Postprocessor] | None = None, preprocessing: PREPROCESSING_TYPE = None, random_seed: int | None = None, sess: Session | None = None)¶

This class implements the task-specific Lingvo ASR model of Qin et al. (2019).

The estimator uses a pre-trained model provided by Qin et al., which is trained using the Lingvo library and the LibriSpeech dataset.

Paper link: http://proceedings.mlr.press/v97/qin19a.html, https://arxiv.org/abs/1902.08295

Warning

In order to calculate loss gradients, this estimator requires a user-patched Lingvo module. A patched source file for the lingvo.tasks.asr.decoder module will be automatically applied. The original source file can be found in <PYTHON_SITE_PACKAGES>/lingvo/tasks/asr/decoder.py and will be patched as outlined in the following commit diff: https://github.com/yaq007/lingvo/commit/414e035b2c60372de732c9d67db14d1003be6dd6

The patched decoder_patched.py can be found in ART_DATA_PATH/lingvo/asr.

Note: Run python -m site to obtain a list of possible candidates where to find the <PYTHON_SITE_PACKAGES folder.

Initialization.

Parameters:

clip_values – Tuple of the form (min, max) of floats or np.ndarray representing the minimum and maximum values allowed for features. If floats are provided, these will be used as the range of all features. If arrays are provided, each value will be considered the bound for a feature, thus the shape of clip values needs to match the total number of features.
channels_first – Set channels first or last.
preprocessing_defences – Preprocessing defence(s) to be applied by the classifier.
postprocessing_defences – Postprocessing defence(s) to be applied by the classifier.
preprocessing – Tuple of the form (subtrahend, divisor) of floats or np.ndarray of values to be used for data preprocessing. The first value will be subtracted from the input. The input will then be divided by the second one.
random_seed – Specify a random seed.

property channels_first: bool¶

Returns:: Boolean to indicate index of the color channels in the sample x.

property clip_values: CLIP_VALUES_TYPE | None¶

Return the clip values of the input samples.

Returns:: Clip values (min, max).

clone_for_refitting() → ESTIMATOR_TYPE¶: Clone estimator for refitting.

compute_loss(x: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for samples x.

Parameters:

x (ndarray) – Input samples.
y (ndarray) – Target values.

Returns:

Loss values.

Return type:

Format as expected by the model

compute_loss_from_predictions(pred: ndarray, y: ndarray, **kwargs) → ndarray¶

Compute the loss of the estimator for predictions pred.

Return type:

ndarray

Parameters:

pred (ndarray) – Model predictions.
y (ndarray) – Target values.

Returns:

Loss values.

fit(x: ndarray, y, batch_size: int = 128, nb_epochs: int = 20, **kwargs) → None¶

Fit the model of the estimator on the training data x and y.

Parameters:

x (ndarray) – Samples of shape (nb_samples, nb_features) or (nb_samples, nb_pixels_1, nb_pixels_2, nb_channels) or (nb_samples, nb_channels, nb_pixels_1, nb_pixels_2).
y (Format as expected by the model) – Target values.
batch_size (int) – Batch size.
nb_epochs (int) – Number of training epochs.

fit_generator(generator: DataGenerator, nb_epochs: int = 20, **kwargs) → None¶

Fit the estimator using a generator yielding training batches. Implementations can provide framework-specific versions of this function to speed-up computation.

Parameters:

generator – Batch generator providing (x, y) for each epoch.
nb_epochs (int) – Number of training epochs.

get_activations(x: ndarray, layer: int | str, batch_size: int, framework: bool = False) → ndarray¶

Return the output of a specific layer for samples x where layer is the index of the layer between 0 and nb_layers - 1 or the name of the layer. The number of layers can be determined by counting the results returned by calling `layer_names.

Return type:

ndarray

Parameters:

x (ndarray) – Samples
layer – Index or name of the layer.
batch_size (int) – Batch size.
framework (bool) – If true, return the intermediate tensor representation of the activation.

Returns:

The output of layer, where the first dimension is the batch size corresponding to x.

get_params() → Dict[str, Any]¶

Get all parameters and their values of this estimator.

Returns:: A dictionary of string parameter names to their value.

property input_shape: Tuple[int, ...]¶

Return the shape of one input sample.

Returns:: Shape of one input sample.

property layer_names: List[str] | None¶

Return the names of the hidden layers in the model, if applicable.

Returns:: The names of the hidden layers in the model, input and output layers are ignored.

Warning

layer_names tries to infer the internal structure of the model. This feature comes with no guarantees on the correctness of the result. The intended order of the layers tries to match their order in the model, but this is not guaranteed either.

loss_gradient(x: ndarray, y: ndarray, batch_mode: bool = False, **kwargs) → ndarray¶

Compute the gradient of the loss function w.r.t. x.

Return type:

ndarray

Parameters:

x (ndarray) – Samples of shape (nb_samples). Note that, it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.ndarray([[0.1, 0.2, 0.1, 0.4], [0.3, 0.1]]).
y (ndarray) – Target values of shape (nb_samples). Each sample in y is a string and it may possess different lengths. A possible example of y could be: y = np.array([‘SIXTY ONE’, ‘HELLO’]).
batch_mode (bool) – If True calculate gradient per batch or otherwise per sequence.

Returns:

Loss gradients of the same shape as x.

property model¶

Return the model.

Returns:: The model.

predict(x: ndarray, batch_size: int = 128, **kwargs) → Tuple[ndarray, ndarray] | ndarray¶

Perform batch-wise prediction for given inputs.

Parameters:

x (ndarray) – Samples of shape (nb_samples) with values in range [-32768, 32767]. Note that it is allowable that sequences in the batch could have different lengths. A possible example of x could be: x = np.ndarray([[0.1, 0.2, 0.1, 0.4], [0.3, 0.1]]).
batch_size (int) – Size of batches.

Returns:

Array of predicted transcriptions of shape (nb_samples). A possible example of a transcription return is np.array([‘SIXTY ONE’, ‘HELLO’]).

property sess: Session¶

Get current TensorFlow session.

Returns:: The current TensorFlow session.

set_params(**kwargs) → None¶

Take a dictionary of parameters and apply checks before setting them as attributes.

Parameters:: kwargs – A dictionary of attributes.

`art.estimators.speech_recognition`¶

Mixin Base Class Speech Recognizer¶

Speech Recognizer Deep Speech - PyTorch¶

Speech Recognizer Espresso - PyTorch¶

Speech Recognizer Lingvo ASR - TensorFlow¶

Adversarial Robustness Toolbox

Navigation

Related Topics

art.estimators.speech_recognition¶

Mixin Base Class Speech Recognizer¶

Speech Recognizer Deep Speech - PyTorch¶

Speech Recognizer Espresso - PyTorch¶

Speech Recognizer Lingvo ASR - TensorFlow¶

`art.estimators.speech_recognition`¶