Core#

class lfeats.Extractor(model_name: str, model_variant: str | None = None, resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu', cache_dir: str | None = None)[source]#

A class for extracting features from a specified model.

__call__(source: ndarray | Tensor | Audio, sample_rate: int | None = None, *, layers: int | Sequence[int] | Literal['all', 'last'] = 'last', center: bool = True, chunk_length_sec: int = 30, overlap_length_sec: int = 5, upsample_factor: int = 1, reduction: Literal['none', 'mean', 'auto'] = 'auto') → Features[source]#

Extract features from the input waveform.

Parameters:

sourcenp.ndarray | torch.Tensor | Audio: The input waveform data with shape (T,) or (B, T) or an Audio object.
sample_rateint | None, optional: The sample rate of the input waveform. Must be provided if source is not an Audio object.
layersint | list[int] | Literal[“all”, “last”], optional: The layer(s) from which to extract features.
centerbool, optional: If True, the input audio will be padded to compensate for the delay caused by the model’s convolutional layers.
chunk_length_secint, optional: The chunk length in seconds for processing long audio.
overlap_length_secint, optional: The overlap length in seconds between chunks.
upsample_factorint, optional: The factor by which to upsample the features in the time dimension.
reductionLiteral[“none”, “mean”, “auto”], optional: The reduction method to apply to the features across time frames. If ‘mean’, the features will be averaged across the time dimension. If ‘none’, no reduction will be applied. If ‘auto’, the reduction method will be determined based on the feature granularity (‘none’ for frame-level features and ‘mean’ for utterance-level features).

Returns:

outFeatures: The extracted features with shape (B, N, D).

Raises:

ValueError: If the given parameters are invalid.
RuntimeError: If the number of frames in the extracted features is unexpected.

Examples

>>> import lfeats
>>> import numpy as np
>>>
>>> sample_rate = 16000
>>> waveform = np.random.randn(sample_rate)
>>>
>>> extractor = lfeats.Extractor(model_name="hubert")
>>> extractor.load(quiet=True)
>>> features = extractor(waveform, sample_rate)
>>> features.shape
(1, 50, 768)

__init__(model_name: str, model_variant: str | None = None, resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu', cache_dir: str | None = None) → None[source]#

Initialize the Extractor with the specified model.

Parameters:

model_namestr: The name of the model to use.
model_variantstr | None, optional: The variant of the model to use.
resampler_typestr, optional: The type of resampler to use.
resampler_presetstr | None, optional: The preset for the resampler.
devicestr, optional: The device to run the model on (e.g., ‘cpu’ or ‘cuda’).
cache_dirstr | None, optional: The directory to cache the model files.

load(quiet: bool = False) → None[source]#

Download and load the model if it is not already loaded.

Parameters:

quietbool, optional: If True, suppress the output during model loading.

to(device: str) → None[source]#

Move the model to the specified device.

Parameters:

devicestr: The device to move the model to (e.g., ‘cpu’ or ‘cuda’).

class lfeats.Resampler(resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu')[source]#

A class for resampling audio using a specified resampler.

__call__(source: ndarray | Tensor | Audio, *, src_rate: int | None = None, dst_rate: int = 16000) → Audio[source]#

Resample the given audio to the target sample rate.

Parameters:

sourcenp.ndarray | torch.Tensor | Audio: The input waveform data with shape (T,) or (B, T) or an Audio object.
src_rateint | None, optional: The source sample rate in Hz.
dst_rateint, optional: The destination sample rate in Hz.

Returns:

outAudio: The resampled audio with shape (B, T’).

Examples

>>> import lfeats
>>> import numpy as np
>>>
>>> sample_rate = 16000
>>> waveform = np.random.randn(sample_rate)
>>>
>>> resampler = lfeats.Resampler(resampler_type="torchaudio")
>>> resampled_audio = resampler(waveform, src_rate=sample_rate, dst_rate=8000)
>>> resampled_audio.shape
(1, 8000)

__init__(resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu') → None[source]#

Initialize the Resampler with the specified resampler type.

Parameters:

resampler_typestr, optional: The type of resampler to use.
resampler_presetstr | None, optional: The preset for the resampler.
devicestr, optional: The device to run the model on (e.g., ‘cpu’ or ‘cuda’).

to(device: str) → None[source]#

Move the resampler to the specified device.

Parameters:

devicestr: The device to move the resampler to (e.g., ‘cpu’ or ‘cuda’).