Core#

class lfeats.Extractor(model_name: str, model_variant: str | None = None, resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu', cache_dir: str | None = None)[source]#

A class for extracting features from a specified model.

__call__(source: ndarray | Tensor | Audio, sample_rate: int | None = None, *, layers: int | Sequence[int] | Literal['all', 'last'] = 'last', center: bool = True, chunk_length_sec: int = 30, overlap_length_sec: int = 5, upsample_factor: int = 1, reduction: Literal['none', 'mean', 'auto'] = 'auto') Features[source]#

Extract features from the input waveform.

Parameters:
sourcenp.ndarray | torch.Tensor | Audio

The input waveform data with shape (T,) or (B, T) or an Audio object.

sample_rateint | None, optional

The sample rate of the input waveform. Must be provided if source is not an Audio object.

layersint | list[int] | Literal[“all”, “last”], optional

The layer(s) from which to extract features.

centerbool, optional

If True, the input audio will be padded to compensate for the delay caused by the model’s convolutional layers.

chunk_length_secint, optional

The chunk length in seconds for processing long audio.

overlap_length_secint, optional

The overlap length in seconds between chunks.

upsample_factorint, optional

The factor by which to upsample the features in the time dimension.

reductionLiteral[“none”, “mean”, “auto”], optional

The reduction method to apply to the features across time frames. If ‘mean’, the features will be averaged across the time dimension. If ‘none’, no reduction will be applied. If ‘auto’, the reduction method will be determined based on the feature granularity (‘none’ for frame-level features and ‘mean’ for utterance-level features).

Returns:
outFeatures

The extracted features with shape (B, N, D).

Raises:
ValueError

If the given parameters are invalid.

RuntimeError

If the number of frames in the extracted features is unexpected.

Examples

>>> import lfeats
>>> import numpy as np
>>>
>>> sample_rate = 16000
>>> waveform = np.random.randn(sample_rate)
>>>
>>> extractor = lfeats.Extractor(model_name="hubert")
>>> extractor.load(quiet=True)
>>> features = extractor(waveform, sample_rate)
>>> features.shape
(1, 50, 768)
__init__(model_name: str, model_variant: str | None = None, resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu', cache_dir: str | None = None) None[source]#

Initialize the Extractor with the specified model.

Parameters:
model_namestr

The name of the model to use.

model_variantstr | None, optional

The variant of the model to use.

resampler_typestr, optional

The type of resampler to use.

resampler_presetstr | None, optional

The preset for the resampler.

devicestr, optional

The device to run the model on (e.g., ‘cpu’ or ‘cuda’).

cache_dirstr | None, optional

The directory to cache the model files.

load(quiet: bool = False) None[source]#

Download and load the model if it is not already loaded.

Parameters:
quietbool, optional

If True, suppress the output during model loading.

class lfeats.Resampler(resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu')[source]#

A class for resampling audio using a specified resampler.

__call__(source: ndarray | Tensor | Audio, *, src_rate: int | None = None, dst_rate: int = 16000) Audio[source]#

Resample the given audio to the target sample rate.

Parameters:
sourcenp.ndarray | torch.Tensor | Audio

The input waveform data with shape (T,) or (B, T) or an Audio object.

src_rateint | None, optional

The source sample rate in Hz.

dst_rateint, optional

The destination sample rate in Hz.

Returns:
outAudio

The resampled audio with shape (B, T’).

Examples

>>> import lfeats
>>> import numpy as np
>>>
>>> sample_rate = 16000
>>> waveform = np.random.randn(sample_rate)
>>>
>>> resampler = lfeats.Resampler(resampler_type="torchaudio")
>>> resampled_audio = resampler(waveform, src_rate=sample_rate, dst_rate=8000)
>>> resampled_audio.shape
(1, 8000)
__init__(resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu') None[source]#

Initialize the Resampler with the specified resampler type.

Parameters:
resampler_typestr, optional

The type of resampler to use.

resampler_presetstr | None, optional

The preset for the resampler.

devicestr, optional

The device to run the model on (e.g., ‘cpu’ or ‘cuda’).