Core#
- class lfeats.Extractor(model_name: str, model_variant: str | None = None, resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu', cache_dir: str | None = None)[source]#
A class for extracting features from a specified model.
- __call__(source: ndarray | Tensor | Audio, sample_rate: int | None = None, *, layers: int | Sequence[int] | Literal['all', 'last'] = 'last', center: bool = True, chunk_length_sec: int = 30, overlap_length_sec: int = 5, upsample_factor: int = 1, reduction: Literal['none', 'mean', 'auto'] = 'auto') Features[source]#
Extract features from the input waveform.
- Parameters:
- sourcenp.ndarray | torch.Tensor | Audio
The input waveform data with shape (T,) or (B, T) or an Audio object.
- sample_rateint | None, optional
The sample rate of the input waveform. Must be provided if source is not an Audio object.
- layersint | list[int] | Literal[“all”, “last”], optional
The layer(s) from which to extract features.
- centerbool, optional
If True, the input audio will be padded to compensate for the delay caused by the model’s convolutional layers.
- chunk_length_secint, optional
The chunk length in seconds for processing long audio.
- overlap_length_secint, optional
The overlap length in seconds between chunks.
- upsample_factorint, optional
The factor by which to upsample the features in the time dimension.
- reductionLiteral[“none”, “mean”, “auto”], optional
The reduction method to apply to the features across time frames. If ‘mean’, the features will be averaged across the time dimension. If ‘none’, no reduction will be applied. If ‘auto’, the reduction method will be determined based on the feature granularity (‘none’ for frame-level features and ‘mean’ for utterance-level features).
- Returns:
- outFeatures
The extracted features with shape (B, N, D).
- Raises:
- ValueError
If the given parameters are invalid.
- RuntimeError
If the number of frames in the extracted features is unexpected.
Examples
>>> import lfeats >>> import numpy as np >>> >>> sample_rate = 16000 >>> waveform = np.random.randn(sample_rate) >>> >>> extractor = lfeats.Extractor(model_name="hubert") >>> extractor.load(quiet=True) >>> features = extractor(waveform, sample_rate) >>> features.shape (1, 50, 768)
- __init__(model_name: str, model_variant: str | None = None, resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu', cache_dir: str | None = None) None[source]#
Initialize the Extractor with the specified model.
- Parameters:
- model_namestr
The name of the model to use.
- model_variantstr | None, optional
The variant of the model to use.
- resampler_typestr, optional
The type of resampler to use.
- resampler_presetstr | None, optional
The preset for the resampler.
- devicestr, optional
The device to run the model on (e.g., ‘cpu’ or ‘cuda’).
- cache_dirstr | None, optional
The directory to cache the model files.
- class lfeats.Resampler(resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu')[source]#
A class for resampling audio using a specified resampler.
- __call__(source: ndarray | Tensor | Audio, *, src_rate: int | None = None, dst_rate: int = 16000) Audio[source]#
Resample the given audio to the target sample rate.
- Parameters:
- sourcenp.ndarray | torch.Tensor | Audio
The input waveform data with shape (T,) or (B, T) or an Audio object.
- src_rateint | None, optional
The source sample rate in Hz.
- dst_rateint, optional
The destination sample rate in Hz.
- Returns:
- outAudio
The resampled audio with shape (B, T’).
Examples
>>> import lfeats >>> import numpy as np >>> >>> sample_rate = 16000 >>> waveform = np.random.randn(sample_rate) >>> >>> resampler = lfeats.Resampler(resampler_type="torchaudio") >>> resampled_audio = resampler(waveform, src_rate=sample_rate, dst_rate=8000) >>> resampled_audio.shape (1, 8000)
- __init__(resampler_type: str = 'torchaudio', resampler_preset: str | None = None, device: str = 'cpu') None[source]#
Initialize the Resampler with the specified resampler type.
- Parameters:
- resampler_typestr, optional
The type of resampler to use.
- resampler_presetstr | None, optional
The preset for the resampler.
- devicestr, optional
The device to run the model on (e.g., ‘cpu’ or ‘cuda’).