Characteristics of Auditory Perception

The assessment of speech quality ultimately relies on human judgment. Understanding the auditory perception characteristics of the human ear contributes to designing more effective algorithm models. Research indicates that speech can be evaluated based on three attributes: loudness, pitch, and timbre. Loudness represents the strength of external speech as perceived by the ear, pitch reflects the ear's perception of different frequency signals, and timbre is the unique quality of sound that distinguishes speakers. Thus, timbre can be used to differentiate various audio sources.

Some auditory characteristics crucial for improving speech enhancement technology include:

  • The human ear is most sensitive to the amplitude of speech, while distortion in phase information is generally imperceptible. Early studies focused on estimating the amplitude spectrum of pure tones.

  • The perception of different frequency signals by the human auditory system varies; for instance, the ear requires high sound pressure levels to perceive low-frequency signals.

  • The ear exhibits masking properties, wherein, in scenarios with a mix of weak and strong signals, the ear tends to focus on the stronger signal, neglecting the weaker one. Algorithms can leverage this effect to correct the final enhanced signal.

  • In multi-speaker environments, the ear possesses selective attention, focusing on sound from a specific direction and naturally filtering out other secondary signals.

Noise Characteristics

Generally, any signal interfering with the target signal is considered noise, such as speech, machinery sounds in factories, or car horns on the street. Noise is ubiquitous in daily life, with common types including white noise, pink noise, engine noise, and disruptive background speech noise. White noise, named after optical white light, maintains a constant power spectral density across the entire frequency domain. Pink noise, characterized by maximum energy in its low-frequency component, decreases as frequency increases. Dealing with background noise from speakers is particularly challenging as the expected speech signal and background speech noise share similar pronunciation characteristics. Array signal enhancement and blind source separation technologies have become solutions in the field of speech noise reduction. From the relationship between noise and target speech, two types emerge: additive and multiplicative. In practical processing, to standardize the model, the multiplicative process is typically transformed into an additive process through specific transformations.

Speech Quality Evaluation

Regardless of the algorithm or technology, performance requires evaluation metrics. Evaluation metrics for speech enhancement algorithms fall into two main categories: subjective auditory assessment and objective metric evaluation. Subjective auditory assessment involves having listeners compare speech before and after processing in a quiet room and score them according to uniform criteria. Objective metric evaluation methods are based on mathematical models using high-order statistical parameters (cumulants, high-order spectra, etc.) to map feature vectors onto subjective evaluation indicators.

For top-notch hearing aids and cutting-edge technology, explore our range at Chosgo Hearing Aids. Discover the revolutionary SmartU Rechargeable Hearing Aids - a specific product designed for unparalleled auditory experience. Elevate your hearing with Chosgo, where innovation meets clarity.