PSOLA

ピッチ同期重畳加算（ピッチどうきちょうじょうかさん、（英: pitch-synchronous overlap-add, PSOLA）は「ピッチに基づいた音声の分割・変換・再合成」という音声処理の枠組みである^[1]。ピッチ同期波形重畳とも^[2]。

概要

PSOLAを採用した音声処理ではスペクトル包絡/フォルマントを保ったまま音高や持続時間(テンポ)を変更できる。

PSOLAは次の3つの段階からなる^[3]。

分析では、対象の音声波形がもつ周期(ピッチ)と同期した分析窓を用い^[6]、互いにオーバーラップした短い断片/区間（基本周期の2倍程度^[6]）に分割する。

変換例として、信号のピッチを下げるには断片を互いに遠ざけ、ピッチを上げるには互いに近付けて断片を再配置する。断片を離す/重ねる結果として信号長/持続時間が変化するため、次の補正を行う^[6]。信号の持続時間を長くするには引き続き同じ断片を複数回繰り返し、短くするにはいずれかの断片を間引きする。

変換された断片は重畳加算で結合され信号が再合成される。

PSOLAを採用しかつ操作が時間領域でおこなわれるアルゴリズムはTD-PSOLAと総称され、また周波数領域でおこなわれるアルゴリズムはFD-PSOLAと総称される^[7]。

PSOLAは様々な目的で利用される。以下はその一例である：

^ a pitch-synchronous overlap-add (PSOLA) approach ... In this paper, we first present the common PSOLA framework(Moulines 1990, pp. 453–454)
^ 板橋秀一 (2005), 音声工学, 森北出版, p. 169, ISBN 9784627828117
^ The PSOLA synthesis scheme involves the three following steps: an analysis of the original speech waveform ... modifications brought to this intermediate representation ... the synthesis of the modified signal from the modified intermediate representation(Moulines 1990, p. 454)
^ consists of a sequence of short-term signals $x_{m}(n)$ (Moulines 1990, p. 454)
^ at a pitch-synchronous rate on the voiced portions of the signal and at a constant rate on the unvoiced portions.(Moulines 1990, pp. 454–455)
^ ^a ^b ^c R. Kortekaas; A. Kohlrausch (1997), “Psychoacoustical Evaluation of the Pitch-Synchronous Overlap-and-Add Speech-Waveform Manipulation Technique Using Single-Formant Stimuli”, Journal of the Acoustical Society of America (JASA) 101 (4): 2202–2213
^ The modifications of the speech signal are performed either in the frequency domain (FD-PSOLA) ... or directly in the time domain (TD-PSOLA)(Moulines 1990, p. 453)
^ a family of methods for modifying the prosody of natural speech ... are used to improve the voice quality of text-to-speech systems based on the concatenation of elementary speech units,(Moulines 1990, p. 453)

Moulines, Eric (1990). "Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones". Speech Communication. 9 (5–6): 453–467. doi:10.1016/0167-6393(90)90021-Z。
Eric Moulines; Jean Laroche (February 1995), “Non-parametric techniques for pitch-scale and time-scale modification of speech”, Speech Communication 16 (2), doi:10.1016/0167-6393(94)00054-E