Skip to content
Free Tool Arena

How-To & Life · Guide · Audio, Video & Voice

How to change audio speed

Time-stretch vs pitch-shift, preserving pitch while changing tempo, 0.5× to 2× sweet spots for podcasts and lectures, and avoiding artifacts.

Updated April 2026 · 6 min read

Playing a podcast at 1.5x saves you a third of your listening time. Speeding up a 2-hour lecture to 2x turns it into a 1-hour review session. But there’s a catch: naive speed changes make voices sound like cartoon chipmunks (pitch goes up with speed). Doing it right requires time-stretching — changing speed while preserving pitch — which uses signal-processing algorithms with names like WSOLA and phase vocoder. This guide covers what’s actually happening when you slide the speed slider, why modern audio apps preserve pitch automatically, the algorithms behind the effect and their tradeoffs, speed limits for intelligibility, and the distinction between speech use cases (1.5–2x podcasts) and music use cases (subtle pitch-safe tempo adjusts).

Advertisement

Naive resampling vs time-stretching

Naive resampling speeds up audio by playing the samples faster. A 48kHz stream at 2x becomes effectively 96kHz for the original content, which the output device renders at 48kHz by compressing the frequencies — pitch goes up an octave. This is how vinyl records sound when you spin them faster.

Time-stretching changes the duration without changing pitch. The signal is broken into short overlapping windows; the algorithm either repeats windows (for slowdown) or skips some (for speedup) while maintaining phase continuity at the boundaries. Pitch stays the same.

# FFmpeg: time-stretch to 1.5x speed, preserve pitch
ffmpeg -i input.mp3 -af "atempo=1.5" output.mp3

# 2x speed (atempo supports 0.5x to 2.0x per pass)
ffmpeg -i input.mp3 -af "atempo=2.0" output.mp3

# 3x speed requires chaining (atempo limit is 2.0 per filter)
ffmpeg -i input.mp3 -af "atempo=2.0,atempo=1.5" output.mp3

# Naive resample (pitch changes too, legacy effect)
ffmpeg -i input.mp3 -af "asetrate=48000*1.5,aresample=48000" output.mp3

WSOLA and SOLA

SOLA (Synchronized Overlap-Add) and WSOLA (Waveform Similarity Overlap-Add) are the classic time-stretching algorithms for speech. They break the signal into overlapping ~25ms frames and stitch them back with sub-sample alignment so the waveform is continuous.

WSOLA improves on SOLA by searching a small window for the best alignment point based on waveform similarity, which reduces phasing artifacts. It’s the algorithm behind most podcast-app speed controls. For speech it’s near-transparent up to 2x; beyond 2.5x, artifacts become audible regardless of algorithm choice.

Phase vocoder

For music, phase vocoders work in the frequency domain — STFT (short-time Fourier transform) breaks the signal into overlapping FFT windows, the algorithm manipulates the magnitude and phase of each frequency bin, and inverse STFT recombines them at the new rate.

Phase vocoders preserve complex harmonic content (chords, overtones) better than WSOLA but smear transients (drums, attacks). Modern implementations (phase locking, transient detection) mitigate this, but extreme speed changes on music still smear. The high-end commercial tool for this is Élastique Pro; the open-source equivalent is rubberband.

PSOLA for voice pitch

PSOLA (Pitch-Synchronized Overlap-Add) is used for pitch shifting (changing pitch without changing speed). It’s the counterpart to WSOLA and is the algorithm behind vocal tuning plugins. Not directly used for speed changes, but many tools combine PSOLA and WSOLA to let you adjust pitch and speed independently.

Listening speeds for podcasts

Speed   Perceived          Retention
1.0x    Natural pace       Baseline
1.25x   Slightly brisk     ~98% of 1x
1.5x    Comfortably fast   ~95% of 1x (sweet spot)
1.75x   Fast, needs focus  ~88% for most
2.0x    Quite fast         ~80%, diminishing returns
2.5x    Comprehension      ~60%, for fast speakers only
          drops rapidly
3.0x    Usually unusable   <50% except for slow speakers

Speakers with deliberate pacing (Dan Carlin, Joe Rogan guests) tolerate 2x well. Speakers who already talk fast (many tech podcasts) peak at 1.5x. Interview-heavy shows with a lot of silence can feel natural at 1.75x because silence-removal plus speedup stacks.

Music speed changes

For music, small speed changes (+/- 5%) are nearly imperceptible to casual listeners and can correct for live-recording tempo drift. Larger changes (+/- 15%) are obvious but can still sound musical with good algorithms. Beyond that, you’re in remix territory.

Note: DJs often change pitch and tempo together (classic vinyl pitch control on a turntable). Preserving pitch while changing tempo is deliberately unusual in DJ contexts — the speed change should raise pitch because that’s what matches a beat-matched mix.

Slowing down for learning

Slowing audio to 0.75x or 0.5x is useful for transcription, language learning, and guitar tab-off. Time-stretching handles slowdown better than speedup because the algorithm has more source material to work with per output sample. Pitch stays intact, phrasing clarity improves.

0.5x speed is the common floor for learning applications — slower than that and the artifacts (smeared consonants, muddy harmonics) outweigh the clarity gain.

Pitch-speed coupling in creative effects

The chipmunk effect (naive speedup with pitch change) is used deliberately for comedy and lo-fi vibes. The slowed-down-and-reverbed effect (same speed and pitch drop) is used in chopped-and-screwed music and slowed-remix TikToks. Both skip time-stretching and embrace the coupled change.

# Chipmunk effect (speed up with pitch)
ffmpeg -i input.mp3 \
  -af "asetrate=48000*1.3,aresample=48000" output.mp3

# Slowed-reverb effect (slow down with pitch drop)
ffmpeg -i input.mp3 \
  -af "asetrate=48000*0.85,aresample=48000,aecho=0.8:0.9:500:0.3" \
  output.mp3

Video and audio sync

When speeding up a video, the audio and video must change rate together. Most video apps handle this transparently — set a 1.5x speed and both streams adjust. Behind the scenes the audio is time-stretched (pitch preserved) and the video is decimated or frame-blended.

# Speed up video and audio together, preserve pitch
ffmpeg -i input.mp4 \
  -filter_complex "[0:v]setpts=0.667*PTS[v];[0:a]atempo=1.5[a]" \
  -map "[v]" -map "[a]" output.mp4

# Note: setpts 0.667 = 1/1.5 (video plays faster),
# atempo 1.5 = audio plays 1.5x

Quality tips

Start from the highest-quality source you have. Speed changes amplify any existing artifacts — compression ringing, low sample rates, clipping all become more audible. A 320kbps MP3 time-stretched sounds notably better than a 128kbps one.

For professional-quality speech speedup, use tools that implement transient detection — the algorithm protects consonants (plosives, sibilants) from smearing, which is the biggest artifact at higher speeds.

Podcast workflow: combine with silence removal

Silence removal plus speedup compound. Removing the 15% of a podcast that’s dead air, then applying 1.5x, effectively gives you 1.75x listening speed. This is what premium podcast apps like Overcast (“Smart Speed”) do by default — you subscribe to a 2-hour show, and it delivers in 1 hour 10 minutes without sounding rushed.

Common mistakes

Using naive resampling when you meant time-stretching. Chipmunk voices on a podcast is a bug unless you’re being funny.

Exceeding atempo’s 2.0 per-pass limit. Chain filters: atempo=2.0,atempo=1.5 for 3x.

Applying speed change to lossy source and re-encoding lossy. Compounds quality loss. Start from highest-quality source possible.

Ignoring transient artifacts on music. Drums smear at big speed changes. Use tools with transient preservation (rubberband) for music.

Forgetting video timestamp adjustment. When speeding up video, you must adjust both video PTS and audio tempo, not just one.

Assuming slowdown is free. It’s better than speedup but still produces phasing artifacts below 0.5x. Don’t go below 0.5 unless you have to.

Testing only with speech. Algorithms optimized for speech (WSOLA) produce audible artifacts on music. Test on representative content.

Run the numbers

Change playback speed with pitch preservation using the audio speed changer. Pair with the audio silence remover to strip dead air before speedup for maximum effective speed gain, and the audio pitch changer when you need to adjust pitch independently (e.g., transposing a music track without tempo change).

Advertisement

Found this useful?Email