Sprint 3 Retro: Spotify API Client

Goal: Build a complete custom Spotify Web API client. Fetch currently playing track data, send all playback commands, parse JSON into clean structs.

Verdict: The ESP32 knows what you’re listening to. It knows what’s coming next. It knows if you paused. It has buttons for play, pause, skip, volume, shuffle, repeat. It just… can’t press them yet. Charli xcx sounded great through the serial monitor though ╰(°▽°)╯

What We Built

A full Spotify Web API client from scratch. No third-party Spotify libraries. Just HTTPClient, ArduinoJson, and stubbornness.

The Data Layer

SpotifyTrack struct with 13 fields: name, artist (comma-joined for multi-artist tracks), album, trackId, album art URL, duration, progress, play state, shuffle, repeat, volume, device name
SpotifyAudioFeatures struct for tempo, key, mode, energy, danceability, valence (implemented, blocked by Spotify policy, more on that below)
SpotifyQueue struct capped at 5 tracks because this is an ESP32, not a server rack

The Polling Engine

spotifyPoll() runs every 3 seconds (configurable 1-5s), millis()-gated, non-blocking
Track change detection via trackId comparison
Correctly distinguishes “playing” vs “paused” (both return HTTP 200, because Spotify)
Skips polls when auth has failed or rate limited

The Command Arsenal

Seven playback commands, all implemented, all compile-tested, none triggerable yet:

spotifyPlay() / spotifyPause() / spotifyTogglePlayback()
spotifyNext() / spotifyPrevious()
spotifySetVolume(uint8_t percent) with 0-100 clamp
spotifySetShuffle(bool) / spotifySetRepeat(const char*) / spotifyCycleRepeat()

They’re all dressed up with nowhere to go until the encoder gets wired in Sprint 5.

Rate Limit Handling

Two-layer defense against getting banned:

Network layer collects Retry-After headers from 429 responses via collectHeaders()
Spotify layer converts that into a millis()-based deadline, silently skips all API calls until it clears

We never actually got rate limited during development. 3-second polling with a single user is apparently very polite.

The Debugging Stories

The Revenge of the Shared Socket (ノಠ益ಠ)ノ

Remember the Sprint 2 “Polite Disconnect” story? Where we fixed TLS hangs between different hosts by calling secureClient.stop() after http.end()? That fix worked perfectly… for requests fired back-to-back in setup().

The moment we moved to polling from loop() with a 3-second gap between requests, the hang came back. Same symptom: prints the URL, hangs forever at TLS handshake. The “polite disconnect” wasn’t polite enough when there was idle time involved.

Deep dive results: A research agent crawled through ESP32 Arduino source, checked every major ESP32 Spotify project on GitHub (witnessmenow, FinianLandes, kaustubhdoval), and traced the actual failure chain:

secureClient.stop() alone is a “quick close” that skips draining remaining data
When there’s a time gap, stale TLS session data lingers in the socket buffer
The next http.begin() finds a socket that thinks it’s clean but isn’t

The actual fix: Every HTTPS function now follows this pattern:

secureClient.stop();           // Clean slate BEFORE request
http.setReuse(false);          // Forces Connection: close
// ... do the request ...
http.end();                    // HTTPClient drains + closes properly

setReuse(false) is the key. It tells HTTPClient to send Connection: close, which means http.end() does a full, proper teardown including draining any remaining response data. Without it, stop() alone leaves ghost TLS state behind.

The cosmetic casualty: (-76) UNKNOWN ERROR CODE (004C) now appears on every request. That’s MBEDTLS_ERR_NET_RECV_FAILED, triggered when HTTPClient’s internal teardown does a zero-byte read probe after the server has already started closing. It’s the TLS equivalent of knocking on a door that’s already closing. Harmless, cosmetic, and literally unfixable without setting CORE_DEBUG_LEVEL=0 because Arduino’s log_e() bypasses ESP-IDF’s log level system entirely.

Every ESP32 Spotify project we found has this same error in their serial output. It’s the club membership sticker. Welcome to the club.

Sprint 2’s retro said secureClient.stop() was “the polite way” and setReuse(false) was “the rude way.” Turns out the rude way is the correct way. Manners don’t survive idle time in TLS. We updated our memory accordingly (⌐■_■)

Spotify’s 200 That Means “No” ◉_◉

Sprint 2 flagged this: Spotify returns 200 (not 204) when a device is “active” but nothing is playing. Sprint 3 confirmed it’s even more nuanced. When you pause a track:

HTTP status: 200
is_playing: false
Full track data still present
All device info still there

204 only shows up when literally no device is active anywhere. So “paused” and “nothing playing” are three different states: playing (200, is_playing: true), paused (200, is_playing: false), gone (204). We handle all three.

The 403 That Killed Audio Features ╥﹏╥

Implemented spotifyGetAudioFeatures(). Clean code, correct URL, proper auth. First real request: 403 Forbidden.

Spotify restricted the /v1/audio-features and /v1/audio-analysis endpoints in late 2024. Apps need “extended quota mode” approval to access them. Our app doesn’t have it. We tested both endpoints just to be sure. Both 403.

The code is correct and ready. Spotify’s policy isn’t. We’ll revisit when/if we get extended quota, or find an alternative approach. The struct, the parser, the cache — all there, waiting for a 200 that may never come.

Architecture Decisions

Authenticated Request Wrappers

All Spotify API calls flow through three private helpers: authGet(), authPost(), authPut(). Each calls prepareAuth() which:

Checks rate limit deadline
Checks if auth has permanently failed
Triggers token refresh if needed
Builds the Bearer header

No API function ever touches tokens directly. Token lifecycle is completely invisible to the rest of the module. Change the auth mechanism tomorrow and the API functions don’t know or care.

Per-Track Caching for Audio Features

Even though the endpoint is 403’d right now, the caching pattern is worth noting: cachedFeaturesTrackId stores the last fetched track ID. If you ask for features for the same track twice, the second call returns instantly from cache. New track ID invalidates automatically. One track cached at a time, because ESP32 RAM is precious.

Queue Capped at 5

Spotify’s queue endpoint returns the entire queue. On ESP32 with ~300KB RAM, parsing a 44KB JSON response is already pushing it. We take the first 5 tracks and throw away the rest. Nobody’s scrolling through 50 tracks on a 240x135 screen anyway.

Things That Just Worked

Multi-artist parsing. “Rain” by Aitch, AJ Tracey, Tay Keith. Three artists, comma-joined on the first try. The artists[] array iteration with comma separator just worked.
Queue parsing. 44KB JSON response, 5 tracks extracted, zero issues. ArduinoJson handled it without breaking a sweat despite the response being nearly half our available heap.
Track change detection. Switch songs on your phone, see [SPOTIFY] Track changed: "Rain" appear on serial within 3 seconds. Simple trackId comparison, no false positives.
Play state detection. [SPOTIFY] Paused: "Ooh La La" by AntsLive vs [SPOTIFY] Playing: "Always Everywhere" by Charli xcx. One ternary operator, zero ambiguity.
The POST overloads. Skip next/previous use POST (not PUT like other commands). Added an auth-capable httpsPost overload and authPost wrapper, plugged right in. The network layer’s function overload pattern keeps growing gracefully.

By the Numbers

Metric	Value
Flash usage	81.4% (1,067 KB / 1.3 MB)
RAM usage	14.8% (48.6 KB / 327 KB)
Flash increase from Sprint 2	+0.6% (full API client is surprisingly lean)
API endpoints implemented	9 (now-playing, play, pause, next, prev, volume, shuffle, repeat, queue)
API endpoints blocked by Spotify	2 (audio-features, audio-analysis)
Polling interval	3 seconds (configurable 1-5s)
Queue response size	44 KB (we keep 5 tracks)
Track response size	~2.6-3.7 KB
Playback commands testable	0 of 7 (no trigger yet, all implemented)
TLS error messages per request	1 (cosmetic, unfixable, accepted)
Charli xcx tracks during testing	many

What’s Next

Sprint 4: LVGL Integration. The ESP32 has data. It has commands. Now it needs a face. LVGL comes in to replace the raw TFT_eSPI clock display with a proper UI framework. Layouts, widgets, animations, encoder input handling. The 240x135 screen is about to become a lot more interesting.

Sprint 3 complete. The ESP32 can hear what you’re listening to, see what’s coming next, and has seven remote control buttons that are all wired up to nothing. The brain is ready. Now it needs hands. ᕙ(⇀‸↼‶)ᕗ