The Audio Features
AutoQ's mood engine maps every track onto a two-dimensional emotional space using arousal (energy/intensity) and valence (positivity/pleasantness). These two dimensions come from Russell's circumplex model of affect — the same framework used in music psychology research.
Extracting Mood Data with Truedat — Music Mood Extractor
Truedat is the companion tool that runs Essentia audio analysis on your library and produces mbxmoods.json — the raw feature data that powers everything on this page. Without it, AutoQ falls back to genre/BPM metadata estimates (lower confidence). With it, you get full 14-feature mood estimation for every analyzed track.
What You Need
| Component | Purpose | |
|---|---|---|
| truedat.exe | Mood extraction orchestrator — reads your library, runs Essentia on each track, writes mbxmoods.json |
Required |
| essentia_streaming_extractor_music.exe | Essentia audio feature extractor — the engine that analyzes waveforms. Ships with Truedat in dist/truedat/ |
Required |
| iTunes Music Library.xml | Library index with file paths. MusicBee can export this — see Step 1 below | Required |
| ffmpeg.exe | Multi-channel audio downmixing. Only needed if your library has surround-sound files | Optional |
Setup Steps
-
Enable iTunes XML Export in MusicBee
Go to Edit → Preferences → Library and check "iTunes Music Library.xml". MusicBee writes this file to your library folder and updates it automatically when your library changes.
Note the path — you'll pass it to Truedat in Step 3. It's usually something like
C:\Users\You\Music\MusicBee\iTunes Music Library.xml. -
Download Truedat
Grab the latest release from github.com/halrad-com/Truedat/dist/truedat. The folder contains
truedat.exeand the bundled Essentia executables. Extract it anywhere — no installation needed. -
Run Truedat
Open a terminal in the Truedat folder and point it at your library XML:
truedat.exe "C:\Users\You\Music\MusicBee\iTunes Music Library.xml"Truedat processes every track through Essentia's audio feature extractor. This is CPU-intensive — each track takes a few seconds. By default it uses all CPU cores (
-p Nto limit). A 10,000-track library takes roughly 8–12 hours; 50,000+ tracks may take multiple days.It's incremental — you can stop and restart at any time. Truedat skips tracks that haven't changed since the last run. Progress is saved every 25 tracks.
-
Place mbxmoods.json
Truedat writes
mbxmoods.jsonnext to the iTunes XML file. MBXHub looks for it in two locations:- Your MusicBee Library folder (same directory as the XML) — preferred
%APPDATA%\MusicBee\MBXHub\— alternative location
If the file is already in your library folder (the default), no move is needed.
-
Configure the Mood Tag Field (optional but recommended)
In MusicBee, go to Edit → Preferences → Tags (1) → Custom Tags and set one custom tag (e.g. Custom1) to "AutoQ Mood". This lets MBXHub write mood labels (like "Upbeat + Energetic") directly into your MusicBee tags, visible in the library view.
Make sure the tag name matches
autoQ.moodTagFieldNameinmbxhub.json(default: "AutoQ Mood"). -
Restart MusicBee
MBXHub loads
mbxmoods.jsonon startup. After restarting, check the MBXHub log or the AutoQ Tuning Console — you should see your track count in the mood cache stats. The dashboard will show mood labels and confidence badges for the current track.
Useful Options
Output Files
| File | Contents |
|---|---|
| mbxmoods.json | Per-track raw features (BPM, mode, loudness, spectral metrics, danceability, dissonance, pitch salience, chord changes, MFCCs), pre-computed valence/arousal, and metadata. This is what MBXHub reads. |
| mbxmoods-errors.csv | Tracks that failed analysis — file path, error reason, file size. Use --retry-errors to reprocess after fixing issues. |
| truedat.log | Full console output (only when --audit is used). Useful for diagnosing extraction failures. |
How It Works Together
The pipeline is:
- Essentia analyzes each audio file and extracts 14 raw features from the waveform
- Truedat orchestrates the extraction, stores raw features in
mbxmoods.json - MBXHub loads the raw features, applies normalization, genre adjustment, and weighted formulas to compute valence/arousal
- AutoQ uses the V/A coordinates for mood matching, scoring, and tag writing
Crucially, steps 3 and 4 happen at runtime with your current weights. You can retune the estimation mixer, change genre profiles, or adjust confidence thresholds without re-running Essentia. The raw features are permanent — only the interpretation changes.
The Features — Reference
The rest of this page is reference material. Fourteen audio features extracted from the waveform drive the mood mapping. Eight feed the arousal estimate, six feed valence. Some features (like danceability) contribute to both axes. Each captures a different psychoacoustic cue that listeners intuitively associate with energy or pleasantness. Genre-aware weight adjustment adapts how these features combine per genre, and confidence scoring tells you how much to trust each mood estimate.
AutoQ Tuning Console — every weight on this page is a slider
Mood Quadrants
The arousal-valence plane divides into four emotional quadrants. Every mood channel in AutoQ targets a specific point in this space.
High Arousal + High Valence
Energetic, upbeat, euphoric
EDM, pop, funk, disco, power pop
Fast tempo, bright timbre, strong beats, major keys
High Arousal + Low Valence
Tense, aggressive, intense
Metal, hard rock, industrial, hardcore punk
Distortion, high energy, dissonance, minor keys
Low Arousal + High Valence
Calm, pleasant, serene
Chillhop, acoustic folk, soft jazz, bossa nova
Warm timbre, consonance, smooth textures, gentle dynamics
Low Arousal + Low Valence
Sad, subdued, melancholic
Ambient drone, slow blues, lo-fi, funeral doom
Slow tempo, dark timbre, soft dynamics, minor keys
Arousal Features (8 inputs)
These describe how energetic, intense, or activated the audio feels. High values push tracks toward the top of the mood space. Arousal features measure physical intensity — they tell you how much energy the sound carries, not whether the emotion is positive or negative.
1. Tempo (BPM)
The speed of the beat, measured in beats per minute. The most direct signal of musical energy.
Extracted using beat-tracking algorithms. Absolute range: bpmMin/bpmMax (80/170). Weight: arousalWeightBpm (0.18).
2. Loudness
Perceptual intensity accounting for human hearing sensitivity. Weighted by frequency response curves (EBU R128 integrated loudness) so it matches what listeners actually perceive.
Absolute range: loudnessMin/loudnessMax (-25/-5 dB). Weight: arousalWeightLoudness (0.13).
3. Spectral Flux
How much the frequency spectrum changes from one frame to the next. Tracks rapid timbral shifts — onsets, transients, dynamic variation.
Frame-by-frame spectral difference. Absolute range: 0 to fluxMax (0.15). Weight: arousalWeightFlux (0.13).
4. Spectral Centroid
The "center of mass" of the frequency spectrum — the brightness of the sound. Higher centroid means more high-frequency energy. Centroid tracks intensity and excitement, not emotional positivity — a bright, aggressive metal track and a bright, happy pop track both have high centroids.
Computed from the magnitude spectrum. Absolute range: centroidMin/centroidMax (400/2500 Hz). Weight: arousalWeightCentroid (0.14).
5. Danceability
A composite measure of rhythmic regularity and beat strength. Captures how naturally the music invites physical movement.
Computed from tempo stability and beat histogram. Absolute range: 0 to danceMax (2.0). Arousal weight: arousalWeightDance (0.08). Also contributes to valence (0.10).
6. Onset Rate
The density of detected note or beat onsets per second. Measures how "busy" or "active" the music feels.
Derived from transient detection. Absolute range: 0 to onsetRateMax (6.0 events/sec). Weight: arousalWeightOnsetRate (0.13).
7. Zero-Crossing Rate (ZCR)
How often the waveform crosses zero amplitude per unit time. Distinguishes noisy, percussive sounds from clean, tonal ones.
Counted directly from the time-domain waveform. Absolute range: 0 to zcrMax (0.15). Weight: arousalWeightZcr (0.08).
8. RMS Energy
Root Mean Square of the amplitude over time. Tracks the raw dynamic intensity of the waveform — how hard the signal is hitting.
Computed directly from the waveform amplitude. Related to loudness but without psychoacoustic weighting. Absolute range: 0 to rmsMax (0.01). Weight: arousalWeightRms (0.13).
Valence Features (6 active inputs)
These describe how pleasant, consonant, or emotionally positive the audio feels. High values push tracks toward the right side of the mood space. Valence features measure harmonic and tonal qualities — they tell you whether the emotion leans happy or sad, resolved or tense.
9. Mode (Major/Minor)
The tonal center and mode of the piece. The strongest single predictor of perceived musical positivity.
Detected via harmonic pitch class profiles. Score: modeScoreMajor (0.8) vs. modeScoreMinor (0.4). Weight: valenceWeightMode (0.30). Scores are softened from 1.0/0.0 to reduce the major/minor cliff.
10. Dissonance
Quantifies roughness — the perceptual beating between nearby frequencies. Based on models of sensory dissonance (Plomp-Levelt curves). Inverted for valence: low dissonance (consonance) = high valence.
Computed by summing roughness contributions of all frequency pairs. Ranges 0–1; inverted before weighting so consonant = high valence. Weight: valenceWeightDissonance (0.25).
11. Pitch Salience
How clearly a dominant pitch emerges from the signal. Measures harmonic clarity — whether the sound has a strong, recognizable tonal center or is diffuse and noisy.
Computed from autocorrelation of the spectrum. Ranges 0–1. Weight: valenceWeightPitchSalience (0.15).
12. Chord Changes Rate
How frequently the harmonic content shifts between chords. Captures harmonic movement — static harmony feels different from rapid chord progressions.
Extracted from chroma features (pitch class energy over time). Absolute range: 0 to chordsRateMax (0.2). Weight: valenceWeightChords (0.10).
13. MFCC (Mel-Frequency Cepstral Coefficient 2)
The second cepstral coefficient captures the broad spectral slope — the balance between low and high frequency energy. This is a timbral fingerprint that distinguishes warm, rounded sounds from bright, harsh ones.
Extracted from mel-scaled spectral analysis. MBXHub uses coefficient #2 (spectral slope). Absolute range: mfccMin/mfccMax (50/250). Weight: valenceWeightMfcc (0.10).
14. Danceability (shared)
Danceability contributes to both axes. For valence, rhythmic regularity and beat strength are associated with positive, accessible music — songs that make you want to move tend to feel upbeat.
Same danceability measure as arousal feature #5. Valence weight: valenceWeightDance (0.10). Arousal weight: arousalWeightDance (0.08).
Normalization
Raw feature values span very different scales (BPM in the hundreds, RMS in the thousandths). Before weighting, every feature is normalized to 0–1. MBXHub offers two normalization modes:
Percentile Normalization (default)
Enabled by usePercentileNormalization: true (the default). Each feature is ranked against every track in your library, and the rank is converted to a 0–1 percentile. The track with the lowest BPM in your library gets 0.0, the highest gets 1.0, and everything else is spread proportionally.
This is library-adaptive: a rock-heavy library where every track has similar loudness will still get full spread on that axis because the ranking is relative. No manual range tuning needed. Falls back to absolute normalization for libraries with fewer than 10 analyzed tracks.
Absolute Normalization
When percentile mode is off, features are clamped to fixed min/max ranges calibrated from a diverse 33K-track library. These are the centroidMin/Max, loudnessMin/Max, etc. settings. This works well for diverse libraries but can compress the range for genre-focused collections where all values land mid-range.
Genre-Aware Weight Adjustment
Fixed weights treat every genre the same, but acoustic features mean different things in different musical contexts. A 120 BPM jazz track is "fast for jazz" but "mellow for EDM." A loud metal track has very different emotional intent from a loud pop track. Genre-aware adjustment solves this by applying per-genre multipliers to the base weights before combining features.
When enabled (useGenreAdjustment: true, the default), MBXHub reads each track's genre tag from MusicBee and looks up a matching genre profile. The profile contains 14 multipliers — one per feature. Each multiplier scales the base weight for that feature, then the result is renormalized so the total weight sum stays the same.
This means the relative importance of features shifts per genre without changing the overall magnitude of the score. Pop and rock use default weights (all multipliers = 1.0) as the reference genre.
Built-in Genre Profiles
MBXHub ships with 8 genre families covering ~20 genre tags. Each is tuned to the acoustic norms of that genre:
| Genre Family | Tags Matched | Key Adjustments | Rationale |
|---|---|---|---|
| Electronic / Dance | Electronic, Dance, EDM, House, Techno, Trance | BPM ↓0.6, Loudness ↑1.4, Flux ↑1.3 | BPM is uniformly high (120-150) so it carries little information. Loudness and spectral dynamics vary more meaningfully. |
| Metal / Hardcore | Metal, Hardcore, Punk | Loudness ↑1.5, Centroid ↑1.8, Dissonance ↑1.5, Mode ↓0.5 | Brightness and distortion are key energy markers. Mode is less informative — metal is overwhelmingly minor. |
| Jazz / Blues | Jazz, Blues | BPM ↑1.5, Chords ↑1.8, PitchSalience ↑1.4, Loudness ↓0.6 | Harmonic complexity and melodic clarity distinguish moods. Loudness varies less. BPM is more informative (wide range from ballads to bebop). |
| Classical / Ambient | Classical, Ambient | Flux ↑1.5, Mode ↑1.3, Loudness ↓0.5, Dance ↓0.3 | Spectral dynamics and tonality matter most. Danceability and loudness are consistently low and uninformative. |
| Hip-Hop / Rap | Hip-Hop, Rap | OnsetRate ↑1.4, RMS ↑1.3, BPM ↓0.7 | Beat density and energy envelope drive mood. BPM often doesn't reflect perceived energy (half-time patterns are common). |
| Folk / Country | Folk, Country | Mode ↑1.3, Chords ↑1.3, PitchSalience ↑1.2 | Harmonic and tonal features are stronger mood indicators in acoustic, melodic genres. |
| R&B / Soul | R&B, Soul | Mode ↑1.2, Dance ↑1.3, Loudness ↓0.8 | Groove and tonality carry emotional weight. Loudness is relatively consistent. |
| Pop / Rock | Pop, Rock | All 1.0 (reference) | Default weights are calibrated against a general-purpose library. Pop/rock is the baseline. |
Genre matching uses the track's MusicBee genre tag. Exact matches are checked first, then the first word of multi-word genres (e.g., "Progressive Rock" matches "Rock" if there's no "Progressive Rock" profile). Custom profiles in autoQ.genreProfiles override built-in ones for the same genre name.
Custom Genre Profiles
Add your own profiles in mbxhub.json under autoQ.genreProfiles:
Confidence Scoring
Not all mood estimates are created equal. A track analyzed by Essentia with features that land squarely on a mood channel is high confidence. A metadata-only fallback for an unknown genre is low confidence. MBXHub computes a confidence score (0–1) for every mood estimate and surfaces it in the dashboard, player, tuning console, and API.
How Confidence Is Computed
The two factors multiply together. An Essentia track right on top of a mood channel gets ~0.9. An Essentia track in between channels might get ~0.6. A fallback track with a genre match gets ~0.35–0.45. A fallback track with an unknown genre barely reaches 0.2.
Confidence Labels
High Confidence
Essentia data, features land near a mood channel. Mood label is reliable.
Medium Confidence
Essentia data but V/A is between channels, or fallback with a good genre match.
Low Confidence
Fallback with unknown genre, or V/A far from any channel. Treat with caution.
Confidence Gate
When MBXHub writes mood tags to MusicBee's custom field, it checks confidence against the confidenceMinForTag threshold (default: 0.3). Tracks below this threshold don't get tagged — preventing low-quality mood labels from cluttering your library.
Where Confidence Appears
- Dashboard — Color-coded badge next to the mood label on now-playing
- Player — Percentage badge next to the mood channel name
- AutoQ Tuning Console — Confidence badge in the Now Playing mood panel
- REST API —
confidence(0–1) andconfidenceLabel("high"/"medium"/"low") inGET /autoq/track-mood
The Estimation Mixer
Think of the estimation engine as a mixing console. Each of the 14 audio features is a channel with its own fader (weight). Push a fader up and that feature has more influence on the final mood coordinate; pull it down and it fades out. You can retune the mix at any time without re-analyzing your music — MBXHub recomputes valence and arousal from the raw features on every startup using your current weights.
The mixer has two signal paths:
- Essentia path (primary) — 14 waveform-extracted features → normalize (percentile or absolute) → genre-adjust weights → weighted sum → (valence, arousal) + confidence. This is the full-fidelity path for tracks analyzed by Truedat.
- Metadata fallback — genre lookup table → adjust with BPM, rating, and year metadata → (valence, arousal) + confidence. Used automatically for tracks without Essentia data. Coarser, but still places tracks in the right quadrant. Confidence is lower to reflect the reduced precision.
How MBXHub Combines Them
Features are extracted once by Essentia and stored in mbxmoods.json by the Truedat tool. MBXHub's AutoQ engine reads these raw values and applies weighted sums to estimate arousal and valence:
All weights are tunable via mbxhub.json under autoQ.estimation or through the AutoQ tuning page in MBXHub's built-in dashboard. You can change how much each feature contributes without re-running Essentia. Genre profiles shift the relative importance of features per genre, and confidence scores tell you how much to trust each estimate. The full parameter list is available in MBXHub's built-in API docs (/docs) once installed.