The Audio Features

AutoQ's mood engine maps every track onto a two-dimensional emotional space using arousal (energy/intensity) and valence (positivity/pleasantness). These two dimensions come from Russell's circumplex model of affect — the same framework used in music psychology research.

Extracting Mood Data with Truedat — Music Mood Extractor

Truedat is the companion tool that runs Essentia audio analysis on your library and produces mbxmoods.json — the raw feature data that powers everything on this page. Without it, AutoQ falls back to genre/BPM metadata estimates (lower confidence). With it, you get the full 15-feature mood estimation (the 14 described here plus dynamic range) for every analyzed track, plus 38 additional extended Essentia features (v0.5.2.4+) that feed downstream classifiers — see Extended features below.

What You Need

Component	Purpose
truedat.exe	Mood extraction orchestrator — reads your library, runs Essentia on each track, writes `mbxmoods.json`	Required
essentia_streaming_extractor_music.exe	Essentia audio feature extractor — the engine that analyzes waveforms. Ships with Truedat in `dist/truedat/`	Required
iTunes Music Library.xml	Library index with file paths. MusicBee can export this — see Step 1 below	Required
ffmpeg.exe	Multi-channel audio downmixing. Only needed if your library has surround-sound files	Optional
essentia_streaming_md5.exe	Computes `audioStreamSha256` (hash of the decoded audio payload) — portable identity for the same track across different paths or machines. Ships with Truedat in `dist/truedat/`	Optional

Setup Steps

Enable iTunes XML Export in MusicBee

Go to Edit → Preferences → Library and check "iTunes Music Library.xml". MusicBee writes this file to your library folder and updates it automatically when your library changes.

Note the path — you'll pass it to Truedat in Step 3. It's usually something like C:\Users\You\Music\MusicBee\iTunes Music Library.xml.
Download Truedat

Grab the latest release from github.com/halrad-com/Truedat/dist/truedat. The folder contains truedat.exe and the bundled Essentia executables. Extract it anywhere — no installation needed.
Run Truedat

Open a terminal in the Truedat folder and point it at your library XML:

truedat.exe "C:\Users\You\Music\MusicBee\iTunes Music Library.xml"

Truedat processes every track through Essentia's audio feature extractor. This is CPU-intensive — each track takes a few seconds. By default it uses all CPU cores (-p N to limit). A 10,000-track library takes roughly 8–12 hours; 50,000+ tracks may take multiple days.

It's incremental — you can stop and restart at any time. Truedat skips tracks that haven't changed since the last run. Progress is saved every 25 tracks.
Place mbxmoods.json

Truedat writes mbxmoods.json next to the iTunes XML file. MBXHub looks for it in two locations:
- Your MusicBee Library folder (same directory as the XML) — preferred
- %APPDATA%\MusicBee\MBXHub\ — alternative location
If the file is already in your library folder (the default), no move is needed.

You can also set an explicit path in mbxhub.json using the moodsFilePath setting. For example:

"moodsFilePath": "C:\\Users\\You\\Music\\mbxmoods.json"

When empty (the default), MBXHub searches your Library folder first, then %APPDATA%\MusicBee\MBXHub\.
Configure the Mood Tag Field (optional but recommended)

In MusicBee, go to Edit → Preferences → Tags (1) → Custom Tags and set one custom tag (e.g. Custom1) to "AutoQ Mood". This lets MBXHub write mood labels (like "Upbeat + Energetic") directly into your MusicBee tags, visible in the library view.

Make sure the tag name matches autoQ.moodTagFieldName in mbxhub.json (default: "AutoQ Mood").
Restart MusicBee

MBXHub loads mbxmoods.json on startup. After restarting, check the MBXHub log or the AutoQ Tuning Console — you should see your track count in the mood cache stats. The dashboard will show mood labels and confidence badges for the current track.

You can also verify in the AutoQ section of MBXHub Settings — it will show whether a moods file was found, how many tracks it contains, and whether the custom mood tag field is configured.

Useful Options

# Limit to 4 CPU cores (default: all cores) truedat.exe "library.xml" -p 4 # Retry tracks that failed on a previous run truedat.exe "library.xml" --retry-errors # Validate file paths without re-analyzing truedat.exe "library.xml" --fixup # Check for problematic filenames before analyzing truedat.exe "library.xml" --check-filenames # Full audit log for troubleshooting truedat.exe "library.xml" --audit

Output Files

File	Contents
mbxmoods.json	Per-track raw features (BPM, mode, loudness, spectral metrics, danceability, dissonance, pitch salience, chord changes, MFCCs), pre-computed valence/arousal, and metadata. v0.5.2.4+ also captures dynamic range (LRA) from the Essentia `loudness_ebu128` block plus 39 extended features (Bark/ERB/Mel band stats, silence rates, spectral entropy/rolloff/complexity, chord strength, HPCP entropy, etc.), and an `audioStreamSha256` identity hash so the same track resolves across different paths or machines. This is what MBXHub reads.
mbxmoods-errors.csv	Tracks that failed analysis — file path, error reason, file size. Use `--retry-errors` to reprocess after fixing issues.
truedat.log	Full console output (only when `--audit` is used). Useful for diagnosing extraction failures.

How It Works Together

The pipeline is:

Essentia analyzes each audio file and extracts 14 raw features from the waveform
Truedat orchestrates the extraction, stores raw features in mbxmoods.json
MBXHub loads the raw features, applies normalization, genre adjustment, and weighted formulas to compute valence/arousal
AutoQ uses the V/A coordinates for mood matching, scoring, and tag writing

Crucially, steps 3 and 4 happen at runtime with your current weights. You can retune the estimation mixer, change genre profiles, or adjust confidence thresholds without re-running Essentia. The raw features are permanent — only the interpretation changes.

The Features — Reference

The rest of this page is reference material. Fourteen audio features extracted from the waveform drive the mood mapping described here. Eight feed the arousal estimate, six feed valence. Some features (like danceability) contribute to both axes. Each captures a different psychoacoustic cue that listeners intuitively associate with energy or pleasantness. Genre-aware weight adjustment adapts how these features combine per genre, and confidence scoring tells you how much to trust each mood estimate. Truedat v0.5.2.4+ captures 39 more features beyond this set; see Extended features.

AutoQ Tuning Console — every weight on this page is a slider

Click to close AutoQ Tuning Console - all weights and scores are adjustable

Mood Quadrants

The arousal-valence plane divides into four emotional quadrants. Every mood channel in AutoQ targets a specific point in this space.

High Arousal + High Valence

Energetic, upbeat, euphoric

EDM, pop, funk, disco, power pop

Fast tempo, bright timbre, strong beats, major keys

High Arousal + Low Valence

Tense, aggressive, intense

Metal, hard rock, industrial, hardcore punk

Distortion, high energy, dissonance, minor keys

Low Arousal + High Valence

Calm, pleasant, serene

Chillhop, acoustic folk, soft jazz, bossa nova

Warm timbre, consonance, smooth textures, gentle dynamics

Low Arousal + Low Valence

Sad, subdued, melancholic

Ambient drone, slow blues, lo-fi, funeral doom

Slow tempo, dark timbre, soft dynamics, minor keys

Arousal Features (8 inputs)

These describe how energetic, intense, or activated the audio feels. High values push tracks toward the top of the mood space. Arousal features measure physical intensity — they tell you how much energy the sound carries, not whether the emotion is positive or negative.

1. Tempo (BPM)

The speed of the beat, measured in beats per minute. The most direct signal of musical energy.

High: Fast, driving, urgent — EDM, punk, thrash

Low: Slow, spacious, contemplative — ambient, ballads

Extracted using beat-tracking algorithms. Absolute range: bpmMin/bpmMax (80/170). Weight: arousalWeightBpm (0.18).

2. Loudness

Perceptual intensity accounting for human hearing sensitivity. Weighted by frequency response curves (EBU R128 integrated loudness) so it matches what listeners actually perceive.

High: Commanding, powerful, in-your-face — mastered pop, compressed metal

Low: Gentle, intimate, fragile — fingerpicking, field recordings

Absolute range: loudnessMin/loudnessMax (-25/-5 dB). Weight: arousalWeightLoudness (0.13).

3. Spectral Flux

How much the frequency spectrum changes from one frame to the next. Tracks rapid timbral shifts — onsets, transients, dynamic variation.

High: Dynamic, volatile, punchy — drum fills, genre-mashing, breakbeats

Low: Steady, droning, static — sustained pads, ambient washes

Frame-by-frame spectral difference. Absolute range: 0 to fluxMax (0.15). Weight: arousalWeightFlux (0.13).

4. Spectral Centroid

The "center of mass" of the frequency spectrum — the brightness of the sound. Higher centroid means more high-frequency energy. Centroid tracks intensity and excitement, not emotional positivity — a bright, aggressive metal track and a bright, happy pop track both have high centroids.

High: Bright, sharp, exciting — cymbals, distortion, brass

Low: Warm, mellow, relaxed — bass-heavy, acoustic, muted

Computed from the magnitude spectrum. Absolute range: centroidMin/centroidMax (400/2500 Hz). Weight: arousalWeightCentroid (0.14).

5. Danceability

A composite measure of rhythmic regularity and beat strength. Captures how naturally the music invites physical movement.

High: Groovy, steady, propulsive — disco, house, funk

Low: Free-form, irregular, rubato — free jazz, ambient, spoken word

Computed from tempo stability and beat histogram. Absolute range: 0 to danceMax (2.0). Arousal weight: arousalWeightDance (0.08). Also contributes to valence (0.10).

6. Onset Rate

The density of detected note or beat onsets per second. Measures how "busy" or "active" the music feels.

High: Dense, busy, relentless — drum fills, blast beats, fast arpeggios

Low: Smooth, minimal, spacious — sustained pads, ambient drones

Derived from transient detection. Absolute range: 0 to onsetRateMax (6.0 events/sec). Weight: arousalWeightOnsetRate (0.13).

7. Zero-Crossing Rate (ZCR)

How often the waveform crosses zero amplitude per unit time. Distinguishes noisy, percussive sounds from clean, tonal ones.

High: Noisy, sharp, aggressive — distorted guitar, hi-hats, static

Low: Clean, tonal, calm — sine waves, flutes, sustained strings

Counted directly from the time-domain waveform. Absolute range: 0 to zcrMax (0.15). Weight: arousalWeightZcr (0.08).

8. RMS Energy

Root Mean Square of the amplitude over time. Tracks the raw dynamic intensity of the waveform — how hard the signal is hitting.

High: Loud, forceful, compressed — stadium rock, EDM drops

Low: Soft, subdued, dynamic — chamber music, whispered vocals

Computed directly from the waveform amplitude. Related to loudness but without psychoacoustic weighting. Absolute range: 0 to rmsMax (0.01). Weight: arousalWeightRms (0.13).

Valence Features (6 active inputs)

These describe how pleasant, consonant, or emotionally positive the audio feels. High values push tracks toward the right side of the mood space. Valence features measure harmonic and tonal qualities — they tell you whether the emotion leans happy or sad, resolved or tense.

9. Mode (Major/Minor)

The tonal center and mode of the piece. The strongest single predictor of perceived musical positivity.

Major: Happy, bright, resolved — pop anthems, marches, hymns

Minor: Sad, tense, yearning — blues, laments, film noir

Detected via harmonic pitch class profiles. Score: modeScoreMajor (0.8) vs. modeScoreMinor (0.4). Weight: valenceWeightMode (0.30). Scores are softened from 1.0/0.0 to reduce the major/minor cliff.

10. Dissonance

Quantifies roughness — the perceptual beating between nearby frequencies. Based on models of sensory dissonance (Plomp-Levelt curves). Inverted for valence: low dissonance (consonance) = high valence.

High dissonance: Tension, clashing, unpleasant — tritones, clusters, microtonal intervals

Low dissonance: Consonance, resolution, pleasant — octaves, fifths, triads

Computed by summing roughness contributions of all frequency pairs. Ranges 0–1; inverted before weighting so consonant = high valence. Weight: valenceWeightDissonance (0.25).

11. Pitch Salience

How clearly a dominant pitch emerges from the signal. Measures harmonic clarity — whether the sound has a strong, recognizable tonal center or is diffuse and noisy.

High: Clear, tonal, melodic — solo vocals, piano, clean guitar

Low: Diffuse, atonal, noisy — white noise, percussion-heavy, distorted

Computed from autocorrelation of the spectrum. Ranges 0–1. Weight: valenceWeightPitchSalience (0.15).

12. Chord Changes Rate

How frequently the harmonic content shifts between chords. Captures harmonic movement — static harmony feels different from rapid chord progressions.

High: Harmonically active, shifting, colorful — jazz, prog, neo-soul

Low: Static, droning, minimal — one-chord vamps, ambient, drone metal

Extracted from chroma features (pitch class energy over time). Absolute range: 0 to chordsRateMax (0.2). Weight: valenceWeightChords (0.10).

13. MFCC (Mel-Frequency Cepstral Coefficient 2)

The second cepstral coefficient captures the broad spectral slope — the balance between low and high frequency energy. This is a timbral fingerprint that distinguishes warm, rounded sounds from bright, harsh ones.

High: Warm, rich, full-bodied — orchestral, acoustic, warm synths

Low: Thin, hollow, cold — lo-fi, sparse, metallic timbres

Extracted from mel-scaled spectral analysis. MBXHub uses coefficient #2 (spectral slope). Absolute range: mfccMin/mfccMax (50/250). Weight: valenceWeightMfcc (0.10).

14. Danceability (shared)

Danceability contributes to both axes. For valence, rhythmic regularity and beat strength are associated with positive, accessible music — songs that make you want to move tend to feel upbeat.

High: Groovy, infectious, feel-good — Motown, disco, pop

Low: Unstructured, free, contemplative — ambient, avant-garde

Same danceability measure as arousal feature #5. Valence weight: valenceWeightDance (0.10). Arousal weight: arousalWeightDance (0.08).

Extended Essentia Features (v0.5.2.4+)

Truedat v0.5.2.4 and later also captures 40 additional Essentia features beyond the 14 scored here. They're stored on every analyzed track but are not wired into the arousal/valence calculation today — they exist to feed downstream classifiers (e.g. perceptual-signature profiles like Crunchy / Angry / Punchy) and to let Fleet-wide analyses pick up richer acoustic fingerprints without rescanning. All fields are nullable: missing values stay null instead of being coerced to zero.

The additions, grouped by family:

Loudness & dynamics: loudnessMomentary, loudnessShortTerm, dynamicRange (BS.1770 LRA in LU from Essentia loudness_ebu128.loudness_range, source tag essentia-lra), replayGain.
Silence & activity: silenceRate20dB, silenceRate30dB, silenceRate60dB.
Spectral shape: spectralRolloff, spectralComplexity, spectralEntropy, spectralKurtosis, spectralSkewness, spectralSpread, spectralStrongPeak, spectralDecrease.
Spectral energy bands: spectralEnergy, spectralEnergyLow, spectralEnergyMidLow, spectralEnergyMidHigh, spectralEnergyHigh, hfc (high-frequency content).
Bark-band stats: barkCrest, barkFlatness, barkKurtosis, barkSkewness, barkSpread.
ERB-band stats: erbCrest, erbFlatness, erbKurtosis, erbSkewness, erbSpread.
Mel-band stats: melCrest, melFlatness, melKurtosis, melSkewness, melSpread.
Rhythm & harmony: beatsLoudness, chordsStrength, hpcpCrest, hpcpEntropy.

Older Truedat runs (pre-0.5.2.4) don't emit these fields; the plugin reads them as null and relies on the 14-feature V/A calculation as before. Mixed-version libraries work fine — each track carries whatever was captured at its last scan.

Normalization

Raw feature values span very different scales (BPM in the hundreds, RMS in the thousandths). Before weighting, every feature is normalized to 0–1. MBXHub offers two normalization modes:

Percentile Normalization (default)

Enabled by usePercentileNormalization: true (the default). Each feature is ranked against every track in your library, and the rank is converted to a 0–1 percentile. The track with the lowest BPM in your library gets 0.0, the highest gets 1.0, and everything else is spread proportionally.

This is library-adaptive: a rock-heavy library where every track has similar loudness will still get full spread on that axis because the ranking is relative. No manual range tuning needed. Falls back to absolute normalization for libraries with fewer than 10 analyzed tracks.

Absolute Normalization

When percentile mode is off, features are clamped to fixed min/max ranges calibrated from a diverse 33K-track library. These are the centroidMin/Max, loudnessMin/Max, etc. settings. This works well for diverse libraries but can compress the range for genre-focused collections where all values land mid-range.

Genre-Aware Weight Adjustment

Fixed weights treat every genre the same, but acoustic features mean different things in different musical contexts. A 120 BPM jazz track is "fast for jazz" but "mellow for EDM." A loud metal track has very different emotional intent from a loud pop track. Genre-aware adjustment solves this by applying per-genre multipliers to the base weights before combining features.

When enabled (useGenreAdjustment: true, the default), MBXHub reads each track's genre tag from MusicBee and looks up a matching genre profile. The profile contains 14 multipliers — one per feature. Each multiplier scales the base weight for that feature, then the result is renormalized so the total weight sum stays the same.

// Genre adjustment formula w_effective[i] = w_base[i] * g_multiplier[i] // Renormalize so weights sum to original total w_final[i] = w_effective[i] * (sum(w_base) / sum(w_effective)) // Example: Electronic genre, BPM multiplier = 0.6 // BPM base weight 0.18 → effective 0.108 → renormalized ~0.11 // The 0.072 of lost BPM weight redistributes to other features

This means the relative importance of features shifts per genre without changing the overall magnitude of the score. Pop and rock use default weights (all multipliers = 1.0) as the reference genre.

Built-in Genre Profiles

MBXHub ships with 8 genre families covering ~20 genre tags. Each is tuned to the acoustic norms of that genre:

Genre Family	Tags Matched	Key Adjustments	Rationale
Electronic / Dance	Electronic, Dance, EDM, House, Techno, Trance	BPM ↓0.6, Loudness ↑1.4, Flux ↑1.3	BPM is uniformly high (120-150) so it carries little information. Loudness and spectral dynamics vary more meaningfully.
Metal / Hardcore	Metal, Hardcore, Punk	Loudness ↑1.5, Centroid ↑1.8, Dissonance ↑1.5, Mode ↓0.5	Brightness and distortion are key energy markers. Mode is less informative — metal is overwhelmingly minor.
Jazz / Blues	Jazz, Blues	BPM ↑1.5, Chords ↑1.8, PitchSalience ↑1.4, Loudness ↓0.6	Harmonic complexity and melodic clarity distinguish moods. Loudness varies less. BPM is more informative (wide range from ballads to bebop).
Classical / Ambient	Classical, Ambient	Flux ↑1.5, Mode ↑1.3, Loudness ↓0.5, Dance ↓0.3	Spectral dynamics and tonality matter most. Danceability and loudness are consistently low and uninformative.
Hip-Hop / Rap	Hip-Hop, Rap	OnsetRate ↑1.4, RMS ↑1.3, BPM ↓0.7	Beat density and energy envelope drive mood. BPM often doesn't reflect perceived energy (half-time patterns are common).
Folk / Country	Folk, Country	Mode ↑1.3, Chords ↑1.3, PitchSalience ↑1.2	Harmonic and tonal features are stronger mood indicators in acoustic, melodic genres.
R&B / Soul	R&B, Soul	Mode ↑1.2, Dance ↑1.3, Loudness ↓0.8	Groove and tonality carry emotional weight. Loudness is relatively consistent.
Pop / Rock	Pop, Rock	All 1.0 (reference)	Default weights are calibrated against a general-purpose library. Pop/rock is the baseline.

Genre matching uses the track's MusicBee genre tag. Exact matches are checked first, then the first word of multi-word genres (e.g., "Progressive Rock" matches "Rock" if there's no "Progressive Rock" profile). Custom profiles in autoQ.genreProfiles override built-in ones for the same genre name.

Custom Genre Profiles

Add your own profiles in mbxhub.json under autoQ.genreProfiles:

// mbxhub.json "autoQ": { "genreProfiles": { "synthwave": { "bpm": 0.7, "loudness": 1.3, "mode": 1.4, "dance": 1.2 } } } // Omitted multipliers default to 1.0 // All multipliers clamped to [0.0, 5.0]

Confidence Scoring

Not all mood estimates are created equal. A track analyzed by Essentia with features that land squarely on a mood channel is high confidence. A metadata-only fallback for an unknown genre is low confidence. MBXHub computes a confidence score (0–1) for every mood estimate and surfaces it in the dashboard, player, tuning console, and API.

How Confidence Is Computed

confidence = sourceBase × channelProximity // sourceBase — how reliable is the data source? Essentia-analyzed: 0.9 (confidenceEssentiaBase) Fallback + genre match: 0.45 (confidenceFallbackGenre) Fallback, no genre: 0.2 (confidenceFallbackNone) // channelProximity — how close is V/A to the nearest mood channel? channelProximity = 1 - (minDistance / sqrt(2)) // A track sitting right on a channel center: proximity ≈ 1.0 // A track in no-man's-land between channels: proximity ≈ 0.5

The two factors multiply together. An Essentia track right on top of a mood channel gets ~0.9. An Essentia track in between channels might get ~0.6. A fallback track with a genre match gets ~0.35–0.45. A fallback track with an unknown genre barely reaches 0.2.

Confidence Labels

High Confidence

70%+

High

Essentia data, features land near a mood channel. Mood label is reliable.

Medium Confidence

40–69%

Medium

Essentia data but V/A is between channels, or fallback with a good genre match.

Low Confidence

<40%

Low

Fallback with unknown genre, or V/A far from any channel. Treat with caution.

Confidence Gate

When MBXHub writes mood tags to MusicBee's custom field, it checks confidence against the confidenceMinForTag threshold (default: 0.3). Tracks below this threshold don't get tagged — preventing low-quality mood labels from cluttering your library.

Where Confidence Appears

Dashboard — Color-coded badge next to the mood label on now-playing
Player — Percentage badge next to the mood channel name
AutoQ Tuning Console — Confidence badge in the Now Playing mood panel
REST API — confidence (0–1) and confidenceLabel ("high"/"medium"/"low") in GET /autoq/track-mood

The Estimation Mixer

Think of the estimation engine as a mixing console. Each of the 14 audio features is a channel with its own fader (weight). Push a fader up and that feature has more influence on the final mood coordinate; pull it down and it fades out. You can retune the mix at any time without re-analyzing your music — MBXHub recomputes valence and arousal from the raw features on every startup using your current weights.

The mixer has two signal paths:

Essentia path (primary) — 14 waveform-extracted features → normalize (percentile or absolute) → genre-adjust weights → weighted sum → (valence, arousal) + confidence. This is the full-fidelity path for tracks analyzed by Truedat.
Metadata fallback — genre lookup table → adjust with BPM, rating, and year metadata → (valence, arousal) + confidence. Used automatically for tracks without Essentia data. Coarser, but still places tracks in the right quadrant. Confidence is lower to reflect the reduced precision.

How MBXHub Combines Them

Features are extracted once by Essentia and stored in mbxmoods.json by the Truedat tool. MBXHub's AutoQ engine reads these raw values and applies weighted sums to estimate arousal and valence:

// Arousal: how energetic (8 inputs) arousal = clamp( 0.18 * bpmNorm + 0.13 * loudnessNorm + 0.13 * fluxNorm + 0.14 * centroidNorm + 0.08 * danceNorm + 0.13 * onsetNorm + 0.08 * zcrNorm + 0.13 * rmsNorm , 0, 1) // Valence: how positive (6 active inputs) valence = clamp( 0.30 * modeScore + 0.10 * danceNorm + 0.25 * (1 - dissonanceNorm) + 0.15 * salienceNorm + 0.10 * chordsNorm + 0.10 * mfccNorm , 0, 1) // With genre adjustment (e.g., Electronic: BPM × 0.6) // w_eff[i] = w_base[i] * genre_mult[i] // w_final[i] = w_eff[i] * (sum(w_base) / sum(w_eff)) // Mood match: distance to target channel moodScore = 1 - sqrt((arousal - target.arousal)² + (valence - target.valence)²) / sqrt(2) // Confidence: how trustworthy is this estimate? confidence = sourceBase * (1 - minChannelDist / sqrt(2))

All weights are tunable via mbxhub.json under autoQ.estimation or through the AutoQ tuning page in MBXHub's built-in dashboard. You can change how much each feature contributes without re-running Essentia. Genre profiles shift the relative importance of features per genre, and confidence scores tell you how much to trust each estimate. The full parameter list is available in MBXHub's built-in API docs (/docs) once installed.