MBXHub

The Audio Features

AutoQ's mood engine maps every track onto a two-dimensional emotional space using arousal (energy/intensity) and valence (positivity/pleasantness). These two dimensions come from Russell's circumplex model of affect — the same framework used in music psychology research.

Extracting Mood Data with Truedat — Music Mood Extractor

Truedat is the companion tool that runs Essentia audio analysis on your library and produces mbxmoods.json — the raw feature data that powers everything on this page. Without it, AutoQ falls back to genre/BPM metadata estimates (lower confidence). With it, you get full 14-feature mood estimation for every analyzed track.

What You Need

ComponentPurpose
truedat.exe Mood extraction orchestrator — reads your library, runs Essentia on each track, writes mbxmoods.json Required
essentia_streaming_extractor_music.exe Essentia audio feature extractor — the engine that analyzes waveforms. Ships with Truedat in dist/truedat/ Required
iTunes Music Library.xml Library index with file paths. MusicBee can export this — see Step 1 below Required
ffmpeg.exe Multi-channel audio downmixing. Only needed if your library has surround-sound files Optional

Setup Steps

  1. Enable iTunes XML Export in MusicBee

    Go to Edit → Preferences → Library and check "iTunes Music Library.xml". MusicBee writes this file to your library folder and updates it automatically when your library changes.

    Note the path — you'll pass it to Truedat in Step 3. It's usually something like C:\Users\You\Music\MusicBee\iTunes Music Library.xml.

  2. Download Truedat

    Grab the latest release from github.com/halrad-com/Truedat/dist/truedat. The folder contains truedat.exe and the bundled Essentia executables. Extract it anywhere — no installation needed.

  3. Run Truedat

    Open a terminal in the Truedat folder and point it at your library XML:

    truedat.exe "C:\Users\You\Music\MusicBee\iTunes Music Library.xml"

    Truedat processes every track through Essentia's audio feature extractor. This is CPU-intensive — each track takes a few seconds. By default it uses all CPU cores (-p N to limit). A 10,000-track library takes roughly 8–12 hours; 50,000+ tracks may take multiple days.

    It's incremental — you can stop and restart at any time. Truedat skips tracks that haven't changed since the last run. Progress is saved every 25 tracks.

  4. Place mbxmoods.json

    Truedat writes mbxmoods.json next to the iTunes XML file. MBXHub looks for it in two locations:

    • Your MusicBee Library folder (same directory as the XML) — preferred
    • %APPDATA%\MusicBee\MBXHub\ — alternative location

    If the file is already in your library folder (the default), no move is needed.

  5. Configure the Mood Tag Field (optional but recommended)

    In MusicBee, go to Edit → Preferences → Tags (1) → Custom Tags and set one custom tag (e.g. Custom1) to "AutoQ Mood". This lets MBXHub write mood labels (like "Upbeat + Energetic") directly into your MusicBee tags, visible in the library view.

    Make sure the tag name matches autoQ.moodTagFieldName in mbxhub.json (default: "AutoQ Mood").

  6. Restart MusicBee

    MBXHub loads mbxmoods.json on startup. After restarting, check the MBXHub log or the AutoQ Tuning Console — you should see your track count in the mood cache stats. The dashboard will show mood labels and confidence badges for the current track.

Useful Options

# Limit to 4 CPU cores (default: all cores) truedat.exe "library.xml" -p 4 # Retry tracks that failed on a previous run truedat.exe "library.xml" --retry-errors # Validate file paths without re-analyzing truedat.exe "library.xml" --fixup # Check for problematic filenames before analyzing truedat.exe "library.xml" --check-filenames # Full audit log for troubleshooting truedat.exe "library.xml" --audit

Output Files

FileContents
mbxmoods.json Per-track raw features (BPM, mode, loudness, spectral metrics, danceability, dissonance, pitch salience, chord changes, MFCCs), pre-computed valence/arousal, and metadata. This is what MBXHub reads.
mbxmoods-errors.csv Tracks that failed analysis — file path, error reason, file size. Use --retry-errors to reprocess after fixing issues.
truedat.log Full console output (only when --audit is used). Useful for diagnosing extraction failures.

How It Works Together

The pipeline is:

  1. Essentia analyzes each audio file and extracts 14 raw features from the waveform
  2. Truedat orchestrates the extraction, stores raw features in mbxmoods.json
  3. MBXHub loads the raw features, applies normalization, genre adjustment, and weighted formulas to compute valence/arousal
  4. AutoQ uses the V/A coordinates for mood matching, scoring, and tag writing

Crucially, steps 3 and 4 happen at runtime with your current weights. You can retune the estimation mixer, change genre profiles, or adjust confidence thresholds without re-running Essentia. The raw features are permanent — only the interpretation changes.


The Features — Reference

The rest of this page is reference material. Fourteen audio features extracted from the waveform drive the mood mapping. Eight feed the arousal estimate, six feed valence. Some features (like danceability) contribute to both axes. Each captures a different psychoacoustic cue that listeners intuitively associate with energy or pleasantness. Genre-aware weight adjustment adapts how these features combine per genre, and confidence scoring tells you how much to trust each mood estimate.

Mood Quadrants

The arousal-valence plane divides into four emotional quadrants. Every mood channel in AutoQ targets a specific point in this space.

High Arousal + High Valence

Energetic, upbeat, euphoric

EDM, pop, funk, disco, power pop

Fast tempo, bright timbre, strong beats, major keys

High Arousal + Low Valence

Tense, aggressive, intense

Metal, hard rock, industrial, hardcore punk

Distortion, high energy, dissonance, minor keys

Low Arousal + High Valence

Calm, pleasant, serene

Chillhop, acoustic folk, soft jazz, bossa nova

Warm timbre, consonance, smooth textures, gentle dynamics

Low Arousal + Low Valence

Sad, subdued, melancholic

Ambient drone, slow blues, lo-fi, funeral doom

Slow tempo, dark timbre, soft dynamics, minor keys

Arousal Features (8 inputs)

These describe how energetic, intense, or activated the audio feels. High values push tracks toward the top of the mood space. Arousal features measure physical intensity — they tell you how much energy the sound carries, not whether the emotion is positive or negative.

1. Tempo (BPM)

The speed of the beat, measured in beats per minute. The most direct signal of musical energy.

High: Fast, driving, urgent — EDM, punk, thrash
Low: Slow, spacious, contemplative — ambient, ballads

Extracted using beat-tracking algorithms. Absolute range: bpmMin/bpmMax (80/170). Weight: arousalWeightBpm (0.18).

2. Loudness

Perceptual intensity accounting for human hearing sensitivity. Weighted by frequency response curves (EBU R128 integrated loudness) so it matches what listeners actually perceive.

High: Commanding, powerful, in-your-face — mastered pop, compressed metal
Low: Gentle, intimate, fragile — fingerpicking, field recordings

Absolute range: loudnessMin/loudnessMax (-25/-5 dB). Weight: arousalWeightLoudness (0.13).

3. Spectral Flux

How much the frequency spectrum changes from one frame to the next. Tracks rapid timbral shifts — onsets, transients, dynamic variation.

High: Dynamic, volatile, punchy — drum fills, genre-mashing, breakbeats
Low: Steady, droning, static — sustained pads, ambient washes

Frame-by-frame spectral difference. Absolute range: 0 to fluxMax (0.15). Weight: arousalWeightFlux (0.13).

4. Spectral Centroid

The "center of mass" of the frequency spectrum — the brightness of the sound. Higher centroid means more high-frequency energy. Centroid tracks intensity and excitement, not emotional positivity — a bright, aggressive metal track and a bright, happy pop track both have high centroids.

High: Bright, sharp, exciting — cymbals, distortion, brass
Low: Warm, mellow, relaxed — bass-heavy, acoustic, muted

Computed from the magnitude spectrum. Absolute range: centroidMin/centroidMax (400/2500 Hz). Weight: arousalWeightCentroid (0.14).

5. Danceability

A composite measure of rhythmic regularity and beat strength. Captures how naturally the music invites physical movement.

High: Groovy, steady, propulsive — disco, house, funk
Low: Free-form, irregular, rubato — free jazz, ambient, spoken word

Computed from tempo stability and beat histogram. Absolute range: 0 to danceMax (2.0). Arousal weight: arousalWeightDance (0.08). Also contributes to valence (0.10).

6. Onset Rate

The density of detected note or beat onsets per second. Measures how "busy" or "active" the music feels.

High: Dense, busy, relentless — drum fills, blast beats, fast arpeggios
Low: Smooth, minimal, spacious — sustained pads, ambient drones

Derived from transient detection. Absolute range: 0 to onsetRateMax (6.0 events/sec). Weight: arousalWeightOnsetRate (0.13).

7. Zero-Crossing Rate (ZCR)

How often the waveform crosses zero amplitude per unit time. Distinguishes noisy, percussive sounds from clean, tonal ones.

High: Noisy, sharp, aggressive — distorted guitar, hi-hats, static
Low: Clean, tonal, calm — sine waves, flutes, sustained strings

Counted directly from the time-domain waveform. Absolute range: 0 to zcrMax (0.15). Weight: arousalWeightZcr (0.08).

8. RMS Energy

Root Mean Square of the amplitude over time. Tracks the raw dynamic intensity of the waveform — how hard the signal is hitting.

High: Loud, forceful, compressed — stadium rock, EDM drops
Low: Soft, subdued, dynamic — chamber music, whispered vocals

Computed directly from the waveform amplitude. Related to loudness but without psychoacoustic weighting. Absolute range: 0 to rmsMax (0.01). Weight: arousalWeightRms (0.13).

Valence Features (6 active inputs)

These describe how pleasant, consonant, or emotionally positive the audio feels. High values push tracks toward the right side of the mood space. Valence features measure harmonic and tonal qualities — they tell you whether the emotion leans happy or sad, resolved or tense.

9. Mode (Major/Minor)

The tonal center and mode of the piece. The strongest single predictor of perceived musical positivity.

Major: Happy, bright, resolved — pop anthems, marches, hymns
Minor: Sad, tense, yearning — blues, laments, film noir

Detected via harmonic pitch class profiles. Score: modeScoreMajor (0.8) vs. modeScoreMinor (0.4). Weight: valenceWeightMode (0.30). Scores are softened from 1.0/0.0 to reduce the major/minor cliff.

10. Dissonance

Quantifies roughness — the perceptual beating between nearby frequencies. Based on models of sensory dissonance (Plomp-Levelt curves). Inverted for valence: low dissonance (consonance) = high valence.

High dissonance: Tension, clashing, unpleasant — tritones, clusters, microtonal intervals
Low dissonance: Consonance, resolution, pleasant — octaves, fifths, triads

Computed by summing roughness contributions of all frequency pairs. Ranges 0–1; inverted before weighting so consonant = high valence. Weight: valenceWeightDissonance (0.25).

11. Pitch Salience

How clearly a dominant pitch emerges from the signal. Measures harmonic clarity — whether the sound has a strong, recognizable tonal center or is diffuse and noisy.

High: Clear, tonal, melodic — solo vocals, piano, clean guitar
Low: Diffuse, atonal, noisy — white noise, percussion-heavy, distorted

Computed from autocorrelation of the spectrum. Ranges 0–1. Weight: valenceWeightPitchSalience (0.15).

12. Chord Changes Rate

How frequently the harmonic content shifts between chords. Captures harmonic movement — static harmony feels different from rapid chord progressions.

High: Harmonically active, shifting, colorful — jazz, prog, neo-soul
Low: Static, droning, minimal — one-chord vamps, ambient, drone metal

Extracted from chroma features (pitch class energy over time). Absolute range: 0 to chordsRateMax (0.2). Weight: valenceWeightChords (0.10).

13. MFCC (Mel-Frequency Cepstral Coefficient 2)

The second cepstral coefficient captures the broad spectral slope — the balance between low and high frequency energy. This is a timbral fingerprint that distinguishes warm, rounded sounds from bright, harsh ones.

High: Warm, rich, full-bodied — orchestral, acoustic, warm synths
Low: Thin, hollow, cold — lo-fi, sparse, metallic timbres

Extracted from mel-scaled spectral analysis. MBXHub uses coefficient #2 (spectral slope). Absolute range: mfccMin/mfccMax (50/250). Weight: valenceWeightMfcc (0.10).

14. Danceability (shared)

Danceability contributes to both axes. For valence, rhythmic regularity and beat strength are associated with positive, accessible music — songs that make you want to move tend to feel upbeat.

High: Groovy, infectious, feel-good — Motown, disco, pop
Low: Unstructured, free, contemplative — ambient, avant-garde

Same danceability measure as arousal feature #5. Valence weight: valenceWeightDance (0.10). Arousal weight: arousalWeightDance (0.08).

Normalization

Raw feature values span very different scales (BPM in the hundreds, RMS in the thousandths). Before weighting, every feature is normalized to 0–1. MBXHub offers two normalization modes:

Percentile Normalization (default)

Enabled by usePercentileNormalization: true (the default). Each feature is ranked against every track in your library, and the rank is converted to a 0–1 percentile. The track with the lowest BPM in your library gets 0.0, the highest gets 1.0, and everything else is spread proportionally.

This is library-adaptive: a rock-heavy library where every track has similar loudness will still get full spread on that axis because the ranking is relative. No manual range tuning needed. Falls back to absolute normalization for libraries with fewer than 10 analyzed tracks.

Absolute Normalization

When percentile mode is off, features are clamped to fixed min/max ranges calibrated from a diverse 33K-track library. These are the centroidMin/Max, loudnessMin/Max, etc. settings. This works well for diverse libraries but can compress the range for genre-focused collections where all values land mid-range.

Genre-Aware Weight Adjustment

Fixed weights treat every genre the same, but acoustic features mean different things in different musical contexts. A 120 BPM jazz track is "fast for jazz" but "mellow for EDM." A loud metal track has very different emotional intent from a loud pop track. Genre-aware adjustment solves this by applying per-genre multipliers to the base weights before combining features.

When enabled (useGenreAdjustment: true, the default), MBXHub reads each track's genre tag from MusicBee and looks up a matching genre profile. The profile contains 14 multipliers — one per feature. Each multiplier scales the base weight for that feature, then the result is renormalized so the total weight sum stays the same.

// Genre adjustment formula w_effective[i] = w_base[i] * g_multiplier[i] // Renormalize so weights sum to original total w_final[i] = w_effective[i] * (sum(w_base) / sum(w_effective)) // Example: Electronic genre, BPM multiplier = 0.6 // BPM base weight 0.18 → effective 0.108 → renormalized ~0.11 // The 0.072 of lost BPM weight redistributes to other features

This means the relative importance of features shifts per genre without changing the overall magnitude of the score. Pop and rock use default weights (all multipliers = 1.0) as the reference genre.

Built-in Genre Profiles

MBXHub ships with 8 genre families covering ~20 genre tags. Each is tuned to the acoustic norms of that genre:

Genre Family Tags Matched Key Adjustments Rationale
Electronic / Dance Electronic, Dance, EDM, House, Techno, Trance BPM ↓0.6, Loudness ↑1.4, Flux ↑1.3 BPM is uniformly high (120-150) so it carries little information. Loudness and spectral dynamics vary more meaningfully.
Metal / Hardcore Metal, Hardcore, Punk Loudness ↑1.5, Centroid ↑1.8, Dissonance ↑1.5, Mode ↓0.5 Brightness and distortion are key energy markers. Mode is less informative — metal is overwhelmingly minor.
Jazz / Blues Jazz, Blues BPM ↑1.5, Chords ↑1.8, PitchSalience ↑1.4, Loudness ↓0.6 Harmonic complexity and melodic clarity distinguish moods. Loudness varies less. BPM is more informative (wide range from ballads to bebop).
Classical / Ambient Classical, Ambient Flux ↑1.5, Mode ↑1.3, Loudness ↓0.5, Dance ↓0.3 Spectral dynamics and tonality matter most. Danceability and loudness are consistently low and uninformative.
Hip-Hop / Rap Hip-Hop, Rap OnsetRate ↑1.4, RMS ↑1.3, BPM ↓0.7 Beat density and energy envelope drive mood. BPM often doesn't reflect perceived energy (half-time patterns are common).
Folk / Country Folk, Country Mode ↑1.3, Chords ↑1.3, PitchSalience ↑1.2 Harmonic and tonal features are stronger mood indicators in acoustic, melodic genres.
R&B / Soul R&B, Soul Mode ↑1.2, Dance ↑1.3, Loudness ↓0.8 Groove and tonality carry emotional weight. Loudness is relatively consistent.
Pop / Rock Pop, Rock All 1.0 (reference) Default weights are calibrated against a general-purpose library. Pop/rock is the baseline.

Genre matching uses the track's MusicBee genre tag. Exact matches are checked first, then the first word of multi-word genres (e.g., "Progressive Rock" matches "Rock" if there's no "Progressive Rock" profile). Custom profiles in autoQ.genreProfiles override built-in ones for the same genre name.

Custom Genre Profiles

Add your own profiles in mbxhub.json under autoQ.genreProfiles:

// mbxhub.json "autoQ": { "genreProfiles": { "synthwave": { "bpm": 0.7, "loudness": 1.3, "mode": 1.4, "dance": 1.2 } } } // Omitted multipliers default to 1.0 // All multipliers clamped to [0.0, 5.0]

Confidence Scoring

Not all mood estimates are created equal. A track analyzed by Essentia with features that land squarely on a mood channel is high confidence. A metadata-only fallback for an unknown genre is low confidence. MBXHub computes a confidence score (0–1) for every mood estimate and surfaces it in the dashboard, player, tuning console, and API.

How Confidence Is Computed

confidence = sourceBase × channelProximity // sourceBase — how reliable is the data source? Essentia-analyzed: 0.9 (confidenceEssentiaBase) Fallback + genre match: 0.45 (confidenceFallbackGenre) Fallback, no genre: 0.2 (confidenceFallbackNone) // channelProximity — how close is V/A to the nearest mood channel? channelProximity = 1 - (minDistance / sqrt(2)) // A track sitting right on a channel center: proximity ≈ 1.0 // A track in no-man's-land between channels: proximity ≈ 0.5

The two factors multiply together. An Essentia track right on top of a mood channel gets ~0.9. An Essentia track in between channels might get ~0.6. A fallback track with a genre match gets ~0.35–0.45. A fallback track with an unknown genre barely reaches 0.2.

Confidence Labels

High Confidence

70%+
High

Essentia data, features land near a mood channel. Mood label is reliable.

Medium Confidence

40–69%
Medium

Essentia data but V/A is between channels, or fallback with a good genre match.

Low Confidence

<40%
Low

Fallback with unknown genre, or V/A far from any channel. Treat with caution.

Confidence Gate

When MBXHub writes mood tags to MusicBee's custom field, it checks confidence against the confidenceMinForTag threshold (default: 0.3). Tracks below this threshold don't get tagged — preventing low-quality mood labels from cluttering your library.

Where Confidence Appears

The Estimation Mixer

Think of the estimation engine as a mixing console. Each of the 14 audio features is a channel with its own fader (weight). Push a fader up and that feature has more influence on the final mood coordinate; pull it down and it fades out. You can retune the mix at any time without re-analyzing your music — MBXHub recomputes valence and arousal from the raw features on every startup using your current weights.

The mixer has two signal paths:

How MBXHub Combines Them

Features are extracted once by Essentia and stored in mbxmoods.json by the Truedat tool. MBXHub's AutoQ engine reads these raw values and applies weighted sums to estimate arousal and valence:

// Arousal: how energetic (8 inputs) arousal = clamp( 0.18 * bpmNorm + 0.13 * loudnessNorm + 0.13 * fluxNorm + 0.14 * centroidNorm + 0.08 * danceNorm + 0.13 * onsetNorm + 0.08 * zcrNorm + 0.13 * rmsNorm , 0, 1) // Valence: how positive (6 active inputs) valence = clamp( 0.30 * modeScore + 0.10 * danceNorm + 0.25 * (1 - dissonanceNorm) + 0.15 * salienceNorm + 0.10 * chordsNorm + 0.10 * mfccNorm , 0, 1) // With genre adjustment (e.g., Electronic: BPM × 0.6) // w_eff[i] = w_base[i] * genre_mult[i] // w_final[i] = w_eff[i] * (sum(w_base) / sum(w_eff)) // Mood match: distance to target channel moodScore = 1 - sqrt((arousal - target.arousal)² + (valence - target.valence)²) / sqrt(2) // Confidence: how trustworthy is this estimate? confidence = sourceBase * (1 - minChannelDist / sqrt(2))

All weights are tunable via mbxhub.json under autoQ.estimation or through the AutoQ tuning page in MBXHub's built-in dashboard. You can change how much each feature contributes without re-running Essentia. Genre profiles shift the relative importance of features per genre, and confidence scores tell you how much to trust each estimate. The full parameter list is available in MBXHub's built-in API docs (/docs) once installed.