PaddySpeaks · Visual Essay

The Missing Note

Why video platforms are failing 200 million music students — and the billion-dollar feature hiding in plain sight.

$4.6BOnline Music Ed Market (2026)
200M+Students Learning Online
0%Objective Feedback Built In
Scroll
I. The Lesson

Shankarabharanam, Week Six

A student sits in her bedroom in Sunnyvale, California. It's Saturday morning. The tanpura drone hums from a phone propped against the wall. On her laptop, a Zoom window frames her Carnatic music teacher — joining from Chennai, thirteen hours ahead, eyes half-closed in concentration.

She sings the arohanam and avarohanam of Shankarabharanam. She's been practicing for six weeks. Her Sa is steady. Her Ga wavers slightly on the descent. Her Ni catches beautifully sometimes, and sometimes it doesn't.

The teacher listens. Nods. Smiles.

"Very good. Keep practicing."

And that's the feedback. That's all of it.

She doesn't know if her swara shuddham improved from 72% to 84% over six weeks, because nobody measured it. She doesn't know if her layam is drifting or if her thalam is holding. The teacher heard it. The student felt it. But the data doesn't exist.

There's no graph showing her gamaka consistency over time. No comparison to a reference recording of Shankarabharanam by a seasoned performer. No shruti alignment score. No layam tracking across āvartanams. No thalam adherence data. No analysis of whether her jathi subdivisions are clean or her brigha is holding up in fast passages.

She'll practice for another week, come back, and hear "very good" again. Or maybe "work on the Dha." She won't know by how much. She won't know if she's on track for the annual concert. She won't know where she stands relative to last year.

This isn't a Carnatic problem. This is every music student, on every video call, in every genre, everywhere.

II. The Gap

The Technology Exists.
The Integration Doesn't.

Here's what's strange about 2026: every piece of this puzzle already exists in isolation.

AI pitch detection engines can now achieve 85–92% accuracy compared to professional vocal coach assessments. Apps like Yousician have 20 million monthly active users doing real-time pitch and rhythm tracking. Riyaz is teaching classical Indian ragas with shruti-level feedback. SmartMusic evaluates dynamics, tempo, and articulation in real time for Western classical students.

Meanwhile, music lessons happen on Zoom. And Zoom has an RTMS SDK that gives developers access to real-time audio streams — the raw audio data, participant by participant, flowing through the call.

The audio is already there. The AI is already there. They're just not in the same room.

What Already Exists

@zoom/rtms // Real-time audio stream from Zoom meetings
client.onAudioData((data, size, timestamp) => {
  shrutiEngine.calibrate(data, tanpura); // shruti alignment to base pitch
  swaraEngine.evaluate(data, raga); // swara shuddham — note accuracy
  layamEngine.track(data, tala); // rhythm consistency across āvartanams
  thalamEngine.score(data, adiTala); // tala cycle adherence, samam alignment
  gamakaEngine.detect(data, raga); // kampita, jāru, voice modulation
  jathiEngine.analyze(data, nadai); // rhythmic subdivisions (chatushra/tisra)
  brighaEngine.measure(data); // vocal agility in fast passages
}); // → Send scores to sidebar in real time

The gap isn't technical. It's architectural. Nobody has plugged the music intelligence layer into the place where the lessons actually happen.

III. The Feature

What This Should Look Like

Imagine a standard Zoom music lesson. Everything the teacher and student already know. But on the right: a real-time scoring sidebar. Click any score card to expand its detail. This is a working mockup.

Zoom — Carnatic Vocal Lesson with Smt. Lakshmi Iyer
● LIVE
A
Ananya Raghavan
Shankarabharanam · Lesson 12
Shruti · श्रुति · Tonal Base
C#
+6 cents
Within tolerance · Tanpura calibrated to 138 Hz (Kattai 1½)
Swara Shuddham · स्वर शुद्धम् 87%
↑ 12% from last week · Clean: Sa, Pa, Ma · Needs work: Ga↓, Ni↑
Note-by-note: Sa 96% · Ri 88% · Ga 72% · Ma 91% · Pa 94% · Dha 85% · Ni 78%
Ga tends sharp on descent. Ni inconsistent in tāra stāyi. Focus: avarohanam phrases with Ga and Ni.
Layam · लयम् · Rhythm 74%
Drift in madhyama kālam · Rushing by ~0.4s per āvartanam
Speed analysis: 1st speed (vilamba): 91% · 2nd speed (madhyama): 68% · 3rd speed (druta): not attempted
Pattern: consistent rush at beats 5–8 of each āvartanam. Suggest: practice with metronome at madhyama kālam only.
Thalam · தாளம் · Ādi (8 beats) 89%
Samam alignment: 94% · Slight drift at anupallavi entry · Eduppu: correct
Cycle map: 14/16 āvartanams landed on samam ✓ · Anupallavi entry delayed by 0.2s in 2 cycles
Eduppu (starting point) consistently correct. Strong sense of tala structure.
Swara Contour · Live
Ni Pa Ga Sa
Purple = student · Dotted = reference · ◆ = gamaka detected
Gamaka · गमक · Expression 82%
Kampita on Ri ✓ · Jāru Ga→Ma needs depth · Voice modulation: steady
Gamaka map: Kampita (oscillation): detected on Ri, Dha ✓ · Jāru (slide): Ga→Ma too abrupt, needs glide
Nokku on Pa: clean ✓ · Voice modulation enables gamakas — current range covers 1.8 octaves with good control.
Jathi · ஜாதி · Chatushra 71%
4-beat subdivision accuracy · Kalpana swara patterns: 3/5 clean
Subdivision detail: Chatushra (4): 71% · Tisra (3): not tested this session
Kalpana swara attempt: "SRGM PMGR SNDP MGRS" — patterns 1,2,4 clean. Patterns 3,5 lost subdivision at tāra stāyi.
Brigha · बृघ · Vocal Agility 68%
Fast passages blurring at 2× speed · Clarity drops above tāra stāyi Sa
Speed clarity: 1× speed: 89% clear · 2× speed: 62% clear · Notes blur at rapid Ri-Ga-Ma transitions
Tāra stāyi range: voice strain detected above tāra Pa. Recommend: gradual speed increase with sarali varisai practice.
🎤
Mute
📹
Video
🎵
Score
💬
Chat
📞
End

This isn't a new product. It's a sidebar. The same way Zoom added live transcription, the same way it added AI meeting summaries — it adds a real-time scoring engine that activates when the meeting is tagged as a "Music Lesson."

The teacher still teaches. The guru's ear is still the authority. But now there's an objective data layer underneath — shruti alignment, swara shuddham, layam consistency, thalam adherence, gamaka detection, jathi subdivision accuracy, brigha agility. The student can see their swara contour overlaid on a reference. The teacher can see exactly where the layam drifted and which gamakas need deeper voice modulation. Both can track progress across sessions.

IV. The Comparison

Student vs. Reference — In Real Time

The most powerful feature isn't the score. It's the comparison. Side-by-side animated waveform analysis. Hit play to see the waveforms animate. Toggle between Swara, Layam, and Gamaka views.

Waveform Comparison
Swara Layam Gamaka
Reference
Shankarabharanam · Smt. M.S. Subbulakshmi
Student · Live
Ananya Raghavan · Lesson 12
0:00 / 0:32
Similarity Score 84% ↑ 9% from Lesson 8
V. The Progression

What a Semester of Data Looks Like

Individual lesson scores are useful. But the real unlock is longitudinal data. Click any skill bar or session to see details. Click any growth metric to see its sparkline.

Skill Breakdown
Shankarabharanam · 14 Sessions
Shruti
91%
Swara
87%
Layam
74%
Thalam
89%
Gamaka
82%
Jathi
71%
Brigha
68%
Recent Sessions
Lesson History
Mar 15 Shankarabharanam 87%
Mar 8 Shankarabharanam 83%
Mar 1 Kalyani (intro) 71%
Feb 22 Shankarabharanam 80%
Feb 15 Shankarabharanam 75%
Growth Summary
14-Session Trend · Shankarabharanam
−8 cents
SHRUTI DEVIATION (↓ BETTER)
+23%
SWARA SHUDDHAM
+18%
LAYAM STABILITY
+15%
THALAM ADHERENCE
+31%
GAMAKA EXPRESSION
+12%
JATHI ACCURACY
+22%
BRIGHA AGILITY
VI. The Advanced Artist

Beyond Basics: Manodharma Sangeetham

Everything above — shruti, swara, layam, thalam, gamaka, jathi, brigha — that's the foundation. It's what you need to sing a varnam correctly. But Carnatic music isn't about correctness. It's about creativity within constraint.

Manodharma sangeetham — the improvisational core — is what separates a student from an artist. It's where a capable musician never renders a raga the same way twice. And it has five forms, each of which can be scored differently.

Manodharma Scoring Panel — Advanced Mode
Raga Alapana · ராக ஆலாபனை 78%
Raga identity: strong · Sthayi coverage: 2.1 octaves · Prayoga usage: 6/9 key phrases
Alapana structure: Akshipthika (intro phrase) ✓ · Poorvanga development: good · Uttaranga: needs more tāra stāyi exploration
Pacing: Started slow (vilamba) ✓ · Gradual tempo increase ✓ · Complexity escalation: moderate — could be more adventurous in mid-section
Raga prayogas used: GRSR ✓ · GMPDS ✓ · SNDPMGRS ✓ · Missing: characteristic Ni-Sa phrase at tāra stāyi
Neraval · நிரவல் · Sahitya Vinyasam 72%
Line chosen: "Endaro Mahanubhavulu" pallavi · Sthayi variation: 2/3 · Speed variation: vilamba + madhyama
Neraval quality: Melody reshaping: good variation across 4 iterations · Word integrity: sahitya preserved ✓
Missing: Druta kālam (3rd speed) not attempted · Kārvai (pause) usage: minimal — add sustain at "Mahanu-" for emotional depth
Tala adherence during neraval: 88% — slight drift at speed transition point
Kalpana Swaram · கல்பன ஸ்வரம் 81%
Patterns coined: 8 · Samam landing: 7/8 ✓ · Speed variation: 2 kālams · Arithmetic accuracy: strong
Pattern analysis: SRGM PMGR SNDP MGRS — clean ✓ · Complex pattern: GRMPDN SNDPMG — landed on samam ✓
Creativity: 5 unique patterns / 3 recycled — good variety · Dātu swaram (skip patterns): attempted 2, clean 1
Koraippu (ending): Muktāyi swaram pattern ended cleanly at samam ✓ · Makutam (crown phrase): present
Ragam Thanam Pallavi · RTP 69%
Ragam: 78% · Thanam: 64% · Pallavi line: 71% · Composite score based on all components
Thanam specifics: Syllable usage (anamtham, thomtha): correct ✓ · Rhythmic pulse: inconsistent — loses drive at minute 3
Pallavi line: "Sarasijanabha" in Shankarabharanam, Ādi talam · Tala cycle coverage: single āvartanam ✓
Speed expansion: Vilamba ✓ · Madhyama ✓ · Druta: attempted but lost tala at 2nd cycle
Note: RTP is the pinnacle of manodharma — a 69% here represents genuine advanced-level effort
Tanam · தானம் · Rhythmic Raga Elaboration 64%
Syllable flow: anamtham/thomtha ✓ · Rhythmic pulse consistency: 61% · Raga adherence during tanam: 72%
Tanam structure: Not another form of alapana — must have distinct lilting rhythmic character ✓ (partially achieved)
Breath control: Phrases averaging 6.2s (target: 8s+) — needs deeper abdominal breathing for sustained phrases
Mridangam sync: When percussionist joined tanam, sync score: 58% — needs practice with accompaniment

Can AI truly score improvisation? Not the creativity itself — that's the artist's soul. But it can score the grammar within which creativity operates: Did the alapana use the raga's key prayogas? Did the kalpana swaram patterns land on samam? Did the neraval preserve sahitya while varying melody? These are measurable. And measuring them frees the teacher to focus on what only a human ear can judge — bhava, expression, the ineffable.

VII. The Soul Layer

Raga Bhava, Sahitya, and Sangathi

Three dimensions that sit above technical accuracy — the ones that make a rendition feel right, not just sound right.

Raga Bhava · Identity
94%
SHANKARABHARANAM IDENTITY
Does it feel like Shankarabharanam?
AI compares phrasing patterns against
a corpus of 500+ reference recordings.

Key prayogas present: 8/9
No anya swara (foreign notes)
Ni treatment sometimes
   resembles Kalyani — flag for review
Sahitya · Lyric Clarity
76%
PRONUNCIATION ACCURACY
Kriti: "Endaro Mahanubhavulu"
Language: Telugu

Vowel clarity: 84%
Consonant precision: 68%
   "bh" in "bhavulu" under-aspirated
Akshara alignment with swara:
   syllable-note sync drifts at sangathis
Word boundaries preserved
Sangathi · Development
83%
PROGRESSIVE ELABORATION
Each sangathi adds beauty to a line,
building complexity across iterations.

Pallavi line — 4 sangathis detected:
1. Plain rendering ✓
2. Added gamaka on Ga ✓
3. Extended to tāra stāyi ✓
4. Complex — slight blur at speed

Progressive complexity: achieved

Raga Bhava is the hardest thing to score — it's asking "does this feel like the right raga?" AI does this by comparing phrasing patterns against a corpus of reference recordings. Sahitya scoring uses phoneme-level analysis for Telugu, Sanskrit, Tamil, and Kannada pronunciation. Sangathi tracking detects whether each iteration of a line builds complexity — the Thyagaraja tradition of progressive elaboration.

VIII. Beyond the Voice

Every Instrument, Its Own Scoring Language

Singing is one dimension of Carnatic music. But a violin's gamaka comes from bowing and fingering. A veena's expression comes from string pulls on frets. A mridangam's language is entirely rhythmic — korvais, theermanams, jathi patterns. A flute's soul is breath. Each instrument needs its own scoring vocabulary.

🎻
Violin
BOWING · FINGERING · GAMAKA ON STRINGS
Bowing Pressure & Consistency85%
Even pressure across up/down bow · Slight scratch at string crossings D→A
Gamaka (String Deflection)80%
Kampita via left-hand oscillation ✓ · Jāru (slide) depth: good · Nokku: needs more attack
Intonation & Fingering73%
Ga position drifts sharp in tāra stāyi · 4th finger stretch: inconsistent
🪕
Veena
FRET WORK · STRING PULLS · TONAL RESONANCE
Fret Precision & Pull Depth82%
String deflection for gamakas: accurate · Pull range: 3 swarasthanams ✓
Tonal Resonance & Sustain71%
Pluck attack: clean · Sustain drops at meetu (right hand) transitions · Buzz on 4th fret
Gamaka (Jāru & Kampita)86%
Veena's fretted structure enables precise gamakas · Jāru: smooth ✓ · Kampita: controlled ✓
🥁
Mridangam
KORVAI · THEERMANAM · JATHI PATTERNS
Sollukatthu Accuracy88%
Tha, Dhi, Thom, Nam stroke clarity ✓ · Left/right hand balance: 84%
Korvai & Theermanam74%
Korvai structure: 3-part pattern detected ✓ · Theermanam landing: 2/3 on samam · Arithmetic: correct
Tani Avartanam (Solo)70%
Build-up structure: good · Gathi bedham (rhythmic shift): attempted tisra in chatushra ✓ · Climax timing: early
🪈
Flute
BREATH CONTROL · MEEND · TONAL PURITY
Breath Control & Sustain72%
Phrase length: avg 5.8s (target 8s+) · Breath breaks mid-phrase: 3 instances · Circular breathing: not detected
Tonal Purity & Hole Coverage81%
Air noise ratio: low ✓ · Half-hole technique for Ri, Dha: accurate · Overblowing at tāra stāyi: minor
Meend & Gamaka (Finger Slides)84%
Sliding technique for gamakas: smooth ✓ · Kampita via finger oscillation: clean · Grace notes: well-articulated
IX. The Guru's Dashboard

From Teacher to Architect

A guru with 30 students doesn't need 30 individual reports. They need a single view that shows where the entire class stands — who's ready for Kalyani, who's stuck on layam, and which students need the same fix. Two tools: class analytics and curriculum design.

Class Performance Heatmap
Senior Batch · Saturday 9am · 8 Students
STUDENT SHRUTI SWARA LAYAM THALAM GAMAKA JATHI BRIGHA AVG
85%+ 70–84% <70%   ·  Click any cell to see detail
Curriculum Builder
Raga Progression Map
1
Mayamalavagowla
Foundation · Sarali → Alankarams → Geetham · Gate: Swara 80%+
8/8 cleared ✓
2
Mohanam
Pentatonic · Varnam + 2 Kritis · Gate: Gamaka 75%+
7/8 cleared ✓
3
Shankarabharanam
Sampoorna · Varnam + 3 Kritis + Alapana intro · Gate: All 7 dims 75%+
IN PROGRESS
4
Kalyani
Sampoorna · Prerequisite: Shankarabharanam cleared
LOCKED
5
Thodi
Vakra raga · Prerequisite: Kalyani + Gamaka 80%+
LOCKED
Class Insights · AI Summary
What to Focus on This Week

Common struggle: Layam in madhyama kālam. 5 of 8 students drift at beats 5–8. Consider a group session focused on metronome practice at 2nd speed.

Ready to advance: Ananya R. and Shreya N. have cleared all 7 dimensions above 75% for Shankarabharanam. They're ready for Kalyani introduction.

Needs attention: Kedar M.'s brigha has plateaued at 52% for 4 sessions. Recommend switching to sarali varisai speed drills before continuing kriti practice.

X. Beyond Music

Every Performance Skill, Scored

Here's where it gets big. Music is the wedge, not the ceiling. Any skill taught over video that involves audible or visible performance is a candidate for real-time AI scoring. The same audio pipeline that scores a raga can score a French vowel.

🗣️

Voice Coaching

Resonance, projection, vocal fry, pacing — scored in real time for public speakers, podcasters, and presentation coaches.

🌍

Language Pronunciation

Phoneme-level accuracy for Mandarin tones, French nasals, Arabic emphatics. Overlay student pronunciation against native speaker reference.

🏥

Speech Therapy

Track articulation accuracy, fluency patterns, and progress for children and adults in remote speech-language pathology sessions.

🎸

Instrument Technique

With video AI: bow angle for violin, hand position for piano, strum patterns for guitar. Audio + visual scoring combined.

XI. The Marketplace

Zoom as a Platform, Not a Pipe

Right now Zoom is plumbing. Audio in, audio out. What if it became a platform?

Teachers publish scored curricula — a 12-session Shankarabharanam course with benchmarks at each stage. Students subscribe. Every lesson is automatically scored across all seven dimensions. Progress dashboards update in real time. Parents can see their child's growth over a semester without asking the teacher for a subjective summary.

1Publish

Teachers create scored curricula with reference recordings, benchmarks, and progression gates across shruti, swara, layam, thalam, gamaka, jathi, and brigha.

2Learn

Students enroll, attend live lessons, and receive real-time AI scoring alongside the teacher's guidance.

3Track

Longitudinal dashboards show skill growth, session history, and readiness for the next level.

Zoom takes a platform cut — the same way Shopify doesn't sell goods but enables sellers. The teacher's authority isn't replaced. It's amplified by data. The student's intuition isn't overridden. It's validated by measurement.

The AI provides the objective layer that makes subjective art measurable — without killing its soul.

This isn't about replacing the guru. It's about giving the guru a dashboard.

The PaddySpeaks Take

Why This Should Exist Yesterday

I sit in the back row of CCC concerts in the Bay Area and watch twelve-year-olds sing Thyagaraja kritis over Zoom, coached by teachers half a world away. The tradition is alive. The teaching works. But the feedback loop is broken.

"Very good" is not a metric. "Work on the Ga" is not a progression plan. "Your layam is off" without knowing it's off by 0.4 seconds per āvartanam is a feeling, not feedback. And a parent paying $80 a session has no way to know if session 14 is measurably better than session 1 — across shruti, swara shuddham, layam, thalam, gamaka, jathi, or brigha.

The technology exists. Yousician proved pitch detection works. Riyaz proved it works for ragas. Zoom's RTMS SDK proved real-time audio access is possible. SmartMusic proved teachers will adopt objective scoring if it's embedded in their workflow.

What's missing is the integration. Someone needs to put the scoring engine inside the lesson, not alongside it. And the company best positioned to do that is the company that already owns the lesson: Zoom.

Or Google Meet. Or Microsoft Teams. The platform doesn't matter. The principle does.

The audio stream is already flowing. Just listen to it.

Market data sourced from Mordor Intelligence, Knowledge Sourcing Intelligence, and industry reports (2025–2026). AI accuracy benchmarks from comparative analyses of vocal coaching platforms. Zoom RTMS SDK documentation is publicly available on zoom.github.io. · paddyspeaks.com

← Back to PaddySpeaks
PADDYSPEAKS · paddyspeaks.com
The audio stream is already flowing. Just listen to it.