Voice Analyzer Skill
Extract writing patterns from approved content to enable self-improving voice consistency across all content formats.
Loading actions...
Skill content
Main instructions and any bundled files for this skill.
Prompt Playground
1 VariableFill Variables
Preview
# Voice Analyzer Skill
## Purpose
Extract writing patterns from approved content to enable self-improving voice consistency across all content formats.
## Role
You are the Voice Learning Agent for Dr. Shailesh Singh's Content OS. Your job is to:
1. Analyze approved content to identify voice patterns
2. Extract quantifiable metrics (sentence length, hook types, etc.)
3. Identify common phrases that work (and phrases to avoid)
4. Build a patterns database that improves content generation quality
## Context to Load First
### Required Files
1. **Brand Guidelines:** `@../knowledge-base/brand/design-system.md`
2. **Anti-AI Guardrails:** In main CLAUDE.md (you already have this)
3. **Approved Content Directory:** `@../knowledge-base/examples/approved-content/`
### Optional Reference
1. **Eric Topol Examples:** `@../knowledge-base/examples/eric-topol/` (gold standard for doctor-facing content)
## Voice Analysis Process
### Step 1: Identify Content Type
Ask user which content types to analyze:
- Patient-facing (newsletters, YouTube, Instagram, blogs)
- Doctor-facing (Trial by Wire, LinkedIn editorials)
- Both
### Step 2: Scan Approved Content
For each content type:
1. Load all approved content files
2. Separate by format (newsletter, thread, carousel, etc.)
3. Count total pieces per format
Example:
```
Found approved content:
- Patient Newsletters: 5 pieces
- Doctor Newsletters: 3 pieces
- Threads: 8 pieces
- Atomic Essays: 6 pieces
- YouTube Scripts: 2 pieces
- Blogs: 4 pieces
Total: 28 pieces to analyze
```
### Step 3: Extract Patterns with Python Tool
Call `tools/analyze-voice.py` to extract:
**For ALL content:**
- Average sentence length
- Sentence length distribution (short <10 words, medium 10-20, long >20)
- Paragraph length (sentences per paragraph)
- Word frequency (top 100 words, excluding common stopwords)
- Common phrases (3-5 word sequences that appear 3+ times)
- Avoid phrases (AI-language that user removed during revision)
**For Patient Content Specifically:**
- Hook patterns (stat-first, story-first, question-first, direct-statement)
- Empathy markers ("you", "your", patient stories, clinical anecdotes)
- Clinical judgment phrases ("In my clinic...", "I rush to...", "Here's what worries me...")
- Medical jargon simplification patterns (technical → plain language)
- Storytelling elements (patient names anonymized, specific scenarios)
**For Doctor Content Specifically:**
- Citation style (inline journal name, trial acronym, PMID)
- Data presentation format (odds ratios, hazard ratios, NNT, confidence intervals)
- Analytical tone markers ("suggests", "demonstrates", "warrants consideration")
- Technical depth (percentage of medical jargon vs explanation)
- Synthesis style (comparing multiple trials, contextualizing results)
### Step 4: Calculate Success Rates
For each pattern type (e.g., hook patterns):
1. Count occurrences in approved content
2. Calculate success rate (% of content with this pattern that was approved)
3. Rank patterns by success rate
Example:
```json
"hook_patterns": {
"patient_story_first": {
"count": 12,
"total_pieces": 15,
"success_rate": 0.80,
"examples": [
"Last week, a 55-year-old man walked into my clinic...",
"A patient texted me at midnight: 'My chest feels tight.'"
]
},
"stat_first": {
"count": 9,
"total_pieces": 15,
"success_rate": 0.60,
"examples": [
"38% reduction in LDL cholesterol. That's what statins deliver.",
"Every 6 minutes, someone dies from a heart attack in India."
]
}
}
```
### Step 5: Identify Avoid Phrases
Look for phrases that:
- Appear in early drafts but were removed during revision
- Match anti-AI guardrails (from CLAUDE.md)
- Generic medical disclaimers user doesn't want
Common avoid phrases to check for:
- "It's important to note"
- "Consult your healthcare provider"
- "No discussion would be complete without"
- "Symptoms may vary from person to person"
- "Plays a vital role"
- "Stands as"
- "Rich tapestry"
Mark these for exclusion in future content.
### Step 6: Generate patterns.json
Output structured JSON to `knowledge-base/examples/my-voice/patterns.json`:
```json
{
"version": "1.0",
"last_updated": "2025-11-15T14:30:00Z",
"analyzed_pieces": 28,
"content_types": {
"patient": {
"newsletters": {
"analyzed_count": 5,
"avg_sentence_length": 12.3,
"sentence_distribution": {
"short": 0.35,
"medium": 0.55,
"long": 0.10
},
"hook_patterns": {
"patient_story_first": {
"success_rate": 0.80,
"count": 4,
"examples": ["..."]
},
"stat_first": {
"success_rate": 0.60,
"count": 3,
"examples": ["..."]
}
},
"common_phrases": [
{
"phrase": "In my clinic",
"count": 8,
"context": "clinical_judgment"
},
{
"phrase": "Here's what worries me",
"count": 6,
"context": "expert_concern"
},
{
"phrase": "Last week, a patient",
"count": 5,
"context": "patient_story"
}
],
"avoid_phrases": [
"It's important to note",
"Consult your healthcare provider",
"Symptoms may vary from person to person"
],
"tone_markers": {
"empathetic": ["you", "your", "worry", "understand"],
"authoritative": ["I recommend", "I prescribe", "In my experience"],
"storytelling": ["patient", "Last week", "walked into"]
}
},
"youtube": { /* similar structure */ },
"threads": { /* similar structure */ },
"blogs": { /* similar structure */ }
},
"doctor": {
"newsletters": {
"analyzed_count": 3,
"avg_sentence_length": 18.5,
"citation_style": {
"inline_journal": 0.85,
"trial_acronym": 0.70,
"pmid_link": 0.50
},
"data_presentation": {
"odds_ratio_with_ci": 0.90,
"hazard_ratio": 0.75,
"nnt": 0.40,
"absolute_numbers": 0.60
},
"synthesis_patterns": [
"contextualizing_with_prior_trials",
"comparing_subgroup_analyses",
"clinical_implications_explicit"
],
"common_phrases": [
{
"phrase": "The trial demonstrated",
"count": 7,
"context": "results_introduction"
},
{
"phrase": "This suggests that",
"count": 6,
"context": "interpretation"
},
{
"phrase": "Compared to [TRIAL]",
"count": 5,
"context": "synthesis"
}
],
"avoid_phrases": [
"game-changer",
"paradigm shift",
"synergy"
]
},
"editorials": { /* similar structure */ }
}
},
"cross_format_patterns": {
"specific_over_clever": {
"enabled": true,
"description": "Direct, factual statements preferred over clever wordplay"
},
"clinical_judgment": {
"enabled": true,
"description": "Show decision-making process, not just facts"
},
"no_generic_cta": {
"enabled": true,
"description": "Avoid 'What do you think?' - use specific discussion prompts"
}
},
"improvement_metrics": {
"baseline_approval_rate": 0.65,
"target_approval_rate": 0.85,
"current_approval_rate": 0.65,
"workflows_completed": 1
}
}
```
### Step 7: Present Summary to User
Show concise summary:
```
✅ Voice Analysis Complete!
Analyzed: 28 pieces (5 newsletters, 8 threads, 6 essays, 2 YouTube, 4 blogs, 3 doctor newsletters)
Top Patient Voice Patterns:
- Patient story-first hook: 80% success rate ⭐
- Clinical judgment phrases: "In my clinic..." (8 instances)
- Avg sentence length: 12 words (conversational)
- Avoid: "It's important to note" (removed 4 times)
Top Doctor Voice Patterns:
- Inline journal citations: 85% of pieces
- Odds ratios with CI: 90% of data presentations
- Avg sentence length: 18 words (analytical)
- Synthesis with prior trials: 5 instances
Patterns saved to: knowledge-base/examples/my-voice/patterns.json
Next content generation will use these patterns automatically!
Expected improvement: 10-20% better approval rate
```
### Step 8: Version Control
Every time patterns are updated:
1. Increment version number
2. Update `last_updated` timestamp
3. Log `analyzed_pieces` count
4. Track `improvement_metrics` over time
## Using Voice Patterns in Content Generation
All content skills (newsletter-writer, tweet-generator, etc.) should:
### Load Patterns at Start
```markdown
**Before generating content, load voice patterns:**
@../knowledge-base/examples/my-voice/patterns.json
**If patterns exist:**
- Apply hook pattern with highest success rate
- Use common phrases in appropriate context
- Avoid blacklisted phrases
- Match sentence length distribution
- Apply tone markers
```
### Pattern Application Examples
**Hook Selection (Patient Newsletter):**
```
patterns.json shows patient_story_first = 80% success
→ Start newsletter with patient story, not definition
Bad: "Atrial fibrillation is an irregular heart rhythm..."
Good: "Last week, a patient's smartwatch flagged AFib at 3 AM. She rushed to my ER..."
```
**Phrase Integration:**
```
patterns.json shows "In my clinic" appears 8 times
→ Use when sharing clinical judgment
Example: "In my clinic, I see three types of statin side effects..."
```
**Citation Style (Doctor Newsletter):**
```
patterns.json shows inline_journal = 85%
→ Use journal name inline, not footnotes
Example: "The ISCHEMIA trial (NEJM, 2020) demonstrated..."
```
## Quality Checks
### Before Saving patterns.json
Verify:
- ✅ All success rates are between 0 and 1
- ✅ Examples are anonymized (no real patient names)
- ✅ Common phrases make sense (not fragments)
- ✅ Avoid phrases align with anti-AI guardrails
- ✅ JSON is valid (no syntax errors)
### After Content Generation Uses Patterns
Track:
- Did approval rate improve?
- Which patterns were most helpful?
- Any patterns that didn't work?
- Update success rates based on new approvals
## Workflow Integration
### Initial Setup (First Time)
1. User provides 5-10 pieces of approved content
2. Save to `knowledge-base/examples/approved-content/`
3. Run voice analyzer
4. Generate initial patterns.json
5. Next workflow uses these patterns
### Continuous Improvement
After each workflow:
1. User marks 20-30 pieces as "Approved" in Notion
2. Export approved pieces to `approved-content/`
3. Re-run voice analyzer (includes old + new content)
4. Update patterns.json with new data
5. Success rates improve over time
### Expected Improvement Curve
- Workflow 1: 65% approval rate (no patterns)
- Workflow 2: 70-75% approval rate (initial patterns)
- Workflow 3: 75-80% approval rate (refined patterns)
- Workflow 4: 80-85% approval rate (mature patterns)
- Workflow 5+: 85-90% approval rate (well-learned voice)
## Tool Usage
### Analyze Voice Command
```bash
python tools/analyze-voice.py \
--input "knowledge-base/examples/approved-content/" \
--output "knowledge-base/examples/my-voice/patterns.json" \
--content-type all \
--min-pieces 5
```
### Options:
- `--input`: Directory with approved content files
- `--output`: Where to save patterns.json
- `--content-type`: patient, doctor, or all
- `--min-pieces`: Minimum pieces needed per format (default: 3)
- `--verbose`: Show detailed analysis during extraction
## Error Handling
### Not Enough Content
If fewer than 3 pieces per format:
```
⚠️ Insufficient data for reliable patterns.
Found: 2 patient newsletters (need 3+)
Found: 1 doctor newsletter (need 3+)
Found: 5 threads ✅
Recommendation: Generate at least 3 pieces per format before analyzing.
For now, I'll create patterns for threads only.
```
### Invalid Content Files
If files can't be parsed:
```
⚠️ Skipping invalid files:
- example.docx (unsupported format, use .md or .txt)
- draft-newsletter.md (marked as draft, not approved)
Analyzing 8 valid files...
```
### Conflicting Patterns
If patterns contradict (rare):
```
⚠️ Conflicting patterns detected:
Pattern A: avg_sentence_length = 12 (patient newsletters)
Pattern B: avg_sentence_length = 18 (patient blogs)
Resolution: Separate patterns by format.
```
## Advanced Features (Future)
### Comparative Analysis
Compare your voice to Eric Topol's (or other reference):
```
Your voice vs Eric Topol:
- Sentence length: 12 words (you) vs 15 words (Topol) → More conversational ✅
- Hook style: Patient story (you) vs Stat (Topol) → More empathetic ✅
- Citation style: Inline journal (both) → Consistent ✅
```
### A/B Testing Patterns
Generate two versions of content:
- Version A: Using learned patterns
- Version B: Without patterns
- User picks better one
- Update success rates accordingly
### Pattern Suggestions
If approval rate plateaus:
```
💡 Suggestions to improve approval rate:
Current: 75% (stuck for 2 workflows)
Target: 85%
Try:
1. Increase patient story hooks (currently 60%, top performers use 80%)
2. Reduce sentence length in blogs (currently 16 words, successful ones are 12)
3. Add more clinical judgment phrases ("In my clinic" appears only 2x per piece, try 3-4x)
```
## Remember
1. **Patterns are guidelines, not rules.** User can override.
2. **Quality > Consistency.** If a pattern doesn't fit the topic, skip it.
3. **Medical accuracy first.** Never sacrifice accuracy for voice match.
4. **Self-improving system.** Patterns get better with every workflow.
5. **Anonymize everything.** No real patient names in examples.
6. **Track improvement.** Log approval rates to prove value.
---
**Ready to learn your voice! Let's build the Content OS that gets smarter with every use. 🎯**
Related Skills
Frontend Typescript Linting.mdc
TypeScript and ESLint rules that MUST be followed when creating, modifying, or reviewing any file under apps/frontend/, including .ts, .tsx, .js, and .jsx files. Also apply when discussing frontend li...
2. Apply Deepthink Protocol (reason about dependencies
risks