2026-03-02|11 min read|--AI--Workflow--Optimization--Claude--Developer Tools--Token Efficiency

Upgrading Kai: From 4K Tokens to 300 Tokens

Upgrading Kai: From 4K Tokens to 300 Tokens

How I optimized my Claude AI workflow with 3-Layer Architecture, reducing token usage by 93%.


## The Problem: Tokens Aren't Free

Working with AI assistants like Claude, you quickly realize one truth: tokens cost money.

And as projects grow, documentation grows with them. My Progress.md went from 50 lines at the start to 500 lines after two weeks. Every time Kai (my AI assistant) reads this file = 4,000 tokens (~$0.012).

Sounds small, but:

  • >Each session reads Progress.md 3-5 times
  • >Each workday has 2-3 sessions
  • >Token costs grow exponentially across multiple projects

Do the math: 60K-100K tokens/day just reading old context.

I needed a better solution.


## The Insight: AI Doesn't Need Everything

The "aha" moment came when I realized:

When starting a session, Kai only needs to know:

  • >Where are we? (current status)
  • >What's next? (next task)
  • >Any issues? (blockers)

Kai doesn't need:

  • >What we did last month
  • >Details of every completed task
  • >Full history of 50 previous tasks

Unless... I explicitly ask about them.


## The Solution: 3-Layer Architecture

### Layer 1: QUICK.md (~50 lines, 300 tokens)

Always read first. Every session start.

# Project - Quick Status

**Progress:** 52/60 (87%)
**Next:** Task 056 - UI Polish

## Recent Wins (This Week)
- ✅ Task 057: Fixed critical bug
- ✅ Task 050: Clear cache feature

## Active Sprint
- [ ] Task 056: UI polish
- [ ] Task 058: App icon

## Blockers
None

## Recent Decisions
1. Split Tý hour into early/late
2. Use iztro as reference

Goal: 30 seconds for Kai to understand context.


### Layer 2: Progress.md (~200 lines, 1,500 tokens)

Read when continuing work. When details are needed.

# Progress Log

## Current Status
[Stats table]

## March 2026 Changelog
### Mar 02 - Task 057 Complete
[Details about fix]

### Mar 02 - Task 050 Complete
[Implementation details]

## Links
- February history: Archive/2026-02.md

Goal: Enough detail to work, but current month only.


### Layer 3: Archive/YYYY-MM.md

Read only when asked. When referencing old work.

Archive/
├── 2026-01.md
├── 2026-02.md
└── 2026-03.md

Goal: Store history without wasting daily tokens.


## Smart Loading

Kai automatically detects what to read:

### Scenario 1: Start Session

User: "Continue LaKinh"

Kai:
→ Read QUICK.md (300 tokens)
→ "Task 056 - UI Polish. Let's start!"

### Scenario 2: Continue Work

User: "What does Task 056 involve?"

Kai:
→ Already read QUICK.md
→ Read Progress.md (1,500 tokens)
→ "Task 056: Visual polish across screens..."

### Scenario 3: Reference Old Work

User: "What stars did Task 025 add?"

Kai detects: Task 025 = February (old)
→ Read Archive/2026-02.md
→ "Task 025 added 27 minor stars: Kình Dương, Đà La..."

Zero manual intervention. Kai knows what to read, when.


## The Results: Token Savings

ScenarioBeforeAfterSavings
Start session4,00030093% ⬇️
Continue work4,0001,80055% ⬇️
Reference old4,0003,8005% ⬇️

Average: 50-60% token savings across all sessions.

For one month:

  • >Before: ~3,000,000 tokens
  • >After: ~1,200,000 tokens
  • >Savings: $5.40/month (at $0.003/1K tokens)

Multiply by 5 projects? $27/month saved.


## Why This Works

### 1. Aligned with Human Memory

You don't remember every detail of 100 tasks. You remember:

  • >What you're working on
  • >What you just did
  • >What's coming next

AI is the same. It needs relevant context, not complete history.

### 2. Lazy Loading Pattern

In programming, there's a pattern called "lazy loading" - only load data when needed.

3-Layer Architecture applies this to AI context:

  • >QUICK.md = eager load (always)
  • >Progress.md = load on demand (when working)
  • >Archive/ = lazy load (when referenced)

### 3. Scales Indefinitely

Project runs 1 week? 1 month? 1 year? 5 years?

Doesn't matter. Archives separate by month:

  • >Month 1: 2K tokens
  • >Month 2: 2K tokens
  • >Month 3: 2K tokens
  • >...

QUICK.md stays 300 tokens, even after 5 years.


## Implementation

### Migration Script

# 1. Create structure
mkdir -p [Project]/Kai/Archive

# 2. Create QUICK.md
cat > [Project]/Kai/QUICK.md << 'EOF'
# Project - Quick Status
[Template here]
EOF

# 3. Split Progress.md
# Keep current month → Progress.md
# Move old months → Archive/YYYY-MM.md

# 4. Test
echo "Continue [Project]"
# Kai should read QUICK.md only

### Auto-Detection

Kai automatically suggests migration when it detects:

Progress.md > 400 lines
OR
Token usage > 3,000

Message:

⚠️ Progress.md is 523 lines (~4,200 tokens).

Recommend: Migrate to 3-Layer structure.
Token savings: 50-60% ⬇️

Setup now? (5 minutes)

## Beyond LaKinh

I'm applying 3-Layer to all projects:

  • >LaKinh (migrated Mar 02, 2026)
  • >FlashBee (pending)
  • >HuVang (pending)

Each project now has:

Project/Kai/
├── QUICK.md          ← 300 tokens
├── Progress.md       ← 1,500 tokens
├── Tasks.md          ← Reference
├── Features.md       ← Reference
└── Archive/          ← History

Consistent structure = efficient workflow.


## Lessons Learned

### 1. Documentation Needs Architecture Too

You can't just keep appending to one file forever. After 500 lines, you need refactoring.

Rule of thumb:

  • >< 200 lines: OK
  • >200-400 lines: Consider splitting
  • >

    400 lines: Must split

### 2. AI Context ≠ Human Memory

Humans have "working memory" + "long-term memory".

AI needs similar concepts:

  • >QUICK.md = working memory
  • >Progress.md = short-term memory
  • >Archive/ = long-term memory

### 3. Optimize for the Common Case

99% of sessions only need current context.

Optimize for this case → immediate 93% savings.

The remaining 1% (referencing old work) accepts the trade-off: read additional archive (still cheaper than reading full history every time).


## Real-World Impact

### Before 3-Layer

Typical day working on LaKinh:

  • >Morning session: 4K tokens (read full Progress.md)
  • >Afternoon session: 4K tokens (read again)
  • >Evening session: 4K tokens (read again)
  • >Total: 12K tokens = $0.036/day

Over a month: 360K tokens = $1.08

Multiply by 5 active projects: $5.40/month just on context loading.

### After 3-Layer

Same day:

  • >Morning: 300 tokens (QUICK.md only)
  • >Afternoon: 1,800 tokens (QUICK + Progress)
  • >Evening: 300 tokens (QUICK only)
  • >Total: 2,400 tokens = $0.007/day

Over a month: 72K tokens = $0.22

Across 5 projects: $1.10/month.

Savings: $4.30/month - or 80% reduction.

That's the cost of a coffee. But it compounds:

  • >Year 1: Save $51.60
  • >Year 5: Save $258
  • >Lifetime: Priceless (because you built the habit)

## Technical Implementation Details

### File Structure Before

LaKinh/
├── Kai/
│   ├── README.md
│   ├── Tasks.md
│   ├── Features.md
│   └── Progress.md        ← 523 lines, 4,200 tokens
└── src/

### File Structure After

LaKinh/
├── Kai/
│   ├── README.md
│   ├── QUICK.md           ← 48 lines, 300 tokens ✨
│   ├── Tasks.md
│   ├── Features.md
│   ├── Progress.md        ← 187 lines, 1,500 tokens ✨
│   ├── FILE_STRATEGY.md   ← Documentation
│   └── Archive/
│       └── 2026-02.md     ← 336 lines, 2,700 tokens
└── src/

### Kai's Decision Tree

def load_context(user_message):
    # Always load Layer 1
    context = read_file("QUICK.md")  # 300 tokens
    
    if needs_work_details(user_message):
        context += read_file("Progress.md")  # +1,500 tokens
    
    if references_old_task(user_message):
        month = detect_month(user_message)
        context += read_file(f"Archive/{month}.md")  # +2,000 tokens
    
    return context

### Token Breakdown by Session Type

Quick Check (30% of sessions):

  • >Just need status update
  • >Load: QUICK.md only
  • >Cost: 300 tokens

Active Work (60% of sessions):

  • >Building features, fixing bugs
  • >Load: QUICK.md + Progress.md
  • >Cost: 1,800 tokens

Deep Dive (10% of sessions):

  • >Reference old decisions
  • >Load: QUICK.md + Progress.md + Archive
  • >Cost: 4,300 tokens

Weighted average: 1,570 tokens/session (vs 4,000 before)


## Beyond Token Savings

The benefits extend beyond just cost:

### 1. Faster Context Switch

Opening a project now takes 1 second (read QUICK.md) vs 5 seconds (read full Progress.md).

Across 20 context switches/day: Save 80 seconds daily = 6.5 hours/year.

### 2. Clearer Mental Model

QUICK.md forces you to clarify:

  • >What matters now?
  • >What's blocking us?
  • >What did we just win?

This exercise alone improves project clarity.

### 3. Better Collaboration

When onboarding someone (or AI) to your project:

Before: "Read this 500-line document"

After: "Read QUICK.md (2 minutes). Need more? Check Progress.md"

Huge reduction in cognitive load.

### 4. Archival Benefits

Full history preserved in Archives, organized by time. Need to understand why a decision was made 6 months ago? Archive has it.

But you don't pay the token cost unless you need it.


## Common Questions

Q: What if I reference old work frequently?

A: Then your average token usage will be closer to the "Before" numbers. But for most developers, 90% of work is forward-looking, not backward-looking.

Q: Isn't manual archiving a hassle?

A: Yes initially. That's why I built auto-detection - Kai suggests migration when files get large. And monthly archiving becomes routine (5 minutes/month).

Q: What about projects that need full context always?

A: Some projects genuinely need everything always. For those, 3-Layer won't help. But most indie projects don't - they're sequential and iterative.

Q: Does this work with other AI assistants?

A: Yes! The pattern works with any AI that loads context from files: GPT-4, Claude, Gemini, etc. The specific token numbers will vary, but the savings ratio stays similar.


## What's Next

### For My Projects

  1. >Migrate FlashBee + HuVang to 3-Layer
  2. >Build monitoring dashboard for actual token usage
  3. >Automate archiving with end-of-month script
  4. >Document lessons learned after 3 months

### For the Community

I've created:

  • >3-Layer-Strategy.md - Global standard document
  • >Auto-Suggestion.md - Detection and migration script
  • >Templates for QUICK.md and Progress.md

Available for anyone using Claude AI with growing projects.

If this helps your workflow, let me know. If you find ways to improve it, even better - I'd love to learn from your experience.


## Conclusion

Tokens are finite. Optimize how you use them.

3-Layer Architecture enables:

  • >Indefinite scaling (QUICK.md always stays 300 tokens)
  • >Efficient work (only load what's needed)
  • >Cost savings (50-60% average reduction)

From one session saving 3,700 tokens, multiplied by hundreds of sessions per month, the savings become substantial.

More importantly: Your AI assistant still understands the project, stays context-aware, and remains helpful.

Just smarter about what to remember, when.


Working with Claude or other AI assistants on large projects? Try implementing 3-Layer Architecture. Or share how you optimize token usage - I'd love to learn from you.

GitHub: gianghaison
Twitter: @gianghaison


## Technical Appendix

### Tools & Stack

  • >Claude Sonnet 4.5 - Primary AI assistant
  • >Claude Code CLI - Terminal integration
  • >Cursor IDE - Development environment
  • >MCP (Model Context Protocol) - Automation layer

### Token Pricing Reference

  • >Claude API: $0.003 per 1K input tokens
  • >Average session before: ~20K tokens
  • >Average session after: ~8K tokens
  • >Monthly savings: $10-30 depending on usage patterns

### Projects Mentioned

  • >LaKinh: Vietnamese astrology app (Tử Vi) - 60 tasks, React Native + TypeScript
  • >FlashBee: Flashcard learning app for kids - 45 tasks, React + Firebase
  • >HuVang: Vietnamese finance tracking app - 35 tasks, React + Cloud Functions

### Repository Structure (Generic Template)

ProjectName/
├── Kai/
│   ├── README.md                 ← Guide to Kai folder
│   ├── QUICK.md                  ← Layer 1: Always read
│   ├── Progress.md               ← Layer 2: Current month
│   ├── Tasks.md                  ← Reference
│   ├── Features.md               ← Architecture docs
│   ├── FILE_STRATEGY.md          ← This system explained
│   └── Archive/
│       ├── 2026-01.md           ← Layer 3: History
│       ├── 2026-02.md
│       └── 2026-03.md
├── src/                          ← Source code
├── tests/                        ← Test files
└── docs/                         ← Additional documentation

### Benchmark Results

Tested over 30 sessions (10 sessions each type):

Quick Check Sessions (10):

  • >Average tokens: 305
  • >Range: 295-320
  • >Savings vs before: 92.4%

Active Work Sessions (10):

  • >Average tokens: 1,847
  • >Range: 1,750-1,950
  • >Savings vs before: 53.8%

Deep Dive Sessions (10):

  • >Average tokens: 4,312
  • >Range: 4,100-4,500
  • >Savings vs before: -7.8% (slightly more, but rare)

Overall weighted average: 1,638 tokens/session (59% savings)


Last updated: March 02, 2026