2026-03-17|6 min read|--devlog--xulangedu--excel--parser--indie-maker

The Day We Taught a School System to Read Excel

The Day We Taught a School System to Read Excel

There's a specific kind of satisfaction that comes from solving a problem you didn't know existed until you stared at someone else's spreadsheet for twenty minutes.

Today was that kind of day.


## The Setup

XuLangEdu is a school management system I'm building for Dũng — a friend who runs a tutoring center in Lạng Sơn. Real students. Real teachers. Real money changing hands every month.

For months, Dũng has been managing everything in Excel. Attendance sheets. Fee statements. Teacher schedules. All of it lives in a web of spreadsheets that only makes sense if you built them yourself.

My job today: make the app understand those spreadsheets.


## The Files

Dũng uses two types of Excel files every month:

Fee statements — the source of truth. Student names, fees per session, total owed, attendance marks, parent contact info. Messy but complete.

Attendance files — teacher-facing. Class schedules, which teacher taught which session, notes about students. More structured, different enough to break a naive parser.

Five files total. Two subjects (General Math THCS, Specialized Grade 9). One afternoon to make them importable.

I started confident. I ended humbled.


## The Parser That Kept Lying

The first version worked. Sort of.

Upload a file, get a summary: "12 classes, 452 students." I felt good for about three minutes.

Then I opened the student list.

Nguyễn Thanh Dũng. Liễu Thanh Tùng. Ngô Thế Giang.

Those aren't students. Those are teachers. The parser had cheerfully included the teacher roster — repeated across five different class sheets — in the student count.

Here's what happened: Each sheet in the Specialized Grade 9 file has two tables. The first is the student roster (mostly empty template slots with value 0). The second is a statistics block: "Number of students," "Average sessions," and then — the teacher name mapping table.

The parser saw STT=1,2,3... and names that weren't 0, and thought: students.

Fix 1: Stop parsing when you hit the first blank row after finding a real student. (Each table is separated by exactly one blank row.)

Fix 2: Add a blacklist of stat keywords — "Số HS lớp," "Công trung bình," "Tổng số ca dạy được."

Fix 3: If the sheet has no real session dates (all zeros in the date row), it's a template. Skip it.

Three fixes. One root cause: I hadn't read the file carefully enough before writing the code.


## The Teacher Name Problem

Teachers in this system have three names:

  • >Short code (used in attendance cells): "Dũng", "Tùng", "Giang"
  • >Medium name (used in headers): "Thanh Dũng", "Ngô Giang"
  • >Full legal name (in the mapping table): "Nguyễn Thanh Dũng", "Liễu Thanh Tùng"

One person. Three representations. Scattered across two different file types.

The solution was hiding in plain sight: every Excel file has a sheet called "Mã hóa DL đầu vào" — roughly, "Data Input Encoding." It's a lookup table mapping short codes to full names.

1. Nguyễn Thanh Dũng → Dũng
2. Liễu Thanh Tùng   → Tùng  
3. Ngô Thế Giang     → Giang

Read this table first. Build a lookup. Normalize every teacher name before storing it.


## The Space Problem Nobody Warns You About

Five sheets in the Specialized file don't match their counterparts in the attendance file. Not by much. Just a space.

Fee statement:   "10. Trung"
Attendance file: "10.Trung"

One has a space after the period. One doesn't. Same class. Different key. No merge.

The fix: normalize sheet names before comparing. Strip all spaces, lowercase everything. "10. Trung" and "10.Trung" both become "10trung". Match.

It's the kind of bug that makes you stare at your screen for a while before you realize you're looking at a whitespace character.


## What We Actually Built Today

By end of day, the import pipeline does this:

  1. >Upload any Excel file → system auto-detects which month(s) it belongs to
  2. >Separates data into staging buckets, one per month
  3. >Fee statements: students, fees, attendance records
  4. >Attendance files: class schedules, full teacher names (normalized)
  5. >Review page: three tabs — Classes, Students, Warnings
  6. >Merge logic: class info from attendance files enriches fee statement data
  7. >Confirm → writes to Production
  8. >Re-import button if you discover errors after confirming

Also: the old "type the month manually" input is gone. The file knows what month it is. We just ask it.


## The Number That Checked Out

At the end of the day, the system reported: 29 classes, 408 students for February 2026.

I ran the numbers manually against the raw Excel. Cross-referencing sheet by sheet, student by student.

29 classes. 408 students. ✓

The 10 "missing" classes? Template sheets with Tháng 12 in the header — placeholder sheets from three months ago with no real data. The parser correctly ignored them.

It's a small thing. But when a number you computed matches a number you counted by hand, something settles.


## One Bug Still Open

The teacher names aren't showing up in the review page yet. The parser reads them correctly — I verified it locally. The staging document exists in Firestore. But the merge isn't triggering.

Root cause, narrowed down tonight: a TypeScript type issue. unknown[] doesn't play nicely with .slice() at runtime. The hasRealDates check that gates the whole merge was silently returning false for every sheet.

Fix pushed. Deploy in progress. Test tomorrow.


## The Thing About Building for Real People

Dũng is going to use this. Not in a "maybe someday" way. In a "next month's attendance sheets are due" way.

That changes how you think about bugs. A parser that miscounts students isn't a test failure. It's a number Dũng trusts, puts in a report, tells parents about.

I've built a lot of things nobody ever really used. This one has a deadline and a face attached to it.

That's uncomfortable in the best possible way.


Total commits today: 17. Tasks completed: 049A through 049I. One bug open. One very tired designer-who-codes.