Follow me on LinkedIn - AI, GA4, BigQuery

I built an AI voice receptionist called Olivia for a company in Florida. She answers inbound calls, collects the caller's address and property type, identifies what the call is about, and routes it to the right team.

The system prompt I wrote for her, the written instructions she follows on every call, started tidy. About two thousand words, organised into named sections that made sense at the time. 


There was a "behaviour" section on how she should sound, a "flow" section on the order of questions, a "format" section on how to read details back to the caller, and a few more sections for edge cases, examples, and key rules. 

Each section earned its place.

Then testing started.


I made a test call, something went wrong, so I added a rule to handle it. The next call revealed another problem, and I patched that too. 

By lunch, I had made a dozen edits, and by the end of the day, I was closer to two dozen. The prompt was no longer the thing I wrote. It was the thing my tests wrote.


Each individual edit looked fine. The trouble was that nothing had been taken away.

A hundred edits in, I opened the prompt to make a small change, and I could not find where the rule lived. It was in three places, two of them slightly different, and one I had added three days earlier and already forgotten about. That was the moment I stopped patching and ran a clean-up pass.

What’s clean up pass?

It is a deliberate edit session that reorganises the voice prompt without changing what the agent does. Same behaviour, fewer rules, each instruction in one obvious place.

The prompt had picked up scar tissue in predictable places:

  • A rule about property type inference (working out whether the caller is residential or commercial) is in the examples section.
  • A near-duplicate of that rule in the flow steps.
  • A rule about not collapsing confirmations (reading details back one at a time rather than all at once) in the behaviour section.
  • A softer reminder of the same thing in the confirmations section.

Each one had been added because a real test had failed, which meant none of them was wrong on their own. 

But the prompt had bloated. 


After the clean-up, each behaviour had one home. Flow steps pointed to the examples section instead of repeating it. 

The prompt got about twenty percent shorter. The more important number was that the next time I needed to change a rule, I knew where it was.

Why does this matter?

When the same rule appears slightly reworded in three places, you have effectively given the model three rules to weigh.

Two of them might look identical to you, but not to the model, which picks one based on context rather than on what you intended.

The output looks almost right, except in the cases where it does not, and you have no idea which copy of the rule was responsible. Not to mention, your agent is now using more tokens.

How do I run a clean-up pass?

I do five checks, in roughly this order. The first one does most of the work.


#1 One behaviour, one home. 

If the same instruction shows up in two sections, keep the cleaner version and delete the other.

When another section needs to refer to it, add a one-line pointer instead of repeating the rule. Most of a clean-up pass is just this check, applied over and over.


Before:

[Examples]

"my house" or "my home" = residential.

"my office" or "my shop" = commercial.

[Flow]

Step 3: Identify the property type.

"my house" / "my home" → residential.

"my office" / "my shop" → commercial.


After:

[Examples]

"my house" or "my home" = residential.

"my office" or "my shop" = commercial.

[Flow]

Step 3: Identify the property type. See Examples for inference patterns.


#2 Bullets beat sentences for patterns. 

If the rule is that "my house" should be treated as residential, a paragraph of explanation is overkill.

Place the template at the top of the section, then list the patterns as bullet points underneath. Bullets are easier for me to scan and easier for the model to match against.


Before:

If the caller says "my house", treat it as residential. If they say "my home",

also residential. If they say "my apartment" or "my flat", residential too.

If they say "my office", that is commercial. If they say "my shop" or "my

store", also commercial. If they say "my warehouse", commercial.


After:

Treat as residential:

- my house

- my home

- my apartment / my flat

Treat as commercial:

- my office

- my shop / my store

- my warehouse


#3 Keep format details out of flow steps. 

A flow step like "confirm the address individually with em dash and digit-by-digit ZIP" is doing two jobs at once.

It is telling the agent what to do, and also telling it how to read the result aloud, since the em dash and the digit-by-digit spelling are formatting cues for the text-to-speech engine. 

Move the formatting back to the format section. The flow step should just say "confirm the address." If the format section is doing its job, the rest follows.


Before:

[Flow]

Step 5: Confirm the address back to the caller. Read each part individually,

with an em dash before any number, and spell the ZIP code digit by digit.


After:

[Flow]

Step 5: Confirm the address back to the caller.

[Format]

Numbers: prefix with an em dash before reading aloud.

ZIP codes: read digit by digit.


#4 Sometimes the fix is deletion. 

When something goes wrong, the instinct is to write a new rule.

Often, that is right. But after a few rounds of patches, the better fix is usually to find a rule that has drifted out of date and rewrite it, or to remove a rule whose original failure no longer applies.

I miss this one constantly, because adding feels safer than deleting.


Before (three rules accumulated from separate patches):

If the caller asks about emergency service, give them the after-hours number.

After 6pm, route the call to the on-call technician.

Confirm the issue is an emergency before routing after hours.


After:

After 6pm, confirm the issue is an emergency, then route to the on-call technician.


#5 Trust the obvious. 

If you ask for "full name," you do not need a parenthetical explaining that the caller might give first and last name in one go.

The model handles that without help. Every parenthetical you add is one more thing competing for attention later.


Before:

Ask the caller for their full name (this could be first and last name spoken

together, or just first name; handle both cases gracefully).


After:

Ask the caller for their full name.

When to do a cleanup pass?

I run a cleanup pass roughly every 8 to 10 edits. If you run the cleanup after every edit, you could get stuck in a testing loop.

This is because after every clean-up, you need to re-run all prior tests. I have wasted whole afternoons trying to clean up after every edit.