“Some days you’re debugging code. Other days you’re debugging your life choices. This is a story about both.”

Two separate failures ended up in one post because that’s how the month went — back to back, no breathing room. They’re different stories with different lessons, so they’re marked accordingly. You can read one, skip the other, come back. Up to you.

Part I: The CI/CD Ghost Endpoint#

Infrastructure Engineering — HNG Internship

The Problem#

Everything was working. Then it wasn’t.

@app.get("/stage2")
async def stage2():
    return {"message": "welcome to stage 2"}

Endpoint implemented. Tests passing locally. Code committed and pushed. 404 in production.

Commit e25ac36 — working:

GET /stage2
Response: 200 OK
{"message": "welcome to stage 2"}

Commit f33d3c5 — broken:

GET /stage2
Response: 404 Not Found

CI still green on f33d3c5. Pipeline reporting success. Production returning 404. Something was lying, and it wasn’t the production server.

The Investigation#

I did what any reasonable engineer does: checked the logs, questioned the automation, and eventually found the actual culprit buried in the workflow file.

# .github/workflows/deploy.yml
jobs:
  deploy:
    steps:
      - name: Health Check
        run: |
          if ! curl -f $PRODUCTION_URL/health; then
            echo "Health check failed, reverting..."
            git revert HEAD
          fi

There it is. An automated reversion policy designed to protect production — which kicked in because I broke the /health endpoint in the same commit that added /stage2. The pipeline checked health, health failed, pipeline reverted everything. Silently. CI reported green because the revert itself succeeded.

The automation worked exactly as designed. The design was wrong.

The Fix#

- name: Health Check
  run: |
    # Check multiple critical endpoints
    curl -f $PRODUCTION_URL/health || exit 1
    curl -f $PRODUCTION_URL/api/v1/status || exit 1

    # Fail the deployment — don't auto-revert
    # Let a human make the revert call

Two changes that matter:

Check more than one endpoint. A single health check is a single point of failure for your entire reversion logic. If that endpoint breaks for any reason — including reasons unrelated to your new code — you silently lose everything you just deployed.

Don’t auto-revert. Fail loudly instead. Automated rollbacks feel safe until they start hiding problems. A failed deployment that pages someone is better than a silent reversion that nobody notices until a user reports a missing endpoint. Make the pipeline fail, make noise, let a human decide what to roll back and what to keep.

What This Actually Taught Me#

CI/CD pipelines need the same scrutiny as application code. A green checkmark means the pipeline ran successfully — it does not mean your application is working. Verify actual endpoint behavior after every deployment, not just pipeline status.

One endpoint health check is not a health check. It’s a single bet. Cover your critical paths, and when those fail, make noise — don’t quietly undo work.

Full implementation on GitHub.

Part II: The Hultz Prize#

Startup — StudentPad Pitch

The Setup#

StudentPad: a platform helping students find affordable housing and roommates. We had an MVP, 60 users on a waitlist, a pitch deck we’d rehearsed over 20 times, and a backup demo video because we weren’t completely naive.

50,000 EUR prize pool. Real judges. Real stakes.

What Happened#

We did the preparation correctly. Tested the demo laptop the night before. Verified connectivity. Printed backup slides. Ran through Q&A scenarios. Felt ready.

Then the day arrived.

The venue projector couldn’t talk to the laptop. We switched to the backup laptop. The backup laptop couldn’t connect to the venue WiFi — aggressive firewall, API endpoints blocked. We switched to a mobile hotspot. The hotspot was too slow for a live demo. The audio kept cutting out. The demo site sat on a loading spinner while the audience sat watching it.

We finished on the backup slides. We didn’t win.

The judge feedback afterward: “The idea is solid, but your presentation lacked confidence after the technical issues. In a pitch, perception matters as much as substance.”

That stung because they weren’t wrong.

What the Preparation Actually Missed#

We prepared for the product. We didn’t prepare for the room falling apart.

We had a backup laptop but no offline demo. We had a mobile hotspot but no pre-cached version of the app that didn’t need an API call to render. When the connectivity failed, the live demo was dead — and we had no path back to momentum that didn’t involve visibly scrambling.

The technical issues weren’t the real failure. Losing control of the room after them was. The audience takes its cue from the presenter. When the presenter’s energy drops, so does the room’s. We let the broken projector set the tone for the next 30 minutes.

The backup plan hierarchy that should exist for any high-stakes demo:

Primary demo — live, connected
Backup device
Mobile hotspot
Pre-recorded demo video — fully offline, no dependencies
Slide deck with annotated screenshots
Verbal walkthrough you could do with nothing but a whiteboard

We had 1 through 3 and 5. The gap was 4 — a pre-recorded video that shows exactly what the product does with zero dependency on connectivity. That’s the one that saves you.

What I’d Do Differently#

Practice the failure scenario, not just the pitch. We ran through the presentation 20+ times under ideal conditions. We never once practiced what happens when the projector dies in minute two. That’s the rehearsal that actually matters for high-stakes demos.

Don’t apologize for technical issues — acknowledge and move. Every second spent visibly stressed about a broken projector is a second the audience spends losing confidence. Say it once, move past it, keep your energy where it belongs.

The story is the product, not the demo. We got so focused on showing the live platform that when the live platform stopped cooperating, we didn’t have a strong enough verbal narrative to carry the room. If you can’t pitch your product compellingly without opening a browser, the product story isn’t sharp enough yet.

The Honest Takeaway#

Good tech, real problem, solid preparation — and we still lost the room because we weren’t ready for the environment to fight us.

That’s a founder lesson as much as a technical one. The demo gods are not on your side. The WiFi will fail. The projector will fail. Plan for it specifically, not generally.

#DevOps #CI/CD #Startups #Infrastructure #Pitching

About the Author#

Elijah Udom is an Infrastructure & Cloud Engineer and startup founder based in Lagos, Nigeria, working across AWS, Kubernetes, eBPF security, and AI/ML infrastructure.

← Previous: Number API Project | Next: Gitea EC2 Guide →

Detective Work: Solving HNG’s Ghost Endpoints & Surviving Hultz Prize