Agents vs Humans: When AI Assistants Save the Day (And When They Don't)
Picture this: It's 11pm on a Tuesday. I've been debugging a mobile header for three days. Then an AI agent solves it in 12 minutes. This is the story of building PhysicianTechnologist with Claude, discovering when agents excel, when they fail spectacularly, and how collaboration shaped everything.

Dr. Chad Okay
NHS Resident Doctor & Physician-Technologist
Agents vs Humans: When AI Assistants Save the Day (And When They Don't)
Lessons from Building a Medical Platform with Claude
"The agent had been running for 12 minutes. When I checked back, it had found the CSS bug that had tormented me for three days. I stared at the screen, feeling a mixture of relief and embarrassment. How had I missed something so obvious?"
That was the moment I realised the true power (and limitations) of AI agents in development. Throughout the PhysicianTechnologist build, I used Claude with specialised tools extensively. This isn't a story about AI replacing developers. It's about discovering when agents excel, when they fail spectacularly, and how our collaboration shaped the entire development process.
Let me take you through what actually happened, bug by bug, failure by failure, and victory by victory.
Chapter 1: Where Agents Dominated
The CSS Stacking Context Investigation
Picture this: It's 11pm on a Tuesday. I've been debugging the mobile header for three straight days. The header was invisible on mobile Safari despite the state being correct. I'd checked everything twice, then checked it again. My browser history was a graveyard of Stack Overflow posts and obscure CSS documentation.
In desperation, I turned to the agent with this directive:
# scripts/debug-mobile-header.sh - My last resort at 11pm
"Perform a deep dive into the codebase to understand why the
header isn't showing on mobile browsers. Analyse CSS files,
component hierarchy, and mobile-specific behaviours."
Twelve minutes. That's all it took. The agent systematically scanned all CSS files for transform properties, mapped component hierarchies, identified stacking context creation patterns, cross-referenced with mobile Safari documentation, and found the exact issue: transforms were creating stacking contexts that broke position:fixed children.
I could have found this eventually, but the agent's ability to systematically analyse hundreds of files in seconds was humbling. It didn't get tired, didn't skip files because they "looked fine", and didn't assume anything.
The Great API Audit of 2am
The API consistency audit happened at 2am on a Thursday (why do I always tackle these tasks at ungodly hours?). I had 22 API endpoints that needed checking for response format consistency. My human estimate was 2-3 hours of tedious, mind-numbing checking.
The agent completed it in 3 minutes.
Not only did it finish quickly, but it produced a comprehensive report identifying 8 different response formats, field naming inconsistencies, and missing error handling in places I hadn't even thought to check. The systematic nature meant it caught patterns I would have missed even after three cups of coffee.
The Colour Hunt That Saved My Sanity
I needed to find every hardcoded colour in the codebase to implement proper theme support. The codebase had grown to 150+ files. The thought of manually checking each one made me consider a career change.
The agent found them all: bg-green-600
hiding in a forgotten component, text-red-500
in error messages I'd written months ago, border-amber-200
in a tooltip I didn't even remember creating. Every single hardcoded colour, even in files I'd forgotten existed. Perfect performance, zero complaints, no coffee breaks needed.
Chapter 2: Where Humans Excelled
The Scrollytelling Vision
No agent could have conceived the scrollytelling hero concept. That moment of inspiration ("what if medical content could be as engaging as a parallax gaming website?") came at 3am while I was procrastinating by browsing award-winning web designs. The creative vision of making dense medical content engaging through parallax effects and dynamic headers was purely human imagination.
The agent could implement it once I described it, but that initial "what if..." moment? That spark of connecting two unrelated concepts? Pure human creativity fuelled by too much caffeine and not enough sleep.
The Theme Colour Psychology
Choosing colours for a medical platform isn't just about hex codes. I spent hours considering what colours convey trust in healthcare, how medical professionals perceive different shades, and the cultural context of the UK NHS.
The agent could tell me that teal is "associated with healthcare" but couldn't grasp why I chose that specific shade of teal (#0F766E) that subtly reminded users of NHS scrubs without being too obvious. It couldn't understand the psychological weight of colour in a medical context where trust literally saves lives.
The "It Feels Wrong" Moments
Something felt off about the header appearing at 0.75 viewport heights. No metric said it was wrong. The agent thought it was fine. But it felt too early, too eager, like someone interrupting a conversation.
I changed it to 2.0 viewport heights at 4am on a Saturday. This kind of intuitive UX decision, based on nothing but gut feeling and years of scrolling websites, was beyond any agent's capability.
Chapter 3: The Spectacular Failures
The Dark Mode Visibility Disaster
Here's where things got embarrassing. The agent successfully converted all colours to theme-aware variables. I was impressed. The code looked clean. The agent was confident. I deployed to staging.
Then I switched to dark mode.
The Twitter thread UI was completely invisible. White text on white background. The agent had been working with code, not visual output. It couldn't "see" that the mathematically correct colour transformation had created an unusable interface. I spent another hour at 1am manually testing every component in both themes.
Chapter 4: The Perfect Collaboration Pattern
Through trial and error (emphasis on error), I discovered the optimal collaboration pattern.
First, I define the vision. "I need a scrollytelling homepage that showcases medical articles," I tell the agent. It implements the technical details while I focus on the creative aspects.
When something breaks, I provide the context: "The header isn't showing on mobile." The agent analyses the entire codebase for CSS issues while I test the suggested fixes on real devices.
For subjective decisions, I make the call. The agent presents five technically correct colour options. I choose the teal that "feels more medical" based on intuition it could never possess.
For repetitive refactoring, the agent takes over completely. "Replace all hardcoded colours with semantic variables," I instruct, and it systematically updates 47 instances across 9 files while I grab another coffee.
The Data: What Actually Happened
Looking at my git history tells the real story.
Tasks the agent completed successfully included CSS stacking context analysis (saving 3 days), API consistency audit (saving 3 hours), hardcoded colour replacement (saving 2 hours), TypeScript type generation from database schemas, systematic file renaming across the project, and dependency vulnerability scanning.
Tasks requiring my intervention included the scrollytelling concept and UX design, theme colour selection based on medical context, mobile device testing on real phones, database schema verification against business logic, visual bug identification that required human eyes, and business logic decisions that needed domain knowledge.
The Productivity Reality Check
Before using agents, I averaged 3-4 bugs fixed per day, refactored code at about 100 lines per hour, and took hours to days to identify systematic issues.
With agents, I now average 8-10 bugs fixed per day, refactor at about 500 lines per hour, and identify systematic issues in minutes.
But velocity isn't everything. Every piece of agent-generated code required my review. About 30% of suggestions needed modifications. Around 5% of changes introduced subtle bugs that only appeared later.
The Cost-Benefit Analysis
The time saved was significant. CSS debugging went from 3 days to 12 minutes. API audit dropped from 3 hours to 3 minutes. Colour replacement fell from 2 hours to 5 minutes. Total time saved in September: approximately 40 hours.
But there were hidden costs. I spent about 8 hours reviewing agent suggestions, 4 hours fixing agent-introduced bugs, and 3 hours re-explaining context when the agent lost track. Total overhead: approximately 15 hours.
Net benefit: 25 hours saved with demonstrably higher code quality and consistency.
The Philosophy That Emerged
After a month of intensive agent-human collaboration, working through late nights and early mornings, I've developed this philosophy:
Agents are tools, not teammates. They excel at mechanical tasks but lack context, intuition, and the ability to understand why something matters to humans. They don't know that medical professionals trust certain colours or that a header appearing too early feels pushy.
The best code comes from human creativity augmented by agent capability. I provide vision and judgement; agents provide speed and consistency. Neither of us could build PhysicianTechnologist alone.
Trust but verify has become my mantra. Agents can be confidently wrong. Every suggestion needs human review, especially when dealing with external systems or visual outputs. That dark mode disaster taught me this lesson permanently.
Lessons Learned (The Hard Way)
- •
Agents excel at systematic analysis - Let them find patterns across large codebases. They don't get tired at 3am.
- •
Humans excel at creative vision - Don't ask agents to innovate. They can't imagine a scrollytelling medical blog.
- •
Collaboration multiplies capability - One human plus one agent achieves more than two humans or two agents ever could.
- •
Context is everything - Agents fail when they lack access to visual output or cultural understanding.
- •
Verification is mandatory - Never deploy agent code without testing, especially UI changes.
The Bottom Line
Building PhysicianTechnologist with agent assistance was transformative. I shipped in weeks what might have taken months. But the agents didn't build it. I didn't build it alone either. We built it together, each doing what we do best.
The future of development isn't human versus AI. It's human with AI, each compensating for the other's weaknesses. The agent never gets tired at 2am. I never lose sight of why the user experience matters.
And sometimes, at 11pm on a Tuesday, when an agent finds in 12 minutes what took me three days to miss, that collaboration feels like magic.
Share this article

Dr. Chad Okay
I am a London‑based NHS Resident Doctor with 8+ years' experience in primary care, emergency and intensive care medicine. I'm developing an AI‑native wearable to tackle metabolic disease. I combine bedside insight with end‑to‑end tech skills, from sensor integration to data visualisation, to deliver practical tools that extend healthy years.