Cognitive warmth — human-in-the-loop craft
Human cognition behind reliable thinking AI
I work where AI systems fail subtly: ambiguity, context loss, refusals, tool-use, tone, and safety boundaries. My role is to apply structured human judgment to make behavior more reliable, measurable, and useful.
Some of this work lives in private workspaces. Access only via direct URL.
Human-in-the-loop, concretely
I don’t “tune” blindly: I make the reasoning visible, define criteria, then measure the effect of changes. We move from intuition to diagnosis, then to iteration.
What I observe
Friction points: misunderstanding, over-confidence, overly strict refusals, hallucinations, tool-use errors, mismatched tone.
What I make measurable
A scored rubric with weights, so teams can compare versions, track progress, and decide.
What I fix
Guidelines, prompts, examples, guardrails, scenarios — product-oriented fixes grounded in real usage.
What I secure
Helpful refusals, explicit limits, ambiguity behavior, and consistent handling of edge cases.
The craft (walkthrough)
- Frame: goals, constraints, user profiles, risks.
- Rubric: criteria, weights, definitions (what truly matters).
- Evaluate: scoring + qualitative notes (the “why”).
- Failure patterns: error families, probable causes, recurrence.
- Corrections: prompt / product / guidelines / examples / tools.
- Retest: scenario-based verification and before/after comparison.
- Hand-off: actionable recommendations + prioritization.
Results
Clarity
Less “weirdness”. More decisions the user can understand.
Reliability
Fewer recurring errors. Better multi-turn coherence and tool-use.
Measurement
Shared criteria for iteration without looping (scoring + edge cases).
Human
More helpful refusals, better tone alignment, and stronger perceived safety.
No fake testimonials. References available on request.
Contact
Send a message
Project newsletter (discreet)
Milestone updates (no spam).
Learn the intervention format: CIA overview.