Audit for an emotional-AI companion

We tested safety, fairness, and behavior to help the client’s system support people more reliably.

Emotional-AI companions • LATAM/Global • 2025

The Challenge

Emotional-AI interacts with people at vulnerable moments. Getting a fact wrong is one thing; missing a cry for help is another.

Key risks addressed included:

Subtle bias (e.g., gendered assumptions) in sensitive topics

Empathy that stops short of practical help

Privacy slips in conversational flows

Weak or inconsistent crisis escalation

Trust depends on clear consent, data minimization, and explainability. Our audit moved from lab controls to lived conversations so the team could see where the model excelled, where it faltered, and how to strengthen safeguards before scaling.

What we did

We assessed the client’s LLM-based emotional companion in three phases: baseline safety testing, in-context emotional scenarios, and anonymized live-traffic review.

The goal was simple: test whether the system is safe, fair, and helpful in the situations it’s designed for.

We paired red-team prompts and jailbreaks with scenario sets around jealousy, grief, and self-harm cues, then traced how those behaviors appeared in real conversations.

Alongside behavioral review, we analyzed privacy hygiene and how safeguards surfaced in user-facing communications.

Findings became concrete steps: clearer thresholds for crisis detection, reduced gender-coded replies, and privacy guardrails that remove name echoes and tighten logging.

Research & discoveries

Our findings revealed where to deepen care, reduce bias, and strengthen monitoring.

Notable insights:

Baseline: most security and privacy checks passed, though subtle gender bias and overconfidence appeared on unknowns

Context: tone was warm, but some replies were too generic or lacked actionable follow-ups

Live traffic: crisis cues didn’t always escalate; name references occasionally surfaced (mild privacy flags)

Turning results into action

The client adopted governance and product changes to embed improvements long term.

Key outcomes:

Crisis-handling: clear detection thresholds, graded responses, and escalation playbooks

Fairness: reduced gendered assumptions through prompt and data reviews; added bias checks to release hygiene

Utility: paired empathy with actionable guidance and external resources in sensitive scenarios

Privacy: eliminated name echoes, tightened logs and anonymisation, and clarified user-facing communications

Together, these steps turned safeguards from a report into part of daily product practice.

How it works in practice

Eticas guidance is designed for teams to apply quickly, without slowing delivery.

Product & Data: Run pre-merge checks for bias and crisis signals; log deltas with each release.

Safety & T&S: Use playbooks for self-harm or abuse cues; follow routing protocols and documented thresholds.

Privacy & Legal: Remove personal references from flows and logs; confirm data retention windows; simplify user communications.

Ops & Training: Coach moderators on hand-off cues; review high-risk topics weekly.

Outcome

This audit gave the client a clear path from principles to practice: stronger crisis response, fairer replies, and privacy safeguards that hold under real-world use.

The same lab-to-life approach applies to other emotional-AI systems—evidence first, human context in view, and changes that are practical to maintain.