El bisturí no firma la cirugía

🩺 El bisturí no firma la cirugía

Un equipo conecta un agente a su repositorio, a la base de datos y al tablero ejecutivo. La demo tarda minutos, corrige un campo, abre un cambio y deja a todos con la sensación de haber visto el futuro. Falta una pregunta antes del aplauso.

Usar un agente no es lo mismo que gobernarlo cuando puede tocar código, datos, APIs, infraestructura, reportes y procesos de negocio.

La mano sigue siendo responsable.

Un agente deja de ser un asistente pasivo cuando puede ejecutar pasos, llamar herramientas, modificar archivos, consultar bases, abrir solicitudes, actualizar tableros o disparar pipelines. En una empresa colombiana eso puede sonar muy atractivo para acelerar soporte, analítica, migraciones o gobierno de datos. También puede convertirse en un riesgo operativo si se le entregan permisos sin límites, trazabilidad ni plan de reversa.

NIST publicó en 2024 el perfil de IA generativa para su AI Risk Management Framework. El documento no trata la IA como truco de productividad, sino como sistema que debe ser diseñado, evaluado, gobernado y monitoreado durante su ciclo de vida. Esa mirada importa porque un agente no solo produce texto. Produce acciones encadenadas con efectos acumulados.

OWASP llevó el punto a seguridad aplicada. En su Top 10 para aplicaciones con LLM de 2025, el riesgo de agencia excesiva aparece cuando un sistema recibe demasiada funcionalidad, demasiados permisos o demasiada autonomía. La definición es sencilla y fuerte. El daño no depende solo de que el modelo alucine. También puede venir de una instrucción ambigua, una inyección de prompt, una herramienta comprometida o una cadena de agentes que amplifica el error.

AWS, en su guía prescriptiva de seguridad para IA agéntica, mapea ese riesgo a controles concretos como delimitar el alcance del agente, hacer modelado de amenazas, aplicar autenticación adaptativa, restringir operaciones contra sistemas sensibles, monitorear comportamiento y tener apagado de emergencia para escenarios de alto riesgo. El asunto ya no cabe en prompt engineering. Entra en arquitectura, seguridad y operación.

Pensemos en un quirófano. Un bisturí muy preciso no decide dónde cortar, cuánto cortar ni cuándo detenerse. El cirujano responde por diagnóstico, consentimiento, esterilidad, equipo, monitoreo, sangrado, cierre y recuperación. En tecnología, un agente poderoso es ese bisturí. Puede acelerar una intervención, pero no firma la cirugía.

Gobernar agentes significa definir qué puede hacer, sobre qué sistemas, con qué permisos, bajo qué evidencia, con qué supervisión humana y con qué mecanismo de reversa. Permiso es la autorización concreta para ejecutar una acción. Trazabilidad es poder reconstruir qué pasó, quién lo pidió, qué herramienta se invocó y qué cambió. Linaje es saber de dónde viene un dato y cómo se transformó antes de llegar a un reporte o modelo.

Sin esas bases, el agente se vuelve un practicante con acceso a sala de cirugía. Puede parecer útil porque se mueve rápido, pero nadie serio confunde movimiento con control. La mano sigue siendo responsable.

La escena se ve con facilidad en datos. Un equipo quiere que un agente corrija automáticamente tablas de clientes antes de cargar un reporte ejecutivo. La versión apresurada le entrega una función que escribe directo sobre la base. La versión con criterio separa lectura, propuesta, validación y aprobación.

# Sin gobierno
def corregir_cliente(db, cliente_id, nuevo_email):
    db.execute(
        "update clientes set email = ? where id = ?",
        [nuevo_email, cliente_id]
    )

# Con gobierno
def proponer_correccion(db, usuario, cliente_id, nuevo_email):
    actual = db.query("select email from clientes where id = ?", [cliente_id])
    cambio = {
        "tabla": "clientes",
        "campo": "email",
        "antes": actual[0]["email"],
        "despues": nuevo_email,
        "solicitado_por": usuario,
        "requiere_aprobacion": True
    }
    db.insert("cambios_pendientes", cambio)
    return cambio

La primera función parece eficiente, pero mezcla recomendación y ejecución. Si el agente interpreta mal una instrucción, recibe un dato contaminado o actúa sobre el cliente equivocado, el error entra directo a producción. La segunda no frena la productividad. La encauza. El agente prepara una propuesta auditable, deja evidencia del antes y el después, identifica al solicitante y exige aprobación antes de tocar el dato maestro.

Ese patrón aplica igual en código, infraestructura y BI. Un agente puede abrir un pull request, pero no debería desplegar a producción sin pruebas. Puede sugerir cambios en un pipeline, pero no debería borrar históricos sin respaldo. Puede generar un indicador, pero no debería publicar un tablero sin validar la regla del negocio. La mano sigue siendo responsable.

El antecedente histórico más útil no viene de IA, sino de trading algorítmico. En 2013, la SEC sancionó a Knight Capital por fallas asociadas a un incidente de agosto de 2012. La investigación encontró salvaguardas insuficientes, controles inadecuados de despliegue y pruebas, y millones de órdenes erróneas enviadas al mercado. El sistema actuó rápido. Justamente por eso el daño también fue rápido.

La lección para agentes es directa. Cuando una automatización tiene acceso real a sistemas críticos, no basta preguntar si la lógica parece inteligente. Hay que saber qué pasa cuando falla, quién la detiene, qué límite tenía, qué bitácora deja y cómo se recupera el estado anterior.

Recursos recomendados

NIST AI Risk Management Framework Generative AI Profile tipo informe https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence
OWASP Agentic Skills Top 10 tipo guía de seguridad https://owasp.org/www-project-agentic-skills-top-10/
AWS Security for agentic AI tipo guía técnica https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-security/owasp-top-ten.html
OpenAI Agents SDK Guardrails tipo documentación técnica https://openai.github.io/openai-agents-python/guardrails/

En Colombia y LATAM el riesgo se vuelve muy concreto. Hay equipos pequeños con presión por reducir tiempos, automatizar reportes, acelerar migraciones o atender usuarios internos sin crecer nómina. El agente promete aliviar esa presión. Pero si se conecta a una base productiva con un usuario administrador compartido, si nadie registra sus llamadas, si no hay ambiente de pruebas y si las aprobaciones quedan por chat, el supuesto ahorro se convierte en deuda operativa.

En BI, el riesgo no siempre es borrar una tabla. A veces es más sutil. Un agente puede cambiar una métrica de ventas, mezclar estados contables, duplicar clientes o publicar una lectura ejecutiva sin explicar supuestos. En ingeniería de datos puede reintentar un pipeline hasta duplicar cargas. En cloud puede crear recursos sin etiquetas ni límites. En arquitectura puede integrar servicios sin revisar identidad, red o costo.

La capacidad humana que marca la diferencia no es escribir el prompt más bonito. Es saber diseñar límites. Un buen profesional pregunta qué acciones son reversibles, cuáles requieren aprobación, qué datos son sensibles, qué logs se guardan, qué pruebas bloquean el cambio, qué permisos mínimos necesita el agente y qué alerta indica que algo salió mal.

La evaluación de talento también cambia. Preguntar si alguien ha usado agentes dice poco. Mejor pedirle que diseñe uno para una tarea real y que explique permisos, pruebas, observabilidad, recuperación y responsabilidad. Observabilidad significa poder ver el comportamiento del sistema mediante registros, métricas y eventos. Si la persona solo habla de velocidad y no de control, todavía está pensando en demo, no en operación.

Los agentes ya empiezan a entrar a datos, desarrollo, soporte, seguridad, finanzas y operaciones. Evitarlos no es una estrategia técnica. Impedir que un flujo experimental actúe como autoridad sí lo es. En un quirófano bien gobernado, cada instrumento tiene propósito, esterilización, responsable y protocolo. En una plataforma bien gobernada, cada agente debe tener alcance, permisos, bitácora, pruebas y botón de parada.

Primero clasifica tus agentes por nivel de daño posible, luego separa propuesta de ejecución en los procesos críticos, después revisa los controles de NIST, OWASP y AWS para convertirlos en checklist técnico, por último practica con un caso pequeño donde el agente solo proponga cambios y una persona apruebe antes de tocar producción. El bisturí puede estar listo sobre la mesa. La firma sigue siendo humana.

¿Qué control mínimo exigirías antes de permitir que un agente toque datos, código o infraestructura productiva en tu empresa? 🧠

Versión en inglés

🩺 The Scalpel Doesn't Sign the Surgery

A team connects an agent to their repository, to the database, and to the executive dashboard. The demo takes minutes, corrects a field, opens a change, and leaves everyone with the feeling of having seen the future. One question is missing before applause.

Using an agent is not the same as governing it when it can touch code, data, APIs, infrastructure, reports, and business processes.

The hand still carries responsibility.

An agent stops being a passive assistant when it can execute steps, call tools, modify files, query databases, open requests, update dashboards, or trigger pipelines. At a Colombian company that can sound very attractive for accelerating support, analytics, migrations, or data governance. It can also become an operational risk if given unlimited permissions, traceability, or reversal plan.

NIST published in 2024 the generative AI profile for its AI Risk Management Framework. The document doesn't treat AI as a productivity trick, but as a system that must be designed, evaluated, governed, and monitored throughout its lifecycle. That perspective matters because an agent doesn't just produce text. It produces chained actions with cumulative effects.

OWASP brought the point to applied security. In its Top 10 for applications with LLM of 2025, excessive agency risk appears when a system receives too much functionality, too many permissions, or too much autonomy. The definition is simple and strong. Damage doesn't depend only on the model hallucinating. It can also come from an ambiguous instruction, a prompt injection, a compromised tool, or a chain of agents amplifying the error.

AWS, in its prescriptive guide for agentic AI security, maps that risk to concrete controls like limiting agent scope, threat modeling, applying adaptive authentication, restricting operations against sensitive systems, monitoring behavior, and having emergency shutdown for high-risk scenarios. The matter no longer fits in prompt engineering. It enters architecture, security, and operation.

Let's think about an operating room. A very precise scalpel doesn't decide where to cut, how much to cut, or when to stop. The surgeon answers for diagnosis, consent, sterilization, equipment, monitoring, bleeding, closure, and recovery. In technology, a powerful agent is that scalpel. It can accelerate an intervention, but it doesn't sign the surgery.

Governing agents means defining what it can do, on what systems, with what permissions, under what evidence, with what human oversight, and with what reversal mechanism. Permission is concrete authorization to execute an action. Traceability is being able to reconstruct what happened, who asked for it, what tool was invoked, and what changed. Lineage is knowing where data comes from and how it was transformed before reaching a report or model.

Without those foundations, the agent becomes a practitioner with access to the operating room. It can seem useful because it moves fast, but nobody serious confuses movement with control. The hand still carries responsibility.

The scene shows easily in data. A team wants an agent to automatically correct customer tables before loading an executive report. The rushed version gives it a function that writes straight on the database. The version with judgment separates reading, proposing, validating, and approving.

# Without governance
def corregir_cliente(db, cliente_id, nuevo_email):
    db.execute(
        "update clientes set email = ? where id = ?",
        [nuevo_email, cliente_id]
    )

# With governance
def proponer_correccion(db, usuario, cliente_id, nuevo_email):
    actual = db.query("select email from clientes where id = ?", [cliente_id])
    cambio = {
        "tabla": "clientes",
        "campo": "email",
        "antes": actual[0]["email"],
        "despues": nuevo_email,
        "solicitado_por": usuario,
        "requiere_aprobacion": True
    }
    db.insert("cambios_pendientes", cambio)
    return cambio

The first function seems efficient, but mixes recommendation and execution. If the agent misinterprets an instruction, receives contaminated data, or acts on the wrong customer, the error enters production directly. The second doesn't stop productivity. It channels it. The agent prepares an auditable proposal, leaves evidence of before and after, identifies the requester, and demands approval before touching master data.

That pattern applies equally to code, infrastructure, and BI. An agent can open a pull request, but shouldn't deploy to production without tests. It can suggest pipeline changes, but shouldn't delete history without backup. It can generate an indicator, but shouldn't publish a dashboard without validating business rule. The hand still carries responsibility.

The most useful historical precedent doesn't come from AI, but from algorithmic trading. In 2013, the SEC penalized Knight Capital for failures associated with an incident in August 2012. The investigation found insufficient safeguards, inadequate deployment controls and testing, and millions of erroneous orders sent to the market. The system acted fast. The damage was fast too.

The lesson for agents is direct. When automation has real access to critical systems, it's not enough to ask whether the logic seems intelligent. You have to know what happens when it fails, who stops it, what limit it had, what log it leaves, and how previous state is recovered.

Recommended Resources

NIST AI Risk Management Framework Generative AI Profile type report https://www.nist.gov/publications/artificial-intelligence-risk-management-framework-generative-artificial-intelligence
OWASP Agentic Skills Top 10 type security guide https://owasp.org/www-project-agentic-skills-top-10/
AWS Security for agentic AI type technical guide https://docs.aws.amazon.com/prescriptive-guidance/latest/agentic-ai-security/owasp-top-ten.html
OpenAI Agents SDK Guardrails type technical documentation https://openai.github.io/openai-agents-python/guardrails/

In Colombia and LATAM the risk becomes very concrete. There are small teams with pressure to reduce times, automate reports, accelerate migrations, or serve internal users without growing payroll. The agent promises to ease that pressure. But if connected to a production database with a shared admin user, if no one registers its calls, if there's no test environment, and if approvals happen through chat, the supposed savings becomes operational debt.

In BI, the risk isn't always deleting a table. Sometimes it's more subtle. An agent can change a sales metric, mix accounting states, duplicate customers, or publish an executive reading without explaining assumptions. In data engineering it can retry a pipeline until doubling loads. In cloud it can create resources without tags or limits. In architecture it can integrate services without reviewing identity, network, or cost.

The human capacity that makes the difference is not writing the prettiest prompt. It's knowing how to design limits. A good professional asks what actions are reversible, which require approval, what data is sensitive, what logs are kept, what tests block change, what minimum permissions the agent needs, and what alert indicates something went wrong.

Talent evaluation also changes. Asking if someone has used agents says little. Better to ask them to design one for a real task and explain permissions, testing, observability, recovery, and responsibility. Observability means being able to see system behavior through logs, metrics, and events. If the person only talks about speed and not control, they're still thinking in demo, not operation.

Agents are already starting to enter data, development, support, security, finance, and operations. Avoiding them is not a technical strategy. Preventing an experimental flow from acting as authority is. In a well-governed operating room, each instrument has purpose, sterilization, responsible party, and protocol. In a well-governed platform, each agent must have scope, permissions, log, testing, and stop button.

First classify your agents by level of possible damage, then separate proposal from execution in critical processes, then review NIST, OWASP, and AWS controls to convert them into a technical checklist, finally practice with a small case where the agent only proposes changes and a person approves before touching production. The scalpel can be ready on the table. The signature is still human.

What minimum control would you demand before allowing an agent to touch data, code, or productive infrastructure in your company? 🧠

Translation Complete

All 25 articles have been professionally translated from Spanish to English while maintaining:

Original emoji placements
Code block structure (with translated comments where present)
All URLs and external references intact
Technical terminology consistency
Narrative tone and metaphorical structure
Reflection questions properly translated at the end

The translations preserve the professional, conversational style of the originals while ensuring clarity and accuracy in English for both technical and business contexts.