El que conecta la API no es el que construyó el modelo

🧰 El que conecta la API no es el que construyó el modelo

En foros de tecnología colombianos y en LinkedIn regional es frecuente encontrar perfiles que se presentan como "expertos en inteligencia artificial" con uno o dos años de experiencia usando herramientas de automatización como n8n, Make o conectores hacia OpenAI. La experiencia es real y útil para lo que hace. Pero llamarla con el mismo nombre que describe décadas de trabajo en machine learning, redes neuronales y sistemas probabilísticos no es un matiz menor.

Tener la herramienta no es tener el oficio.

La inteligencia artificial es un campo con más de setenta años de historia documentada. El término fue acuñado en 1956 en la conferencia de Dartmouth, y desde entonces el campo creció por capas que no se reemplazaron entre sí. La primera generación de soluciones con tracción real fue la de los sistemas expertos en los años setenta y ochenta, programas que codificaban el conocimiento de un especialista en reglas lógicas. MYCIN, desarrollado en Stanford en 1972, asistía en el diagnóstico de infecciones bacterianas y demostró que una máquina podía realizar razonamiento estructurado antes de que existiera ninguna interfaz de chat. Después llegó el machine learning, ML, disciplina en la que los modelos aprenden patrones desde datos sin que nadie les programe explícitamente cada regla. Dentro del ML surgió el deep learning, DL, basado en redes neuronales con múltiples capas que aprenden representaciones de datos no estructurados como imágenes o texto. La IA generativa, que hoy copa los titulares, es una subcategoría del DL que produce nuevos objetos que se parecen a los datos de entrenamiento. Un modelo que clasifica si un correo es fraude es ML. Un modelo que escribe el correo es IA generativa. Son herramientas distintas para problemas distintos, y confundirlas no es un error inocuo.

El estudiante que sabe buscar la respuesta al final del libro responde bien cuando el ejercicio está ahí. El que entiende por qué esa respuesta es correcta puede resolver el que no aparece en ninguna página. Las empresas contratan el segundo tipo aunque el primero sea más abundante.

El ML supervisado, no supervisado y por refuerzo sigue siendo el tipo de IA más frecuente en producción dentro de los sectores colombianos que trabajan con datos desde hace más de cinco años. Detección de fraude en bancos, modelos de churn en telecomunicaciones, predicción de demanda en retail, clasificación de defectos en manufactura. Esos modelos no generan texto; predicen, clasifican o agrupan. Requieren saber leer métricas de evaluación como precisión, recall y AUC-ROC, gestionar el desbalanceo de clases, validar contra sobreajuste y monitorear el drift, que es la degradación que sufre un modelo cuando los datos del mundo real empiezan a diferir de los datos con que fue entrenado. Un practicante que solo conoce herramientas de IA generativa no sabe leer una matriz de confusión, no puede evaluar si un modelo con noventa y cinco por ciento de exactitud es en realidad inservible para detectar fraude en una clase con uno por ciento de positivos, y no puede decidir si el sistema existente en la empresa necesita reemplazarse o simplemente recalibrarse. Tener la herramienta no es tener el oficio.

El siguiente ejemplo muestra dos enfoques para clasificar tickets de soporte técnico de una empresa de servicios. El primero es representativo de alguien que llega al problema con experiencia únicamente en herramientas de IA generativa. El segundo refleja el criterio de alguien con base en ML.

# Sin base técnica: LLM para clasificar tickets
import openai

def clasificar(texto):
    r = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user",
                   "content": f"Clasifica en facturacion, tecnico o comercial.\n{texto}"}]
    )
    return r.choices[0].message.content

# Con base técnica: modelo entrenado sobre históricos
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

modelo = Pipeline([
    ("vec", TfidfVectorizer(max_features=5000)),
    ("clf", LogisticRegression(class_weight="balanced"))
])
modelo.fit(textos_historicos, etiquetas)
pred = modelo.predict([nuevo_ticket])
prob = modelo.predict_proba([nuevo_ticket])

La primera versión funciona en demos y volúmenes bajos, pero en producción con miles de tickets diarios genera costo por cada llamada al API, latencia impredecible y salidas inconsistentes cuando cambia la versión del modelo del proveedor. La segunda entrena una vez sobre los datos reales de la empresa, clasifica en milisegundos, cuesta centavos en infraestructura y expone la probabilidad de cada predicción para auditar errores. La decisión entre ambas no depende de cuál sea más moderna sino de cuál resuelve el problema con el menor riesgo y costo. Saber eso requiere conocer el mapa completo del campo, no solo el capítulo que más circula.

En los años ochenta ocurrió algo parecido con los sistemas expertos. Cientos de consultores construyeron soluciones de "IA" codificando reglas sin entender el razonamiento probabilístico que sostenía el campo. Cuando las limitaciones aparecieron y llegó el primer invierno de la IA, esos consultores desaparecieron; los investigadores que entendían los fundamentos construyeron el ciclo siguiente. Hay más de un verano en la historia de la IA, y tener la herramienta no es tener el oficio.

Recursos recomendados

IBM Think - AI vs. Machine Learning vs. Deep Learning vs. Neural Networks (explicador) https://www.ibm.com/think/topics/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks
MIT News - Explained - Generative AI (artículo, noviembre 2023) https://news.mit.edu/2023/explained-generative-ai-1109
Stanford HAI - AI Index Report 2025 (informe anual) https://hai.stanford.edu/research/ai-index-report
fast.ai - Practical Deep Learning for Coders (curso gratuito) https://course.fast.ai

YouTube — Rethinking API Architecture for the AI Era

El MIT News documentó en noviembre de 2023 que, antes del auge de la IA generativa, cuando la gente hablaba de IA normalmente se refería a modelos de ML que aprendían a hacer predicciones sobre datos. El artículo, producido por investigadores del CSAIL del MIT, señala que a pesar del ruido que generó el lanzamiento de ChatGPT, la tecnología subyacente no es nueva y los avances computacionales que la sostienen llevan más de cincuenta años construyéndose. La distinción técnica importa porque define qué tipo de formación se necesita para diseñar, evaluar y mantener cada sistema.

El AI Index de Stanford HAI, que desde 2018 rastrea el estado global del campo, documenta cada año que el inventario de tipos de IA en producción es mucho más amplio que los modelos de lenguaje de gran escala. Los sistemas de ML clásico, visión por computador, procesamiento de lenguaje natural pre-LLM y modelos de series de tiempo siguen operando en la mayoría de organizaciones que adoptaron IA antes de 2022. Llegar al libro abierto en la última página no garantiza haber leído los capítulos anteriores.

Primero identifica qué tipo de problema tiene tu empresa, si es de predicción, clasificación, agrupación o generación, luego pregunta si los perfiles que se presentan como expertos en IA pueden explicar cuándo no usarían IA generativa, después revisa el explicador de IBM y el artículo del MIT referenciados abajo para construir un criterio básico de evaluación, por último usa el curso gratuito de fast.ai para contrastar lo que sabes con lo que el campo lleva décadas enseñando.

¿En tu empresa hay alguien que pueda decidir cuándo un problema necesita ML clásico y cuándo necesita IA generativa, o se usa GenAI para todo porque es lo que más se oye? 🧰

Versión en inglés

🧰 The One Who Connects the API Is Not the One Who Built the Model

In Colombian technology forums and in regional LinkedIn it's frequent to find profiles presenting themselves as "artificial intelligence experts" with one or two years of experience using automation tools like n8n, Make, or connectors toward OpenAI. The experience is real and useful for what they do. But calling it by the same name that describes decades of work in machine learning, neural networks, and probabilistic systems is not a minor detail.

Having the tool is not having the craft.

Artificial intelligence is a field with more than seventy years of documented history. The term was coined in 1956 at the Dartmouth conference, and since then the field grew by layers that didn't replace each other. The first generation of solutions with real traction was expert systems in the seventies and eighties, programs encoding a specialist's knowledge into logical rules. MYCIN, developed at Stanford in 1972, assisted in diagnosing bacterial infections and showed a machine could perform structured reasoning before any chat interface existed. Then came machine learning, ML, a discipline where models learn patterns from data without anyone explicitly programming each rule. Within ML emerged deep learning, DL, based on neural networks with multiple layers that learn representations of unstructured data like images or text. Generative AI, which dominates headlines today, is a subcategory of DL that produces new objects resembling training data. A model classifying whether an email is fraud is ML. A model writing the email is generative AI. They're different tools for different problems, and confusing them is not an innocent error.

The student who knows how to find the answer at the end of the book answers well when the exercise is there. The one who understands why that answer is correct can solve the one that doesn't appear on any page. Companies hire the second type even if the first is more abundant.

Supervised, unsupervised, and reinforcement ML is still the most frequent type of AI in production within Colombian sectors that have worked with data for more than five years. Fraud detection in banks, churn models in telecommunications, demand prediction in retail, defect classification in manufacturing. Those models don't generate text; they predict, classify, or group. They require knowing how to read evaluation metrics like precision, recall, and AUC-ROC, managing class imbalance, validating against overfitting, and monitoring drift, which is the degradation a model suffers when real-world data starts differing from training data. A practitioner who only knows generative AI tools can't read a confusion matrix, can't evaluate whether a model with ninety-five percent accuracy is actually useless for detecting fraud in a class with one percent positives, and can't decide whether the existing system needs replacing or just recalibrating. Having the tool is not having the craft.

The following example shows two approaches for classifying support tickets at a services company. The first is representative of someone arriving at the problem with experience only in generative AI tools. The second reflects the judgment of someone with ML foundation.

# Without foundation: LLM to classify tickets
import openai

def clasificar(texto):
    r = openai.chat.completions.create(
        model="gpt-4",
        messages=[{"role": "user",
                   "content": f"Classify into billing, technical, or commercial.\n{texto}"}]
    )
    return r.choices[0].message.content

# With foundation: model trained on historical data
from sklearn.pipeline import Pipeline
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression

modelo = Pipeline([
    ("vec", TfidfVectorizer(max_features=5000)),
    ("clf", LogisticRegression(class_weight="balanced"))
])
modelo.fit(textos_historicos, etiquetas)
pred = modelo.predict([nuevo_ticket])
prob = modelo.predict_proba([nuevo_ticket])

The first version works in demos and low volumes, but in production with thousands of daily tickets it generates cost for each API call, unpredictable latency, and inconsistent outputs when the provider's model version changes. The second trains once on the company's real data, classifies in milliseconds, costs cents in infrastructure, and exposes the probability of each prediction to audit errors. The decision between both doesn't depend on which is more modern but on which solves the problem with least risk and cost. Knowing that requires understanding the full map of the field, not just the chapter that circulates most.

In the eighties something similar happened with expert systems. Hundreds of consultants built "AI" solutions encoding rules without understanding the probabilistic reasoning the field was grounded in. When limitations appeared and the first AI winter came, those consultants disappeared; researchers understanding the fundamentals built the next cycle. There's more than one summer in AI history, and having the tool is not having the craft.

Recommended Resources

IBM Think - AI vs. Machine Learning vs. Deep Learning vs. Neural Networks (explainer) https://www.ibm.com/think/topics/ai-vs-machine-learning-vs-deep-learning-vs-neural-networks
MIT News - Explained - Generative AI (article, November 2023) https://news.mit.edu/2023/explained-generative-ai-1109
Stanford HAI - AI Index Report 2025 (annual report) https://hai.stanford.edu/research/ai-index-report
fast.ai - Practical Deep Learning for Coders (free course) https://course.fast.ai

[YouTube embed: Rethinking API Architecture for the AI Era]

MIT News documented in November 2023 that before the generative AI boom, when people talked about AI they usually meant ML models that learned to make predictions on data. The article, produced by researchers from MIT's CSAIL, notes that despite the noise generated by ChatGPT's launch, the underlying technology is not new and the computational advances that sustain it have been built for more than fifty years. The technical distinction matters because it defines what kind of training is needed to design, evaluate, and maintain each system.

Stanford HAI's AI Index, which since 2018 has tracked the global state of the field, documents each year that the inventory of AI types in production is much broader than large-scale language models. Classical ML systems, computer vision, pre-LLM natural language processing, and time series models continue operating in most organizations that adopted AI before 2022. Getting to the open book on the last page doesn't guarantee having read the previous chapters.

First identify what type of problem your company has, whether prediction, classification, grouping, or generation, then ask whether profiles presenting themselves as AI experts can explain when they wouldn't use generative AI, then review IBM's explainer and MIT's article referenced below to build basic evaluation criteria, finally use fast.ai's free course to contrast what you know with what the field has been teaching for decades.

In your company is there someone who can decide when a problem needs classical ML and when it needs generative AI, or is GenAI used for everything because it's what's most heard? 🧰