AutoML on Autopilot - Provide.ai

PyCaretAgent lets you run end-to-end machine learning pipelines with a single English sentence — no boilerplate, no babysitting.

Figure 1 — From a plain-English prompt to a fully tracked MLflow experiment, autonomously.

TL;DR

Wraps PyCaret’s AutoML engine in a Google ADK agent hierarchy
One natural language prompt → plan → code → execution → MLflow tracking
Self-corrects up to 10 times on failure; isolates artifacts per session
Covers Classification, Regression, Clustering, Anomaly Detection, Time Series

If you’ve used PyCaret, you know it already cuts ML boilerplate dramatically. PyCaretAgent goes further: a Root Agent reads your intent, a Planner designs the pipeline, and an Executor writes and runs the code — all without you touching a line of Python.

How It Works

Three layers. The Root Agent validates your CSV and routes to the right specialist. Each specialist is a SequentialAgent: a Planner designs the pipeline and mints a session ID; an Executor writes the code, runs it, and logs everything to MLflow.

Figure 2 — Root routes; each SequentialAgent runs Planner → Executor in strict order.

The Smart Bits

Session IDs via callback. The Planner outputs a free-text plan with a SESSION_ID: AB1X9Z token. A regex callback extracts it and drops it into shared session state — no structured output format needed.

10-retry self-correction. UnsafeLocalCodeExecutor(error_retry_attempts=10) automatically re-runs generated code on failure, letting the model diagnose and fix its own bugs.

Failure short-circuit. A before_model_callback checks a check_failure_status flag and skips re-runs if the task already succeeded — no wasted API calls.

Figure 3 — Every metric and param is auto-logged. Named classification_AB1X9Z for instant retrieval.

The agent doesn’t just run your ML pipeline — it tracks, isolates, and self-heals through every failure.

Run It

git clone https://github.com/Rishav1996/PyCaretAgent.git
cd PyCaretAgent && uv pip install .
uv run mlflow ui --port 5000
uv run adk run pycaretagent

Prompt: “Classify heart.csv where the target is ‘target’.” That’s the entire interface. The agent validates the file, plans, codes, executes, and delivers a tracked experiment.

Figure 4 — Real-time terminal output. Session ID, retry events, and success signal are all visible in the agent’s log stream.

What’s Next

This article is the first in a series. Each subsequent piece does a deep-dive into one task type, walking through a real dataset end-to-end — prompt, plan, generated code, and final MLflow results.

Figure 5 — Each article in the series covers one task type with a real dataset and annotated agent output.

Classification Deep-Dive (Coming Soon)

Heart disease prediction with heart.csv. We trace the full agent run — from CSV validation to compare_models() — and annotate every decision the Planner makes.

Regression Deep-Dive (Coming Soon)

House price prediction. How the Executor tunes via tune_model(), and why the 10-retry mechanism matters when XGBoost hits a dependency mismatch mid-run.

Clustering Deep-Dive (Coming Soon)

Customer segmentation without a target column. Watch the Root Agent skip target validation entirely and route straight to the unsupervised pipeline.

Anomaly Detection Deep-Dive (Coming Soon)

Fraud detection on a transactions dataset. The Planner picks Isolation Forest; we break down why, and show how anomaly scores surface as MLflow metrics.

Time Series Deep-Dive (Coming Soon)

Sales forecasting with seasonality detection. The most complex setup — index parsing, horizon selection, and MASE vs. MAPE in the MLflow comparison table.

Future: Deploy Directly to Cloud

The current version trains, tracks, and saves models locally. The next major milestone closes the loop — pushing finalized models to cloud storage and inference endpoints using PyCaret’s built-in deploy_model(), triggered directly by the agent with no manual steps.

The target UX is a single extra sentence in the user prompt: “Classify heart.csv, target=’target’, deploy to AWS.” The Root Agent will parse the platform, pass it as a session state variable, and the Executor will append a deploy_model() call after finalize_model() — credentials injected from environment variables. A dedicated article in this series will cover the full credential handoff pattern and multi-cloud configuration.

PyCaretAgent is a clean, reusable template for any agent-wrapped AutoML system. The Planner/Executor pattern, state handoff via callbacks, and retry-based self-correction all generalize well beyond PyCaret.

Github Link : https://github.com/Rishav1996/PyCaretAgent

AutoML on Autopilot was originally published in Towards AI on Medium, where people are continuing the conversation by highlighting and responding to this story.