How to Build Your First AI Agent with Python — No API Key Required

If you’ve been following tech news this week, you’ve seen the word “agent” everywhere. Patronus AI just raised $50 million to stress-test AI agents. Notion killed its email app because users prefer letting agents handle their inbox. OpenAI’s GPT-5.6 preview is all about agentic capabilities. It’s starting to feel like every company suddenly discovered the same word at the same time.

Python code on a screen — *Image: Thraea19 via Wikimedia Commons (CC BY-SA 4.0)*

But here’s the thing — agents aren’t magic. At their core, they’re just programs that combine an LLM with the ability to do things: run code, search the web, read files, send emails. The model thinks, decides what action to take, and the program executes it. That’s it.

I’ve been tinkering with this stuff since the first LangChain tutorials dropped, and I’ll be honest — most of the “agent frameworks” out there are overkill for what you actually need. You don’t need a 200MB Python package to call an LLM in a loop. Sometimes the simplest approach is the right one.

So let’s build one. No API keys. No cloud dependencies. Just Python, an open-source model running on your machine, and about 50 lines of code.

Table of Contents

What You’re Building

A command-line agent that can answer questions, run Python code, and search the web — all locally. When you ask it “what’s 247 * 389 and who’s the current heavyweight champion,” it figures out it needs to calculate the math and look up the boxing info, does both, and gives you a single answer.

That’s the key difference between a chatbot and an agent: the agent decides what to do instead of just generating text.

Step 1: Install Ollama and Pull a Model

Ollama is the easiest way to run open-source LLMs on your own machine. It handles quantization, GPU acceleration, and model management without any configuration headaches.

# Install Ollama (Linux/WSL)
curl -fsSL https://ollama.com/install.sh | sh

# Or on macOS: brew install ollama
# Windows: download from ollama.com

# Pull a capable small model
ollama pull llama3.2:3b

The 3B parameter model runs fine on most laptops — even ones without a dedicated GPU. If you’ve got 8GB+ of VRAM, bump up to llama3.2 (no size suffix) for better reasoning. The 3B version is about 2GB and will handle the tool-calling examples without breaking a sweat.

Quick test to make sure it works:

ollama run llama3.2:3b "Hello! What is 2+2?"

You should see a response in under a second on most machines. Alright, the engine’s running. Time to give it hands.

Step 2: Define Your Tools

An agent is only as useful as the tools you give it. Let’s start with three: a calculator, a web search stub, and a file reader.

import subprocess
import webbrowser
import json

def calculate(expression: str) -> str:
    """Evaluate a math expression and return the result."""
    try:
        result = eval(expression, {"__builtins__": {}}, {})
        return str(result)
    except Exception as e:
        return f"Error: {e}"

def web_search(query: str) -> str:
    """Search the web for information. Returns top result summaries."""
    # Using DuckDuckGo's HTML version — no API key needed
    import urllib.request, urllib.parse
    url = f"https://html.duckduckgo.com/html/?q={urllib.parse.quote(query)}"
    req = urllib.request.Request(url, headers={"User-Agent": "Mozilla/5.0"})
    with urllib.request.urlopen(req, timeout=10) as resp:
        html = resp.read().decode()
    # Extract result snippets (simple regex approach)
    import re
    snippets = re.findall(r'class="result__snippet">(.*?)', html, re.DOTALL)
    results = []
    for s in snippets[:3]:
        clean = re.sub(r'<[^>]+>', '', s).strip()
        results.append(clean)
    return "\n".join(results) if results else "No results found."

def read_file(path: str) -> str:
    """Read the contents of a file."""
    try:
        with open(path, 'r') as f:
            return f.read()
    except Exception as e:
        return f"Error reading file: {e}"

# Registry — the agent knows about these
TOOLS = {
    "calculate": calculate,
    "web_search": web_search,
    "read_file": read_file,
}

A few things to note here. The calculate function uses eval with an empty builtins dict — this means no __import__, no file system access, just pure math. It’s sandboxed enough for personal use. The web search uses DuckDuckGo’s HTML endpoint which doesn’t require an API key and won’t rate-limit you into oblivion.

Each tool has a docstring. This matters — the agent uses these descriptions to figure out which tool to call. Write them clearly.

Step 3: The Agent Loop

This is where the magic happens. The pattern is dead simple: the model gets a prompt with the available tools, it responds with either a final answer or a tool call, and the loop continues until we get a final answer.

import requests
import json

OLLAMA_URL = "http://localhost:11434/api/chat"

def query_ollama(messages: list) -> str:
    """Send a chat completion request to Ollama."""
    payload = {
        "model": "llama3.2:3b",
        "messages": messages,
        "stream": False,
    }
    resp = requests.post(OLLAMA_URL, json=payload)
    return resp.json()["message"]["content"]

def run_agent(user_query: str, max_steps: int = 5):
    """Main agent loop — think, act, observe, repeat."""
    
    system_prompt = f"""You are a helpful AI assistant with access to tools.
    
When you need to use a tool, respond with a JSON object in this exact format:
{{"tool": "tool_name", "arguments": {{"arg_name": "value"}}}}

Available tools:
- calculate(expression: str): {TOOLS['calculate'].__doc__}
- web_search(query: str): {TOOLS['web_search'].__doc__}
- read_file(path: str): {TOOLS['read_file'].__doc__}

When you have the final answer, respond with:
{{"answer": "your complete response to the user"}}

Always think step by step. If you need information, use a tool."""
    
    messages = [
        {"role": "system", "content": system_prompt},
        {"role": "user", "content": user_query},
    ]
    
    for step in range(max_steps):
        response = query_ollama(messages)
        
        # Try to parse as JSON tool call or final answer
        try:
            parsed = json.loads(response)
        except json.JSONDecodeError:
            # Model didn't return JSON — treat as final answer
            return response
        
        if "answer" in parsed:
            return parsed["answer"]
        
        if "tool" in parsed:
            tool_name = parsed["tool"]
            args = parsed.get("arguments", {})
            
            if tool_name not in TOOLS:
                messages.append({"role": "assistant", "content": response})
                messages.append({"role": "user", 
                    "content": f"Unknown tool: {tool_name}"})
                continue
            
            print(f"[Agent] Calling {tool_name}({args})")
            result = TOOLS[tool_name](**args)
            
            messages.append({"role": "assistant", "content": response})
            messages.append({"role": "user", 
                "content": f"Tool result from {tool_name}: {result}"})
        else:
            return response
    
    return "Agent reached maximum steps without a final answer."

# ── Try it ──
if __name__ == "__main__":
    answer = run_agent("What is 247 * 389 plus the square root of 144?")
    print(f"\nFinal answer:\n{answer}")

Run this and you’ll see the agent think. It gets the query, realizes it needs to calculate something, calls the calculate tool, gets the result, and wraps it into a natural response. The whole thing happens in 2-3 seconds on a decent laptop.

The max_steps parameter acts as a safety net. Without it, a confused model could loop forever asking for tools that don’t exist. Five steps is plenty for most queries — if the agent hasn’t figured it out by then, it probably won’t.

Step 4: Give It Something Real to Do

Let’s test it with a question that requires multiple tools:

query = """I need three things:
1. Calculate how many seconds are in 3.5 days
2. Search for the latest Philippine boxing news
3. Read the file /etc/hostname and tell me what it says

Combine all the answers into one response."""

When you run this, the agent will make three separate tool calls — one for each subtask — collect the results, and synthesize them into a single coherent response. This is what separates agents from chatbots. A regular chatbot would either hallucinate the answers or just tell you it can’t do those things.

I ran this exact query on my machine (a modest ThinkPad with integrated graphics) and the whole thing completed in about 8 seconds. The model correctly called calculate("60 * 60 * 24 * 3.5"), then web_search("latest Philippine boxing news 2026"), and finally read_file("/etc/hostname"). Not bad for a 3B model running on a laptop.

What This Pattern Unlocks

Once you have this loop working, adding new tools is trivial. Need your agent to send emails? Add a send_email tool that calls your SMTP server. Want it to manage your calendar? Add a calendar_check function. The agent doesn’t need to be retrained — it just reads the tool descriptions and figures out how to use them.

(I wrote about agent loops and swarms, and this is essentially what Patronus AI’s $50 million testing platform validates at scale — and what Anthropic is doing with Claude Tag and what Notion is betting on when they say agents will handle your inbox. The difference is you just built the core pattern in 50 lines of Python, running entirely on your own hardware, with zero recurring costs.

I’ve been slowly replacing my collection of random shell scripts and cron jobs with a single agent that I can talk to in plain English. “Check if any of my servers are running low on disk space and email me if they are” is one query instead of three separate monitoring scripts. It’s not perfect — the 3B model occasionally confuses tool names or produces malformed JSON — but it’s right more often than it’s wrong, and the gap is closing with every model release.

A Quick Note on Privacy

Running everything locally means your data never leaves your machine. No API keys, no cloud processing, no “we may use your data to improve our models” fine print. For developers working with proprietary code or sensitive business logic, this is a big deal. The AI chatbots are not your friends — as I’ve written about before — but a local agent you control is a tool, not a confidant.

Where to Go From Here

Start small. Give your agent two or three tools that solve real problems for you. Let it handle the tedious stuff — checking logs, summarizing reports, searching documentation — while you focus on the work that actually needs a human brain. The prompt engineering is the hardest part; once you find a system prompt that produces reliable JSON, save it and reuse it — similar to how AI coding context files work for your projects.

And if you want to get fancy later, swap out the model. Mistral, Qwen, DeepSeek — Ollama supports dozens of them. Each has its own strengths. Some are better at reasoning, others at following strict formatting. Experiment. The agent loop stays the same; only the brain changes.

That’s the beauty of building it yourself. You’re not locked into anyone’s ecosystem. When a better model drops tomorrow, you just change one line and keep going. And if you’re curious about where local AI hardware is heading, the custom AI chip race is worth watching..