What are AI Agents?

We all will have multiple AI agents working for us.

Oct 19, 2024

An AI Agent helping a business executive

AI agents are the current buzzword in the software world, and they’re quickly changing the way we all live and work. By now, we all have interacted with AI, either while using a chatbot, while searching, or even while performing some tasks. As a software developer and writer, large language models like ChatGPT and Claude have completely shaped the way I work. And now with AI agents, our lives are going to further change, and we will be interacting with AI in almost everything we do. AI agents are at the center of this seismic shift.

The term ‘AI agent’ is still new. There are many definitions, and people often differ on what can be called an AI agent. The term ‘AI agent’ will evolve as more work happens in this field. My goal with this article is to take a slightly broader view of what people mean when they say something is an AI agent. So, let’s get started.

AI Agent - Definition

An AI agent is an intelligent software system designed to complete a goal that requires multiple steps. These goals are tasks that could have been done by a human agent such as a customer support executive, a travel agent, an HR executive, an IT person, or anyone else. It could be a simple task or a complex task.

This sounds like any automated system, you may say. However, there are differences. You can ask an AI agent to do tasks it is designed for, and it does those for you. This goes well beyond just answering simple questions, and may involve understanding users’ requirements, figuring out what’s involved, making decisions, and interacting with multiple systems.

AI Agents vs. LLM Chatbots

While both AI agents and Large Language Model (LLM) chatbots are powered by advanced AI, they differ significantly in their approach and design. LLM chatbots, like the ones we commonly interact with, are designed for quick responses. They generate answers based on patterns in their training data, often providing the first plausible response that comes to "mind." This allows for fast, fluid conversations but can sometimes lead to inconsistencies or less thoughtful answers.

In contrast, AI agents are designed to work more methodically. They don't just respond; they plan, reason, and then act. An AI agent will take time to break down a task, consider various approaches, and devise a strategy before taking action. This slower, more deliberate process allows AI agents to handle complex, multi-step tasks that require careful planning and execution. While this approach may not be as instantaneous as a chatbot, it often results in more accurate, comprehensive, and tailored solutions to user requests.

AI Agents - Capabilities

There are three key capabilities of AI agents that differentiate them from any other automation software or a simple chatbot. An AI agent can 1) Reason, 2) Act, and 3) has Memory.

Let’s look at each of these.

1. Reason

The first is their ability to reason. An AI agent can understand a user’s request, think about it, break it into steps, and even reason about it. This is a crucial step. An AI agent has a large language model in charge which takes the user’s request and controls the agent’s workflow. This could be ChatGPT, or Claude, or any other specialized LLM. As we know, LLMs are good at these kinds of things, especially to understand the user's intent and to break a complex task into smaller, more actionable steps. And if programmed correctly, they can even reason with you or ask for more information. This is the core of how a problem will be solved. The model will be prompted to come up with a plan and to reason about each step in the process.

2. Act

The second is their ability to act. This is where agents shine and where the actual work happens. Based on the steps to solve a problem, the AI agent will now act upon each step and execute it. This is done using external programs which are referred to as tools in the AI industry. For example, if a step involves checking the weather in an area, the AI agent may fetch that information from a third-party weather API. This weather API is referred to as a tool.

An example of a tool is search, such as a program that helps search the web or a database for some information. Another example may be a calculator that will perform some math as required. Yet another could be a custom piece of code that may do something such as interacting with a transactional database, or perform some other operation. A tool could also be another language model or an API that does some specialized task such as translation. The model in charge of the AI agent will orchestrate these tools in terms of when to call them, how to call them, or even which tools to use and which ones not to while performing a certain task. There are endless possibilities for creating and using these tools.

3. Memory

The third capability is that they have memory. They use memory to remember past conversations from users and access this memory and learn from it while performing tasks, which makes the process even smoother. This allows AI agents to provide more personalized and context-aware responses over time.

Examples of AI Agents

AI Travel Agent

An AI travel planning agent could streamline your vacation planning by analyzing your calendar and travel history to suggest suitable dates and destinations. It would recommend flights, accommodations, and activities based on your preferences, make bookings, and add the itinerary to your calendar. The AI could also compile necessary documents, monitor for changes, and provide pre-trip information like weather forecasts and packing suggestions. While ambitious, this level of AI assistance in travel planning may soon be achievable.

AI Email Assistant

An AI email assistant could help a professional by drafting, proofreading, and organizing emails. It would analyze your writing style and past correspondence to compose messages for you, suggest responses to incoming emails, and even schedule follow-ups. The AI could prioritize your inbox, flag important messages, and summarize long email threads. It would also help maintain a consistent tone across your communications, offer language translations, and ensure emails are free from grammatical errors. Such an AI agent could significantly boost productivity and improve the quality of your communication.

Copilot vs Autonomous Agents

AI agents can operate autonomously or alongside other agents and human users. With varying degrees of autonomy, we have different labels for these agents.

Copilots are a type of AI agents that work alongside users rather than operating independently. Copilots provide suggestions and recommendations to assist users in completing tasks. An autonomous agent, on the other hand, would work more independently, making decisions and taking actions with minimal human intervention.

For example, an AI coding copilot could suggest code snippets, functions, or algorithms while the developer writes code. It could search the project's codebase, offer optimizations, and explain complex code. The AI might autocomplete lines, propose variable names, or generate comments. Developers can accept, modify, or reject these suggestions, enhancing productivity while maintaining control over their code.

An autonomous coding agent could take development a step further by independently generating entire code blocks or features from high-level requirements, analyzing architecture, and integrating modules seamlessly. It might proactively refactor code, handle edge cases, and follow best practices. Developers guide the agent by setting goals, reviewing outputs, and fine-tuning implementations, enhancing productivity while ensuring code quality.

Conclusion

In conclusion, AI agents are transforming how we interact with technology, shifting from simple tools to proactive partners. To make them truly effective, we'll likely need multiple AI models working together, each specializing in different tasks. As AI continues to evolve, these agents will become even more capable, taking on more responsibilities to help us in our daily lives.

Modern Software

Discussion about this post

Ready for more?