Authored by: Suresh Bansal, Technical Manager – Xoriant
The journey of Artificial Intelligence (AI) and Machine Learning (ML) has been transformative. It all began when we shifted from manual coding to training computers with data. In the early days, AI could only handle specific tasks like classification and object identification—functions for which they were explicitly trained.
But everything changed at the end of 2022 with the launch of ChatGPT by OpenAI. This groundbreaking tool could generate content and perform a wide range of tasks, quickly capturing the attention of millions worldwide. As noted in Gartner’s 2023 Hype Cycle for AI, Generative AI has reached the “peak of inflated expectations” and is expected to hit the “Plateau of Productivity” within the next 5 to 10 years.
Overcoming Challenges and Limitations
Reaching the Plateau of Productivity, according to Gartner, means that AI will become widely adopted, with its benefits well-defined and clear guidelines for implementation. To get there, we must first address the current limitations of AI technology and explore how agents can help overcome these challenges.
While today’s large language models (LLMs) excel at tasks like generating emails, writing essays, and conducting sentiment analysis, they still struggle with complex tasks, such as intricate math calculations or multi-step problem-solving. Additionally, LLMs have other notable limitations:
- Hallucinations or misleading outputs
- Technical constraints like limited context length and memory
- Bias in outputs
- Potential for toxic or harmful speech
- Limited knowledge (e.g., ChatGPT 3.5’s knowledge cutoff is September 2021)
Interestingly, these challenges are not so different from those we humans face. We, too, are prone to mistakes, bias, limited memory, and occasionally harmful responses. To manage these shortcomings, we typically:
- Seek information online and use tools like Excel and Word.
- Revise our work multiple times to correct errors and improve quality.
- Seek feedback from peers and mentors and incorporate their insights.
- Collaborate in teams to achieve better results.
By applying similar strategies, we can improve the outputs from LLMs, leading us to the concept of Generative AI Agents.
What are Generative AI Agents?
Generative AI Agents are designed to overcome many of the limitations of current LLMs by executing complex tasks that standalone models cannot handle. For example, if you want to identify the top three companies by revenue from a dataset, an agent would:
- Retrieve revenue data for all companies.
- Sort the companies by revenue.
- Return the top three companies.
To accomplish this, agents combine LLMs with key components such as planning, memory, and tools:
- Planning: The agent outlines and executes a plan using an LLM.
- Memory: The agent retains information while performing multiple steps, allowing it to process complex tasks.
- Tools: Agents use various tools to perform specific tasks, which are discussed in more detail below.
Key Features of Generative AI Agents
Generative AI agents are designed to:
- Plan and execute tasks
- Reflect on outcomes
- Use tools to achieve specified goals
- Operate with minimal human intervention
Examples of such agents include website builders, data analysts who provide insights from Excel sheets, and travel agents planning trips based on user inputs.
The Role of Tools in Generative AI Agents
Tools are critical for agents, enabling them to perform their tasks effectively. In the realm of generative AI, tools allow an LLM agent to interact with external environments and applications, such as internet searches, code interpreters, and math engines. These tools can access databases, knowledge bases, and external models.
For instance, a travel agent would need tools to search and book flights, as well as search the internet. Other tools could include:
- Entity Extraction: Extract specific information from unstructured documents.
- Chat DB: Retrieve information from a database without needing SQL knowledge.
- Knowledge Bot: Uses Retrieval-Augmented Generation (RAG) to answer questions based on a custom knowledge repository.
- Internet Search: Fetches content from search engines based on user queries.
- Summarization: Provides summaries of large documents tailored to specific personas.
- Program Execution: Executes Python code to solve specific problems.
- Wikipedia Search: Retrieves content from Wikipedia based on user queries.
- Comparison: Answers comparative questions, like performance metrics or product recommendations.
Agentic Design Patterns
To perform complex tasks, agents must orchestrate these tools effectively. Based on lectures by Andrew NG, several agentic design patterns have emerged:
- Reflection: The LLM evaluates its own work to improve it.
- Tool Use: The LLM utilizes tools like web searches or code execution to gather information and process data.
- Planning: The LLM devises a multi-step plan to achieve a goal and then executes it.
- Multi-Agent Collaboration: Multiple AI agents collaborate, dividing tasks and debating ideas to find better solutions.
While the first two patterns yield predictable outcomes, the latter two are still in the experimental phase.
The LLM Agent Framework
Building on the understanding of agents, tools, and design patterns, a variation of the planning pattern emerges. This framework involves defining a task or goal and then iteratively planning and executing the next action, followed by a feedback loop.
An LLM agent consists of core components:
- Brain/LLM: Acts as the coordinator.
- Memory (Vector DB): Stores intermediate steps and results.
- Short-term memory: Holds context information within the context window.
- Long-term memory: An external vector store providing relevant contextual information.
- Tools/Internet: Enable the agent to perform tasks like web searches or program execution.
- Policy: Ensures trust by design, preventing the processing of toxic inputs.
A Future with Intelligent Agents
The future of generative AI lies in the collaboration between intelligent agents and humans. Imagine a world where doctors, designers, and customer service representatives are supported by agents that enhance their capabilities. The possibilities are endless, from scientific discoveries to artistic creations.
For businesses, integrating generative AI agents into their operations offers a strategic advantage, unlocking new levels of efficiency, personalization, and problem-solving. These agents won’t replace human ingenuity; they’ll empower it, shaping a future rich with innovation and progress.
About Author:
Suresh Bansal is a Technical Manager at Xoriant with expertise in Generative AI and technologies such as Vector DB, LLM, Hugging Face, Llama Index, Lang Chain, Azure, and AWS. With experience in pre-sales and sales, he has exceled at creating compelling technical proposals and ensuring client success. Suresh has worked with clients from the US, UK, Japan, and Singapore, achieved advanced-level partnerships with AWS, and presented research recommendations to C-level leadership.