Multi-Agent LLM Systems: Architecture, Communication, and Coordination
Large language models (LLMs) have revolutionized autonomous agents by equipping them with advanced reasoning, planning, and communication capabilities. While single-agent systems have demonstrated impressive power, multi-agent LLM systems represent the next frontier—harnessing the strengths of multiple specialized agents to address complex tasks that exceed the capabilities of any individual agent.
These systems distribute work among “expert” agents, each possessing distinct roles and domain knowledge, enabling natural language collaboration on challenging problems such as problem-solving, simulation, and strategic planning. Unlocking emergent collective intelligence hinges on thoughtful team design, encompassing agent specialization, robust communication protocols, and effective memory-sharing mechanisms.
Architectural Patterns: From Flat Networks to Complex Hierarchies
Multi-agent LLM architectures can be categorized along several key dimensions, each suited to different types of tasks and collaboration needs. Below we outline five patterns (flat, hierarchical, team-based, central coordinator, hybrid) with their agent roles, message flows, and key functions (plan_task()
, execute_task()
, evaluate_output()
, etc.).
Flat (Peer-to-Peer) Architecture
In a flat or “network” architecture, all agents are peers. Any agent can call or message any other agent. There is no central boss. Typically agents are specialized (e.g. “Analyzer”, “Retriever”, “Responder”) but collaborate directly. This allows flexible, many-to-many communication (emergent coordination) but requires careful protocol design.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
# Example roles (all equal peers)
agents = [AnalyzerAgent(), RetrieverAgent(), ResponderAgent()]
# Shared message bus or direct messaging
message_bus = MessageBus()
# plan_task: any agent may initiate planning or take on parts of task
def plan_task(task):
# e.g., choose an expert by consensus or skill match
for agent in agents:
if agent.can_handle(task):
agent.receive(task)
return
# Agents execute tasks and can call others as needed
for agent in agents:
if agent.has_task():
subtask = agent.plan_task() # e.g., break task or reformulate
result = agent.execute_task(subtask)
agent.publish(result) # share output on bus
# Each agent can see others' outputs and optionally pass work on
for agent in agents:
while not all_tasks_complete():
msg = message_bus.retrieve_any()
if msg:
agent.receive(msg) # receive peer's update
agent.evaluate_output(msg) # possibly refine or combine
Key Characteristics:
- Roles: All agents are domain experts or generalists (no supervisor).
- Communication: Any agent can send messages or tasks to any other (many-to-many). A shared bus or direct message passing implements this.
- Functions: Agents
plan_task()
for subtasks,execute_task()
, thenevaluate_output()
by sharing results. They may forward subtasks or insights to peers as needed, reflecting a collaborative, leaderless workflow. - Use case: This democratic approach enables flexible interactions and works well for small-scale collaborative tasks where all agents contribute equally to the solution.
Hierarchical Design
In a hierarchical architecture, agents are organized in levels (a tree). A top-level Supervisor (or manager) delegates to mid-level managers, who in turn assign to Worker/Specialist agents. This mirrors a command-and-control structure: tasks flow down the hierarchy, results flow up. It scales to complex projects by subdividing authority⁴.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
# Top-level supervisor agent
class Supervisor:
def plan_task(self, task):
subtasks = self.decompose(task) # break into parts
for subtask in subtasks:
# assign to appropriate mid-level manager or specialist
specialist = select_specialist_for(subtask)
specialist.receive_task(subtask)
def evaluate_output(self, outputs):
# integrate results from all sub-agents
final_answer = self.combine(outputs)
return final_answer
# Mid-level manager (optional) or direct specialists
class Manager:
def __init__(self, team):
self.team = team # list of Worker agents
def execute(self, subtask):
worker = select_worker_for(subtask)
worker.receive_task(subtask)
result = worker.execute_task(subtask)
return result
class Specialist:
def receive_task(self, task):
self.current_task = task
def execute_task(self, task):
# domain-specific processing
return perform_domain_logic(task)
# Setup hierarchy
top_supervisor = Supervisor()
team_lead = Manager(team=[WorkerA(), WorkerB()])
workers = [WorkerA(), WorkerB()]
# Example workflow
main_task = "Analyze dataset and report insights"
top_supervisor.plan_task(main_task) # decompose and delegate downwards
# Workers do their tasks and return outputs up through managers
outputs = []
for worker in workers:
if worker.has_task():
output = worker.execute_task(worker.current_task)
outputs.append(output)
final_result = top_supervisor.evaluate_output(outputs)
Key Characteristics:
- Roles: Supervisor (at top), optional mid-level Manager agents, and bottom-tier Specialist or Worker agents.
- Communication: Supervisor sends tasks downward (e.g.
assign(subtask)
calls), workers reply upward with results. Hierarchy may be two or more levels. - Functions: The Supervisor’s
plan_task()
breaks down tasks and assigns to managers or specialists. Specialists runexecute_task()
. The Supervisor (or managers) useevaluate_output()
to collect and integrate sub-results into a final answer, ensuring clear control flow. - Use case: These structures excel when tasks naturally decompose into layers—for example, distinguishing between high-level strategic planners and low-level execution specialists.
Team-Based (Society) Architecture
A team-based or “society” architecture groups agents into teams, often with a team lead. Each team has a Supervisor (Team Lead) and multiple Specialist Agents, plus shared state or memory. This mirrors a collaborative society of minds. The Supervisor coordinates team activities and routes subtasks, while specialists focus on domain tasks. A common design is to include a State Management System to maintain context across the team.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# Shared state/context for the team
class TeamState:
def __init__(self):
self.history = []
self.next_agent = None
# Define team members
class TeamSupervisor:
def __init__(self, team_members, state: TeamState):
self.team_members = team_members
self.state = state
def plan_task(self, task):
self.state.history.append(("Supervisor receives", task))
# Decide which specialist handles which part
for member in self.team_members:
if member.specialty_matches(task):
member.receive(task, self.state)
def evaluate_output(self):
# Merge team outputs
combined = combine([member.output for member in self.team_members if member.output])
return combined
class SpecialistAgent:
def __init__(self, domain):
self.domain = domain
self.output = None
def receive(self, task, state: TeamState):
self.current_task = task
state.history.append((self.domain + " starts", task))
def execute_task(self, state: TeamState):
# Perform specialized work using state if needed
self.output = f"Result({self.domain})"
state.history.append((self.domain + " done", self.output))
# Setup a team
state = TeamState()
team = [SpecialistAgent(domain="Analysis"),
SpecialistAgent(domain="Verification")]
team_lead = TeamSupervisor(team_members=team, state=state)
# Workflow
user_task = "Complex project request"
team_lead.plan_task(user_task) # supervisor routes to specialists
for member in team:
member.execute_task(state) # specialists work, updating shared state
final_output = team_lead.evaluate_output()
Key Characteristics:
- Roles: TeamSupervisor (lead) and multiple SpecialistAgents. A TeamState object tracks conversation history and shared data.
- Communication: Supervisor broadcasts or assigns subtasks to specialists; specialists may read/write shared state. Feedback flows back via state or messages.
- Functions: The
TeamSupervisor.plan_task()
decomposes work and dispatches to specialists. Each SpecialistAgent performsexecute_task()
, updating the shared TeamState. Finally, the supervisor’sevaluate_output()
merges specialist outputs, leveraging the team’s joint “society” knowledge. - Use case: These structures organize agents into specialized functional groups, similar to departments in an organization. You might have coding agents, reasoning agents, and fact-checkers, each bringing their expertise to bear on different aspects of a problem.
Central Coordinator (Group Chat Model)
In this model, one central agent (or the user/human) acts as a Coordinator in a group-chat style environment. All specialists share a common “chat room” or forum. The coordinator posts tasks or queries, specialists post responses, and the coordinator picks or synthesizes the best results. This is essentially a supervisor pattern implemented as a mediated conversation (every participant sees relevant messages).
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# Roles: Coordinator and several Agent participants
coordinator = CoordinatorAgent()
agents = [SpecialistAgent("TopicA"), SpecialistAgent("TopicB"),
SpecialistAgent("TopicC")]
chat_room = ChatRoom()
# Coordinator posts initial task to group chat
task = "Plan marketing strategy"
chat_room.post(sender=coordinator, content=task)
# Each agent listens to chat and replies when relevant
for agent in agents:
if agent.relevant(task):
response = agent.execute_task(task) # each agent crafts a response
chat_room.post(sender=agent, content=response)
# Coordinator collects all messages and evaluates
responses = chat_room.get_all_messages()
final_plan = coordinator.evaluate_output(responses) # e.g. pick best or synthesize
Key Characteristics:
- Roles: A single CoordinatorAgent and multiple SpecialistAgents (possibly generic agents). The chat room acts as shared context.
- Communication: All messages go through the ChatRoom. The coordinator and all agents see the conversation (group chat). The coordinator gives directions or asks questions; each agent can respond to any message (horizontal communication via chat). This model is akin to a supervised group discussion.
- Functions: The coordinator uses
plan_task()
by posting a task. Each agent’sexecute_task()
generates an answer to the group. The coordinator’sevaluate_output()
reviews all agent messages and decides on a final answer. This yields traceable dialog and collective problem-solving.
Hybrid (Hierarchy + Peer-to-Peer) – MetaGPT-Style
Hybrid architectures combine hierarchical control with peer-to-peer flexibility. For example, a MetaGPT-style framework may have high-level managers (e.g. Product Manager, Architect) who delegate to specialized agents (Engineers), while specialists can also consult one another (peer links) when needed. This blends clear chains of command with ad-hoc collaboration.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# Define roles
class Manager:
def plan_task(self, task):
subtasks = self.decompose(task)
for sub in subtasks:
team = select_team_for(sub)
team_lead = team["lead"]
team_lead.receive_task(sub)
class TeamLead:
def __init__(self, members):
self.members = members
def receive_task(self, task):
# delegate within team but allow members to self-organize
for member in self.members:
if member.can_handle(task):
member.current_task = task
def gather_results(self):
results = [m.execute_task() for m in self.members if m.current_task]
return combine(results)
class Agent:
def can_handle(self, task): ...
def execute_task(self):
return f"Output of {self}"
# Example setup: managers with cross-team links
ceo = Manager()
team_alpha = TeamLead(members=[Agent(), Agent()])
team_beta = TeamLead(members=[Agent(), Agent()])
teams = {"alpha": {"lead": team_alpha}, "beta": {"lead": team_beta}}
# Workflow: CEO assigns to team leads
project = "Build new feature"
ceo.plan_task(project) # breaks task, chooses teams
# Teams might also interact peer-to-peer
if need_cross_team_info:
team_alpha.members[0].message(team_beta.members[1], "Quick question")
# Team leads gather and coordinate
alpha_result = team_alpha.gather_results()
beta_result = team_beta.gather_results()
final_product = ceo.evaluate_output([alpha_result, beta_result])
Key Characteristics:
- Roles: Multiple levels: e.g. Manager/Supervisor, TeamLeads, and Agents (specialists). Some agents may also act at peer-level.
- Communication: Primary flows are hierarchical (Manager→TeamLead→Agent), but agents or teams may also chat with or handoff to other agents directly (peer links), or share via a common platform.
- Functions: The
Manager.plan_task()
splits work among teams. Within a team, the TeamLead assigns tasks but allows team members to coordinate. Agents can callexecute_task()
independently. The manager’sevaluate_output()
assembles results. This hybrid model enables structured oversight with cross-agent collaboration, a strength noted in Multi-Agent literature.
Each pseudocode snippet above is a high-level template. In practice, agents would use LLM calls and data stores within these functions, but the patterns of delegation, messaging, and evaluation reflect the cited architectural concepts. These designs align with known multi-agent LLM architectures: peer-to-peer “network” models, hierarchical supervisors, team-based organization, central-coordinator group-chat models, and hybrid mixes. The pseudocode follows the collaborative delegation and communication patterns described in recent literature (refer to “Resources” section).
Communication and Coordination Protocols
The heart of any multi-agent system lies in how agents exchange information and coordinate their efforts. Several communication paradigms have emerged, each with distinct advantages.
Message Passing enables direct point-to-point conversations between agents, similar to email or instant messaging. Blackboard Systems create shared memory spaces where agents can post findings for others to discover. Speech-Act Protocols structure communication using defined “utterance” types—agents might label their messages as questions, proposals, votes, or critiques.
Communication can be synchronous (agents take turns speaking) or asynchronous (agents contribute when ready). Systems often implement scheduling mechanisms like round-robin or priority-based selection to manage turn-taking. AutoGen’s Group Chat pattern exemplifies this approach: a GroupChatManager selects the next speaker using strategies ranging from simple round-robin to sophisticated LLM-based selection, then broadcasts messages to all participants.
Nested Chats allow agents to spawn internal sub-conversations to handle specific subtasks before returning results to the main group. This creates useful “information silos” where complex intermediate reasoning doesn’t clutter the primary discussion.
Hierarchical Dialogues use higher-level agents for problem decomposition and lower-level agents for detailed implementation. The overall team might follow structured conversation patterns—sequential pipelines where problems pass from agent to agent, or iterative broadcast loops where each agent refines a shared solution.
Protocol design must balance rich communication with computational efficiency. AgentVerse distinguishes between horizontal communication structures (all agents collaboratively share and refine ideas) and vertical ones (a “solver” agent proposes solutions while others provide iterative critique).
Horizontal approaches work well for broad brainstorming or tool selection, while vertical patterns suit tasks requiring a single definitive answer, such as mathematical problem-solving. Advanced protocols may include credit allocation systems to reward valuable contributions and experience management to enable learning from past collaborations.
Modular Design: Roles, Specialization, and Planning
Effective multi-agent systems decompose complex tasks through thoughtful role assignment and modular design. Each agent typically receives a specialized function—”Planner,” “Code Generator,” “Environment Simulator,” or “Database Agent”—allowing them to use prompts and tools optimized for their specific responsibilities.
Role assignment can be static (predefined agent types) or dynamic. AgentVerse implements an innovative “expert recruitment” stage where a recruiter agent dynamically selects appropriate expert agents based on the current task requirements. This automated role assignment mimics human collaboration patterns, using prompt-based generation to create expert descriptions and periodic reevaluation to adjust team composition.
Chain-of-Agents addresses the challenge of processing long inputs by distributing work across multiple agents. Each worker agent analyzes a different segment of text and proposes partial results, which a manager agent then aggregates into a comprehensive final answer. This pipeline approach overcomes individual model context limitations:
1
2
3
4
5
6
7
8
9
# Pseudocode for Chain-of-Agents pipeline
chunks = split_long_input(document)
worker_summaries = []
for chunk in chunks:
summary = worker_agent.chat(f"Analyze this segment: {chunk}")
worker_summaries.append(summary)
final_answer = manager_agent.chat(
"Combine the following insights: " + join(worker_summaries)
)
Sequential Brainstorming (agents A→B→C passing ongoing discussions) can generate richer, more nuanced solutions than parallel processing. AutoGen’s “StateFlow” enables developers to codify structured workflows—for example, requiring agents to plan before coding, then review their implementations.
Multi-agent systems must balance autonomy with human oversight. AutoGen agents can be configured for full automation or human-in-the-loop operation, where agents require explicit user approval for critical actions. This flexibility allows systems to blend human expertise with agent efficiency, particularly valuable for high-stakes decisions.
Communication Patterns: Voting, Debate, and Coordination
Group decision-making in multi-agent systems has drawn inspiration from social choice theory and structured debate formats. While many systems rely on simple majority voting or opinion averaging, recent research explores more sophisticated consensus mechanisms that can significantly improve both accuracy and robustness of collective decisions.
Zhao et al. (EMNLP 2024) conducted a comprehensive survey of 52 recent LLM-based multi-agent frameworks and uncovered a severe lack of diversity, with heavy reliance on dictatorial and plurality voting for collective decision-making. Their analysis revealed that most systems use basic voting mechanisms that can limit solution diversity and fail to leverage the full potential of multi-agent collaboration. To address these limitations, they introduced GEDI (Group Electoral Decision-making Integration, an electoral module that incorporates various ordinal preferential voting mechanisms using rank-based ballots. Their empirical evaluation across three benchmarks demonstrated that certain collective decision-making methods can markedly improve the reasoning capabilities and robustness of leading LLMs, with some mechanisms generating positive synergies even with as few as three agents.
Multi-round Debate formats represent one of the most promising approaches for improving factual accuracy and reducing hallucinations in LLM outputs. Du et al. (2023) demonstrated that having multiple language model instances propose and debate their individual responses over multiple rounds significantly enhances mathematical and strategic reasoning while improving the factual validity of generated content. Recent work has shown that adversarial debate mechanisms can systematically reduce hallucinations across debate rounds, with agents presenting evidence for their solutions before aggregating final answers. This iterative refinement process mirrors classical methodological approaches like the Delphi method used in expert consensus building and jury deliberation processes, where repeated rounds of discussion and reflection lead to more robust collective judgments.
The effectiveness of debate-based systems has been empirically validated across multiple domains. Cohen et al. (2023) found that cross-examination between ChatGPT instances leads to significant improvements over Chain-of-Thought prompting on factuality benchmarks, with language models more likely to generate inconsistent outputs when hallucinating. Multi-agent collaborative filtering frameworks have demonstrated that cross-verification among agents can effectively overcome hallucinations while filtering out high-quality responses from diverse response spaces.
Competitive Scenarios introduce sophisticated game-theoretic dynamics where agents must balance cooperation with self-interest, providing valuable insights into coordination capabilities under realistic constraints. Abdelnabi et al. (2024) developed comprehensive testbeds of multi-agent, multi-issue negotiation games that require strong arithmetic, inference, exploration, and planning capabilities while integrating them in dynamic, multi-turn setups. These negotiation frameworks serve as practical evaluation tools for real-world scenarios where LLM agents might be used in customer service or business negotiations.
However, research has revealed important limitations and biases in LLM-agent communities that affect collective decision-making. Studies of LLM populations show that social conventions can spontaneously emerge through purely local interactions, but this process gives rise to collective biases that increase the likelihood of specific conventions developing over others. These biases are not easily deducible from analyzing isolated agents and manifest as preferences for conservative consensus and avoidance of extreme positions. Research on political bias in LLMs reveals that these systems often exhibit systematic preferences that can lead to echo chambers and reduce exposure to differing perspectives, potentially intensifying societal polarization.
The strength of these biases varies dramatically depending on the specific LLM model used, with some populations requiring committed minorities as small as 2% to overturn consensus, while others need up to 67% to achieve the same effect. Understanding these social dynamics is crucial for safe multi-agent design, as they help identify and mitigate risks like echo chambers, groupthink, and the emergence of harmful consensus patterns in AI systems. Understanding these social dynamics is crucial for safe multi-agent design, helping mitigate risks like echo chambers or groupthink in AI systems.
Memory Sharing and Collective Knowledge
Memory architecture significantly influences multi-agent collaboration effectiveness. Agents may maintain private memories, access shared knowledge bases, or use hybrid approaches that balance privacy with collective learning.
AutoGen’s Experience Management Protocol has agents log task experiences (inputs, outputs, outcomes) for collective learning. The Memory-Sharing framework proposed by Gao & Zhang (2024) creates central stores where agents contribute past queries and responses, enabling retrieval of relevant examples to improve in-context reasoning.
More sophisticated approaches integrate retrieval-augmented generation (RAG) with multi-agent systems. Agents query knowledge bases before sharing insights with peers, or implement blackboard architectures where any agent can read from and write to a global workspace.
Two-tier Memory Systems distinguish between private observations (sensitive context) and shared knowledge (general insights). Fine-grained access control policies determine what information flows between agents, with permission graphs that can evolve based on trust and collaboration history.
Even simple memory mechanisms improve coherence—when agents remember prior messages and decisions, they avoid redundant work and build upon previous insights. Rollout Memory systems maintain logs of planning progress that each agent reads and updates, ensuring coordinated progression through multi-step problems.
Practical Implementation: Multi-Agent Task Execution
To illustrate these concepts, consider a plan-and-execute multi-agent workflow for a complex research task:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# Define agent roles with specialized capabilities
agents = {
"Planner": Agent(model="gpt-4", role_desc="Decompose complex tasks into actionable steps"),
"Researcher": Agent(model="gpt-4", role_desc="Gather and validate information from multiple sources"),
"Coder": Agent(model="gpt-4", role_desc="Implement solutions and debug technical issues"),
"UserProxy": Agent(model="user", role_desc="Provide human oversight and feedback")
}
conversation = Conversation()
conversation.add_system("You are a collaborative project team. Work together to address the user's request efficiently.")
# Initial user request
conversation.add_user("Build a comprehensive dataset of scientific articles about climate change models, including metadata and quality assessments.")
# Multi-agent dialogue loop with structured turn-taking
for round in range(max_rounds):
for name, agent in agents.items():
# Each agent sees full conversation history for context
response = agent.chat(conversation.history)
conversation.add_assistant(name, response)
# Check for completion criteria or user intervention
if conversation.task_completed() or user_intervention_requested():
break
# Extract final deliverable from designated agent
final_output = conversation.get_structured_output("Planner")
Multi-agent LLM systems represent a paradigm shift toward more capable, specialized, and collaborative AI. As these systems mature, we can expect to see increasingly sophisticated communication protocols, more nuanced memory sharing mechanisms, and better integration of human expertise.
The key challenges ahead include managing computational complexity, ensuring robust coordination at scale, and developing safety mechanisms that prevent harmful emergent behaviors. Success in addressing these challenges will unlock AI systems capable of tackling humanity’s most complex problems through the power of intelligent collaboration. By thoughtfully designing agent architectures, communication protocols, and memory systems, we can create multi-agent systems that truly exceed the sum of their parts—achieving collective intelligence that surpasses what any individual agent could accomplish alone.
Resources
Large Language Model based Multi-Agents: A Survey of Progress and Challenges
AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors
Beyond Self-Talk: A Communication-Centric Survey of LLM-Based Multi-Agent Systems
Multi-LLM-Agent Systems: Techniques and Business Perspectives
Chain of Agents: Large Language Models Collaborating on Long-Context Tasks
LLM-Deliberation: Evaluating LLMs with Interactive Multi-Agent Negotiation Gam-OpenReview
Collaborative Memory: Multi-User Memory Sharing in LLM Agents with Dynamic Access Control
A Dynamic LLM-Powered Agent Network for Task-Oriented Agent Collaboration
Rethinking the Bounds of LLM Reasoning: Are Multi-Agent Discussions the Key?
AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation