Agentic Knowledge Discovery Notebook
Emergency Response System: Intelligent Crisis Management: Unlocking Enterprise Data with MongoDB Vector Search, LangChain, and LangGraph
Use Case Overview
In today's complex technical environment, organizations face critical incidents—ranging from network outages and security breaches to infrastructure failures and service disruptions. When these crises occur, teams must rapidly mobilize the right expertise, access relevant knowledge resources, and coordinate response efforts under significant time pressure.
Imagine:
- a critical 5G network outage affecting multiple metropolitan areas,
- a data center hardware failure impacting enterprise customers,
- or a security breach requiring immediate containment.
Each crisis demands rapid response spanning multiple technical domains, requiring organizations to quickly assemble the right experts, access relevant procedures, and coordinate complex actions—all while business-critical services remain offline.
This solution transforms Emergency Response Management by:
- Accelerating crisis detection: Automatically parsing incident reports to extract critical parameters, affected systems, and required skill sets.
- Assembling optimal response teams: Identifying available experts with the precise skills needed for each unique crisis situation.
- Mobilizing knowledge resources: Retrieving relevant technical procedures, best practices, and previous incident documentation.
- Orchestrating coordinated response: Generating comprehensive response plans with prioritized action items, team assignments, and communication protocols.
Built on MongoDB Atlas Vector Search for high-performance semantic search and document retrieval, LangChain and LangGraph for agentic workflow orchestration, this approach delivers an intelligent emergency response system that dramatically reduces incident resolution time and business impact.
Key Components
- Crisis Detection: Analyzes unstructured incident reports to extract structured data about the crisis type, severity, affected systems, and required expertise.
- Expert Identification: Searches employee records using semantic matching to identify personnel with crisis-relevant skills and availability.
- Knowledge Resource Gathering: Retrieves technical documentation, recovery procedures, and best practices specifically relevant to the current crisis.
- Response Plan Generation: Creates comprehensive response plans with team assignments, prioritized action items, communication protocols, and estimated resolution timelines.
Business Impact
- Reduced Average Time to Resolution: Accelerates response time by automating the most time-consuming aspects of crisis management.
- Optimal Team Composition: Ensures the most qualified experts are engaged based on real-time availability and precise skill matching.
- Enhanced Decision Support: Provides response teams with only the most relevant knowledge resources and procedures.
- Improved Stakeholder Communication: Generates structured briefings and updates for both technical teams and business stakeholders.
This intelligent system transforms crisis management from a reactive, often chaotic process into a structured, data-driven workflow that minimizes business impact and accelerates service restoration.
Cross-Industry Applications This emergency response architecture can be readily adapted to various industries:
1. Healthcare
- Mobilizing specialized medical teams for rare conditions or mass casualty events
- Coordinating expertise during disease outbreaks or public health emergencies
2. Financial Services
- Assembling fraud response teams for complex financial incidents
- Coordinating technical and business experts during trading system failures
3. Energy and Utilities
- Mobilizing technical teams during power grid failures or outages
- Assembling environmental specialists during contamination events
4. Manufacturing
- Coordinating experts to minimize downtime on critical production equipment
- Assembling cross-functional teams for supply chain or quality control crises
5. Transportation
- Mobilizing aviation or maritime experts during system failures or safety incidents
- Coordinating response teams for logistics network disruptions
6. Government
- Assembling emergency management teams during natural disasters
- Mobilizing technical expertise for infrastructure failures or cybersecurity incidents
Objective:
Enable enterprise users to query and explore organizational knowledge across FAQs, project details, and employee expertise in natural language.
Key Benefits:
-
Reduced time-to-insight: Semantic search surfaces relevant results even when keywords differ.
-
Contextual reasoning: Agents chain multi-step queries (e.g., “Which engineer led Project P123?”).
-
Scalable architecture: Easily extend to new data sources (Confluence, emails, design documents).
Key Components:
-
MongoDB Atlas Vector Search: Dense vector indexing for semantic relevance.
-
Voyage AI: State of the art embedding models and rerankers
-
LangChain: Embedding pipelines and workflow management.
-
LangGraph: Agentic, graph-driven decision making for complex queries.
Enter your OPENAI API KEY: ··········
Part 0: Synthetic Data Creation
Part 1: Data Loading, Cleaning and Preparation
Enter your VOYAGE AI API key: ··········
Generating emebdding for datapoints
Part 2: Database Connection, Collection and Indexes
Connecting to MongoDB
MongoDB acts as both an operational and a vector database for the RAG system. MongoDB Atlas specifically provides a database solution that efficiently stores, queries and retrieves vector embeddings.
Setup
To use MongoDB as a toolbox, you will need to complete the following steps:
-
Register for a MongoDB Account:
- Go to the MongoDB website (https://www.mongodb.com/cloud/atlas/register).
- Click on the "Try Free" or "Get Started Free" button.
- Fill out the registration form with your details and create an account.
-
Create a MongoDB Cluster
-
Set Up Database Access:
- In the left sidebar, click on "Database Access" under "Security".
- Click "Add New Database User".
- Create a username and a strong password. Save these credentials securely.
- Set the appropriate permissions for the user (e.g., "Read and write to any database").
-
Configure Network Access:
- In the left sidebar, click on "Network Access" under "Security".
- Click "Add IP Address".
- To allow access from anywhere (not recommended for production), enter 0.0.0.0/0.
- For better security, whitelist only the specific IP addresses that need access.
-
Follow MongoDB’s steps to get the connection string from the Atlas UI. After setting up the database and obtaining the Atlas cluster connection URI, securely store the URI within your development environment.
Enter your MongoDB URI: ··········
Connection to MongoDB successful
Create collections
Create Indexes
Create the vector search indexes
Create the search indexes
Data Ingestion
Part 3: Creating and Testing Retrieval Methods With LangChain
Text Search
/tmp/ipython-input-3957023351.py:19: LangChainDeprecationWarning: The method `BaseRetriever.get_relevant_documents` was deprecated in langchain-core 0.1.46 and will be removed in 1.0. Use :meth:`~invoke` instead. result = full_text_search.get_relevant_documents(query)
[Document(metadata={'_id': '68c02694dc3b288b36954711', 'emp_id': 'employees-0', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['IP networking', 'routing and switching', 'fiber optics', 'network security', 'VoIP'], 'bio': 'Jordan Singh is a seasoned network engineer specializing in telecom network infrastructure with expertise in routing, switching, and optical transmission technologies.', 'manager': None, 'start_date': '2020-07-15', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': [], 'mentees': [], 'frequent_collaborators': [], 'score': 0.9914655685424805}, page_content='Jordan Singh'),
, Document(metadata={'_id': '68fa0cb57e65d1c84f9e0465', 'emp_id': 'employees-4', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['Network Design', 'Cisco Routers', 'VoIP Implementation', 'VPN Configuration', 'Troubleshooting', 'Telecom Infrastructure'], 'bio': 'Jordan Kim is an experienced Network Engineer with expertise in designing, implementing, and maintaining large-scale telecommunication networks. Skilled in troubleshooting and optimizing network infrastructure.', 'manager': 'employees-2', 'start_date': '2021-05-17', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-1'], 'mentees': ['employees-3'], 'frequent_collaborators': ['employees-0'], 'score': 0.9914655685424805}, page_content='Jordan Kim'),
, Document(metadata={'_id': '68fa0cb57e65d1c84f9e0461', 'emp_id': 'employees-0', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['Network Design', 'Cisco Routing & Switching', 'Fiber Optic Communication', 'Telecommunications Protocols', 'Network Security'], 'bio': 'Jordan Lee is an experienced network engineer specializing in the design and maintenance of robust telecommunications infrastructure. Proven expertise in optimizing large-scale networks and ensuring high reliability for carrier-grade operations.', 'manager': None, 'start_date': '2021-03-15', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': [], 'mentees': [], 'frequent_collaborators': [], 'score': 0.9914655685424805}, page_content='Jordan Lee')] Vector Search
0%| | 0/1 [00:00<?, ?it/s]
[(Document(id='68fa0cb57e65d1c84f9e0464', metadata={'_id': '68fa0cb57e65d1c84f9e0464', 'emp_id': 'employees-3', 'name': 'Priya Deshmukh', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['Cisco networking', 'VoIP installation', 'Fiber optic infrastructure', 'Network security', 'BGP & OSPF routing', 'Troubleshooting WAN/LAN'], 'manager': 'employees-1', 'start_date': '2021-02-15', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-2'], 'mentees': ['employees-0'], 'frequent_collaborators': ['employees-2', 'employees-1']}, page_content='Priya is a seasoned Network Engineer with 7 years of experience in designing, implementing, and optimizing telecom network infrastructures. She specializes in VoIP systems and high-capacity fiber-optic deployments for enterprise clients.'),
, 0.7213080525398254),
, (Document(id='68921a051d77d2d9c2b14100', metadata={'_id': '68921a051d77d2d9c2b14100', 'emp_id': 'employees-7', 'name': 'Sophia Kim', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['IP routing', 'network design', 'fiber optics', 'troubleshooting', 'VoIP'], 'manager': 'employees-3', 'start_date': '2021-04-12', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-6'], 'mentees': ['employees-4'], 'frequent_collaborators': ['employees-0', 'employees-1']}, page_content='Sophia Kim is a dedicated network engineer with more than 5 years’ experience designing and maintaining high-capacity telecom networks. She specializes in optimizing network infrastructure for performance and reliability, and has strong expertise with fiber optic systems.'),
, 0.7185817956924438),
, (Document(id='68921a051d77d2d9c2b140ff', metadata={'_id': '68921a051d77d2d9c2b140ff', 'emp_id': 'employees-6', 'name': 'Samantha Riley', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['Network Design', 'Cisco Routers', 'VoIP Configuration', 'Telecommunications Protocols', 'Firewall Management', 'Network Troubleshooting'], 'manager': 'employees-1', 'start_date': '2019-04-15', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-1'], 'mentees': ['employees-0'], 'frequent_collaborators': ['employees-3', 'employees-5']}, page_content='Samantha Riley is a skilled network engineer with over 7 years of experience in designing and maintaining large-scale telecommunications networks. She specializes in VoIP solutions and has a strong background in network security and troubleshooting complex network issues.'),
, 0.7175770998001099)] Hybrid Search
0%| | 0/1 [00:00<?, ?it/s]
[Document(metadata={'_id': '68fa0cb57e65d1c84f9e0465', 'emp_id': 'employees-4', 'name': 'Jordan Kim', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['Network Design', 'Cisco Routers', 'VoIP Implementation', 'VPN Configuration', 'Troubleshooting', 'Telecom Infrastructure'], 'manager': 'employees-2', 'start_date': '2021-05-17', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-1'], 'mentees': ['employees-3'], 'frequent_collaborators': ['employees-0'], 'vector_score': 0.01639344262295082, 'rank': 0, 'fulltext_score': 0, 'score': 0.01639344262295082}, page_content='Jordan Kim is an experienced Network Engineer with expertise in designing, implementing, and maintaining large-scale telecommunication networks. Skilled in troubleshooting and optimizing network infrastructure.'),
, Document(metadata={'_id': '68921a051d77d2d9c2b140fd', 'emp_id': 'employees-4', 'name': 'Maya Patel', 'role': 'Network Engineer', 'department': 'Network Operations', 'skills': ['Network Design', 'Cisco Routers & Switches', 'VoIP', 'Telecommunications Infrastructure', 'Network Security', 'Fiber Optic Communication', 'Linux Administration'], 'manager': 'employees-1', 'start_date': '2017-03-12', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-1'], 'mentees': ['employees-2'], 'frequent_collaborators': ['employees-0', 'employees-3'], 'vector_score': 0.016129032258064516, 'rank': 1, 'fulltext_score': 0, 'score': 0.016129032258064516}, page_content='Maya Patel is a seasoned network engineer with over 7 years of experience in the telecommunications sector. Her expertise spans network design, deployment, and ongoing optimization, focusing on delivering high-availability communication platforms. Passionate about mentoring new engineers, she contributes to building robust technical teams.'),
, Document(metadata={'_id': '68c02694dc3b288b36954718', 'emp_id': 'employees-7', 'name': 'Anjali Patel', 'role': 'System Administrator', 'department': 'IT Operations', 'skills': ['Linux administration', 'Network security', 'Telecommunications systems', 'Firewall configuration', 'Cloud infrastructure', 'Incident response'], 'manager': 'employees-3', 'start_date': '2021-03-15', 'end_date': '', 'current_projects': [], 'past_projects': [], 'mentors': ['employees-5'], 'mentees': ['employees-6'], 'frequent_collaborators': ['employees-1', 'employees-4'], 'vector_score': 0.015873015873015872, 'rank': 2, 'fulltext_score': 0, 'score': 0.015873015873015872}, page_content='Anjali Patel is an experienced System Administrator specializing in telecom infrastructure. With a strong focus on network security and high-availability systems, Anjali ensures seamless IT operations and supports large-scale telecommunications environments.')] Graph Search
AIMessage(content='There are no entities related to the query about projects that share team members with good communication.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 18, 'prompt_tokens': 472, 'total_tokens': 490, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-2024-08-06', 'system_fingerprint': 'fp_cbf1785567', 'id': 'chatcmpl-CTni54CdmYkU5yr8PO7CNRmaSYXfV', 'service_tier': 'default', 'finish_reason': 'stop', 'logprobs': None}, id='run--dfb7c5f1-8ae2-4f2b-a559-8aaa151349dd-0', usage_metadata={'input_tokens': 472, 'output_tokens': 18, 'total_tokens': 490, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}) Cross-Team Project Knowledge Discovery
- Finding how different projects interconnect through shared team members
- Identifying knowledge transfer paths when employees move between projects
- Discovering dependencies between projects that aren't documented but exist through shared personnel
Expert Network Mapping
- Tracing expertise flows when experts collaborate on projects
- Finding indirect expertise paths (e.g., "Who can John reach out to for Android development help through his network?")
- Discovering emerging expertise clusters around specific technologies
Part 4: Automated Workflow and Agentic AI Implementation
AUTOMATION SCENARIO : Critical 5G Network Issue Response ( Workflow Automation)
-
Context: A major 5G network outage affects multiple regions. The system needs to quickly assemble an emergency response team with specific expertise.
-
Workflow Steps:
- Step 1: Crisis Detection and Skill Requirements
- Step 2: Expert Identification
- Step 3: Team Composition Analysis
- Step 4: Knowledge Asset Preparation
- Step 5: Team Activation and Brief
Overview
Create Collections and Indexes [Crisis]
crisis_events collection already exists
Data Models
Incident Report Parser
Example Incident Report
The incident report can be a text document such as a PDF, and if images and tables are included in the PDF then we advice leveraging voyage multimodal embedding models
Testing the Incident Response Parser
=== Processing Incident Report === Event ID: CRISIS-20250505-001 Type: CrisisType.NETWORK_OUTAGE Severity: SeverityLevel.CRITICAL Title: Critical 5G Network Failure Across Major US Cities Description: A complete 5G network outage affects the North America region, with service down in New York City, Boston, and Philadelphia due to equipment overheating during maintenance. Primary data center and majority of gNodeB stations have failed, causing major business and consumer disruptions. Affected Systems: 5G Network Service, Core Network, gNodeB stations, Primary Data Center Affected Regions: New York City metro area, Boston metropolitan region, Philadelphia and surrounding counties Customer Impact: Approximately 2 million customers unable to access 5G services; enterprise customers experience business-critical disruptions; mobile data speeds reduced to 4G in adjacent areas. Required Skills: 5G network engineering, Hardware repair, Crisis management, Customer communications
Issue Response Engine (Brings all processes together)
LangGraph State
Workflow Definiton
Workflow Excecution
=== Emergency Response System Activated ===
1. Beginning crisis detecting and parsing provided information...
Crisis event saved into records
Crisis Event Generated:
{
"event_id": "CRISIS-20250505-001",
"event_type": "Network Outage",
"severity": "critical",
"title": "Critical 5G Network Failure Across Major North American Cities",
"description": "A complete 5G network outage has affected the NYC, Boston, and Philadelphia regions. Core network is down due to equipment overheating during maintenance. Estimated 2 million customers impacted, including major enterprise clients. Mobile data speeds are degraded in surrounding areas. Immediate emergency technical and customer response required.",
"affected_systems": [
"5G Network Service",
"Core Network",
"gNodeB Stations",
"Primary Data Center"
],
"affected_regions": [
"New York City metro area",
"Boston metropolitan region",
"Philadelphia and surrounding counties"
],
"customer_impact": "Estimated 2 million customers without 5G access; business-critical disruptions reported; mobile data reduced to 4G in surrounding areas; significant revenue and SLA impacts.",
"required_skills": [
"5G network engineering",
"Hardware repair",
"Crisis management",
"Customer communications"
]
}
2. Identifying experts within records suitable to handle crisis event...
Search Query: Find experts with 5G network engineering, Hardware repair, Crisis management, Customer communications in their skills and experience
0%| | 0/1 [00:00<?, ?it/s]
Below are the experts identified ⬇️
[{'emp_id': 'employees-9', 'name': 'Aisha Patel', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['LTE/5G Networking', 'Network Security', 'Cisco Routers & Switches', 'RF Planning', 'Troubleshooting', 'Fiber Optic Communications', 'Data Center Networking'], 'current_projects': []}, {'emp_id': 'employees-5', 'name': 'Sarah Kim', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['Network Design', 'Telecommunications Infrastructure', 'Fiber Optic Networking', 'Routing & Switching', 'VoIP', 'Troubleshooting', 'Cisco Certified'], 'current_projects': []}, {'emp_id': 'employees-7', 'name': 'Sophia Kim', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['IP routing', 'network design', 'fiber optics', 'troubleshooting', 'VoIP'], 'current_projects': []}, {'emp_id': 'employees-4', 'name': 'Ravi Sharma', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['Network Design', 'Troubleshooting', 'Cisco Routers', 'Optical Fiber Communication', 'Telecommunications Protocols', 'Packet Switching'], 'current_projects': []}, {'emp_id': 'employees-2', 'name': 'Priya Raman', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['Network Design', 'Routing & Switching', 'Telecommunications Infrastructure', 'LAN/WAN Optimization', 'Fiber Optics', 'VoIP', 'Troubleshooting'], 'current_projects': []}]
3. Gathering knowledge assets to prep team on...
0%| | 0/1 [00:00<?, ?it/s]
Below are the knowledge assets gathered ⬇️
[{'asset_id': 'knowledge_assets-2', 'title': 'Technical Procedures and Best Practices for Cloud Data Migration', 'type': 'documentation', 'author': 'employees-6', 'content': '', 'creation_date': '2024-06-20'}, {'asset_id': 'knowledge_assets-4', 'title': 'Secure API Integration Procedures and Best Practices', 'type': 'best_practice', 'author': 'employees-3', 'content': '', 'creation_date': '2024-06-16T10:00:00Z'}, {'asset_id': 'knowledge_assets-3', 'title': 'Best Practices for API Deployment and Version Management', 'type': 'best_practice', 'author': 'employees-6', 'content': '', 'creation_date': '2024-06-18'}, {'asset_id': 'knowledge_assets-5', 'title': 'Automating Deployment Pipelines: Technical Procedures & Best Practices', 'type': 'documentation', 'author': 'employees-3', 'content': '', 'creation_date': '2024-05-12T09:30:00Z'}, {'asset_id': 'knowledge_assets-3', 'title': 'Standard Procedures and Best Practices for Secure API Development', 'type': 'best_practice', 'author': 'employees-4', 'content': '', 'creation_date': '2024-06-15T09:25:00Z'}]
4. Activating team and creating a response plan...
Briefing Generated:
---
## CRISIS RESPONSE TEAM BRIEFING
**Event ID:** CRISIS-20250505-001
**Event:** Critical 5G Network Failure Across Major North American Cities
**Severity:** CRITICAL
---
### 1. Executive Summary
A total outage of the 5G network has simultaneously impacted New York City, Boston, and Philadelphia metropolitan areas due to equipment overheating during scheduled maintenance at the core network. This outage affects approximately 2 million customers, severely degrading service for enterprise clients and reducing mobile data speeds to 4G in adjacent regions. Immediate resolution is vital to reduce further SLA, financial, and customer trust damages.
### 2. Team Assignments & Roles
| Name | Role | Assignment |
|----------------|--------------------|--------------------------------------------|
| **Aisha Patel** | Technical Support | Lead technical triage & data center escalation |
| **Sarah Kim** | Technical Support | Fault isolation: gNodeB & radio access analysis |
| **Sophia Kim** | Technical Support | Core network diagnostics & configuration review |
| **Ravi Sharma** | Technical Support | Overheating investigation and equipment coordination |
| **Priya Raman** | Technical Support | Customer/enterprise impact mapping & technical comms |
**All engineers**: On rotating shifts for continuous coverage, status escalation responsibility as per situation criticality.
### 3. Priority Action Items
1. **Immediate Core Network Restoration**
- Diagnose overheating incident, restore failed components, and reroute traffic if possible.
2. **Outage Containment**
- Isolate affected zones, stabilize gNodeB stations, and prevent cascading failures.
3. **Service Continuity**
- Deploy fallback/temporary solutions to partially restore service or escalate to 4G fallback.
4. **Customer Impact Minimization**
- Map affected business clients; prioritize mission-critical sectors.
5. **Root Cause Analysis**
- Collect forensic data for post-mortem and long-term remediation.
6. **Ongoing Status Updates**
- Maintain near real-time crisis dashboard for executives and customer service.
### 4. Available Resources & Documentation
- **Technical Procedures and Best Practices for Cloud Data Migration**
- **Secure API Integration Procedures and Best Practices**
- **Best Practices for API Deployment & Version Management**
- **Automating Deployment Pipelines: Technical Procedures & Best Practices**
- **Standard Procedures and Best Practices for Secure API Development**
(All resources are available on the shared crisis drive and may provide applicable guidance for rapid deployment or temporary reroute solutions.)
### 5. Expected Timeline & Milestones
- **0–1 hrs:** Situation triage, core isolation, and first technical update
- **1–3 hrs:** Action on preliminary fix and start of targeted restoration
- **3–6 hrs:** Progress update, phased service restoration (reprioritize if delays), post-outage impact assessment initiation
- **6+ hrs:** Full network restoration, incident review, and executive summary
### 6. Communication Protocols
- **Incident Command:** Led by Aisha Patel; status via secure team Slack #crisis-response and standby phone bridge
- **Update Frequency:** Every 30 minutes or at major milestone completion
- **Stakeholder Reports:** Every 1 hour to executive leadership and customer service liaisons
- **Customer Messaging:** Drafted by Priya Raman in sync with PR; distributed via website, SMS, and enterprise client portals
### 7. Success Criteria
- **Restoration:** 5G network service fully restored to impacted metro areas
- **Stabilization:** No lingering or recurrent outages detected for 24 hours
- **Root Cause:** Documented and communicated, with preventive actions identified
- **Customer Communication:** Timely, clear, and accurate updates delivered throughout
- **SLA Compliance:** Post-crisis SLA review completed and breach instances mitigated
---
**ALL HANDS: Be vigilant, submit all findings through assigned channels, and prepare escalation summaries at each milestone.**
There were minor errors so retrying...
["Error activating team: 'EmergencyResponseWorkflow' object has no attribute '_generate_action_items'"]
1. Beginning crisis detecting and parsing provided information...
Crisis event saved into records
Crisis Event Generated:
{
"event_id": "CRISIS-20250505-001",
"event_type": "Network Outage",
"severity": "critical",
"title": "Critical 5G Network Failure in North America Impacting Millions",
"description": "A complete 5G network failure has occurred across major North American cities due to equipment overheating during a maintenance window, causing core network and multiple gNodeB node failures. Business-critical outages and severe degradation of mobile data speeds are being reported.",
"affected_systems": [
"5G Network Service",
"Core Network",
"gNodeB Stations",
"Primary Data Center"
],
"affected_regions": [
"New York City metro area",
"Boston metropolitan region",
"Philadelphia and surrounding counties"
],
"customer_impact": "Approximately 2 million customers cannot access 5G services. Enterprise clients face business-critical disruptions; mobile data speed reduced to 4G in surrounding areas.",
"required_skills": [
"Network engineering (5G expertise)",
"Hardware repair",
"Crisis management",
"Customer communications"
]
}
2. Identifying experts within records suitable to handle crisis event...
Search Query: Find experts with Network engineering (5G expertise), Hardware repair, Crisis management, Customer communications in their skills and experience
0%| | 0/1 [00:00<?, ?it/s]
Below are the experts identified ⬇️
[{'emp_id': 'employees-9', 'name': 'Aisha Patel', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['LTE/5G Networking', 'Network Security', 'Cisco Routers & Switches', 'RF Planning', 'Troubleshooting', 'Fiber Optic Communications', 'Data Center Networking'], 'current_projects': []}, {'emp_id': 'employees-5', 'name': 'Sarah Kim', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['Network Design', 'Telecommunications Infrastructure', 'Fiber Optic Networking', 'Routing & Switching', 'VoIP', 'Troubleshooting', 'Cisco Certified'], 'current_projects': []}, {'emp_id': 'employees-7', 'name': 'Sophia Kim', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['IP routing', 'network design', 'fiber optics', 'troubleshooting', 'VoIP'], 'current_projects': []}, {'emp_id': 'employees-4', 'name': 'Ravi Sharma', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['Network Design', 'Troubleshooting', 'Cisco Routers', 'Optical Fiber Communication', 'Telecommunications Protocols', 'Packet Switching'], 'current_projects': []}, {'emp_id': 'employees-0', 'name': 'Jordan Singh', 'role': 'Network Engineer', 'department': 'Network Operations', 'bio': None, 'skills': ['IP networking', 'routing and switching', 'fiber optics', 'network security', 'VoIP'], 'current_projects': []}]
3. Gathering knowledge assets to prep team on...
0%| | 0/1 [00:00<?, ?it/s]
Below are the knowledge assets gathered ⬇️
[{'asset_id': 'knowledge_assets-2', 'title': 'Technical Procedures and Best Practices for Cloud Data Migration', 'type': 'documentation', 'author': 'employees-6', 'content': '', 'creation_date': '2024-06-20'}, {'asset_id': 'knowledge_assets-4', 'title': 'Secure API Integration Procedures and Best Practices', 'type': 'best_practice', 'author': 'employees-3', 'content': '', 'creation_date': '2024-06-16T10:00:00Z'}, {'asset_id': 'knowledge_assets-3', 'title': 'Best Practices for API Deployment and Version Management', 'type': 'best_practice', 'author': 'employees-6', 'content': '', 'creation_date': '2024-06-18'}, {'asset_id': 'knowledge_assets-5', 'title': 'Automating Deployment Pipelines: Technical Procedures & Best Practices', 'type': 'documentation', 'author': 'employees-3', 'content': '', 'creation_date': '2024-05-12T09:30:00Z'}, {'asset_id': 'knowledge_assets-3', 'title': 'Standard Procedures and Best Practices for Secure API Development', 'type': 'best_practice', 'author': 'employees-4', 'content': '', 'creation_date': '2024-06-15T09:25:00Z'}]
4. Activating team and creating a response plan...
--------------------------------------------------------------------------- KeyboardInterrupt Traceback (most recent call last) /tmp/ipython-input-1171894299.py in <cell line: 0>() 60 # Execute the complete workflow starting from incident report 61 print("=== Emergency Response System Activated ===") ---> 62 result = emergency_workflow.respond_to_crisis(incident_report) # Empty dict to trigger detection & parsing 63 64 # Print results /tmp/ipython-input-1663767310.py in respond_to_crisis(self, incident_report) 297 # Run the workflow 298 config = {"configurable": {"thread_id": 1}} --> 299 final_state = self.workflow.invoke(initial_state, config) 300 301 return final_state /usr/local/lib/python3.12/dist-packages/langgraph/pregel/main.py in invoke(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, **kwargs) 3092 interrupts: list[Interrupt] = [] 3093 -> 3094 for chunk in self.stream( 3095 input, 3096 config, /usr/local/lib/python3.12/dist-packages/langgraph/pregel/main.py in stream(self, input, config, context, stream_mode, print_mode, output_keys, interrupt_before, interrupt_after, durability, subgraphs, debug, **kwargs) 2677 for task in loop.match_cached_writes(): 2678 loop.output_writes(task.id, task.writes, cached=True) -> 2679 for _ in runner.tick( 2680 [t for t in loop.tasks.values() if not t.writes], 2681 timeout=self.step_timeout, /usr/local/lib/python3.12/dist-packages/langgraph/pregel/_runner.py in tick(self, tasks, reraise, timeout, retry_policy, get_waiter, schedule_task) 165 t = tasks[0] 166 try: --> 167 run_with_retry( 168 t, 169 retry_policy, /usr/local/lib/python3.12/dist-packages/langgraph/pregel/_retry.py in run_with_retry(task, retry_policy, configurable) 40 task.writes.clear() 41 # run the task ---> 42 return task.proc.invoke(task.input, config) 43 except ParentCommand as exc: 44 ns: str = config[CONF][CONFIG_KEY_CHECKPOINT_NS] /usr/local/lib/python3.12/dist-packages/langgraph/_internal/_runnable.py in invoke(self, input, config, **kwargs) 654 # run in context 655 with set_config_context(config, run) as context: --> 656 input = context.run(step.invoke, input, config, **kwargs) 657 else: 658 input = step.invoke(input, config) /usr/local/lib/python3.12/dist-packages/langgraph/_internal/_runnable.py in invoke(self, input, config, **kwargs) 398 run_manager.on_chain_end(ret) 399 else: --> 400 ret = self.func(*args, **kwargs) 401 if self.recurse and isinstance(ret, Runnable): 402 return ret.invoke(input, config) /tmp/ipython-input-1663767310.py in _activate_team_and_create_plan(self, state) 174 175 # Use IssueResponseEngine to create team briefing --> 176 briefing_text = self.issue_engine.team_activation_and_brief( 177 crisis_event, selected_team, relevant_knowledge 178 ) /tmp/ipython-input-1941078503.py in team_activation_and_brief(self, crisis_data, experts_identified, knowledge_assets) 100 101 # Call GPT-4.1 to generate briefing --> 102 response = openai_client.responses.create( 103 model="gpt-4.1", 104 input=prompt, /usr/local/lib/python3.12/dist-packages/openai/resources/responses/responses.py in create(self, background, conversation, include, input, instructions, max_output_tokens, max_tool_calls, metadata, model, parallel_tool_calls, previous_response_id, prompt, prompt_cache_key, reasoning, safety_identifier, service_tier, store, stream, stream_options, temperature, text, tool_choice, tools, top_logprobs, top_p, truncation, user, extra_headers, extra_query, extra_body, timeout) 838 timeout: float | httpx.Timeout | None | NotGiven = not_given, 839 ) -> Response | Stream[ResponseStreamEvent]: --> 840 return self._post( 841 "/responses", 842 body=maybe_transform( /usr/local/lib/python3.12/dist-packages/openai/_base_client.py in post(self, path, cast_to, body, options, files, stream, stream_cls) 1257 method="post", url=path, json_data=body, files=to_httpx_files(files), **options 1258 ) -> 1259 return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls)) 1260 1261 def patch( /usr/local/lib/python3.12/dist-packages/openai/_base_client.py in request(self, cast_to, options, stream, stream_cls) 980 response = None 981 try: --> 982 response = self._client.send( 983 request, 984 stream=stream or self._should_stream_response_body(request=request), /usr/local/lib/python3.12/dist-packages/httpx/_client.py in send(self, request, stream, auth, follow_redirects) 912 auth = self._build_request_auth(request, auth) 913 --> 914 response = self._send_handling_auth( 915 request, 916 auth=auth, /usr/local/lib/python3.12/dist-packages/httpx/_client.py in _send_handling_auth(self, request, auth, follow_redirects, history) 940 941 while True: --> 942 response = self._send_handling_redirects( 943 request, 944 follow_redirects=follow_redirects, /usr/local/lib/python3.12/dist-packages/httpx/_client.py in _send_handling_redirects(self, request, follow_redirects, history) 977 hook(request) 978 --> 979 response = self._send_single_request(request) 980 try: 981 for hook in self._event_hooks["response"]: /usr/local/lib/python3.12/dist-packages/httpx/_client.py in _send_single_request(self, request) 1012 1013 with request_context(request=request): -> 1014 response = transport.handle_request(request) 1015 1016 assert isinstance(response.stream, SyncByteStream) /usr/local/lib/python3.12/dist-packages/httpx/_transports/default.py in handle_request(self, request) 248 ) 249 with map_httpcore_exceptions(): --> 250 resp = self._pool.handle_request(req) 251 252 assert isinstance(resp.stream, typing.Iterable) /usr/local/lib/python3.12/dist-packages/httpcore/_sync/connection_pool.py in handle_request(self, request) 254 255 self._close_connections(closing) --> 256 raise exc from None 257 258 # Return the response. Note that in this case we still have to manage /usr/local/lib/python3.12/dist-packages/httpcore/_sync/connection_pool.py in handle_request(self, request) 234 try: 235 # Send the request on the assigned connection. --> 236 response = connection.handle_request( 237 pool_request.request 238 ) /usr/local/lib/python3.12/dist-packages/httpcore/_sync/connection.py in handle_request(self, request) 101 raise exc 102 --> 103 return self._connection.handle_request(request) 104 105 def _connect(self, request: Request) -> NetworkStream: /usr/local/lib/python3.12/dist-packages/httpcore/_sync/http11.py in handle_request(self, request) 134 with Trace("response_closed", logger, request) as trace: 135 self._response_closed() --> 136 raise exc 137 138 # Sending the request... /usr/local/lib/python3.12/dist-packages/httpcore/_sync/http11.py in handle_request(self, request) 104 headers, 105 trailing_data, --> 106 ) = self._receive_response_headers(**kwargs) 107 trace.return_value = ( 108 http_version, /usr/local/lib/python3.12/dist-packages/httpcore/_sync/http11.py in _receive_response_headers(self, request) 175 176 while True: --> 177 event = self._receive_event(timeout=timeout) 178 if isinstance(event, h11.Response): 179 break /usr/local/lib/python3.12/dist-packages/httpcore/_sync/http11.py in _receive_event(self, timeout) 215 216 if event is h11.NEED_DATA: --> 217 data = self._network_stream.read( 218 self.READ_NUM_BYTES, timeout=timeout 219 ) /usr/local/lib/python3.12/dist-packages/httpcore/_backends/sync.py in read(self, max_bytes, timeout) 126 with map_exceptions(exc_map): 127 self._sock.settimeout(timeout) --> 128 return self._sock.recv(max_bytes) 129 130 def write(self, buffer: bytes, timeout: float | None = None) -> None: /usr/lib/python3.12/ssl.py in recv(self, buflen, flags) 1230 "non-zero flags not allowed in calls to recv() on %s" % 1231 self.__class__) -> 1232 return self.read(buflen) 1233 else: 1234 return super().recv(buflen, flags) /usr/lib/python3.12/ssl.py in read(self, len, buffer) 1103 return self._sslobj.read(len, buffer) 1104 else: -> 1105 return self._sslobj.read(len) 1106 except SSLError as x: 1107 if x.args[0] == SSL_ERROR_EOF and self.suppress_ragged_eofs: KeyboardInterrupt:
AUTONOMY SCENARIO : Critical 5G Network Issue Response (Agentic AI )
Overview
Define Tools
Aggregate Tools
LLM Defintion
Agent Definition
Node Definition
Autonomous Graph Agent Definition
<langgraph.graph.state.StateGraph at 0x78b771387710>
/tmp/ipython-input-3211739412.py:5: DeprecationWarning: AsyncMongoDBSaver is deprecated and will be removed in 0.3.0 release. Please use the async methods of MongoDBSaver instead. mongodb_checkpointer = AsyncMongoDBSaver(async_mongodb_client)
Executing the Agent
User: Can you run this incident report """ NETWORK CRISIS REPORT - PRIORITY CRITICAL Incident #: INC-20250505-3547 Service: 5G Network Service Status: ACTIVE OUTAGE SUMMARY: Complete 5G network failure reported across North America region AFFECTED AREAS: - New York City metro area - Boston metropolitan region - Philadelphia and surrounding counties IMPACT ASSESSMENT: - Estimated 2 million customers unable to access 5G services - Enterprise customers reporting business-critical service disruptions - Mobile data speeds degraded to 4G in surrounding areas TECHNICAL DETAILS: - Core Network Status: DOWN - gNodeB Stations: 3/5 nodes failed - Data Center: Primary facility shows hardware failures - Root Cause: Equipment overheating during maintenance window TIMELINE: 15:00 EST - Maintenance window begins 15:25 EST - First customer complaints received 15:30 EST - Network monitoring alerts triggered 15:45 EST - Service outage confirmed REQUIRED RESPONSE: - Network engineers with 5G expertise - Hardware repair technicians - Crisis management team - Customer communications team BUSINESS IMPACT: - Revenue impact: $5,000/minute - SLA breach: Yes (2-hour response requirement) - Media attention: High (local news coverage) NEXT STEPS: 1. Activate emergency response protocol 2. Dispatch on-site technicians 3. Prepare customer communications 4. Assess backup systems deployment """ Assistant:
/tmp/ipython-input-3900299550.py:20: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
result = AIMessage(**result.dict(exclude={"type", "name"}), name="assistant")
Crisis event saved into records
Crisis Event Generated:
{
"event_id": "CRISIS-20250505-001",
"event_type": "Network Outage",
"severity": "critical",
"title": "Critical 5G Network Outage in Major US Metro Areas - North America",
"description": "Complete 5G network failure across North American metro regions, with core network and gNodeB station failures due to equipment overheating during maintenance. Affects millions and enterprise customers, causing business-critical disruptions and media attention.",
"affected_systems": [
"5G Network Service",
"Core Network",
"gNodeB Stations",
"Primary Data Center"
],
"affected_regions": [
"New York City metro area",
"Boston metropolitan region",
"Philadelphia and surrounding counties"
],
"customer_impact": "Estimated 2 million customers unable to access 5G services; enterprise and business customers suffer critical disruptions; 4G data speeds in surrounding areas.",
"required_skills": [
"Network engineers with 5G expertise",
"Hardware repair technicians",
"Crisis management team",
"Customer communications team"
]
}
1. Crisis detected and parsed:
- Type: CrisisType.NETWORK_OUTAGE
- Severity: SeverityLevel.CRITICAL
- Required skills: Network engineers with 5G expertise, Hardware repair technicians, Crisis management team, Customer communications team
[Tool Used: detect_crisis]
Tool Call ID: call_d8DN8mKlas8SZv8NVVKoo5yv
Content: {"event_id": "CRISIS-20250505-001", "event_type": "Network Outage", "severity": "critical", "title": "Critical 5G Network Outage in Major US Metro Areas - North America", "description": "Complete 5G network failure across North American metro regions, with core network and gNodeB station failures due to equipment overheating during maintenance. Affects millions and enterprise customers, causing business-critical disruptions and media attention.", "affected_systems": ["5G Network Service", "Core Network", "gNodeB Stations", "Primary Data Center"], "affected_regions": ["New York City metro area", "Boston metropolitan region", "Philadelphia and surrounding counties"], "customer_impact": "Estimated 2 million customers unable to access 5G services; enterprise and business customers suffer critical disruptions; 4G data speeds in surrounding areas.", "required_skills": ["Network engineers with 5G expertise", "Hardware repair technicians", "Crisis management team", "Customer communications team"]}
Assistant: Incident detected and analyzed:
- Crisis Type: Network Outage (Critical)
- Title: Critical 5G Network Outage in Major US Metro Areas - North America
- Description: Complete failure of the 5G network across New York, Boston, and Philadelphia due to equipment overheating during maintenance. Affects millions of customers and businesses, with major media coverage.
- Impact: 2 million customers affected, significant business disruption, degraded services, and high revenue loss.
- Affected Systems: 5G Network Service, Core Network, gNodeB Stations, Primary Data Center
- Required Response:
- Network engineers with 5G expertise
- Hardware repair technicians
- Crisis management team
- Customer communications team
The crisis requires immediate activation of emergency protocols and multidisciplinary experts for rapid response. Would you like me to proceed gathering a crisis response team, related knowledge assets, and initiate response planning?/tmp/ipython-input-3900299550.py:20: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.11/migration/
result = AIMessage(**result.dict(exclude={"type", "name"}), name="assistant")
User: q Goodbye!