Gao et al., 2020 - Google Patents
Scouts: Improving the diagnosis process through domain-customized incident routingGao et al., 2020
View PDF- Document ID
- 5339482104783492288
- Author
- Gao J
- Yaseen N
- MacDavid R
- Frujeri F
- Liu V
- Bianchini R
- Aditya R
- Wang X
- Lee H
- Maltz D
- Yu M
- Arzani B
- Publication year
- Publication venue
- Proceedings of the Annual conference of the ACM Special Interest Group on Data Communication on the applications, technologies, architectures, and protocols for computer communication
External Links
Snippet
Incident routing is critical for maintaining service level objectives in the cloud: the time-to- diagnosis can increase by 10x due to mis-routings. Properly routing incidents is challenging because of the complexity of today's data center (DC) applications and their dependencies …
- 238000003745 diagnosis 0 title abstract description 13
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3409—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment for performance assessment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/34—Recording or statistical evaluation of computer activity, e.g. of down time, of input/output operation; Recording or statistical evaluation of user activity, e.g. usability assessment
- G06F11/3466—Performance evaluation by tracing or monitoring
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
- G06F11/0709—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0766—Error or fault reporting or storing
- G06F11/0775—Content or structure details of the error report, e.g. specific table structure, specific error fields
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Error detection; Error correction; Monitoring responding to the occurence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/30—Monitoring
- G06F11/3065—Monitoring arrangements determined by the means or processing involved in reporting the monitored data
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/06—Arrangements for maintenance or administration or management of packet switching networks involving management of faults or events or alarms
- H04L41/0631—Alarm or event or notifications correlation; Root cause analysis
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/04—Architectural aspects of network management arrangements
- H04L41/046—Aspects of network management agents
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/50—Network service management, i.e. ensuring proper service fulfillment according to an agreement or contract between two parties, e.g. between an IT-provider and a customer
- H04L41/5003—Managing service level agreement [SLA] or interaction between SLA and quality of service [QoS]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/02—Arrangements for maintenance or administration or management of packet switching networks involving integration or standardization
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/08—Configuration management of network or network elements
- H04L41/0803—Configuration setting of network or network elements
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/14—Arrangements for maintenance or administration or management of packet switching networks involving network analysis or design, e.g. simulation, network model or planning
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/14—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
- H04L63/1408—Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic by monitoring network traffic
- H04L63/1425—Traffic logging, e.g. anomaly detection
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/22—Arrangements for maintenance or administration or management of packet switching networks using GUI [Graphical User Interface]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/16—Network management using artificial intelligence
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance or administration or management of packet switching networks
- H04L41/12—Arrangements for maintenance or administration or management of packet switching networks network topology discovery or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING; COUNTING
- G06F—ELECTRICAL DIGITAL DATA PROCESSING
- G06F2201/00—Indexing scheme relating to error detection, to error correction, and to monitoring
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L43/00—Arrangements for monitoring or testing packet switching networks
- H04L43/08—Monitoring based on specific metrics
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L12/00—Data switching networks
- H04L12/02—Details
- H04L12/26—Monitoring arrangements; Testing arrangements
- H04L12/2602—Monitoring arrangements
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gao et al. | Scouts: Improving the diagnosis process through domain-customized incident routing | |
Lin et al. | Microscope: Pinpoint performance issues with causal graphs in micro-service environments | |
Notaro et al. | A survey of aiops methods for failure management | |
Kandula et al. | Detailed diagnosis in enterprise networks | |
US9672085B2 (en) | Adaptive fault diagnosis | |
US8051028B2 (en) | Method and apparatus for generating configuration rules for computing entities within a computing environment using association rule mining | |
Mahimkar et al. | Detecting the performance impact of upgrades in large operational networks | |
US8370466B2 (en) | Method and system for providing operator guidance in network and systems management | |
US20200409781A1 (en) | Identifying root causes of performance issues | |
US9306806B1 (en) | Intelligent resource repository based on network ontology and virtualization | |
US10177984B2 (en) | Isolation of problems in a virtual environment | |
US9122784B2 (en) | Isolation of problems in a virtual environment | |
Tang et al. | An integrated framework for optimizing automatic monitoring systems in large IT infrastructures | |
Tang et al. | Optimizing system monitoring configurations for non-actionable alerts | |
Kalegele et al. | Four decades of data mining in network and systems management | |
US7469287B1 (en) | Apparatus and method for monitoring objects in a network and automatically validating events relating to the objects | |
Cherrared et al. | LUMEN: A global fault management framework for network virtualization environments | |
Tailhardat et al. | Leveraging knowledge graphs for classifying incident situations in ict systems | |
US12073248B2 (en) | Server groupings based on action contexts | |
Arefin et al. | Cloudinsight: Shedding light on the cloud | |
Chakor et al. | Proposing a Layer to Integrate the Sub-classification of Monitoring Operations Based on AI and Big Data to Improve Efficiency of Information Technology Supervision. | |
Wu | Automatic performance diagnosis and recovery in cloud microservices | |
Munawar | Adaptive monitoring of complex software systems using management metrics | |
Gulenko | Autonomic self-healing in cloud computing platforms | |
Kakadia | An Exploration of Optimization Models for User Network Experience and Operational Efficiency Improvement Via Automation and Machine Learning |