View OND on GitHub

Open Network Design

OND → Accelerating Interconnectednes

Designs | Readme | About | Training Download this project as a .zip fileDownload this project as a tar.gz file

Network Logging Requirements

Table of Contents

Goals

Background 

By virtue of providing a network fabric (thus a service platform to the whole business) it is incumbent upon operations and engineering teams to manage all network resources not just individually, but as an aggregate, and as effectively as possible to ensure a reliable, predictable, and efficient service is provided. This entails standard network and security logging from elements for a wide array of events, triggers, and informational aspects. A key part of operations is situational awareness and without proper instrumentation and logging from devices, one might as well be flying blind.

In-Scope

Organsation_Name (NEMS) including but not limited to: 

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable)
SC01 Cisco routers, switches Mandatory Existing footprint
SC02 Cisco Call Managers, UC servers and Gatekeepers/CUBE Mandatory Existing footprint
SC03 Cisco UCS servers, UCS Manager (and Fabric Interconnects) Mandatory Existing footprint
SC04 Palo Alto firewalls (both hardware and software instantiations) Mandatory Existing footprint
SC05 Opengear Terminal Servers Mandatory Existing footprint
SC06 Cisco Wireless Controllers Mandatory Existing footprint
SC07 Cisco Distributed vSwitch Mandatory Existing footprint
SC08 Cisco Prime Mandatory Existing footprint
SC09 Infoblox Appliances (DNS/DHCP/IPAM) Mandatory Existing footprint
SC10 APC UPS and PDUs Mandatory Existing footprint
SC11 Infoblox Appliances Mandatory Existing footprint
SC12 Nimble Storage Mandatory Existing footprint

Out-of-Scope

Assumptions

Note: The following requirements will focus primarily upon the format, content, style, configuration, and handling of emitted logs. The network team will emit all logs (where capable) as RFC5424 syslog compatible formats with Facility Code 23 (local7) and Severity Code 5 (notifications).   Note: The header of the Syslog message contains “priority”, “version”, “timestamp”, “hostname”, “application”, “process id”, and “message id”. It is followed by structured-data, which contains data blocks in the “key=value” format enclosed in square brackets “[]”.

Architectural Requirements

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable) Meets
AR01 There shall be collectors/probes in each major geographical region (e.g. AMER, EMEA and APAC). Mandatory * To minimise latency
* For fault tolerance
* To minimise inter-regional traffic and logging
 
AR02 The solution shall ingest logs on UDP port 514 Mandatory * Standard connectionless UDP syslog  
AR03 The solution shall ingest logs on TCP port 514 Mandatory  * Standard connection-orientated TCP syslog  
AR04 Elements shall only log to regional ingestion collectors (until cross-regional de-duplication is available as standard). Desirable * UDP syslog may not get recorded if a local regional collector is down  
AR05 The solution will expect logs in UTC time thus all network elements will be configured to log based upon UTC. Mandatory  * Standardisation of global logging to allow for simple correlation rather than parsing timezones and manipulating data  
AR06 The solution will support and expect millisecond timestamps from network elements Mandatory  * Standardisation and requirement for complex interdependencies at machine times (pre/post event dependencies)  
AR07 The solution will provide plain text search. Mandatory     
AR08 The solution shall allow logs to be split and tagged based upon delimiters and regex. Mandatory     
AR09 The solution shall be able to ingest syslog on non-standard ports and filter said logs to different groupings or apply tags. Desirable    

Functional Requirements (The ‘What’)

Note: Network Element Logging Requirements

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable) Meets
FR01 The logging device shall emit logs with facility Local 7. Mandatory  * This will allow for rapid grouping of NEMs and network operational logs  
FR02 The logging device shall emit logs with severity 5 (Notification). Mandatory  * This will allow for rapid grouping of NEMs and network operational logs at a severity that provides operational intelligence without too much verbosity  
FR03 The logging device shall emit logs with millisecond timestamps. Mandatory  * This will allow for immediate and globalised correlation  
FR04 The logging device shall emit logs for all authentication and authorisation events. Mandatory  * To gain insight in to operators and agents interacting with elements for operational and security purposes (including audit logs)  
FR05 The logging device shall use its management loopback to source syslog and traps from. Mandatory  * To utilise an always up and consistent source IP address to instrument on the management plane from (even if traversing in-band)  
FR06 The logging device shall log state of IGP and EGP neighbour state changes. Mandatory  * This allows for distributed state to be tracked in relation to routing events and operational troubleshooting  

Reporting Requirements

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable) Meets
RP01 The solution shall provide automated and scheduled reporting. Mandatory * Automation and Continual Service Improvement  
RP02 The solution shall provide manual and on-demand reporting. Mandatory * Ad-hoc reporting and configuration of ‘one time’ reports.  
RP03 The solution shall provide active and visually queried graphing of metrics. Mandatory * GUI and mouse driven selection of graph subsections which result in new queries and results/graphs.  
RP04 The solution shall provide for user configurable dashboards. Mandatory  * Custom dashboards for per user, team, and departmental views.  

Security Requirements

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable) Meets
SEC01 The solution shall provide web based access via TLS based HTTPS for administration/reporting. Mandatory * Basic security requirements.  
SEC02 The solution shall support armored data ingestion (e.g. https, SSH, TLS etc) Mandatory  * Basic security requirements.  
SEC03 The solution shall support strong ciphers for symmetric encryption. Desirable * Basic security requirements.  
SEC04 The solution shall support an internal audit log for AAA(Authentication, Authorization, and Accounting) which can be audited by administrators. Mandatory  * Basic auditing and least access principle.  
SEC05 The solution shall support external authentication methods such as Radius (and/or Active Directory or SAML based authentication) for administration. Desirable  * To support AAA.  
SEC06 The solution shall provide for 2FA administrator and operator access. Desirable  * Basic 2FA integration preferably via DUO.  
SEC07 The solution shall support encrypted data storage (Data at Rest). Mandatory  * Basic Security (Data at Rest)  
SEC08 The solution shall support encrypted channels for instrumentation and telemetry shipping from collectors to analyzers/reporters (Data in Transit). Mandatory  * Basic Security (Data in Transit across untrusted/semi-trusted)  

Integration Requirements

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable) Meets
GR01 The solution shall provide a rich RESTful API to facilitate automation and custom event actions as part of wider workflows. Mandatory  * Basic API integrations and leveraging other systems and triggers.  
GR02 The solution shall provide integration with PagerDuty. Desirable  * 3rd party ecosystem for operations, alerting, and escalations  
GR03 The solution shall provide integration with Slack. Desirable  * 3rd party ecosystem for operations, alerting, and escalations (suppress alerts, update status).  
GR04 The solution shall provide integration with DataDog. Desirable  * 3rd party ecosystem for operations, alerting, and escalations  
GR05 The solution shall provide integration with Cisco Prime. Desirable  * Basic monitoring of other management platforms.  

Use Cases (‘Stories’)

Network Operations Uses

ID Description Importance (Mandatory/Desirable) Rationale or Comments (if applicable) Meets
ST01 Network Operations investigate a WAN or service outage and are trying to determine how often an interface or interfaces were flapping over a time period Mandatory  * Exact times of interface UP/DOWN  
ST02 Network Operations investigate a WAN or service outage and are trying to correlate to other events, faults, or outages on other platforms or endpoints that may provide root cause or additional context to service impact(s) Mandatory  * Heterogenous systems are interconnected and can become force multipliers exacerbating issues or indeed even causing them in the first place  
ST03 Network Operations general troubleshooting of routing state changes to help isolate and determine faults or distributed issues in the RIB (Routing Information Base) Mandatory  * Standard operations  
ST04 Network Operations can validate changes made by a specific operator via planned Change Management (or unplanned). Mandatory  * Standard operations  
ST05 Network Operations (ENVMON) environmental and hardware platform issues via syslog that are not available via SNMP. Mandatory  * Standard operations  
ST06 Pre-Network Support : AirSupport (level 1) visibility of user device authentication issues (by 802.1X, Radius, etc) Mandatory  * Standard operations  
ST07 Pre-Network Support : AirSupport (level 1) custom dashboards for certain event frequencies and volumes that can be presented in a simple visual representation. Mandatory  * Data visualisation  
ST08 Network Operations : Escalation of performance issues in relation to specific event types detected/matched (via syslog regex?). Mandatory  * Standard operations  
ST09 Network Operations: Ability to define the ‘business logic’ or ‘operational logic’ for correlating different types of events based upon different scenarios or thresholds of multiple events, which themselves trigger events and alerts. Desirable  * Advanced operations  
ST10 Network Operations: Alerting on low light from transceivers (via syslog regex?). Mandatory  * Standard operations  
ST11 Network Operations: Alerting when certain user types/IDs/names log in (via syslog regex?). Mandatory  * Standard operations  
ST12 Network Operations: Segregation, separation, grouping, or tagging of logs based upon FACILITY. Mandatory  * Standard syslog filtering and local redirection  
ST13 Network Operations: Segregation or grouping of device types based upon: Local3 -> FW/Security, Local4 -> IAM/DNS/DHCP, Local5 -> ENV (Power/HVAC), Local6 -> LocalWireless LAN Controllers, Local7 -> Routers/Switches (via some manner e.g. FACILITY, SRC IP, hostname etc) Mandatory  * Standard operations  
ST14 Network Operations: Ability to remap FACILITY based upon some arbitrary data such as received PORT, SRC IP, hostnames etc. Desirable  * Advanced operations