HummingBird
Developers Guide
A HAMSON Private
Limited
Twelve, 3rd Floor Al Baber Center F
8 Markaz Islamabad.
0512818094-5
www.ahamson.com
info@ahamson.com
About HummingBird
AHamson Pvt Ltd in collaboration with CISCO has developed a software platform, under the
name HummingBird, for broadband traffic analysis, statistics, and reporting. It will provide
advanced IP broadband traffic analysis platform to fixed and mobile communication Service
Providers. It will help the service providers in collecting the right data from network traffic for
building statistics over time and also provide alerts on critical events for further analytics.
Goals
Traffic collection in Syslog/Radius protocols from network devices
Subscriber record correlation with RADIUS record through syslog or Api
Subscriber usage data processing for analytics on BW usage trends
Integration of various Syslog and RADIUS devices for data collection
Integrated dashboards and search for traffic, usage, subscriber activity, usage patterns
Detailed reporting and data export
Controlled access
Components
The high-level components of the proposed solution include:
RADIUS Log Forwarding Agent
Syslog Forwarding Agent
HB Server
RADIUS Consumer
Syslog Consumer
REST API
Search Engine (Api)
Dashboard Application
The NAT session information is sent by the CGNAT appliances via Syslog messages to the
forwarding agents, these agents collect this information and queue a new job for every
packet/event received. The forwarding agents also perform basic preliminary checks in order to
filter and discard unwanted event types. The HB Server listens for new jobs in the queue and as
soon as a new job is available it is de-queued, parsed and processed for correlation, grouping or
forwarded to respective consumer queue. The HB Server also purges the queued events if any
left due to unavailability of a matching event. This is implemented as a workaround for a flaw
on the firewall side that doesn’t send a matching Start or Stop event. All such events are usually
evicted to consumer queues after a 90 minute interval. This way the server makes sure that
Redis doesn’t keep any dangling events in memory and save the machine from memory
exhaustion.
High-Level System Design (Old & New Comparison)
Old System Architecture
New System Architecture
Forwarding Agents
Agents listen to traffic stream generated from RADIUS and Nat device in real-time. There are
separate agents for RADIUS and Nat traffic as each use a different message protocol. The
Radius agent filters the RADIUS accounting messages whenever a new session is started or an
old one is stopped. It ignores any other irrelevant accounting messages. The RADIUS accounting
messages are sent by the BRAS to the agent on UDP port 1813. When a session request is
received it makes a new task and enqueues this task to Job Queue in temporary storage.
Syslog agent also performs similarly but filters out any DNS events before queuing it to job
queue.
Server
The HummingBird server is a multi-threaded application that performs a number of tasks.
These include:
1. Task Processing
2. Session grouping
3. Maintaining an IP map to quickly find a user via Private IP Address
4. Purge dangling sessions/events periodically
The server can be started with the desired number of workers for processing job queue,
depending the traffic load. Server makes sure that each worker processes one and only one task
at a time. The server maintains an IP Map consisting of User and Private IP. This list is
automatically updated every time a new Radius session is received. This way a user can be
quickly and accurately identified for a Syslog session. Server keeps track of all live sessions and
connections in the temporary storage. On receiving a Stop event, this session is moved to a
writing queue for insertion to database. The server is designed to manage high task load by
initializing with n number of worker processes.
Server also purges the temporary cache from any dangling sessions that have been
accumulating over time and resulting in an increased use of system RAM. The server can be
initialized to take this time period as seconds at the start. By default this period is set at 5400
seconds or 90 minutes.
Consumers
HummingBird Consumers perform the final step in processing Radius and Syslog sessions. Once
a Radius or Syslog session is ready to be written to database, it is moved to respective
consumer queue. Consumers are designed to have multiple parallel instances running at a time
in order to handle any number of load. Moreover each consumer inserts records in batches.
The size of a consumer batch can also be provided at the time of starting a consumer. For
changing the defaults see the respective consumer source code.
Data Storage and Retrieval
Data Storage component is responsible for storage and retrieval of data from both parsers to
main storage. It also serves queries of the other system components for data within a given
time period.
Data Stores
Data storage is divided into two components a temporary storage (Redis cache, deployed on
app server) and a main storage (PostgreSQL deployed on DB server). Temporary storage is used
to ingest the high rate of traffic coming from firewall and Radius devices.
Temporary Store
The first priority of hummingBird is to secure all incoming traffic and to keep up with the high
data rates. All events from forwarding agents are first stored in temporary storage as tasks. All
parsed data is purged of irrelevant data in order to keep only minimum required information
and save storage space. Now this data is ready to be exported to main storage. Consumers
export this data in batches from temporary storage to main storage.
Redis Objects
Name Type
critical Job Queue
ars Active Radius Sessions Key Prefix
out-sessions Stream for radius session ready to be written to DB
sc Syslog Active Connections
out-cons Stream for syslog sessions ready to be written to DB
radius-consumer-group Radius Stream Consumer Group
syslog-consumer-group Syslog Stream Consumer Group
Main/Permanent Store
Data in main storage is stored in relational form for further processing, aggregation and
filtering. In order to facilitate better read performance as well as filtering this data is stored in
partitions based on a date range. For Syslog each day has a separate partition, similarly for
Radius each year has a separate partition. These partitions of processed data can then be
manually detached and attached when required for archival, analysis or search.
SQL Scripts
CREATE TABLE IF NOT EXISTS public.connections-firewall
(
cid character varying(12) COLLATE pg_catalog."default" NOT NULL,
username character varying(15) COLLATE pg_catalog."default",
started timestamp without time zone NOT NULL DEFAULT '0001-01-01
00:00:00'::timestamp without time zone,
ended timestamp without time zone NOT NULL,
public_ip inet,
destination_ip inet,
private_ip inet NOT NULL,
public_port integer NOT NULL DEFAULT 0,
destination_port integer NOT NULL DEFAULT 0,
private_port integer NOT NULL DEFAULT 0,
protocol character varying(2) COLLATE pg_catalog."default",
CONSTRAINT "PK_connections" PRIMARY KEY (ended, cid)
) PARTITION BY RANGE (ended);
ALTER TABLE IF EXISTS public.connections
OWNER to postgres;
-- Index: IX_connections_destination_ip
-- DROP INDEX IF EXISTS public."IX_connections_destination_ip";
CREATE INDEX IF NOT EXISTS "IX_connections_destination_ip"
ON public.connections USING gist
(destination_ip inet_ops)
;
-- Index: IX_connections_ended
-- DROP INDEX IF EXISTS public."IX_connections_ended";
CREATE INDEX IF NOT EXISTS "IX_connections_ended"
ON public.connections USING brin
(ended)
;
-- Index: IX_connections_public_ip
-- DROP INDEX IF EXISTS public."IX_connections_public_ip";
CREATE INDEX IF NOT EXISTS "IX_connections_public_ip"
ON public.connections USING gist
(public_ip inet_ops)
;
-- Index: IX_connections_started
-- DROP INDEX IF EXISTS public."IX_connections_started";
CREATE INDEX IF NOT EXISTS "IX_connections_started"
ON public.connections USING brin
(started)
;
-- Index: IX_connections_username
-- DROP INDEX IF EXISTS public."IX_connections_username";
CREATE INDEX IF NOT EXISTS "IX_connections_username"
ON public.connections USING btree
(username COLLATE pg_catalog."default" ASC NULLS LAST)
;
-- Partitions SQL
CREATE TABLE public.connections_15_11_2021_to_16_11_2021 PARTITION OF
public.connections
FOR VALUES FROM ('2021-11-15 00:00:00') TO ('2021-11-16 00:00:00');
ALTER TABLE IF EXISTS public.connections_15_11_2021_to_16_11_2021
OWNER to postgres;
CREATE TABLE public.connections_16_11_2021_to_17_11_2021 PARTITION OF
public.connections
FOR VALUES FROM ('2021-11-16 00:00:00') TO ('2021-11-17 00:00:00');
ALTER TABLE IF EXISTS public.connections_16_11_2021_to_17_11_2021
OWNER to postgres;
CREATE TABLE IF NOT EXISTS public.sessions-radius
(
macaddress character varying(20) COLLATE pg_catalog."default",
session_id character varying(15) COLLATE pg_catalog."default" NOT NULL,
username character varying(15) COLLATE pg_catalog."default",
started timestamp without time zone NOT NULL DEFAULT '0001-01-01
00:00:00'::timestamp without time zone,
ended timestamp without time zone NOT NULL,
in_octets bigint NOT NULL DEFAULT 0,
out_octets bigint NOT NULL DEFAULT 0,
duration bigint NOT NULL DEFAULT 0,
private_ip inet,
status_type character varying(5) COLLATE pg_catalog."default",
CONSTRAINT "PK_sessions" PRIMARY KEY (session_id, ended)
) PARTITION BY RANGE (ended);
ALTER TABLE IF EXISTS public.sessions
OWNER to postgres;
-- Index: IX_sessions_ended
-- DROP INDEX IF EXISTS public."IX_sessions_ended";
CREATE INDEX IF NOT EXISTS "IX_sessions_ended"
ON public.sessions USING brin
(ended)
;
-- Index: IX_sessions_private_ip
-- DROP INDEX IF EXISTS public."IX_sessions_private_ip";
CREATE INDEX IF NOT EXISTS "IX_sessions_private_ip"
ON public.sessions USING gist
(private_ip inet_ops)
;
-- Index: IX_sessions_started
-- DROP INDEX IF EXISTS public."IX_sessions_started";
CREATE INDEX IF NOT EXISTS "IX_sessions_started"
ON public.sessions USING brin
(started)
;
-- Index: IX_sessions_username
-- DROP INDEX IF EXISTS public."IX_sessions_username";
CREATE INDEX IF NOT EXISTS "IX_sessions_username"
ON public.sessions USING btree
(username COLLATE pg_catalog."default" ASC NULLS LAST)
;
-- Partitions SQL
CREATE TABLE public.sessions_01_01_2021_to_01_01_2022 PARTITION OF
public.sessions
FOR VALUES FROM ('2021-01-01 00:00:00') TO ('2022-01-01 00:00:00');
ALTER TABLE IF EXISTS public.sessions_01_01_2021_to_01_01_2022
OWNER to postgres;
CREATE TABLE public.sessions_01_01_2022_to_01_01_2023 PARTITION OF
public.sessions
FOR VALUES FROM ('2022-01-01 00:00:00') TO ('2023-01-01 00:00:00');
ALTER TABLE IF EXISTS public.sessions_01_01_2022_to_01_01_2023
OWNER to postgres;
SAN Storage
All aging data is moved from hot storage to a SAN storage for archival. SAN storage constitutes
of multiple modular storage modules of fix storage capacity. More storage modules can be
added any time to increase the capacity of overall storage. Archived data partitions can be
restored from the storage and attached to the main store for search queries or data analysis
purpose. SAN storage is connected to the database server through a fast fiber transport link for
fast transmission of data from the database. This enables fast read and write to and from the
database server.
Deployment
Source code is divided into two main project folders, the HbParsers and HbConsumers. The
HbParsers project includes the executables for Forwarding Agents and Server. The
HbConsumers project folder included the executables for Radius and Syslog consumers. Make
build of the executable components including
1. Radius & Syslog Forwarding Agent
2. Server
3. Radius Consumer
4. Syslog Consumer
A Make file has been included in each of the two project to quickly build the executables as
simple as executing the Make build command from the terminal. The target path for
copying the executables can be changed in the Make file.
Command line arguments:
HbParser
none
HbServer
–c {Number of concurrent workers}
–e {Purger interval for expiring sessions default is 5400 secs}
–v {info/debug}
Radcon
–i {Consumer unique Id number, default is 1}
-e {Number of entries to process, default is 100}
-v Log verbosity {info/debug}, default is info
Syscon
–i {Consumer unique Id number, default is 1}
-e {Number of entries to process, default is 100}
-v Log verbosity {info/debug}, default is info
Backend Application Components
Application components consist of components that serve the request from Dashboard
application. This consists of an API components, based on REST services that the Dashboard
application interacts with. The Api has restricted access only to be accessible from the
Dashboard application. It not only serves user search queries but also provides other
functionality related to
User authentication
User session management
User management
Role management
Destination management
Search
Data summarization and aggregation
Data export
Front-end Application
The front-end application runs on the intranet and consists of
Dashboard
Advanced Search
Destinations
User Management
Role Management
Data Export
Settings
The front-end application is the client application that connects to the backend Api for all data
requests. It is a modular application that has different components to manage each
functionality. User Management component allows to create and manage as many users as you
like. Each user has a designated roles for managing his/her rights of access on various
application areas. Using the combination of User and Roles, access to various functionality of
the application can be easily controlled. By default two built-in roles have been created
Admin
Search
Admin has a full-rights role that has access to all application. It can manage users and roles and
also create new roles with customized rights.
Search role has restricted access and it enables a user to access only the Advanced Search page
after a successful authentication.
HummingBird –Dashboard
HummingBird –Search
HummingBird – Configuration
HummingBird – Destinations/Hosts
HummingBird – Administration
HummingBird – Search Results Exported to Excel