E5C7 GitHub - IONOS-Core/dac: It's the DNS analytics collector (dac). Collects DNS data, processes them and put them into a datastore. · GitHub
[go: up one dir, main page]

Skip to content

IONOS-Core/dac

Repository files navigation

DAC - DNS Analytics Collector

A high-performance DNS analytics tool that collects, processes, and sends processed data to a datastore. DAC reads DNS traffic from PCAP files and stores the data in ClickHouse for analysis.

📋 Requirements

  • a 64-bit platform

▶️ Usage

⚡Basic Example

./dac --pcap-file input.pcap --ch-address localhost:9000

📈 With Metrics

./dac --pcap-file input.pcap \
      --metrics-file /var/lib/dac/metrics.json

🚀 High-Performance Configuration

./dac --pcap-file input.pcap \
      --ch-batch-size 20000 \
      --pcap-input-workers 32 \
      --worker-queue-length 50000
  • --ch-batch-size int affects the ClickHouse insert performance (good values are between 1000 - 20000)
  • --pcap-input-workers int affects the CPU usage for processing of the raw network packets (more is better)
  • --worker-queue-length int affects the RAM consumption for buffering the processed network packets in memory between the in- and output (recommended is not more than 4 * batch size)

⚙️ Configuration

CLI Options

Input Options

  • --input string - Input module to use (default: pcap)
  • --pcap-file string - PCAP file to read from
  • --pcap-input-workers int - Number of workers to process packets (default: thread count)

Output Options

  • --output string - Output module to use (default: clickhouse)
  • --ch-address string - ClickHouse address (default: localhost:9000)
  • --ch-database string - ClickHouse database name (default: default)
  • --ch-table-name string - ClickHouse table name (default: dns_packets)
  • --ch-username string - ClickHouse username (default: default)
  • --ch-password string - ClickHouse password
  • --ch-use-tls=bool - Enable TLS 1.3 for ClickHouse connection (default: false)
  • --ch-compression string - Compression algorithm for ClickHouse connection: disabled, lz4, zstd, none, lz4hc (default: disabled)
  • --ch-origin-server string - Origin server of the DNS packets
  • --ch-batch-size int - Batch size for ClickHouse insertion (default: 5000)
  • --ch-output-workers int - Number of workers for processing messages (default: thread count)
  • --ch-conn-pool-size int - Number of ClickHouse connections in the pool (default: output workers count)
  • --ch-migrate-schema=bool - Migrate ClickHouse schema on startup (default: true)

GDPR Options

  • --src-ipv4-prefix int - Mask source IPv4 addresses with the given prefix (default: 32)
  • --src-ipv6-prefix int - Mask source IPv6 addresses with the given prefix (default: 128)

Metrics Options

  • --metrics-file string - Path to JSON file for metrics persistence

Profiling Options

  • --profile string - Enable runtime profiling and write profiles to this directory (generates CPU and memory profiles)

General Options

  • --worker-queue-length int - In-memory worker queue size (default: 10000)
  • --verbose=bool - Enable verbose logging
  • --version - Print version and exit

📈 Metrics

DAC collects the following metrics:

  • packets_read - Packets successfully read from input source
  • packets_processed - Packets successfully processed and decoded
  • packets_sent - Packets successfully sent to ClickHouse
  • processing_errors - Errors during packet processing

Metrics File Format

All metrics are thread-safe and loaded on startup and saved on graceful shutdown. The metrics will be persisted every 10 seconds to disk.

Metrics are stored in JSON format:

{
  "packets_read": 1000,
  "packets_processed": 950,
  "packets_sent": 945,
  "processing_errors": 5
}

🛠️ Build

It's recommended to use the --profile flag to generate a cpu & memory profile. The resulting cpu profile should be renamed to default.pgo and placed in the same location as the makefile. You might want to benefit of the positive effects of PGO.

make build

🧪 Testing

make test

🏗️ Architecture

  • Input Module: Reads packets from PCAP files
  • Output Module: Batched insertion into ClickHouse
  • Metrics System: Concurrent-safe metrics collection with optional file-based persistence

Schema

The used clickhouse schema is stored in the clickhouse output module.

🚧 Current Limitations

  • Only PCAP files are supported as input (no live capture yet)
  • Only ClickHouse is supported as output
  • TCP reassembly is not supported
  • Broken packets are discarded
  • At-most-once delivery semantic
  • No graceful panic handling
  • Edns0Present can't be determined with 100% accuracy due to limitations of the dns library (see numeric_encoder.IsEDNS0Present for the current implementation)
  • It uses the greentea garbage collector because of the high gc performance (before go 1.26: GOEXPERIMENT=greenteagc must be set in your build/dev environment)

⚠️ Known Behaviors

  • DNS messages with multiple questions are duplicated into separate DNS database entries
    • Batches may occasionally exceed the configured batch size due to DNS message expansion because of multiple questions
  • Metrics integer overflow possible in theory. It's recommended to delete the metrics file sometimes or put in a temporary directory. When an overflow happen, dac doesn't panic, the variable will be automatically set to 0 and can be used without intervention.
  • Garbage collection needs a lot of cpu time. It's recommended to set the env variable GOGC to a value bigger then 100 to decrease the amount of gc cycles. Less gc cycles means more memory (RSS) usage.

📄 License

See LICENSE file for details.


ℹ️ Disclaimer: GitHub Copilot was used for test creation.

About

It's the DNS analytics collector (dac). Collects DNS data, processes them and put them into a datastore.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

0