parse nginx access log into sqlite database
解析的日志格式
log_format vhosts '$remote_addr - $remote_user [$time_local] "$request" '
'$status $body_bytes_sent "$http_referer" '
'"$http_user_agent" "$http_x_forwarded_for" '
'$host $request_length $bytes_sent $upstream_addr '
'$upstream_status $request_time $upstream_response_time '
'$upstream_connect_time $upstream_header_time';
依赖 sqlite3 , alpine 内静态编译可 apk add sqlite-dev sqlite-static
以下三个版本均在nimlang/nim:latest-alpine-slim
中编译
nim --mm:arc -d:release --dynlibOverride:libsqlite3 --passC:-flto --passL:-flto --passL:-s --passL:-static --passL:-lsqlite3 --opt:speed c main
g++ -Wall -std=c++20 -O3 -lsqlite3 main.cpp
g++ -Wall -std=c++20 -flto=auto -static-libstdc++ -static-libgcc --static -Wl,-Bstatic,--gc-sections -O3 -ffunction-sections -fdata-sections main.cpp -lsqlite3 -o log2sqlite
mac
gcc -Wall -std=c17 -O3 -lsqlite3 main.c
linux 静态编译,因strptime函数的申明问题(https://stackoverflow.com/questions/43460876/trouble-including-function-declaration-for-strptime)
我们需要使用gnu17
标准
gcc -Wall -std=gnu17 -flto=auto -static-libstdc++ -static-libgcc --static -Wl,-Bstatic,--gc-sections -O3 -ffunction-sections -fdata-sections main.c -lsqlite3 -o log2sqlite
注意: -lsqlite3
必须在main.c
之后
每秒约处理20万条日志,c++版本比c版本慢约10%,nim版本和c++版本速度相当
测试数据700MB+, 约200万行
版本 | 时间 |
---|---|
c | 7.395s |
c++ | 10.791s |
nim | 10.024s |
可使用sqlite3命令对数据进行分析
sqlite3 nginx_log.db 'select count(1) as n,request from nginx_log group by request order by n desc limit 5;'
或者sql查询存储到sql文件
sqlite3 nginx_log.db < /tmp/1.sql
sqlite3 nginx_log.db 'select count(1) as n,request,http_user_agent from nginx_log group by http_user_agent,request order by n desc limit 5;'
sqlite3 nginx_log.db 'select count(1) as n,request,remote_addr from nginx_log group by remote_addr,request order by n desc limit 5;'