Package grokky is a pure Golang Grok-like patterns library, which can help you to parse log files and other. This is based on RE2 regexp that much more faster than Oniguruma in some cases. Check out the "much more faster" article to understand the difference.
The library was disigned for creating many patterns and using it many times. The behavior and capabilities are slightly different from the original library. The goals of the library are:
- simplicity,
- fast,
- ease of use.
See also another golang implementation vjeantet/grok that is closer to the original library.
The difference:
-
The grokky allows named captures only. Any name of a pattern is just name of a pattern and nothing more. You can treat is as an alias for regexp. It's impossible to use a name of a pattern as a capture group. In some cases the grooky is similar to the grok that created as
g, err := grok.NewWithConfig(&grok.Config{NamedCapturesOnly: true})
. -
The grokky prefered top named group. If you have two patterns. And the second pattern has same named group and nested into first. Then the named group of the first pattern will be used. The grok uses last (closer to tail) group in any cases. But the grok also has
ParseToMultiMap
method. To see the difference explanation get the package (usinggo get -t
) and run the following commandgo test -v -run the_difference github.com/logrusorgru/grokky
. Or check out source code of the test. -
The grokky was designed as a factory of patterns. E.g. compile once and use many times.
go get -u -t github.com/logrusorgru/grokky
Run test case
go test github.com/logrusorgru/grokky
Run benchmark comparsion with vjeantet/grok
go test -bench=.* github.com/logrusorgru/grokky
package main
import (
"github.com/logrusorgru/grokky"
"fmt"
"log"
"time"
)
func createHost() grokky.Host {
h := grokky.New()
// add patterns to the Host
h.Must("YEAR", `(?:\d\d){1,2}`)
h.Must("MONTHNUM2", `0[1-9]|1[0-2]`)
h.Must("MONTHDAY", `(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9]`)
h.Must("HOUR", `2[0123]|[01]?[0-9]`)
h.Must("MINUTE", `[0-5][0-9]`)
h.Must("SECOND", `(?:[0-5]?[0-9]|60)(?:[:.,][0-9]+)?`)
h.Must("TIMEZONE", `Z%{HOUR}:%{MINUTE}`)
h.Must("DATE", "%{YEAR:year}-%{MONTHNUM2:month}-%{MONTHDAY:day}")
h.Must("TIME", "%{HOUR:hour}:%{MINUTE:min}:%{SECOND:sec}")
return h
}
func main() {
h := createHost()
// compile the pattern for RFC3339 time
p, err := h.Compile("%{DATE:date}T%{TIME:time}%{TIMEZONE:tz}")
if err != nil {
log.Fatal(err)
}
for k, v := range p.Parse(time.Now().Format(time.RFC3339)) {
fmt.Printf("%s: %v\n", k, v)
}
//
// Yes, it's better to use time.Parse for time values
// but this is just example.
//
}
Don't complicate regular expressions. Use simplest regular expressions possible. Here is example about Nginx access log, combined format:
h := New()
h.Must("NSS", `[^\s]*`) // not a space *
h.Must("NS", `[^\s]+`) // not a space +
h.Must("NLB", `[^\]]+`) // not a left bracket +
h.Must("NQS", `[^"]*`) // not a double quote *
h.Must("NQ", `[^"]+`) // not a double quote +
h.Must("nginx", `%{NS:remote_addr}\s\-\s`+
`%{NSS:remote_user}\s*\-\s\[`+
`%{NLB:time_local}\]\s\"`+
`%{NQ:request}\"\s`+
`%{NS:status}\s`+
`%{NS:body_bytes_sent}\s\"`+
`%{NQ:http_referer}\"\s\"`+
`%{NQ:user_agent}\"`)
nginx, err := h.Get("nginx")
if err != nil {
panic(err)
}
for logLine := range catLogFileLineByLineChannel {
values := nginx.Parse(logLine)
// stuff
}
or there is a version (thanks for @nanjj)
h := New()
h.Must("NSS", `[^\s]*`) // not a space *
h.Must("NS", `[^\s]+`) // not a space +
h.Must("NLB", `[^\]]+`) // not a left bracket +
h.Must("NQS", `[^"]*`) // not a double quote *
h.Must("NQ", `[^"]+`) // not a double quote +
h.Must("A", `.*`) // all (get tail)
h.Must("nginx", `%{NS:clientip}\s%{NSS:ident}\s%{NSS:auth}`+
`\s\[`+
`%{NLB:timestamp}\]\s\"`+
`%{NS:verb}\s`+
`%{NSS:request}\s`+
`HTTP/%{NS:httpversion}\"\s`+
`%{NS:response}\s`+
`%{NS:bytes}\s\"`+
`%{NQ:referrer}\"\s\"`+
`%{NQ:agent}\"`+
`%{A:blob}`)
// [...]
Since the
grokky.Pattern
inherits regexp.Regexp
, it's possible
to use methods of the regexp.Regexp
. E.g. you can to use
FindStringSubmatch
for example instead of (grokky.Pattern).Parse
. Or any other method of
the regexp.Regexp
.
Check out Benchmark_parse_vs_findStringSubmatch for example.
For my machine result of this becnhmark is (the map is Parse
, and the slice is
FindStringSubmatch
)
map-4 200000 9980 ns/op 1370 B/op 5 allocs/op
slice-4 200000 7508 ns/op 416 B/op 2 allocs/op
Copyright © 2016-2018 Konstantin Ivanov kostyarin.ivanov@gmail.com
This work is free. It comes without any warranty, to the extent
permitted by applicable law. You can redistribute it and/or modify
it under the terms of the Do What The Fuck You Want To Public License,
Version 2, as published by Sam Hocevar. See the LICENSE file for
more details.