This is where Grok patterning comes in.
But before I show you the grok pattern for bitbucket logs, i will like to show you the difference between the customary nginx log format and bitbucket logs.
Bitbucket has a couple of logs, atlassian-bitbucket-access.log, atlassian-bitbucket.log, atlassian-bitbucket-audit.log and couple of others; but for the sake of this post we will compare nginx's access.log and atlassian's atlassian-bitbucket-access.log.
This is assuming you are using nginx as a load balancer for your bitbucket server installation, either for proxy redirection or some form of TLS termination. Whatever your reason is, nginx will front your service and hand over communication to the bitbucket service running on the backend. So we have 2 access points, one - Nginx, the other Atlassian bitbucket Apache.
Here are some excerpts
NGINX Logs
172.16.68.113 - - [13/Apr/2019:11:43:24 -0500] "GET /scm/dt/repo.git/info/refs?service=git-upload-pack HTTP/1.1" 401 5 "-" "git/2.17.1" 172.16.68.113 - user [13/Apr/2019:11:43:24 -0500] "GET /scm/dt/repo.git/info/refs?service=git-upload-pack HTTP/1.1" 200 2656 "-" "git/2.17.1" 172.16.68.113 - - [13/Apr/2019:11:43:26 -0500] "GET /scm/dt/repo.git/info/refs?service=git-upload-pack HTTP/1.1" 401 5 "-" "git/2.17.1" 172.16.68.113 - user [13/Apr/2019:11:43:26 -0500] "GET /scm/dt/repo.git/info/refs?service=git-upload-pack HTTP/1.1" 200 2656 "-" "git/2.17.1" 172.16.68.133 - - [13/Apr/2019:11:43:31 -0500] "GET /scm/dt/repo.git/info/refs?service=git-upload-pack HTTP/1.1" 401 5 "-" "git/2.17.1" 172.16.68.133 - user [13/Apr/2019:11:43:31 -0500] "GET /scm/dt/repo.git/info/refs?service=git-upload-pack HTTP/1.1" 200 2656 "-" "git/2.17.1"
Bitbucket Logs
172.16.68.113,127.0.0.1 | https | i@1Q4STNAx702x1743109x3 | - | 2019-04-13 11:42:49,034 | "GET /scm/dt/repo.git/info/refs HTTP/1.0" | "" "git/2.17.1" | - | - | - | - | - | - | 172.16.68.113,127.0.0.1 | https | o@1Q4STNAx702x1743109x3 | - | 2019-04-13 11:42:49,036 | "GET /scm/dt/repo.git/info/refs HTTP/1.0" | "" "git/2.17.1" | 401 | 0 | 0 | - | 2 | - | 172.16.68.113,0:0:0:0:0:0:0:1 | https | i@1Q4STNAx702x1743110x3 | - | 2019-04-13 11:42:49,038 | "GET /scm/dt/repo.git/info/refs HTTP/1.0" | "" "git/2.17.1" | - | - | - | - | - | - | 172.16.68.113,0:0:0:0:0:0:0:1 | https | o@1Q4STNAx702x1743110x3 | cicd | 2019-04-13 11:42:49,127 | "GET /scm/dt/repo.git/info/refs HTTP/1.0" | "" "git/2.17.1" | 200 | 0 | 2644 | cache:hit, refs | 89 | - |
These are log entries from the same session from both the frontend NGINX and backend BITBUCKET process.
If you want to read more on interpreting atlassian access logs, you can this read up here and if you wonder why a session is giving so much 401s, read up here
Collecting NGINX logs is easy enough, if you run a Telegraf agent on the host, you can configure telegraf with the input below to monitor the NGINX process and parse its access logs
[[inputs.nginx]] ## An array of Nginx stub_status URI to gather stats. urls = ["https://localhost:80/nginx_status"] # # Stream and parse log file(s). [[inputs.logparser]] files = ["/var/log/nginx/access.log"] ## Read file from beginning. from_beginning = false [inputs.logparser.grok] patterns = ["%{COMBINED_LOG_FORMAT}"] measurement = "nginx_access_log"
This is where grok patterns come in.
The Grok pattern for bitbucket based on the log strings above could look like this
%{IP:clientIP},%{IP:localIP} \| %{WORD:protocol:tag} \| %{DATA:requestID} \| %{USERNAME:username} \| %{TIMESTAMP_ISO8601:timestamp} \| "%{DATA:action} %{DATA:resource} %{DATA:http_version}" \| "" "%{DATA:request_details}" \| %{NUMBER:resp_code} \| %{NUMBER:bytes_read} \| %{NUMBER:bytes_written} \| %{DATA:labels} \| %{NUMBER:resp_time} \| %{DATA:session_id} \|
I have tried to follow this format
%{<capture_syntax>[:<semantic_name>][:<modifier>]} (the almighty Grok format)
The semantic_name for each capture syntax closely follow atlassian's recommendation and whats obtainable in the combined_log format. This is to allow for comparison between NGINX logs and Atlassian logs.
Creating an input block to collect these logs should be pretty easy from here
Add this block to telegraf config
[[inputs.logparser]] files = ["/var/atlassian/application-data/stash/log/atlassian-bitbucket-access.log"] from_beginning = false [inputs.logparser.grok] patterns = ["%{HTTP}"] measurement = "bitbucket_access_log" custom_patterns = ''' HTTP %{IP:clientIP},%{IP:localIP} \| %{WORD:protocol:tag} \| %{DATA:requestID} \| %{USERNAME:username} \| %{TIMESTAMP_ISO8601:timestamp} \| "%{DATA:action} %{DATA:resource} %{DATA:http_version}" \| "" "%{DATA:request_details}" \| %{NUMBER:resp_code} \| %{NUMBER:bytes_read} \| %{NUMBER:bytes_written} \| %{DATA:labels} \| %{NUMBER:resp_time} \| %{DATA:session_id} \| '''
You could also collect atlassian audit logs with this block in your telegraf configuration
[[inputs.logparser]] files = ["/var/atlassian/application-data/stash/log/audit/atlassian-bitbucket-audit.log"] from_beginning = false [inputs.logparser.grok] patterns = ["%{AUDIT_LOG}"] measurement = "stash_audit_log" custom_patterns = ''' AUDIT_LOG %{IP:clientIP},%{IP:local} \| %{WORD:eventType} \| %{USERNAME:user:tag} \| %{INT:msSinceJan11970} \| %{USERNAME:eventDetails} \| \{\"authentication-method\":\"%{WORD:authenticationMethod}\"\,\"error\"\:\"%{DATA:authError}\"\} \| %{DATA:requestID} \| %{DATA:sessionID} '''
No comments:
Post a Comment