In this guide, we’ll explore how you can leverage open source tools Graylog and Ansible - to gain control over what’s happening in your IT infrastructure with remote logging, analytics, and monitoring.
Our research has identified nearly 92% of small and mid-market businesses don’t have an existing log pipeline for real-time event monitoring.
Centralized log analytics platforms can serve as the key source of truth for effective IT operational and cyber incident response.
Follow along as we deploy a container solution and configure endpoints to push logs remotely.
Here’s what we’ll cover using Docker as the hosting platform of choice:
- Creating a MongoDb cluster for Log Storage using Docker Compose
- ElasticSearch for Accelerated Log Analytics
- Deploying Graylog Web Server
- Configuring Graylog to Accept Syslog Input using Web Interface
- Setting up Clients for Remote Logging with Ansible
Follow along by creating the following tree structure with a boilerplate docker-compose.yaml
.
1graylog-ansible-playbook2├── client3├── server4└────└── docker-compose.yaml
If you simply want the code, just clone the Github repo @DigitalTransformation/docker-graylog-ansible, set your environment variables, and deploy in minutes.
Requirements for Deployment
The foundation of this guide uses on-premise Docker infrastructure. Log servers are critical assets during incident response and ensuring a compliant deployment is a pre-requisite for getting started.
As a general rule, you should already be familiar with linux, containerization, and networking technologies but we’ve explained in-depth the reasoning behind each configuration for all developer audience levels.
Docker Environment
Docker Docs
Docker is a containerization platform that uses atomic images built from Dockerfile
recipies for preconfigured stacks.
IT Infrastructure Planning In the event of a cyber incident, logs quickly become the organizations most valuable risk-management asset and having the right IT infrastructure is imperative.
While the docker runtime itself can be virtualized on shared infrastructure, a dedicated always-on Unix host with power and network redundancy is strongly recommmended.
Similarly, if you’re running Podman as a docker compatible instance, some of the scripts mentioned may need to be adapted as we make heavy use of docker-compose
.
Security Considerations
To ensure compliance with PMO IT, we’ll reference the open source docker/docker-bench-security tool.
The host node should be configured with audit.rules
and validated as meeting key security specifications.
Securing the log server is just as critical as client endpoints.
Configuring Clients with Ansible Playbooks
Ansible Docs Ansible is a versatile framework for automation of operational IT routines known as Ansible Playbooks.
We’ll configure unix clients to pipeline logs into our log analytics server using a standardized approach.
If this is your first time using Ansible, check out the ansible.setup.sh script deployable over SSH to configure Fedora/CentOS/RHEL compatible hosts and clients.
1. Creating a MongoDb container for Log Storage using Docker Compose
In the docker-compose.yaml
we’ll use the following notation in the header to describe the file, useful when inspecting contents with the $ head
command.
1# --------------------------------2# Infra::Docker::Graylog+MongoDB+ElasticSearch3# Docs: << docs link >>4# Github: << github repo >>5# --------------------------------
Graylog has out of the box support for different database servers but we’ve selected MongoDb due to it’s recognized high-availability and scalability in handling semi-structured log data.
Mongo is provided as an official image built and tested from source on Docker Hub with a CI/CD pipeline. However, only the last major release of Mongo is supported by Graylog due to the database scaffolding scripts used.
docker-compose.yaml | Github Source
1version: '3.3'23services:4 # MongoDb Container5 mongodb:6 image: mongo:37 container_name: qone.graylog.mongodb8 restart: always9 volumes:10 - ./mongodb:/data/db11 - ./mongodb/mongo-init.js:/docker-entrypoint-initdb.d/mongo-init.js:ro12 env_file:13 - mongo.env14 networks:15 - d1_graylog16 ports:17 - 9052:2701718 deploy:19 resources:20 limits:21 cpus: '1'22 memory: 1G
Let’s breakdown each component of the mongodb
services section in more detail to understand the reasoning behind these properties.
restart Restart policy must be configured as a precaution for failure management such as I/O errors thrown from a remounting disk.
Note that simply restarting the container itself is not sufficient and should be paired with other approaches like database and volume cloning (i.e. RAID1) for redundancy.
volumes
For the purposes of this guide, we’ll use an isolated and mounted filesystem in /etc/fstab
for MongoDb data and can be any arbitrary subdirectory on the host.
Run the following shell command to prevent accidental deletion of the path: $ sudo chattr +i ./mongodb
.
Note: Docker security specs recommend isolating the runtime from both data storage and the operating system. To define another storage medium, change the rules in /etc/docker/daemon.json
mentioned in docker docs otherwise proceed with caution.
env_file: mongo.env | Github Source
To manage securables in docker-compose, place the credentials in a seperate mongo.env
file. This can be excluded globally with **/*.env
in .gitignore
to prevent leakage of secrets in code repositories.
The env file will contain parameters for the root
credentials for the MongoDb cluster from external connections outside of graylog. Replace the following <<placeholders>>
accordingly.
1MONGO_INITDB_DATABASE=<<root_db>>2MONGO_INITDB_ROOT_USERNAME=<<root_username>>3MONGO_INITDB_ROOT_PASSWORD=<<secure_password>>
mongo-init.js | Github Source
Since Mongo does not include a default database constructor, to generate a custom database for graylog we’ll need to clone the mongo-init.js and place in root where docker-compose.yaml
is going to be executed from on the host.
Let’s take a closer look at mongo-init.js
which requires setting custom credentials for the new graylog database.
1db.createUser(2 {3 user: "<<graylog_user>>",4 pwd: "<<password>>",5 roles: [6 {7 role: "readWrite",8 db: "graylog" # <<graylog_database>>9 }10 ]11 }12);
Take careful note of these values as they’ll be used to configure graylog.conf
connection to the database later in the guide.
networks
While docker creates a default
virtual network on the host, we recommend setting an isolated vnet with a meaningful name that can be inspected from $ docker networks
command.
The same network must be declared on all services and mentioned globally outside the services
section in the yaml
file. Refer to the Docker Docs: Compose File v3 Reference - Network Driver on selecting the driver type overlay (swarm) || bridge (default)
suitable for your host environment.
1services:2 mongodb:3 ...4 networks:5 - d1_graylog6 elasticsearch:7 ...8 networks:9 - d1_graylog10 graylog:11 ...12 networks:13 - d1_graylog1415networks:16 d1_graylog:17 driver: bridge
ports
The default MonoDb port is 27017
which can be reference under expose
or by using host forwarding 0.0.0.0:{{HOST_PORT}} -> 172.0.0.1:27017 {container}
.
We’ll expose the database port as it’ll allow us to query Mongo directly for services such as a Grafana Dashboard or for Machine Learning use cases on scalable datasets.
Note that the Mongo port 27017
takes presedence in graylog.conf
because it’s on an internal docker vnet.
1services:2 mongodb:3 ...4 ports:5 - {{HOST_PORT}}:27017
Be careful exposing this port on public servers that aren’t behind a firewall as vulnerabilities can lead to exposing potentially sensitive data.
Clients will continue to push logs into Graylog server as long as they’re on the same internal network or connected over corporate VPN.
deploy
Next, to better comply with Docker Audit rules we provision soft resource limits on each of the container services. Generally the following limits are sufficient to intake logs from up to 50 clients
in real time.
Resource limits prevent leakage of system resources such as during a Distributed Denial of Service (DDoS) attack from crashing the container host.
1deploy:2 resources:3 limits:4 cpus: '0.5'5 memory: '1G'
The same convention with modified values can be used for graylog
and elasticsearch
container services.
2. ElasticSearch for Accelerated Log Analytics
ElasticSearch provides hot cache acceleration of search on large datasets such as unstructured logs and we’ll deploy it alongside Graylog to improve our analytics workflow.
Here’s how you can define the container in your docker-compose.yaml.
1# ---------------------------2# Elasticsearch::Cache3# Docs: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/docker.html4# ---------------------------5 elasticsearch:6 image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.107 container_name: graylog.escache8 restart: always9 volumes:10 - ./escache:/usr/share/elasticsearch/data11 networks:12 - d1_graylog13 environment:14 - http.host=0.0.0.015 - http.port=920016 - transport.tcp.port=930017 - transport.host=localhost18 - network.host=0.0.0.019 - "ES_JAVA_OPTS=-Xms512m -Xmx512m"20 ulimits:21 memlock:22 soft: -123 hard: -124 deploy:25 resources:26 limits:27 cpus: '2'28 memory: 2G
If you’re already familiar with Elasticsearch feel free to skip this section as we’ll breakdown each component of the deployment.
volumes
ElasticSearch requires a local tmp fs directory which can be placed on the same relative volume as MongoDB.
For instances with large clients, use a dedicated flash storage medium and $ ln -s <<source>> ./escache
link the path.
1volumes:2 - ./escache:/usr/share/elasticsearch/data
ulimits
To learn more about memlock
refer to the SLES Docs - Memlock which provides an in-depth breakdown.
environment
Since there are no securables in elasticsearch, the network is internal only. These are default values already setup in the generated graylog.conf
and don’t need to be changed, more on that below.
networks
Recall from the MongoDB breakdown, we’re using the same network for this service.
3. Deploying Graylog Web Server using Docker Compose
The graylog image can be found in Docker Hub with extensible configuration. As of this guide, there are three release cadence branches but it is strongly recommended to use the latest production variant.
Graylog does require setting properties manually in a number of configuration files. Here’s the source folder on Github with the required scripts.
1# ---------------------------2 # Graylog::Production3 # Docs: https://hub.docker.com/r/graylog/graylog/4 # ---------------------------5 graylog:6 image: graylog/graylog:3.37 container_name: graylog.server8 restart: always9 volumes:10 - ./gldata/config:/usr/share/graylog/data/config11 networks:12 - d1_graylog13 - default14 depends_on:15 - mongodb16 - elasticsearch17 ports:18 # Host:Container19 # Graylog Web Interface and REST API20 - 9050:905021 # Syslog TCP22 - 514:51423 # Syslog UDP24 - 514:514/udp25 # GELF TCP26 - 9051:1220127 # GELF UDP28 - 9051:12201/udp29 deploy:30 resources:31 limits:32 cpus: '2'33 memory: 4G
container_name
Following the convention we’ve been using in this guide, set a developer friendly container_name
.
restart
Docker audit recommends a detailed restart
policy be limited with max_retries
.
In case of failure of the database container, using an asynchronous approach that awaits for dependency restart can be padded with a buffer delay such as 15000ms
.
However for always on log collection servers, an exception to use always
to all services can be made.
volumes
Clone the file config.sh to generate a runtime configuration for graylog. It includes default values where you can populate the MongoDb connectionString
securables.
Generated with $ ./config.sh
, place the edited graylog.conf in the volume path ./gldata/config
where the container is going to be deployed from.
graylog.conf
Here’s the sections to edit in the generated graylog.conf, along with detailed explanations below:
1password_secret = << 96_char_token >>23root_username = << admin_user >>4root_password_sha2 = << shasum >>5root_email = << consulting@quant.one >>67# CEST: New York / Toronto8root_timezone = America/Atikokan
Graylog uses 96 char
hash for securable key rotation which can be generated using rotate_key.sh and output stored in password_secret
.
Set the credentials for the root user that’ll be used to login to the web interface. As graylog.conf is unencrypted, generate a SHA256 SHASUM of your input password using shasum.sh and store the hash in root_password_sha2
.
The timezone should be localized to the clients feeding inputs and will be rendered in log dashboards. If you’re managing geo-distributed systems, use the local time for the PMO office.
depends_on
To allow MongoDb and Elasticsearch container services to complete entrypoint
initialization, we’ll use the docker compose asynchronous await model in depends_on
.
Otherwise, a fault may occcur while Graylog attemps to connect to an uninitialized container.
deploy
Modified values have been applied for streaming data and input workers. Refer to the explanation in the MonoDB section above.
docker-compose.yaml
Finally, if you’ve followed along you should end up with a completed docker-compose.yaml
covering the sections mentioned (services, networks, volumes and compliance) that looks something like this.
MongoDb + ElasticSearch + Graylog Container Stack | Github Source
1# --------------------------------2# Infra::Docker::Graylog+MongoDB+ElasticSearch3# Docs: https://docs.graylog.org/en/3.3/pages/installation/docker.html4# Github: https://github.com/DigitalTransformation/Log.Analytics.Graylog.Ansible/5# --------------------------------67version: '3.3'89services:10 # ---------------------------11 # MongoDB::Data12 # Docs: https://hub.docker.com/_/mongo/13 # ---------------------------14 mongodb:15 image: mongo:316 container_name: graylog.mongodb17 restart: always18 volumes:19 - ./mongodb:/data/db20 - ./mongodb/mongo-init.js:/docker-entrypoint-initdb.d/mongo-init.js:ro21 env_file:22 - mongo.env23 networks:24 - d1_graylog25 ports:26 - 9052:2701727 deploy:28 resources:29 limits:30 cpus: '1'31 memory: 1G3233 # ---------------------------34 # Elasticsearch::Cache35 # Docs: https://www.elastic.co/guide/en/elasticsearch/reference/6.x/docker.html36 # ---------------------------37 elasticsearch:38 image: docker.elastic.co/elasticsearch/elasticsearch-oss:6.8.1039 container_name: graylog.escache40 restart: always41 volumes:42 - ./escache:/usr/share/elasticsearch/data43 networks:44 - d1_graylog45 environment:46 - http.host=0.0.0.047 - http.port=920048 - transport.tcp.port=930049 - transport.host=localhost50 - network.host=0.0.0.051 - "ES_JAVA_OPTS=-Xms512m -Xmx512m"52 ulimits:53 memlock:54 soft: -155 hard: -156 deploy:57 resources:58 limits:59 cpus: '2'60 memory: 2G6162 # ---------------------------63 # Graylog::Production64 # Docs: https://hub.docker.com/r/graylog/graylog/65 # ---------------------------66 graylog:67 image: graylog/graylog:3.368 container_name: graylog.server69 restart: always70 volumes:71 - ./gldata/config:/usr/share/graylog/data/config72 networks:73 - d1_graylog74 - default75 depends_on:76 - mongodb77 - elasticsearch78 ports:79 # Host:Container80 # Graylog Web Interface and REST API81 - 9050:905082 # Syslog TCP83 - 514:51484 # Syslog UDP85 - 514:514/udp86 # GELF TCP87 - 9051:1220188 # GELF UDP89 - 9051:12201/udp90 deploy:91 resources:92 limits:93 cpus: '2'94 memory: 4G9596networks:97 d1_graylog:98 driver: bridge
1graylog-ansible-playbook2├── server3│ ├── config.sh4│ ├── mongo.env5│ ├── mongo-init.js6│ ├── docker-compose.yaml7│ ├── rotatekey.sh8└────└── shasum.sh
With the server configuration completed, go ahead and deploy the infrastructure to your docker container using the docker-compose
command.
$ docker-compose -f docker-compose.yaml up
Once the images are pulled and deployed, you should end up with a running Graylog, MongoDb, and ElasticSearch stack.
We strongly recommended using a failover cluster with reverse proxy or load balancing for scalable event handling in large volume client situations.
The deployment can be adapted for Docker Swarm multi-node clusters by updating the variant spec to version: 3.7
and referencing the compose file specification.
4: Configuring Graylog to Accept Syslog Input using Web Interface
To start accepting new data inputs, launch the Graylog webserver running on your https://node_ip:graylog_port/
and authenticate using the credentials previously defined in configuration.
Create a new input worker to accept 514/tcp
and 514/udp
by following the guide provided by graylog.
5. Setting up Clients for Remote Logging with Ansible
This last part configures Unix clients to push system log events into the Graylog server on 514/tcp | 514/udp
ports.
One such linux runtime is rsyslog that runs as a common linux systemd
service for event handling and emitting to virtually any output if a condition is met.
We’ll configure it on clients using Ansible by creating an operational routine known as a playbook. Here’s an example below defined in yaml
markup that installs rsyslog
packages, copies the config file, and launches as a startup service.
Replace the placeholder <<USER>>
with that of the username on the client machine. If you’re configuring this on more than one client, checkout the docs on Ansible inventories.
1# --------------------------2# Ansible Playbook3# Configure unix daemon services for rsyslog4# Creates pipeline into Graylog from client endpoints5# systemd: rsyslog.unit6# --------------------------78- hosts: all9 remote_user: <<USER>>10 become: yes1112 tasks:13 - name: Configure services for ryslog14 block:15 - name: Package dependencies16 yum:17 state: present18 name:19 - rsyslog2021 - name: Set Default Config Params22 copy:23 src: rsyslog.conf24 dest: /etc/rsyslog.conf25 force: true2627 - name: Configure System Service28 systemd:29 name: rsyslog30 enabled: yes31 state: started3233# --------------------------
rsyslog.conf | Github Source
Change the daemon settings in rsyslog.conf
to match the Graylog server as the target
. For the purposes of this guide, we’ve setup rsyslog to push all logs with the rule.
1# OnMessage: Forward Default Message2*.* action(3 type="omfwd"4 target="<<IP_ADDRESS>>"5 port="514"6 protocol="udp"7 template="RSYSLOG_SyslogProtocol23Format"8)
If you’re satisfied with the condition, simply deploy the playbook on clients using the bash command:
1$ ansible-playbook -i target.env \2 -k --ask-become-pass \3 ./configure_rsyslog.yaml
Once it’s done running the tasks, you should start to see data streaming into Graylog within seconds of log events happening on clients.
Here’s the completed tree structure for both client and server deployments that’s reusable across your IT infrastructure portfolio.
1graylog-ansible-playbook2├── client3│ ├── configure_rsyslog.yaml4│ ├── target.env5├── server6│ ├── config.sh7│ ├── mongo.env8│ ├── mongo-init.js9│ ├── docker-compose.yaml10│ ├── rotatekey.sh11└────└── shasum.sh
For the full code sample, checkout the Github repository.
Conclusion
Graylog provides a powerful platform for log analytics, setting up monitoring alerts, and visualizing data in dashboards.
Now that you’re collecting data, the quest begins to analyze it! With streaming data pipelines comes great complexity, some of our clients can generate millions of log events an hour.
Get in touch to learn how we can deliver tailored insights to optimize your IT budget spend and mitigate cyber risk.
In a follow up post we’ll explore using Rundeck and Ansible Playbooks as an automated response to threat incident response. Subscribe to stay notified.