aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
aster.cloud aster.cloud
  • /
  • Platforms
    • Public Cloud
    • On-Premise
    • Hybrid Cloud
    • Data
  • Architecture
    • Design
    • Solutions
    • Enterprise
  • Engineering
    • Automation
    • Software Engineering
    • Project Management
    • DevOps
  • Programming
    • Learning
  • Tools
  • About
  • Engineering

Getting Started With Fluentd For Data Collection

  • aster.cloud
  • December 26, 2022
  • 9 minute read

Fluentd is an open source data collector capable of retrieving and receiving event data from several sources and then filtering, buffering, and routing data to different compatible destinations. It utilizes a plugin system to help you quickly set up specific inputs, apply any required filtering, and send data to your preferred data ingestion platform.

Fluentd supports multiple sources and destinations, and it can be deployed to multiple operating systems, including Windows, Linux, and macOS.


Partner with aster.cloud
for your next big idea.
Let us know here.



From our partners:

CITI.IO :: Business. Institutions. Society. Global Political Economy.
CYBERPOGO.COM :: For the Arts, Sciences, and Technology.
DADAHACKS.COM :: Parenting For The Rest Of Us.
ZEDISTA.COM :: Entertainment. Sports. Culture. Escape.
TAKUMAKU.COM :: For The Hearth And Home.
ASTER.CLOUD :: From The Cloud And Beyond.
LIWAIWAI.COM :: Intelligence, Inside and Outside.
GLOBALCLOUDPLATFORMS.COM :: For The World's Computing Needs.
FIREGULAMAN.COM :: For The Fire In The Belly Of The Coder.
ASTERCASTER.COM :: Supra Astra. Beyond The Stars.
BARTDAY.COM :: Prosperity For Everyone.

In this article, you’ll learn about Fluentd and how to install it on a Ubuntu Linux system, receive data from a syslog source, and then send the syslog event data to an InfluxDB instance.

Fluentd overview

In addition to supporting multiple sources and destinations, Fluentd has a few benefits that make it stand out from other data collection tools, including the following:

Unified logging layer

Fluentd endeavors to be the middleman for all your data sources and destinations. By treating all logs as JSON, Fluentd lets you apply the same rules to different data sources and make sure that the result is readable by your destination system, whether that system is a MongoDB instance or an OpenSearch cluster.

It is also horizontally scalable and has been used in organizations to collect data from thousands of machines, so you should be able to safely implement it as the centralized log event routing engine for your organization.

Source and destination agnostic

The unified logging layer is enabled because Fluentd supports receiving data from dozens of different sources. Those same source events can then be filtered, formatted, modified, and sent to one of the dozens of supported destinations.

This is highlighted by the robust plugin system that Fluentd uses to receive, transform, and send data to multiple destination types. These plugins support everything from specific tools like NGINX to general network protocols like MQTT or HTTP.

Low memory footprint

Fluentd has been developed using a combination of the C programming language and Ruby, which in turn enables it to use very little memory (about 40 MB) when it’s installed and run on a system.

Fluentd claims that in its default configuration, a Fluentd instance should be able to handle about 13,000 events/sec/core. It also has a lightweight event forwarder version called Fluent Bit, in case your memory requirements are smaller than 40 MB.

Strong community

When choosing any tool you want to know if it will be supported long term. Fluentd is a graduated CNCF project with many companies contributing and using Fluentd in production. The Fluent Bit component of the project recently crossed 3 billion downloads and growth is actually accelerating rapidly, with daily downloads going from around 3 million per day at the start of 2022 to around 14 million per day in October of that year.

All this is simply to show that you can rest easy if you choose to use Fluentd as part of your application’s architecture.

InfluxDB overview

InfluxDB is an open source time series database purpose-built for storing and querying time series data.

Some advantages of using InfluxDB compared to a more general database for time series data:

  • Performance optimized for writing large volumes of data and querying ranges of data based on timestamp
  • Improved data compression due to column oriented storage which lowers storage costs
  • Built-in tools for working with time series data that improve developer productivity
  • Flexible deployment options that can be optimized for edge computing and IoT environments, hybrid architectures, or pure cloud deployments
Read More  Reimagining AutoML With Google Research: Announcing Vertex AI Tabular Workflows

Implementing Fluentd with InfluxDB

In this tutorial, you’ll be making use of a cloud-based instance of InfluxDB to simplify setting up an InfluxDB database. You could set up an open source instance of InfluxDB on your local desktop or laptop, but that won’t be the focus of this tutorial.

You can view the contents of the configuration files in this GitHub repository.

Create an InfluxDB Cloud instance

To begin, you need to create an InfluxDB Cloud instance. If you don’t already have one, go ahead and sign up for one on InfluxDB’s website:

Once you’ve filled in the details and entered a secure password, you’ll be asked to choose where you want to store your data:

When choosing a provider and region, try and find a combination that is the closest to you geographically. Take note of which region and provider you selected, as you will need this information later.

Next, you’ll be asked which plan you want to sign up for. For the purpose of this tutorial, the free plan is sufficient:

Once you’ve selected your plan, you’ll be redirected to your InfluxDB Cloud instance’s dashboard. From here, you’ll be able to monitor your data sources, create dashboards to visualize your data, and more:

Create a bucket

After reviewing your dashboard, it’s time to set up a bucket where you will store your data. A bucket is a logical store, like a database, where you store events of a similar nature. Often, individuals choose to split different application logs into different buckets. On the free tier, you’re allowed a total of two buckets.

Now you need to create a bucket that will hold your syslog data. Begin by selecting Load Data on the left navigation menu:

Then select Buckets > CREATE BUCKET:

Give your bucket a name and set a retention period (the default of thirty days is fine). Select CREATE when you’re done:

Now you should see your newly created bucket listed with the other two built-in system buckets:

Generate an API token

Next, you need to create an API token so that Fluentd can authenticate and send data to your newly created bucket. Navigate to the Load Data screen again and click on API TOKENS > GENERATE API TOKEN. Then you need to select All Access API Token:

Please note: In a production environment, you want to be more careful with your configuration, following the principle of least privilege.

Add a description for your token, and then click SAVE:

The final screen will display your token. Copy and paste the token somewhere safe before continuing:

Remember that you don’t have to sign up for the cloud-based instance of InfluxDB. If you’re comfortable installing your own instance on your own machine, go ahead and do that. The rest of the instructions shouldn’t differ too much if you decide to install a local copy of InfluxDB.

Install and configure Fluentd

Using the RubyGems installation method should work on almost any mainstream Linux distribution. RubyGems is Ruby’s package manager and a standard way of distributing Ruby libraries.

The following commands are all entered into the terminal application of your choice. In Ubuntu, for example, most people would use the GNOME Terminal that comes installed with the operating system.

Before continuing, you need to have Ruby installed on your OS. If, for some reason, you don’t have it installed already, you can follow these instructions to install it.

Additionally, you’ll also need the ruby-dev package available:

<code>sudo apt install ruby ruby-dev

After installing Ruby, use it to install the Fluentd agent with this command:

<code>gem install fluentd --no-doc --user-install

This will install the Fluentd agent as a Ruby gem for your current user.

Read More  C Vs. Go: Comparing Programming Languages

Depending on your shell, you may need to add the RubyGems directory to your path so that the executables can be run from anywhere:

<code>export PATH="~/.local/share/gem/ruby/3.0.0/bin:$PATH"

Make sure that the version of Ruby you specify in the $PATH variable corresponds to the version of Ruby installed on your system. If you have a new Ubuntu system, Ruby 3.0 will need to be installed.

Now you need to configure the Fluentd agent with a default configuration in your user’s home directory and then test to make sure it’s working:

<code>fluentd --setup ~/fluent
fluentd -c ~/fluent/fluent.conf -vv &
echo '{"json":"message"}' | fluent-cat debug.test

The previous commands will set up Fluentd with a clean config inside a ./fluent directory. Then they’ll run Fluentd in daemon mode using the bare configuration file and send a test message to the Fluentd daemon.

If you get an error, then that means there’s still something wrong with your $PATH variable.

If the configuration goes well, you should see the following test message in the console:

You may notice that all sorts of messages from the daemon are flooding your console. You can stop that using CTRL-C or killall fluentd from another terminal process. The daemon will exit gracefully, and you can continue setting everything up.

Now you need to grab the InfluxDB output plug-in. Conveniently, Fluentd has its own RubyGems-like system called fluent-gem for managing and downloading plug-ins. You need v2 of the InfluxDB plug-in since your hosted InfluxDB instance uses the newer version.

Install the plug-in with the following command:

<code>fluent-gem install influxdb-plugin-fluent --user-install

Then configure Fluentd with a clean configuration so it will only do what you need it to do. Edit your .conf file using your text editor of choice. The example is using vi:

<code>vi ~/fluent/fluent.conf

Replace the contents of the file with the following:

<source>
@type syslog
port 41514
source_address_key src_ip
facility_key facility
severity_key severity
<parse>
message_format auto
</parse>
tag influxdb2
</source>
<match influxdb2.*.*>
@type influxdb2
url https://westeurope-1.azure.cloud2.influxdata.com
org 1234abcd4567ef00
token your-api-token-you-created-earlier
bucket syslog
flush_interval 30s
</match>

view rawfluent.txt hosted with ❤ by GitHub

Here, the <source> section sets up a listener on UDP port 41514. You can use any high-numbered available port in your config.

Then a field called source_address_key is added and gets the value of the source IP that sent the syslog event. Two more fields called facility_key and severity_key are added, and those values are retrieved from the syslog message.

Next, the <parse> section determines which RFC specification the syslog message is in and parses it accordingly. Finally, there’s an internal tag called influxdb2 so that you can target it later in the <match> section.

In the <match> section, the section header matches all events tagged with influxdb2.

@type tells Fluentd which plug-in to use, and the url can vary depending on where you selected to host your instance when you first set up your InfluxDB account. Take a look at the InfluxDB docs to get the correct URL for your instance.

The org is the organization that you set up earlier. However, the value is not a string. If you look at the URL for your InfluxDB Cloud instance, the sixteen-character code behind .../orgs/ in the URL is your organization code that you need to put into the configuration file:

Read More  Speed Up Model Inference With Vertex AI Predictions’ Optimized Tensorflow Runtime

Then the token is the token you created earlier in the InfluxDB Cloud frontend, and the bucket value references the bucket you created in the frontend. Finally, flush_interval 30s tells Fluentd that it should buffer events but send whatever it has in the buffer every thirty seconds.

Now you need to save this file and then start Fluentd in daemon mode using the new fluentd -c ./fluent/fluent.conf -vv configuration file. Then Fluentd will set up a listener on the port you specified and make a connection to the InfluxDB Cloud instance.

Configure Syslog to send to Fluentd

There is a very good chance that your Linux distribution already has some version of syslog installed. Ubuntu comes installed with rsyslog, a popular choice for a syslog server. If you don’t have a syslog package installed, you may have to look up the installation documentation for your particular version of Linux.

After you install syslog, you can go ahead and edit the default rsyslog configuration file:

<code>sudo vi /etc/rsyslog.conf

At the bottom of the file, add the following:

<code># forward everything to fluentd
*.* @127.0.0.1:41514

This sets up rsyslog to send every syslog message that it receives to the port that Fluentd is listening on.

Once you’ve saved the file, go ahead and restart the rsyslog daemon:

<code>sudo systemctl restart rsyslog

Send a test Syslog message

If you’re lucky, you may already see some system-level syslog messages arrive in your bucket.

If not, you can run a test message to see if it’s working. To do so, enter the following command in your terminal:

logger -d -p local0.debug "Testing Syslog to Influxdb cloud"Copy

Confirm that your data arrived

To confirm that your data arrived, navigate to your InfluxDB Cloud home page and select Data Explorer on the left-hand navigation menu:

To speed things up, switch from the default Query Builder view to Script Editor:

Inside the Script Editor window, copy and paste the following query:

from(bucket: “syslog”)
|> range(start: v.timeRangeStart, stop: v.timeRangeStop)
|> filter(fn: (r) => r[“_measurement”] == “influxdb2.local0.debug”)
|> filter(fn: (r) => r[“_field”] == “facility” or r[“_field”] == “host” or r[“_field”] == “severity”)
|> unique()
|> yield(name: “unique”)

view rawflux.txt hosted with ❤ by GitHub

This script is written in Flux, InfluxDB’s querying and scripting language that can be used for simple queries as well as building complex data transformation and manipulation scripts. InfluxDB also recently added support to query data with SQL as well.

Then toggle the View Raw Data switch on:

Finally, click SUBMIT:

Now, you should see the results of your query:

While this example query might not seem exciting, it does show that you have received the syslog events, and at this point, you’ve successfully set up Fluentd to forward messages to an InfluxDB Cloud instance!

Conclusion

In this tutorial, you successfully registered and configured an InfluxDB Cloud instance, installed and configured Fluentd to listen for syslog messages, configured syslog to forward messages to Fluentd, and confirmed that data is flowing from your system to your InfluxdB Cloud instance.

You can view the contents of the configuration files on this GitHub repository.

Storing your data is just the first step and there is much more you can do with your data once it is stored inside InfluxDB to get value from it. Take a look at these additional resources to see what you can do next:

  • Analyze your data
  • Make forecasts with your data
  • Visualize your data
  • Create no-code automated workflows

About the author

Thinus Swart is a longtime data geek. More recently, Thinus has been working with Splunk specifically but gets excited about anything related to big data.

Guest post originally published on InfluxData’s blog by Thinus Swart
Source: CNCF Blog


For enquiries, product placements, sponsorships, and collaborations, connect with us at [email protected]. We'd love to hear from you!

Our humans need coffee too! Your support is highly appreciated, thank you!

aster.cloud

Related Topics
  • Cloud Native Computing Foundation
  • Fluentd
  • InfluxDB
  • Tutorials
  • Ubuntu
You May Also Like
View Post
  • Engineering

Just make it scale: An Aurora DSQL story

  • May 29, 2025
View Post
  • Engineering
  • Technology

Guide: Our top four AI Hypercomputer use cases, reference architectures and tutorials

  • March 9, 2025
View Post
  • Computing
  • Engineering

Why a decades old architecture decision is impeding the power of AI computing

  • February 19, 2025
View Post
  • Engineering
  • Software Engineering

This Month in Julia World

  • January 17, 2025
View Post
  • Engineering
  • Software Engineering

Google Summer of Code 2025 is here!

  • January 17, 2025
View Post
  • Data
  • Engineering

Hiding in Plain Site: Attackers Sneaking Malware into Images on Websites

  • January 16, 2025
View Post
  • Computing
  • Design
  • Engineering
  • Technology

Here’s why it’s important to build long-term cryptographic resilience

  • December 24, 2024
IBM and Ferrari Premium Partner
View Post
  • Data
  • Engineering

IBM Selected as Official Fan Engagement and Data Analytics Partner for Scuderia Ferrari HP

  • November 7, 2024

Stay Connected!
LATEST
  • 1
    Just make it scale: An Aurora DSQL story
    • May 29, 2025
  • 2
    Reliance on US tech providers is making IT leaders skittish
    • May 28, 2025
  • Examine the 4 types of edge computing, with examples
    • May 28, 2025
  • AI and private cloud: 2 lessons from Dell Tech World 2025
    • May 28, 2025
  • 5
    TD Synnex named as UK distributor for Cohesity
    • May 28, 2025
  • Weigh these 6 enterprise advantages of storage as a service
    • May 28, 2025
  • 7
    Broadcom’s ‘harsh’ VMware contracts are costing customers up to 1,500% more
    • May 28, 2025
  • 8
    Pulsant targets partner diversity with new IaaS solution
    • May 23, 2025
  • 9
    Growing AI workloads are causing hybrid cloud headaches
    • May 23, 2025
  • Gemma 3n 10
    Announcing Gemma 3n preview: powerful, efficient, mobile-first AI
    • May 22, 2025
about
Hello World!

We are aster.cloud. We’re created by programmers for programmers.

Our site aims to provide guides, programming tips, reviews, and interesting materials for tech people and those who want to learn in general.

We would like to hear from you.

If you have any feedback, enquiries, or sponsorship request, kindly reach out to us at:

[email protected]
Most Popular
  • 1
    Cloud adoption isn’t all it’s cut out to be as enterprises report growing dissatisfaction
    • May 15, 2025
  • 2
    Hybrid cloud is complicated – Red Hat’s new AI assistant wants to solve that
    • May 20, 2025
  • 3
    Google is getting serious on cloud sovereignty
    • May 22, 2025
  • oracle-ibm 4
    Google Cloud and Philips Collaborate to Drive Consumer Marketing Innovation and Transform Digital Asset Management with AI
    • May 20, 2025
  • notta-ai-header 5
    Notta vs Fireflies: Which AI Transcription Tool Deserves Your Attention in 2025?
    • May 16, 2025
  • /
  • Technology
  • Tools
  • About
  • Contact Us

Input your search keywords and press Enter.