Hi, my team just released the gcloud transfer command-line tooI, and this tutorial will show you how to use it for a common task: uploading logs to the cloud.

Setup

 

You’ll need a device running a Linux operating system with at least 8 GB of RAM to continue. If you don’t have one lying around, it’s easy to spin up a Compute Engine virtual machine.

Let’s create some logs to upload. In the real world, Google Cloud’s transfer service is a great tool if you have large amounts of data (terabytes+). But tutorials don’t usually ask people to create multiple harddrive’s worth of fake data. So let’s do this:

 

$ mkdir my-logs
$ cd my-logs
$ echo "i am a petabyte" > logs.txt

Perfect. That will fool them.

On to the gcloud CLI. If you haven’t already, install the gcloud CLI. You should be prompted to log into Google during the installation process.

You’re probably wondering how much this tutorial will cost to complete in your Google Cloud project. At the time of writing, transfer jobs cost “$0.0125 per GB transferred to the destination successfully.” Here’s the current price table.

Next, you’ll need a Google Cloud Storage bucket to upload to. Object storage also shouldn’t be very expensive, but please save resource names for cleanup at the end of the tutorial. Here’s the price table. You can create a bucket by running:

$ gsutil mb [globally unique bucket ID]

Using gcloud transfer

To begin, let’s grant ourselves the permissions necessary to use all gcloud transfer features:

$ gcloud transfer authorize

Creating transfers from one cloud bucket to another is straightforward with gcloud transfer. Setting up your local file system to handle transfer jobs requires a little more work. Specifically, you need to install an “agent.” An agent is basically a docker container that runs a program dedicated to copying files.

Before installing any agents, you need an agent pool. When a transfer job assigns work to an agent pool, any agent in that pool might end up copying files. Use agent pools to make sure only agents with access to the files you want execute a transfer job.

$ gcloud transfer agent-pools create [pool ID]

Now, to install an agent on your system, run:

$ gcloud transfer agents install --pool=[pool ID]

All right, now we can upload our fake logs! Storage Transfer Service works best with absolute paths, so use the “pwd” command to get the path to your current folder—you should be inside the “my-logs” folder from earlier.

We require a “posix://” scheme for uploading from a POSIX file system (Linux & Mac). I know it’s a bit odd, but it’s to leave space open if we support transfer jobs dedicated to other file system types in the future (e.g. “ntfs://”).

$ gcloud transfer jobs create posix://$(pwd) gs://[bucket ID] --source-agent-pool=[pool ID]

Great, the above should return your new transfer job’s metadata. To monitor the transfer, run the below with the value for the “name” key returned above:

$ gcloud transfer jobs monitor [transfer job ID]
$ gsutil ls gs://[bucket ID]

Automation

Say we wanted to upload logs every midnight from 2022 to 2023. The ability to schedule regular transfers for large amounts of data differentiate gcloud transfer from tools like gcloud storage or gsutil. To do this, we just need to update the schedule properties of our job:

$ gcloud transfer jobs update [transfer job ID] --schedule-repeats-every=24h  \
      --schedule-starts=2022-01-01  \
      --schedule-repeats-until=2023-01-01

If you have another machine, and you do not care which one uploads logs, you could install an agent on that machine in the same pool as before.

More realistically, if you want each machine in your fleet to upload logs to a different cloud destination, we can write a script to run once on each device. Just make sure the agent pool and destination argument are different for each device, or more than one machine may upload to the same location.

You don’t have to go around running this script on multiple computers to complete the tutorial but for demonstrative purposes:

# !/bin/bash
# First argument $1 is agent pool ID. Ex: “pool1”.
# Second argument $2 is the source path. Ex: “posix:///tmp/logs”
# Third argument $3 is the destination path. Ex: “gs://my-bucket/log-dir1”

gcloud transfer agent-pools create $1
gcloud transfer agents install --pool=$1
gcloud transfer jobs create $2 $3
  --schedule-repeats-every=24h  \
  --schedule-starts=2022-01-01  \
  --schedule-repeats-until=2023-01-01

If you’re interested in more complex scripting, the “jobs create” and “jobs run” commands have a “–no-async” flag you can use to delay until a transfer completes.

Teardown

 

This is the part where we delete everything to save you monthly costs.

First, let’s delete the transfer job:

 

$ gcloud transfer jobs delete [transfer job ID]
# If you lost your transfer job ID, you can try to find it by running the below command.
$ gcloud transfer jobs list --expand-table

Next, follow the instructions provided by this command to delete any agents you installed:

$ gcloud transfer agents delete --all

Now, let’s delete the empty agent pool:

$ gcloud transfer agents-pools delete [agent pool ID] 
# If you lost your agent pool ID, you can try to find it by running the below command.
$ gcloud transfer agent-pools list

Lastly, let’s delete the Google Cloud Storage bucket and the fake logs on your device:

$ gsutil rm -r [bucket ID]
# If you lost your bucket ID, you can try to find it by running the below command.
$ gsutil ls -b
$ rm logs.txt
$ cd ..
$ rmdir my-logs

Conclusion

Superb—you learned how to build an automated log uploader!

If you’re comparing gcloud transfer to other tools like gsutil, I linked some helpful articles in the “Related” section. TLDR: gcloud transfer is for copying huge amounts of data (even petabytes!) and automating recurring copies. gsutil is better for less than a terabyte of data, and recurring copies have to be manually scripted (e.g. cron job calls gsutil).

If you’re copying files between clouds, we also support Amazon S3 and Azure Storage sources.

Congratulations on adding another tool to your Google toolkit!

 

 

By Nicholas Hartunian, Software Engineer
Source Google Cloud

Previous Using GeoJSON In BigQuery For Geospatial Analytics
Next Still Pretending You Know The Differences Between JavaScript and Typescript?