Nvidia SMI

Nvidia SMI

Ship your Nvidia SMI Metrics via Telegraf to your Logit.io Stack

Follow the steps below to send your observability data to Logit.io

Metrics

Configure Telegraf to ship Nvidia SMI Metrics to your Logit.io stacks via Logstash.

Install Integration

Please click on the Install Integration button to configure your stack for this source.

Install Telegraf

This integration allows you to configure a Telegraf agent to send your metrics to Logit.io.

Choose the installation method for your operating system:

When you paste the command below into Powershell it will download the Telegraf zip file. Once that is complete, press Enter again and the zip file will be extracted into C:\Program Files\InfluxData\telegraf\telegraf-1.34.1.

wget https://dl.influxdata.com/telegraf/releases/telegraf-1.34.1_windows_amd64.zip -UseBasicParsing -OutFile telegraf-1.34.1_windows_amd64.zip 
Expand-Archive .\telegraf-1.34.1_windows_amd64.zip -DestinationPath 'C:\Program Files\InfluxData\telegraf'

or in Powershell 7 use:

# Download the Telegraf ZIP file
Invoke-WebRequest -Uri "https://dl.influxdata.com/telegraf/releases/telegraf-1.34.1_windows_amd64.zip" `
                -OutFile "telegraf-1.34.1_windows_amd64.zip" `
                -UseBasicParsing
 
# Extract the contents of the ZIP file
Expand-Archive -Path ".\telegraf-1.34.1_windows_amd64.zip" `
            -DestinationPath "C:\Program Files\InfluxData\telegraf"

Configure Telegraf

The configuration file below is pre-configured to scrape the system metrics from your hosts, add the following code to the configuration file telegraf.conf from the previous step.

# Pulls statistics from nvidia GPUs attached to the host
[[inputs.nvidia_smi]]
  ## Optional: path to nvidia-smi binary, defaults "/usr/bin/nvidia-smi"
  ## We will first try to locate the nvidia-smi binary with the explicitly specified value (or default value),
  ## if it is not found, we will try to locate it on PATH(exec.LookPath), if it is still not found, an error will be returned
  # bin_path = "/usr/bin/nvidia-smi"
 
  ## Optional: timeout for GPU polling
  # timeout = "5s"
 
### System metrics
[[inputs.disk]]
[[inputs.net]]
[[inputs.mem]]
[[inputs.system]]
[[inputs.cpu]]
  percpu = false
  totalcpu = true
  collect_cpu_time = true
  report_active = true
 
### Output
[[outputs.http]]
  url = "https://@metricsUsername:@metricsPassword@@metrics_id-vm.logit.io:@vmAgentPort/api/v1/write"
  data_format = "prometheusremotewrite"
 
  [outputs.http.headers]
    Content-Type = "application/x-protobuf"
    Content-Encoding = "snappy"

Read more about how to configure data scraping and configuration options for Nvidia SMI (opens in a new tab)

Start Telegraf

From the location where Telegraf was installed (C:\Program Files\InfluxData\telegraf\telegraf-1.34.1) run the program providing the chosen configuration file as a parameter:

.\telegraf.exe --config telegraf.conf

Once Telegraf is running you should see output similar to the following, which confirms the inputs, output and basic configuration the application has been started with:

Powershell Telegraf information

Launch Grafana to View Your Data

Launch Grafana

How to diagnose no data in Stack

If you don't see data appearing in your stack after following this integration, take a look at the troubleshooting guide for steps to diagnose and resolve the problem or contact our support team and we'll be happy to assist.

Telegraf Nvidia SMI Overview

To efficiently monitor and analyze Nvidia SMI metrics across different systems, it's vital to implement a robust and efficient metrics management solution. Telegraf, an open-source server agent designed for collecting and reporting metrics, fits this role perfectly. It can gather Nvidia SMI metrics from various sources, including operational Nvidia SMI instances, databases, and other relevant applications.

Telegraf's array of input plugins enables users to collect a variety of metrics, like CPU usage, memory utilization, network traffic, and more, all vital for understanding Nvidia SMI performance. To store and analyze these harvested metrics, organizations can utilize Prometheus, an open-source monitoring and alerting toolkit renowned for its flexible querying language and powerful data visualization capabilities.

To channel Nvidia SMI metrics from Telegraf to Prometheus, organizations need to configure Telegraf to output metrics in the Prometheus format, and then set up Prometheus to scrape these metrics from the Telegraf server. This process involves setting up Telegraf to collect Nvidia SMI metrics, outputting them in the Prometheus format, configuring Prometheus to retrieve these metrics from the Telegraf server, and then visually interpreting the data using Prometheus's dynamic querying and graphical visualization tools.

Once the metrics are successfully integrated into Prometheus, further analysis and visualization can be performed using Grafana. Grafana, a leading open-source platform known for its monitoring and observability features, is fully compatible with Prometheus. It enables users to create dynamic, interactive dashboards for a deep dive into the metrics data, providing a comprehensive understanding of performance trends and potential issues in the Nvidia SMI system.

If you need any further assistance with shipping your log data to Logit.io we're here to help you get started. Feel free to get in contact with our support team by sending us a message via live chat & we'll be happy to assist.