Jetson Nano

The Jetson Nano is a powerful yet compact AI computing platform designed for edge AI projects. It provides the performance and flexibility needed for real-world AI applications while remaining accessible to hobbyists, students, and developers alike. Below, you'll find essential resources to get started with the Jetson Nano:

Resources

Jetson Nano Datasheet: Comprehensive technical specifications for the Jetson Nano.
Jetson Nano Developer Kit: Official page for setup guides, tutorials, and detailed documentation.
NVIDIA Jetson Projects: Explore a collection of projects built by the Jetson community for inspiration and guidance.
YOLO Object Detection

SSH into Jetson Nano via Ubuntu WSL

Identify the Jetson Nano's IP Address
On the Jetson Nano, run:
```
hostname -I
```
Note the IP address.
Check Connection from WSL
Open the Ubuntu WSL terminal on your Windows machine and ping the Jetson Nano to confirm connectivity:
```
ping <Jetson-IP>
```
Replace <Jetson-IP> with the IP address of the Jetson Nano.
SSH into the Jetson Nano
Use the ssh command in your WSL terminal:
```
ssh jetson@<Jetson-IP>
```
- Replace <Jetson-IP> with the IP address of the Jetson Nano.
- The default password is jetson.
Save SSH Fingerprint
During the first connection, you may be prompted to confirm the SSH fingerprint. Type yes and press Enter to save it for future connections.

Image Processing (Hello AI World)

Following notes are based off Jetson Nano's AI introductionary project called Hello AI world.

ImageNet

ImageNet is a large-scale dataset containing millions of labeled images across thousands of categories. It is widely used for training and benchmarking computer vision models and was instrumental in the rise of deep learning, starting with the success of convolutional neural networks (CNNs) like AlexNet in 2012.

Why It Matters to the Jetson Nano:

The Jetson Nano, a low-power AI development board for edge computing, benefits from ImageNet in several ways:

Pretrained Models: Deploy CNNs like ResNet or MobileNet for tasks such as object detection, image classification, and feature extraction.
Transfer Learning: Fine-tune pretrained models on custom datasets with accelerated training and inference.
Benchmarking: Use ImageNet to evaluate the Nano’s performance in handling computer vision workloads.
Edge AI Applications: Enable real-time object recognition for robotics, IoT devices, and autonomous systems.

Learning Material: Jetson ImagNet Project

TensorRT

TensorRT is an NVIDIA library for optimizing and accelerating deep learning inference on GPUs. It supports model optimization (e.g., precision calibration, layer fusion), multiple precisions (FP32, FP16, INT8), and works with models from frameworks like TensorFlow, PyTorch, and ONNX. TensorRT is ideal for high-throughput, low-latency applications in real-time AI tasks such as object detection, NLP, and autonomous systems.

Example project using GoogleNet image recognition netowrk with TensorRT found here

Sigmoid and Softmax in Classification

Softmax is used for single-class classification, where each instance belongs to exactly one class. It converts logits into a probability distribution using the formula:

\[ \text{softmax}(x_i) = \frac{e^{x_i}}{\sum_{j} e^{x_j}} \]

This ensures that probabilities across all classes sum to 1. Softmax is suitable for tasks where classes are mutually exclusive, like predicting whether an image contains a "cat," "dog," or "bird" (but only one of them).

Sigmoid is used for multi-label classification, where each instance can belong to multiple classes. It applies the function:

\[ \sigma(x) = \frac{1}{1 + e^{-x}} \]

to each output independently, producing probabilities between 0 and 1. For example, in an image containing both "cat" and "dog," sigmoid allows both classes to be true since they are not mutually exclusive. The probabilities for each class do not sum to 1.

Here's a Sigmoid example for printing out the top 5 classifications of a jpg image:

#include <jetson-inference/imageNet.h>
#include <jetson-utils/loadImage.h>
#include <vector>

int main(int argc, char** argv)
{
    // Ensure an image filename is provided as a command line argument
    if (argc < 2)
    {
        printf("my-recognition: expected image filename as argument\n");
        printf("example usage:  ./my-recognition my_image.jpg\n");
        return 0;
    }

    // Retrieve the image filename from command line arguments
    const char* imgFilename = argv[1];

    // Variables for the image data pointer and dimensions
    uchar3* imgPtr = NULL;   // shared CPU/GPU pointer to image
    int imgWidth   = 0;      // width of the image (in pixels)
    int imgHeight  = 0;      // height of the image (in pixels)

    // Load the image from disk as uchar3 RGB (24 bits per pixel)
    if (!loadImage(imgFilename, &imgPtr, &imgWidth, &imgHeight))
    {
        printf("failed to load image '%s'\n", imgFilename);
        return 0;
    }

    // Load the GoogleNet image recognition network with TensorRT
    imageNet* net = imageNet::Create("googlenet");

    // Ensure the network model loaded properly
    if (!net)
    {
        printf("failed to load image recognition network\n");
        return 0;
    }

    // Store classifications and specify the number of top results (topK)
    imageNet::Classifications classifications; // vector of (classID, confidence)
    const int topK = 5; // Modify this for more or fewer top results

    // Classify the image and retrieve multiple results
    if (net->Classify(imgPtr, imgWidth, imgHeight, classifications, topK) < 0)
    {
        printf("failed to classify image\n");
        delete net;
        return 0;
    }

    // Print out the classification results
    printf("Classification results (top %d):\n", topK);
    for (size_t n = 0; n < classifications.size(); n++)
    {
        const uint32_t classID = classifications[n].first;
        const char* classLabel = net->GetClassLabel(classID);
        const float confidence = classifications[n].second * 100.0f;

        printf(" %2.5f%% class #%u (%s)\n", confidence, classID, classLabel);
    }

    // Free the network's resources before shutting down
    delete net;
    return 0;
}

example project here: Multi-Label Classification for Image Tagging

Object Detection with DetectNet

DetectNet is a deep learning framework designed for object detection tasks, optimized for NVIDIA Jetson devices. It identifies objects in an image and provides their locations using bounding boxes.

Key Features:

Input: Processes an image through a pre-trained neural network (e.g., SSD or Faster R-CNN).
Output: Returns: Bounding boxes, Object classes, and Confidence scores
Applications: Pedestrian detection, Vehicle detection, Custom object detection (after fine-tuning)

DetectNet utilizes TensorRT for high-speed inference and supports training on custom datasets to detect specific object classes.

Reference for images

Reference for video

Semantic Segmentation

Semantic segmentation classifies every pixel in an image into categories (e.g., road, car, sky), offering detailed scene understanding. Unlike object detection, which provides bounding boxes, semantic segmentation delivers pixel-level precision. Two prominent architectures are Fully Convolutional Networks (FCNs) and SegNet.

Fully Convolutional Networks (FCNs)

Architecture: Replace fully connected layers with convolutional layers to preserve spatial relationships.
Upsampling: Use transposed convolutions (deconvolution) to restore spatial resolution.
Skip Connections: Combine high-level features with low-level details for better accuracy.
Use Case: High-accuracy, detailed segmentation tasks, though computationally intensive.

SegNet

Encoder-Decoder Structure:
1. Encoder: Extracts features with convolution and pooling layers, reducing spatial resolution.
2. Decoder: Upsamples using saved pooling indices from the encoder, maintaining spatial accuracy while reducing memory and computation.
Optimized for Efficiency: Designed for lightweight, real-time applications.
Use Case: Real-time tasks on resource-constrained devices, like Jetson Nano.

Applications

Urban Environments (Cityscapes)
Segment urban scenes to identify roads, buildings, pedestrians, and vehicles for autonomous driving and smart city systems.
- Dataset: Cityscapes
Off-Road Navigation (DeepScene)
Classify forest trails, vegetation, and off-road elements for robotic path-following in outdoor environments.
- Dataset: DeepScene
Multi-Human Parsing (MHP)
Segment people into detailed body parts like arms, legs, and clothing for pose estimation or fashion analytics.
- Dataset: Multi-Human Parsing
Object Segmentation (Pascal VOC)
Classify and segment various objects, including people, animals, and vehicles, for general-purpose object recognition.
- Dataset: Pascal VOC
Indoor Scene Understanding (SUN RGB-D)
Recognize furniture, appliances, and spaces in office and home environments for robotics and augmented reality.
- Dataset: SUN RGB-D

Nividia Reference

Setting Up YOLO on Jetson Nano with Docker

Install the JetPack Docker Image

To get started, visit the Ultralytics NVIDIA Jetson Guide and follow the instructions to install the Ultralytics YOLO11 JetPack 4 Docker environment .

Running the Docker Environment

Once the container is downloaded, run it with:

sudo docker run -it --ipc=host --runtime=nvidia ultralytics/ultralytics:latest-jetson-jetpack4

After running this command, you should see a prompt similar to:

root@<container_id>:/ultralytics#

This means you are inside the YOLO Docker environment.

Basic Docker Commands

Check Running Containers

sudo docker ps

Check All Containers (Including Stopped Ones)

sudo docker ps -a

Restart a Stopped Container

sudo docker start <CONTAINER_ID>

Enter a Running Container

sudo docker exec -it <CONTAINER_ID> bash

Exit the Container

exit

Copy Files From a Stopped Container to Your Jetson Nano

sudo docker cp <CONTAINER_ID>:/ultralytics/runs/detect/predict ~/yolo_results

Running YOLO Inside the Docker Container

Verify YOLO Installation

python3 -c "import ultralytics; print(ultralytics.__version__)"

Run a YOLO Detection on a Sample Image

yolo detect predict model=yolo11n.pt source=https://ultralytics.com/images/zidane.jpg

Expected output:

Results saved to /ultralytics/runs/detect/predict

Check GPU Availability

python3 -c "import torch; print(torch.cuda.is_available())"

If it prints True, YOLO is using CUDA.

Run YOLO on a Custom Image Make sure the image is accessible inside the container and run:

yolo detect predict model=yolo11n.pt source=/path/to/your/image.jpg

Accessing Results

YOLO saves detected images inside /ultralytics/runs/detect/predict. To access these files:

Re-enter the container:

sudo docker exec -it <CONTAINER_ID> bash

Navigate to the results folder:

cd /ultralytics/runs/detect/predict
ls

Copy results to your Jetson Nano:

sudo docker cp <CONTAINER_ID>:/ultralytics/runs/detect/predict ~/yolo_results

Now you can access them in ~/yolo_results.

Training YOLO (untested)

If you want to train YOLO on your own dataset, use:

yolo train model=yolo11n.pt data=your_dataset.yaml epochs=50 imgsz=640

Cleaning Up Docker Containers

If you have too many stopped containers, you can remove them with:

sudo docker container prune

This deletes all stopped containers to free up space.