Objective
This lab introduces students to YOLOv11n (You Only Look Once version 11 nano), a family of high-performance real-time object detection models. The primary focus is on deploying these models inside a Docker container on the Jetson Orin Nano platform to perform object detection, image segmentation, classification, pose/keypoint estimation, and oriented bounding box detection.
Learning Outcomes
- Understand the architecture and deployment pipeline for YOLO-based models.
- Convert YOLOv11n models into TensorRT-optimized engines for faster inference on edge devices.
- Perform various computer vision tasks using YOLOv11n Python modules.
- Handle input/output binding with mounted volumes and webcam feeds inside Docker.
Lab Tasks
- YOLO Docker Setup: Pull and run the YOLOv11n Docker image using NVIDIA runtime. Mount the local workspace for data persistence and connect the webcam device.
- Workspace Configuration: Create and link a local directory yolo_workspace for storing processed output and Python scripts.
- Model Conversion: Use ModelConversion.py to convert pretrained YOLOv11n models to TensorRT format for efficient inference.
- Execution of Tasks:
- ObjectDetect.py for object detection
- Segmentation.py for semantic segmentation
- Classification.py for image classification
- Pose.py for keypoint estimation and oriented bounding box
- Output Handling: All frames will be saved in appropriately named directories like processed_frames, pose_frames, etc.
Part II: Visual Transformers using NanoOWL
Objective
This lab enables students to explore the power of Visual Transformers (ViT) for multi-object hierarchical detection using NanoOWL—a lightweight ViT module adapted for the Jetson platform. Students will run inference through a browser interface and write prompt-based queries to locate complex scene elements.
Learning Outcomes
- Understand the basics of ViT-based visual reasoning and hierarchical object parsing.
- Utilize prompt-based semantic querying to identify nested and compound visual entities.
- Set up and mount external SSDs to handle large vision models efficiently.
- Deploy and interact with a browser-based NanoOWL ViT model via Jetson Containers.
Lab Tasks
- SSD Setup: Format and mount an external SSD to /mnt/nvme/my_storage for storing model files and data.
- NanoOWL Installation: Clone the jetson-containers repository and install necessary dependencies. Enter the
NanoOWL container using jetson-containers run.
- Module Execution: Navigate to the examples/tree_demo directory and run tree_demo.py with a live camera
feed.
- Prompt-Based Tasks:
- Identify facial parts with prompts like: [a face[an eye, a nose]]
- Identify body parts: [a person[a t-shirt, trousers]]
- Detect workstation components: [a desk[a monitor, a keyboard, a mouse]]
- Visualization: Access the model’s predictions in a browser-based UI via a local IP link.
Expected Deliverables
- Screenshots and example outputs from each YOLOv11n task.
- Sample prompt structures and visual outputs from NanoOWL.
- Reports on model performance, accuracy, and responsiveness.
- Summary of challenges and suggestions for enhancing model accuracy.