ECE 381 - Applied Machine Learning Lab

Lab 3: Advanced Object Detection And Visual Transformers
Part I: YOLOv11n-Based Object Detection, Segmentation, and Pose Estimation

Objective

This lab introduces students to YOLOv11n (You Only Look Once version 11 nano), a family of high-performance real-time object detection models. The primary focus is on deploying these models inside a Docker container on the Jetson Orin Nano platform to perform object detection, image segmentation, classification, pose/keypoint estimation, and oriented bounding box detection.

Learning Outcomes

Understand the architecture and deployment pipeline for YOLO-based models.
Convert YOLOv11n models into TensorRT-optimized engines for faster inference on edge devices.
Perform various computer vision tasks using YOLOv11n Python modules.
Handle input/output binding with mounted volumes and webcam feeds inside Docker.

Lab Tasks

YOLO Docker Setup: Pull and run the YOLOv11n Docker image using NVIDIA runtime. Mount the local workspace for data persistence and connect the webcam device.
Workspace Configuration: Create and link a local directory yolo_workspace for storing processed output and Python scripts.
Model Conversion: Use ModelConversion.py to convert pretrained YOLOv11n models to TensorRT format for efficient inference.
Execution of Tasks:
- ObjectDetect.py for object detection
- Segmentation.py for semantic segmentation
- Classification.py for image classification
- Pose.py for keypoint estimation and oriented bounding box
Output Handling: All frames will be saved in appropriately named directories like processed_frames, pose_frames, etc.

Part II: Visual Transformers using NanoOWL

Objective

This lab enables students to explore the power of Visual Transformers (ViT) for multi-object hierarchical detection using NanoOWL—a lightweight ViT module adapted for the Jetson platform. Students will run inference through a browser interface and write prompt-based queries to locate complex scene elements.

Learning Outcomes

Understand the basics of ViT-based visual reasoning and hierarchical object parsing.
Utilize prompt-based semantic querying to identify nested and compound visual entities.
Set up and mount external SSDs to handle large vision models efficiently.
Deploy and interact with a browser-based NanoOWL ViT model via Jetson Containers.

Lab Tasks

SSD Setup: Format and mount an external SSD to /mnt/nvme/my_storage for storing model files and data.
NanoOWL Installation: Clone the jetson-containers repository and install necessary dependencies. Enter the NanoOWL container using jetson-containers run.
Module Execution: Navigate to the examples/tree_demo directory and run tree_demo.py with a live camera feed.
Prompt-Based Tasks:
- Identify facial parts with prompts like: [a face[an eye, a nose]]
- Identify body parts: [a person[a t-shirt, trousers]]
- Detect workstation components: [a desk[a monitor, a keyboard, a mouse]]
Visualization: Access the model’s predictions in a browser-based UI via a local IP link.

Expected Deliverables

Screenshots and example outputs from each YOLOv11n task.
Sample prompt structures and visual outputs from NanoOWL.
Reports on model performance, accuracy, and responsiveness.
Summary of challenges and suggestions for enhancing model accuracy.

ECE Undergraduate Laboratory ECE 381 - Applied Machine Learning

ECE 381 - Applied Machine Learning Lab

Lab 3: Advanced Object Detection And Visual Transformers Part I: YOLOv11n-Based Object Detection, Segmentation, and Pose Estimation

Objective

Learning Outcomes

Lab Tasks

Part II: Visual Transformers using NanoOWL

Objective

Learning Outcomes

Lab Tasks

Expected Deliverables

ECE Undergraduate Laboratory
ECE 381 - Applied Machine Learning

Lab 3: Advanced Object Detection And Visual Transformers
Part I: YOLOv11n-Based Object Detection, Segmentation, and Pose Estimation