02Computer Vision

Classroom Behavior Detection

Research project designing and training a multi-class classroom behavior detection model using YOLOv8 and PyTorch, with Grad-CAM integration for model explainability.

YOLOv8PyTorchGrad-CAMYAMLOpenCVPython
View on GitHub
11+
Behavior Classes
mAP@50
Metric
CPU
Training Env
Drag to rotate scene
type
Computer Vision
status
Active Research
year
2024
role
Research Assistant
01

System Architecture · 3D View

02

Architecture Diagram

Raw Dataset
Images + Annotations
Augmentation
Resize · Flip · Crop
YOLOv8 Model
PyTorch Backbone
Training Loop
YAML Config · GPU/CPU
Evaluation
mAP@50 · Precision
Grad-CAM
Explainability
Detection Output
Bounding Boxes
03

Screenshots & Output

terminal
$ yolo detect train data=classroom.yaml model=yolov8n.pt epochs=100
Epoch 1/100: box_loss=3.21 cls_loss=2.18 dfl_loss=1.03
Epoch 25/100: box_loss=1.84 cls_loss=1.12 dfl_loss=0.87
Epoch 50/100: mAP50=0.64 Precision=0.71 Recall=0.68
Epoch 75/100: mAP50=0.81 Precision=0.84 Recall=0.79
Epoch 100/100: mAP50=0.91 Precision=0.89 Recall=0.87
✓ Training complete. Best weights → runs/detect/best.pt
✓ Grad-CAM heatmaps saved → /gradcam/outputs/
$
Training Output
YOLO training terminal logs
Class mAP Scores
mAP@5091%
Precision89%
Recall87%
Writing79%
Using Phone85%
Class mAP Scores
Per-class detection performance
YAML Config
{
# classroom.yaml
path: ./dataset
train: images/train
val: images/val
nc: 11
names: ['writing', 'phone', 'sleeping',
 'reading', 'raising_hand', 'talking'...]
# epochs=100 imgsz=640 batch=8
}
Config JSON
YAML training configuration
Project Structure
📁 classroom-behavior-detection/
├─ train.py YOLOv8 training
├─ evaluate.py mAP evaluation
├─ gradcam.py Explainability
├─ dataset/
│ ├─ images/train/ Training set
│ └─ labels/ Annotations
├─ classroom.yaml Training config
└─ runs/detect/ Results
Dataset Structure
Directory organization
04

What I Built

Designed and trained multi-class detection model with YOLOv8 and PyTorch (11+ behavioral classes).

Structured and validated custom dataset with disciplined data organization and quality control.

Configured YAML training pipelines and optimized hyperparameters (epochs, image size, batch size).

Improved mAP@50 through structured debugging and performance analysis.

Explored Grad-CAM integration for model explainability and responsible AI.

Managed CPU-based training environments with virtual environment setups.

05

Project Insights

Personal Notes & Learnings
Markdown Editor
Live Preview

Research Context

This is my primary research assistantship project at Lawrence Tech.

Technical Challenges

  • 11+ behavioral classes with significant visual overlap
  • CPU-only training environment requiring careful batch size tuning
  • Annotation quality control across hundreds of images

Key Findings

  • Grad-CAM revealed the model focused on posture over facial features
  • Smaller batch sizes with higher epochs outperformed opposite configuration

Next Steps

  • GPU-accelerated training for faster iteration
  • Real-time video stream inference pipeline
✓ Insights saved locally