How to Train Your Own Dataset for Classification using PyTorch? - Forecr.io

How to Train Your Own Dataset for Classification using PyTorch?

Jetson AGX Xavier | Jetson Nano | Jetson TX2 NX | Jetson Xavier NX

13 August 2021
ENVIRONMENT

Hardware: DSBOARD-NX2

OS: Jetpack 4.5

Camera: MIPI CSI or V4L2


In this blog post, we will be explaining about how to create and train your custom dataset. Then, classify them using ImageNet. 

We will be using jetson-inference project in this example. If you haven’t downloaded it, click here. While building up the project, do not forget to install PyTorch as well. If you haven’t installed PyTorch, you can type the following command.


cd jetson-inference/build 
./install-pytorch.sh


If you used Docker container, it will be installed automatically. In this example we will use docker container, but you can use the same commands as well if you prefer to build the project from the source. 


  You should also import torch and torchvision. Execute the following commands on terminal by writing python or python3 first.


import torch 
import torchvision

How to Collect Your Own Dataset?


First, go to jetson-inference directory and run the docker container. Then, go to python/training/classification directory. Make sure to connect a camera, we will use it to take pictures for our dataset.

cd jetson-inference 
docker/run.sh
cd /python/training/classification


Open Data Capture Tool to collect your data with a camera. 

camera-capture /dev/video0
camera-capture csi://0








Create your data directory at jetson-inference/python/training/classification/data. In this example we will be using computer devices for our classes, so we named it as devices. Then, create labels.txt file to store classes. 


In the Data Capture Tool, choose the dataset path as where the devices directory is. Choose labels.txt file as the Class Labels.



Now you can take pictures for train, validation, and test sets. Make sure you have at least 100 training images with different angles and lightning. There is an 80-10-10 rule for the sets, meaning 80% of your dataset should be training, 10% validation, 10% test images. 


You can also download a dataset on Google from this GitHub image download tool.

https://github.com/RiddlerQ/simple_image_download


First, clone the files in the GitHub into simple_image_download directory.


git clone https://github.com/RiddlerQ/simple_image_download 


Then, after going into the directory run the following command to finish installation.

cd simple_image_download 
pip install simple_image_download


Copy the Test1.py into simple_image_download subdirectory to make it easy to recognize.

You can change the Test1.py script shown below to write the keyword to search and how many pictures to download.



Then, run the Test1.py script to download the dataset. 


python3 Test1.py


Our project only runs with .jpg format images, so we must convert .png and .jpeg files into .jpg format. 


Install ImageMagick software to convert format types easily. 


sudo apt install imagemagick


Now, convert images and remove the old files for each class.


cd keyboard
mogrify -format jpg *.jpeg
rm *.jpeg
mogrify -format jpg *.png
rm *.png


Make sure you have created a directory for validation images as well and move some of the images there. 

We must copy the dataset into our training and validation directories. To do this, run docker by accessing the folder where the images are downloaded. 

 

cd jetson-inference
docker/run.sh --volume ~/simple_image_download/simple_image_download/simple_images:/ simple_image_download/simple_image_download/simple_images


Go to the directory where the images are currently stored. Then, by using the following command, copy all the images into training and val directories.


cd /simple_image_download/simple_image_download/simple_images/

cp keyboard/* /jetson-inference/python/training/classification/data/devices/train/keyboard

cp mouse/* /jetson-inference/python/training/classification/data/devices/train/mouse

cp headphones/* /jetson-inference/python/training/classification/data/devices/train/headphones

How to Train Your Own Data?


After collecting enough data, next step is to train our dataset. Use the following command that also indicates how many objects are processed at once (batch-size), data loaders (workers), number of passes the entire the set (epochs).


python3 train.py --model-dir=models/devices --batch-size=4 --workers=1 --epochs=30 data/devices


OR


python3 train.py --model-dir=models/ data/


To test and process the PyTorch dataset model on TensorRT, we need to convert it to an independent model called ONNX. Type the following code for this operation.

python3 onnx_export.py --model-dir=models/devices


OR


python3 onnx_export.py --model-dir=models/

 How to Test Your Own Dataset?


To test our model, we need to create test directories to store our output test images. You can copy the images you have previously downloaded to test directories.


mkdir data/test_keyboard data/test_headphones data/test_mouse 


Then, run the ImageNet program to classify the pictures we added in the test folder. 


imagenet --model=models/devices/resnet18.onnx --labels=data/devices/labels.txt   --input_blob=input_0 --output_blob=output_0 data/devices/test/mouse data/test_mouse

imagenet --model=models/devices/resnet18.onnx --labels=data/devices/labels.txt --input_blob=input_0 --output_blob=output_0 data/devices/test/keyboard data/test_keyboard#

imagenet --model=models/devices/resnet18.onnx --labels=data/devices/labels.txt --input_blob=input_0 --output_blob=output_0 data/devices/test/headphones data/test_headphones

You can also classify the images with a camera.


V4L2 Camera (USB Camera):


imagenet --model=models/devices/resnet18.onnx --labels=data/devices/labels.txt   --input_blob=input_0 --output_blob=output_0 /dev/video0 


CSI Camera:


imagenet --model=models/devices/resnet18.onnx --labels=data/devices/labels.txt   --input_blob=input_0 --output_blob=output_0 csi://0

Thank you for reading our blog post.