ParticleDetection
ParticleDetection is a library for detecting and tracking particles in stereo-camera images. For this it customizes the training, inference and visualization functionalities of the Detectron2 framework. It additionally provides functionality to track these detected particles over multiple frames and reconstruct 3D representations. The main focus here is to enable the (semi-)automatic data extraction from microgravity experiments with granular gases. In these experiments many particles float and interact in space. Different shapes can be chosen for these particles, but for now the library is focused on rod-like particles. It is planned to include multiple shapes in later versions.
This repository customizes the training, inference and visualization code of the Detectron2 framework to accurately detect rod-like particles. It additionally provides functionality to match and track the detected particles over multiple frames and reconstruct 3D representations of the particle ensembles (granular gases).
Model training
For automatic detection of particles a model must be trained. Here we focus on training a R-CNN network that will yield segmentation masks and class predictions.
Training dataset
For the training process at least two datasets are required, one for the actual training and one for testing during training. An additional validation dataset is not enforced by this package.
These datasets consist of image files (*.jpg, *.jpeg, *.png) and a metadata file in json format.
The metadata describes the particles on each image, that shall be detectable by the network to train. Each of these particles therefore needs a polygon defining its extent in the image and a class. The classes must be integers, e.g. class 1 are thick, red rods.
Example metadata file:
{
"arbitrary_id0": {
"filename": "file0.jpg",
"regions":
[
{
"shape_attributes": {
"name": "polygon",
"all_points_x": [0, 1, 2, 3],
"all_points_y": [0, 1, 2, 3],
},
"region_attributes": {
"rod_col": "class_number"
}
},
{"..."},
]
},
"arbitrary_id1": {"..."},
"arbitrary_id2": {"..."},
}
See also load_custom_data for more information on what the resulting format is.
Training
The script below shows part of a training procedure used to train a model for rod detection. It shows how to start with a pre-trained network and then adapting it to the specific use-case. It shows how a multi-stage training process can be realized and what configurations might be necessary to adjust. Within this it is shown how to further train only certain portions of the model while keeping the state of others fixed. To learn more about different model settings used here, refer to the Detectron2 documentation.
import os
import numpy as np
from detectron2.config import CfgNode
import detectron2.data.transforms as T
from ParticleDetection.modelling.runners import training
import ParticleDetection.utils.datasets as ds
import ParticleDetection.modelling.datasets as mod_ds
import ParticleDetection.modelling.configs as mod_config
import ParticleDetection.modelling.augmentations as ca
def init_cfg() -> dict:
"""Initialize the configuration for training.
Returns
-------
dict
Configuration that can be used for training as:
`training.run_training(**configuration)`
"""
# Set up known dataset(s) for use with Detectron2 #########################
data_folder = "./test_dataset"
metadata_file = "/metadata.json"
train_data = ds.DataSet("dataset_training", data_folder + "/training",
metadata_file)
val_data = ds.DataSet("dataset_validation", data_folder + "/validation",
metadata_file)
# Register datasets to Detectron2
classes = ["blue", "green", "orange", "purple", "red", "yellow",
"black", "lilac", "brown"]
try:
mod_ds.register_dataset(train_data, classes=classes)
mod_ds.register_dataset(val_data, classes=classes)
except AssertionError as e:
# Datasets are already registered
print(e)
image_no = mod_ds.get_dataset_size(train_data)
# Set up training configuration ###########################################
# Load a *.yaml file with static configurations
cfg = CfgNode(CfgNode.load_yaml_with_base(
"your_starting_configuration.yaml"))
cfg.DATASETS.TRAIN = [train_data.name]
cfg.DATASETS.TEST = [val_data.name]
# control the GPU memory load
cfg.SOLVER.IMS_PER_BATCH = 1
# No warm-up and constant lr
cfg.SOLVER.WARMUP_ITERS = 0
cfg.SOLVER.STEPS = ()
cfg.SOLVER.CHECKPOINT_PERIOD = 5 * image_no
# add computed values to the configuration,
cfg.MODEL.ROI_HEADS.NUM_CLASSES = len(
ds.get_dataset_classes(train_data))
cfg.MODEL.BACKBONE.FREEZE_AT = 0
cfg.TEST.EVAL_PERIOD = int(image_no / cfg.SOLVER.IMS_PER_BATCH)
counts = ds.get_object_counts(val_data)
cfg.TEST.DETECTIONS_PER_IMAGE = int(1.5 * np.max(counts))
# create a list of image augmentations to use #############################
augmentations = [
ca.SomeOf([
T.RandomFlip(horizontal=True, vertical=False),
T.RandomFlip(horizontal=False, vertical=True),
T.RandomRotation([90, 180, 270], expand=False,
sample_style="choice"),
T.RandomRotation([15, 30, 45, 60, 75], expand=False,
sample_style="choice"),
ca.MultiplyAugmentation(mul=(0.85, 1.15)),
ca.GaussianBlurAugmentation(sigmas=(0.0, 2.0)),
ca.SharpenAugmentation(alpha=(0.0, 0.5), lightness=(0.8, 1.15))
],
lower=0, upper=3)
]
return {
"train_set": train_data,
"val_set": val_data,
"configuration": cfg,
"output_dir": "",
"resume": True,
"visualize": False,
"img_augmentations": augmentations,
"freeze_layers": []
}
def train_heads(output_base: str, config: dict, weights: str = None) -> dict:
"""Training Step 1
Trains only the heads of the model, i.e. excludes backbone and box
predictor from updating during training.
Parameters
----------
output_base : str
Path to the main output directory. Will serve as the location for the
output folder generated by this training step.
config : dict
Basic configuration of the model training. Will be adapted in this
function.
weights : str, optional
Path to a *.pkl file containing weights to be used for the trained
model.
Returns
-------
dict
Configuration used for training the model in this training step.
"""
config["freeze_layers"] = ["backbone", "box_predictor"]
previous_output = config["output_dir"]
config["output_dir"] = os.path.join(output_base, "heads")
if weights is None:
config["configuration"].MODEL.WEIGHTS = os.path.join(
previous_output, "model_final.pth")
else:
config["configuration"].MODEL.WEIGHTS = weights
image_no = mod_ds.get_dataset_size(config["train_set"])
config["configuration"].SOLVER.MAX_ITER = int(
mod_config.get_iters(config["configuration"], image_no,
desired_epochs=300))
config["configuration"].SOLVER.BASE_LR = 0.001
config["configuration"].INPUT.CROP.ENABLED = True
config["configuration"].INPUT.MIN_SIZE_TRAIN = 512
training.run_training(**config)
return config
def train_all_s1(output_base: str, config: dict, weights: str = None) -> dict:
"""Training Step 2
Trains all layers of the model with the same learning rate as in step 1.
Parameters
----------
output_base : str
Path to the main output directory. Will serve as the location for the
output folder generated by this training step.
config : dict
Basic configuration of the model training. Will be adapted in this
function.
weights : str, optional
Path to a *.pkl file containing weights to be used for the trained
model.
Returns
-------
dict
Configuration used for training the model in this training step.
"""
config["freeze_layers"] = []
previous_output = config["output_dir"]
config["output_dir"] = os.path.join(output_base, "all_1")
if weights is None:
config["configuration"].MODEL.WEIGHTS = os.path.join(
previous_output, "model_final.pth")
else:
config["configuration"].MODEL.WEIGHTS = weights
image_no = mod_ds.get_dataset_size(config["train_set"])
config["configuration"].SOLVER.MAX_ITER = int(
mod_config.get_iters(config["configuration"], image_no,
desired_epochs=450))
training.run_training(**config)
return config
if __name__ == "__main__":
output = "./example_detector"
init_weights = "https://dl.fbaipublicfiles.com/detectron2/COCO-InstanceSegmentation/mask_rcnn_R_101_FPN_3x/138205316/model_final_a3ec72.pkl"
cfg = init_cfg()
cfg = train_heads(output, cfg, init_weights)
cfg = train_all_s1(output, cfg)
The init_cfg() function, as the name suggests, initializes the configuration object for the network. For this it loads a *.yaml file with previously prepared configurations, e.g. a default configuration obtained from Detectron2.
Individual values of this configuration are then adjusted further. Additionally, a list of image augmentations to be used during training is generated.
The first training step is then performed in train_heads(). Here the initialized configuration is modified further, i.e. to freeze certain layers in the model for this training step. Furthermore, the model weights are set, here by inserting those from a pre-trained network. If parts of the new model’s layers differ from those of the pre-trained one, only the matching layers will be given the pre-trained weights.
The last shown step performed by train_all_s1() is another model training step. Here, the final weights from train_heads() are taken and this time the whole network is trained.
The end result is a model_final.pth file containing the trained weights and the configuration.yaml file containing the model structure. Together they can be used to obtain particle segmentations in new images.
Note
If you are using extensions to the default Detectron2 models, e.g. the PointRend project, it is necessary to import/register those before loading the model (configuration). For the extensions from the Detectron2 projects this can usually be done by importing their module:
from detectron2.projects import point_rend
Visualization of training metrics using TensorBoard
During training logs are written to allow the supervision of the training process. These logs contain key performance indicators of the current model state and can be visualized with TensorBoard during and after the training. Run the following command for training data visualization with TensorBoard:
tensorboard --logdir "path\to\output\folder(s)"
Exporting of a trained model
It might be required to transfer the trained model(s) to systems that cannot install Detectron2, i.e. Windows computers, or to an environment that should be kept as lean as possible. For these instances the models can be exported to a format that can be directly read and used by torch.
The RodTracker also uses only the exported version of the models.
from pathlib import Path
import torch
from ParticleDetection.modelling import export
def test_export(version: str):
model = torch.jit.load(f"./model_{version}.pt")
sample = Path("./your_dataset/test_image.jpg")
input = export.get_sample_img(sample)
with torch.no_grad():
testing = model.forward(input)
print(testing)
version = "cpu"
config = Path("./your_model/config.yaml").resolve()
weights = Path("./your_model/model_final.pth").resolve()
sample = Path("./your_dataset/test_image.jpg")
export.export_model(config, weights, sample, version)
test_export(version)
Note
If you are using extensions to the default Detectron2 models, e.g. the PointRend project, it is necessary to import/register those before loading the model (configuration). For the extensions from the Detectron2 projects this can usually be done by importing their module:
from detectron2.projects import point_rend
Particle Detection
The trained model is now used to detect the trained classes of objects/particles as shown in the image below.
Visualized detection result
The model, that produced the image, was trained with an extended version of the script shown above with a last step exchanging the standard mask head with a PointRend network for the segmentation mask generation.
The detected particles in this image are given as a border around their returned segmentation mask with the border color indicating the object class. Additionally, the confidence score for each of the detected particles is plotted. Note, that the border colors are arbitrarily chosen and do not correspond with the title of the particle classes, i.e. rod colors.
An example script on how to run detections with an exported model is given below. Please refer to ParticleDetection.modelling.runners.detection for how to run models from their model_final.pth and configuration.yaml files without prior exporting.
The script below assumes a working folder that contains the following image file containing folders obtained from a stereo-camera setup:
|.
├── your_images
│ ├── gp1
│ │ ├── 0001.jpg
│ │ ├── 0002.jpg
│ │ ...
│ │ └── 0321.jpg
│ └── gp2
│ ├── 0001.jpg
│ ├── 0002.jpg
│ ...
│ └── 0321.jpg
├── your_model
│ └── model_cuda.pt
└── your_output
└── ...
from pathlib import Path
import torch
from ParticleDetection.utils import detection
import ParticleDetection.utils.datasets as ds
# Don't remove the following import, see GitHub issue as reference
# https://github.com/pytorch/pytorch/issues/48932#issuecomment-803957396
import cv2
import torchvision
import ParticleDetection
# Setup
cam1 = 1
cam2 = 2
frames = list(range(1, 321))
model_path = Path("./your_model/model_cuda.pt").resolve()
data_path = Path("./your_images").resolve()
out_path = Path("./your_output").resolve()
classes = ds.DEFAULT_CLASSES
# Detection
model = torch.jit.load(str(model_path))
dataset_format = str(data_path / "{cam_id:s}/{frame:04d}.jpg")
out_path.mkdir(parents=True, exist_ok=True)
detection.run_detection(model, dataset_format, classes, out_path,
frames=frames, cam1_name=f"gp{cam1}",
cam2_name=f"gp{cam2}", threshold=0.7)
Note
The output might contain particles from classes that are not actually present. Select only the classes that are known to be present in the images to avoid problems.
Not all particles might be detected by the network. Make sure, that ‘dummy’ particles are inserted instead of missing ones, to avoid problems in the tracking step. See
ParticleDetection.utils.helper_funcs.rod_endpoints()on how to define expected amounts of particles per frame.
This script yields multiple *.csv files in the your_output directory. Each detected particle class is saved to a rods_df_{classname}.csv file, e.g. rods_df_red.csv with all particles saved to rods_df.csv. From the detected segmentation masks two endpoints were generated, that will represent the rod from now on.
Below you can see the structure of these files:
idx |
x1 |
y1 |
z1 |
x2 |
y2 |
z2 |
x |
y |
z |
l |
x1_cam1 |
y1_cam1 |
x2_cam1 |
y2_cam1 |
x1_cam2 |
y1_cam2 |
x2_cam2 |
y2_cam2 |
seen_cam1 |
seen_cam2 |
particle |
frame |
(color) |
/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
( |
Only 2D data is extracted here, so the columns reserved for 3D data, generated by the steps described in the next section, are set to Nan, i.e. are empty.
Note
These files can be used with the RodTracker.
3D-Reconstruction
The reconstruction of 3D coordinates works by associating particles detected in the first camera with ones in the second camera. For that each of the detected particle is given an ID (a number) and has two endpoints on each camera and each frame. The tracking functions are then used to reassign the IDs such that a combination of particles on camera one and two is found, that minimizes the reprojection error of their calculated 3D coordinates.
Camera Calibration
For the reconstruction of 3D points a correspondence between points in the first and second camera’s images must be known. Please refer to the OpenCV documentation for more information. See below the example stereo calibration script.
import json
from ParticleDetection.reconstruct_3D.calibrate_cameras import stereo_calibrate
cam1 = "./datasets/calibration_imgs/camera1"
cam2 = "./datasets/calibration_imgs/camera2"
results = stereo_calibrate(cam1, cam2)
to_json = {
"CM1": results[1].tolist(),
"dist1": results[2].tolist(),
"CM2": results[3].tolist(),
"dist2": results[4].tolist(),
"R": results[5].tolist(),
"T": results[6].tolist(),
"E": results[7].tolist(),
"F": results[8].tolist(),
}
with open("calibration_cam12.json", "w") as f:
json.dump(to_json, f, indent=2)
World vs. Camera coordinates
After 3D reconstruction, it is usually useful to transform the positions of particles from the first camera’s coordinate system to the world/experiment coordinate system. Usually, it is the coordinate systems with its axes parallel to container walls and its origin corresponding to the geometrical center of experimental box. The transformation must be represented as a rotation followed by a translation. See below the example stereo calibration script.
import numpy as np
from pathlib import Path
from ParticleDetection.utils.helper_funcs import find_world_transform
# Input - path to stereo camera calibration file
calibration_file = Path(
"ParticleDetection/src/ParticleDetection/reconstruct_3D/example_calibration/Matlab/gp12.json"
).resolve()
# Output - path to resulting transformation file
transformation_file = Path(
"ParticleDetection/src/ParticleDetection/reconstruct_3D/example_calibration/Matlab/world_transformation_gp12.json"
).resolve()
# 2D pixel coordinates of box edges on first camera
# [front: left up, left down, right up, right down,
# back: left up, left down, right up, right down]
edges_cam1_dist = np.array(
[
[27, 36],
[30, 904],
[1235, 27],
[1240, 903],
[183, 149],
[188, 900],
[1096, 140],
[1098, 790],
]
).astype(float)
# 2D pixel coordinates of box edges on second camera
# [front: left up, left down, right up, right down,
# back: left up, left down, right up, right down]
edges_cam2_dist = np.array(
[
[26, 923],
[149, 834],
[1243, 916],
[1118, 833],
[30, 63],
[149, 146],
[1245, 57],
[1120, 144],
]
).astype(float)
# Corresponding 3D world coordinates of the box edges
# ([0,0,0] is the center of the box)
edges_3D = np.array(
[
[-58, 40, 40],
[-58, -40, 40],
[58, 40, 40],
[58, -40, 40],
[-58, 40, -40],
[-58, -40, -40],
[58, 40, -40],
[58, -40, -40],
]
).astype(float)
if __name__ == "__main__":
rot_comb, trans_vec = find_world_transform(
str(calibration_file),
edges_cam1_dist,
edges_cam2_dist,
edges_3D,
str(transformation_file),
)
Tracking
With the calibration data from above it is now possible to reconstruct the 3D positions of the detected particles. Additionally, the function used in the script below tracks the objects over the given frames, reassigning particle IDs where necessary.
The function here requires the output *.csv files from the Detection example script.
from pathlib import Path
import numpy as np
from ParticleDetection.reconstruct_3D import matchND
output_main_folder = Path("./out").resolve()
calibration_file = "./calibration_data/calibration_cam12.json"
transformation_file = "./calibration_data/transformation_cam12.json"
colors = ["blue", "brown", "green", "red", "yellow"]
frame_numbers = np.arange(1, 321)
base_folder = str(Path(".").resolve())
out_folder = str(Path("./out").resolve())
errs, lens = matchND.assign(base_folder, out_folder, colors, "cam1", "cam2",
frame_numbers, calibration_file,
transformation_file)
The output are *.csv files similar to those given by detection. The only difference are the now filled columns with 3D coordinates:
idx |
x1 |
y1 |
z1 |
x2 |
y2 |
z2 |
x |
y |
z |
l |
x1_cam1 |
y1_cam1 |
x2_cam1 |
y2_cam1 |
x1_cam2 |
y1_cam2 |
x2_cam2 |
y2_cam2 |
seen_cam1 |
seen_cam2 |
particle |
frame |
(color) |
/ |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
( |
Note
These files can be used with the RodTracker.