ParticleDetection.utils.datasets

Functions and classes for dataset information and manipulation.

Author: Adrian Niemann (adrian.niemann@ovgu.de)

Date: 31.10.2022

DEFAULT_CLASSES = {0: 'blue', 1: 'green', 2: 'orange', 3: 'purple', 4: 'red', 5: 'yellow', 6: 'black', 7: 'lilac', 8: 'brown'}: Class-color correspondences most commonly used by the trained networks.

DEFAULT_COLUMNS = ['x1', 'y1', 'z1', 'x2', 'y2', 'z2', 'x', 'y', 'z', 'l', 'x1_{id1:s}', 'y1_{id1:s}', 'x2_{id1:s}', 'y2_{id1:s}', 'x1_{id2:s}', 'y1_{id2:s}', 'x2_{id2:s}', 'y2_{id2:s}', 'frame', 'seen_{id1:s}', 'seen_{id2:s}', 'color']: Columns of rod position datasets used, e.g. in the RodTracker app.

class DataGroup(train: DataSet, val: DataSet)[source]

Bases: object

Collection of training and test set for training a network.

train: DataSet

val: DataSet

class DataSet(name: str, folder: str, annotation_file: str)[source]

Bases: object

Representation of a dataset for training a network.

annotation: str

folder: str

name: str

class DetectionResult

Bases: dict

Results of detecting particles in an image file.

input_size: List[int]

pred_boxes: Tensor

pred_classes: Tensor

pred_masks: Tensor

scored: Tensor

RNG_SEED = 1: Seed to allow reproducibility of results, that are dependent on the generation of random numbers.

add_points(points: Dict[str, ndarray], data: DataFrame, cam_id: str, frame: int)[source]

Updates a DataFrame with new rod endpoint data for one camera and frame.

Parameters:

points (Dict[str, np.ndarray]) – Rod endpoints in the format obtained from rod_endpoints().
data (DataFrame) – DataFrame for the rods to be saved in.
cam_id (str) – ID/Name of the camera, that produced the image the rod endpoints were computed on.
frame (int) – Frame number in the dataset.

Returns:

pd.DataFrame – Returns the updated data.

get_dataset_classes(dataset: DataSet) → Set[int][source]: Retrieve the number and IDs of thing classes in the dataset.

get_dataset_size(dataset: DataSet) → int[source]: Compute the number of annotated images in a dataset (excluding augmentation).

get_files(dataset: DataSet) → List[str][source]

Retrieve the file paths of a dataset that have annotations associated.

Parameters:: dataset (DataSet)
Returns:: List[str] – List of file paths to images that have annotations associated to them.

get_object_counts(dataset: DataSet) → List[int][source]: Returns a list of the number of objects in each image in the dataset.

get_pixel_stats(files: List[str]) → Tuple[ndarray, ndarray][source]

Get the mean and standard deviation of each color channel for a list of image files.

Parameters:

files (List[str]) – List of file paths to images that shall be included in the calculation.

Returns:

means (ndarray) – Mean pixel values of the given dataset for each color channel in BGR order. Shape: (3, 1)
standard-deviations (ndarray) – Standard deviation of pixel values for the given dataset for each color channel in BGR order. Shape: (3, 1)

insert_missing_rods(dataset: DataFrame, expected_rods: int, cam1_id: str = 'gp1', cam2_id: str = 'gp2') → DataFrame[source]

Inserts empty rods into a dataset, depending on how many are expected.

Parameters:

dataset (pd.DataFrame) – Dataset with the column format from DEFAULT_COLUMNS.
expected_rods (int) – The expected number of rods per frame (and color).
cam1_id (str) – Default is "gp1".
cam2_id (str) – Default is "gp2".

Returns:

DataFrame

randomize_endpoints(file: Path, cam_ids: List[str] | None = None) → None[source]

Randomize the order of particles/endpoints in a dataset/-file.

The dataset with randomized particle numbers is saved with 'rand_endpoints_' as a prefix to the file’s name.

Parameters:

file (Path) – Path to a *.csv file containing data in the format of DEFAULT_COLUMNS.
cam_ids (List[str]) – Cam IDs present in the dataset. Default is ["gp1", "gp2"].

Returns:

None

randomize_particles(file: Path) → None[source]

Randomizes particle numbers per frame of a given *.csv dataset.

The dataset with randomized particle numbers is saved with 'rand_particles_' as a prefix to the file’s name.

Parameters:: file (Path) – Path to a *.csv file containing data in the format of DEFAULT_COLUMNS, but at minimum with column 'frame'.

replace_missing_rods(dataset: DataFrame, cam1_id: str = 'gp1', cam2_id: str = 'gp2') → DataFrame[source]

Fills missing data in 'seen_...' and '[xy][12]_...' columns.

Replaces NaN values in columns of the format 'seen_...' and '[xy][12]_...', see DEFAULT_COLUMNS for more information. NaN``s in ``'seen_...' are replaced by 0, NaN``s in ``'[xy][12]_...' are replaced by -1..

Parameters:

dataset (DataFrame) – Dataset with the column format from DEFAULT_COLUMNS.
cam1_id (str) – Default is "gp1".
cam2_id (str) – Default is "gp2".

Returns:

DataFrame