ParticleDetection.utils.datasets

Functions and classes for dataset information and manipulation.

Author: Adrian Niemann (adrian.niemann@ovgu.de)

Date: 31.10.2022

DEFAULT_CLASSES = {0: 'blue', 1: 'green', 2: 'orange', 3: 'purple', 4: 'red', 5: 'yellow', 6: 'black', 7: 'lilac', 8: 'brown'}

Class-color correspondences most commonly used by the trained networks.

DEFAULT_COLUMNS = ['x1', 'y1', 'z1', 'x2', 'y2', 'z2', 'x', 'y', 'z', 'l', 'x1_{id1:s}', 'y1_{id1:s}', 'x2_{id1:s}', 'y2_{id1:s}', 'x1_{id2:s}', 'y1_{id2:s}', 'x2_{id2:s}', 'y2_{id2:s}', 'frame', 'seen_{id1:s}', 'seen_{id2:s}', 'color']

Columns of rod position datasets used, e.g. in the RodTracker app.

class DataGroup(train: DataSet, val: DataSet)[source]

Bases: object

Collection of training and test set for training a network.

train: DataSet
val: DataSet
class DataSet(name: str, folder: str, annotation_file: str)[source]

Bases: object

Representation of a dataset for training a network.

annotation: str
folder: str
name: str
class DetectionResult

Bases: dict

Results of detecting particles in an image file.

input_size: List[int]
pred_boxes: Tensor
pred_classes: Tensor
pred_masks: Tensor
scored: Tensor
RNG_SEED = 1

Seed to allow reproducibility of results, that are dependent on the generation of random numbers.

add_points(points: Dict[str, ndarray], data: DataFrame, cam_id: str, frame: int)[source]

Updates a DataFrame with new rod endpoint data for one camera and frame.

Parameters
  • points (Dict[str, np.ndarray]) – Rod endpoints in the format obtained from rod_endpoints().

  • data (DataFrame) – DataFrame for the rods to be saved in.

  • cam_id (str) – ID/Name of the camera, that produced the image the rod endpoints were computed on.

  • frame (int) – Frame number in the dataset.

Returns

pd.DataFrame – Returns the updated data.

get_dataset_classes(dataset: DataSet) Set[int][source]

Retrieve the number and IDs of thing classes in the dataset.

get_dataset_size(dataset: DataSet) int[source]

Compute the number of annotated images in a dataset (excluding augmentation).

get_files(dataset: DataSet) List[str][source]

Retrieve the file paths of a dataset that have annotations associated.

Parameters

dataset (DataSet) –

Returns

List[str] – List of file paths to images that have annotations associated to them.

get_object_counts(dataset: DataSet) List[int][source]

Returns a list of the number of objects in each image in the dataset.

get_pixel_stats(files: List[str]) Tuple[ndarray, ndarray][source]

Get the mean and standard deviation of each color channel for a list of image files.

Parameters

files (List[str]) – List of file paths to images that shall be included in the calculation.

Returns

  • means (ndarray) – Mean pixel values of the given dataset for each color channel in BGR order. Shape: (3, 1)

  • standard-deviations (ndarray) – Standard deviation of pixel values for the given dataset for each color channel in BGR order. Shape: (3, 1)

insert_missing_rods(dataset: DataFrame, expected_rods: int, cam1_id: str = 'gp1', cam2_id: str = 'gp2') DataFrame[source]

Inserts empty rods into a dataset, depending on how many are expected.

Parameters
  • dataset (pd.DataFrame) – Dataset with the column format from DEFAULT_COLUMNS.

  • expected_rods (int) – The expected number of rods per frame (and color).

  • cam1_id (str) – Default is "gp1".

  • cam2_id (str) – Default is "gp2".

Returns

DataFrame

randomize_endpoints(file: Path, cam_ids: Optional[List[str]] = None) None[source]

Randomize the order of particles/endpoints in a dataset/-file.

The dataset with randomized particle numbers is saved with 'rand_endpoints_' as a prefix to the file’s name.

Parameters
  • file (Path) – Path to a *.csv file containing data in the format of DEFAULT_COLUMNS.

  • cam_ids (List[str]) – Cam IDs present in the dataset. Default is ["gp1", "gp2"].

Returns

None

randomize_particles(file: Path) None[source]

Randomizes particle numbers per frame of a given *.csv dataset.

The dataset with randomized particle numbers is saved with 'rand_particles_' as a prefix to the file’s name.

Parameters

file (Path) – Path to a *.csv file containing data in the format of DEFAULT_COLUMNS, but at minimum with column 'frame'.

replace_missing_rods(dataset: DataFrame, cam1_id: str = 'gp1', cam2_id: str = 'gp2') DataFrame[source]

Fills missing data in 'seen_...' and '[xy][12]_...' columns.

Replaces NaN values in columns of the format 'seen_...' and '[xy][12]_...', see DEFAULT_COLUMNS for more information. NaN``s in ``'seen_...' are replaced by 0, NaN``s in ``'[xy][12]_...' are replaced by -1..

Parameters
  • dataset (DataFrame) – Dataset with the column format from DEFAULT_COLUMNS.

  • cam1_id (str) – Default is "gp1".

  • cam2_id (str) – Default is "gp2".

Returns

DataFrame