ParticleDetection.utils.datasets
Functions and classes for dataset information and manipulation.
Author: Adrian Niemann (adrian.niemann@ovgu.de)
Date: 31.10.2022
- DEFAULT_CLASSES = {0: 'blue', 1: 'green', 2: 'orange', 3: 'purple', 4: 'red', 5: 'yellow', 6: 'black', 7: 'lilac', 8: 'brown'}
Class-color correspondences most commonly used by the trained networks.
- DEFAULT_COLUMNS = ['x1', 'y1', 'z1', 'x2', 'y2', 'z2', 'x', 'y', 'z', 'l', 'x1_{id1:s}', 'y1_{id1:s}', 'x2_{id1:s}', 'y2_{id1:s}', 'x1_{id2:s}', 'y1_{id2:s}', 'x2_{id2:s}', 'y2_{id2:s}', 'frame', 'seen_{id1:s}', 'seen_{id2:s}', 'color']
Columns of rod position datasets used, e.g. in the RodTracker app.
- class DataGroup(train: DataSet, val: DataSet)[source]
Bases:
objectCollection of training and test set for training a network.
- class DataSet(name: str, folder: str, annotation_file: str)[source]
Bases:
objectRepresentation of a dataset for training a network.
- annotation: str
- folder: str
- name: str
- class DetectionResult
Bases:
dictResults of detecting particles in an image file.
See also
ParticleDetection.utils.detection._run_detection(),ParticleDetection.modelling.runners.detection.detect()- input_size: List[int]
- pred_boxes: Tensor
- pred_classes: Tensor
- pred_masks: Tensor
- scored: Tensor
- RNG_SEED = 1
Seed to allow reproducibility of results, that are dependent on the generation of random numbers.
- add_points(points: Dict[str, ndarray], data: DataFrame, cam_id: str, frame: int)[source]
Updates a
DataFramewith new rod endpoint data for one camera and frame.- Parameters:
points (Dict[str, np.ndarray]) – Rod endpoints in the format obtained from
rod_endpoints().data (DataFrame) –
DataFramefor the rods to be saved in.cam_id (str) – ID/Name of the camera, that produced the image the rod endpoints were computed on.
frame (int) – Frame number in the dataset.
- Returns:
pd.DataFrame – Returns the updated
data.
- get_dataset_classes(dataset: DataSet) Set[int][source]
Retrieve the number and IDs of thing classes in the dataset.
- get_dataset_size(dataset: DataSet) int[source]
Compute the number of annotated images in a dataset (excluding augmentation).
- get_files(dataset: DataSet) List[str][source]
Retrieve the file paths of a dataset that have annotations associated.
- Parameters:
dataset (DataSet)
- Returns:
List[str] – List of file paths to images that have annotations associated to them.
- get_object_counts(dataset: DataSet) List[int][source]
Returns a list of the number of objects in each image in the dataset.
- get_pixel_stats(files: List[str]) Tuple[ndarray, ndarray][source]
Get the mean and standard deviation of each color channel for a list of image files.
- Parameters:
files (List[str]) – List of file paths to images that shall be included in the calculation.
- Returns:
means (ndarray) – Mean pixel values of the given dataset for each color channel in BGR order. Shape: (3, 1)
standard-deviations (ndarray) – Standard deviation of pixel values for the given dataset for each color channel in BGR order. Shape: (3, 1)
- insert_missing_rods(dataset: DataFrame, expected_rods: int, cam1_id: str = 'gp1', cam2_id: str = 'gp2') DataFrame[source]
Inserts empty rods into a dataset, depending on how many are expected.
- Parameters:
dataset (pd.DataFrame) – Dataset with the column format from
DEFAULT_COLUMNS.expected_rods (int) – The expected number of rods per frame (and color).
cam1_id (str) – Default is
"gp1".cam2_id (str) – Default is
"gp2".
- Returns:
DataFrame
- randomize_endpoints(file: Path, cam_ids: List[str] | None = None) None[source]
Randomize the order of particles/endpoints in a dataset/-file.
The dataset with randomized particle numbers is saved with
'rand_endpoints_'as a prefix to the file’s name.- Parameters:
file (Path) – Path to a
*.csvfile containing data in the format ofDEFAULT_COLUMNS.cam_ids (List[str]) – Cam IDs present in the dataset. Default is
["gp1", "gp2"].
- Returns:
None
- randomize_particles(file: Path) None[source]
Randomizes particle numbers per frame of a given
*.csvdataset.The dataset with randomized particle numbers is saved with
'rand_particles_'as a prefix to the file’s name.- Parameters:
file (Path) – Path to a
*.csvfile containing data in the format ofDEFAULT_COLUMNS, but at minimum with column'frame'.
- replace_missing_rods(dataset: DataFrame, cam1_id: str = 'gp1', cam2_id: str = 'gp2') DataFrame[source]
Fills missing data in
'seen_...'and'[xy][12]_...'columns.Replaces
NaNvalues in columns of the format'seen_...'and'[xy][12]_...', seeDEFAULT_COLUMNSfor more information.NaN``s in ``'seen_...'are replaced by0,NaN``s in ``'[xy][12]_...'are replaced by-1..- Parameters:
dataset (DataFrame) – Dataset with the column format from
DEFAULT_COLUMNS.cam1_id (str) – Default is
"gp1".cam2_id (str) – Default is
"gp2".
- Returns:
DataFrame