dataset¶
A Dataset is the collection of data produced during one contiguous period of time
(or, informally, a “session”).
It can include multiple Recording s,
which are the data produced by a single device during the collection of the dataset.
A Recording may consist of several different modalities of data,
or “streams”,
like a Video, a raw binary stream, and an accompanying metadata CSV.
The items within a recording are assumed to be of the same timebase. If a device produces multiple streams of data in different timebases (e.g. video and electrophysiology), then those should be considered separate recordings.
A dataset might consist of multiple recordings from different devices that need to be aligned
(e.g. multiple cameras from multiple angles, multiple sensors receiving the same stream, etc.).
The dataset can contain an alignment_map that maps a common, contiguous, monotonic index
onto the indexes of individual recordings.
Recordings may be related to or derived from other recordings: e.g. A video can be indicated as being derived from a binary stream, a preprocessed, denoised, etc. video can be derived from the raw video, and so on. A derivation is indicated by a reference from the derived to source recording and the transformation that was applied.
Timestamps within a dataset are assumed to be in the same unit (e.g. datetimes or unix epoch floats) and in the same timezone, but not necessarily entirely equivalent (e.g. multiple machines with system clocks synchronized with NTP).
A dataset is assumed to be on disk, and only small, text-based streams are loaded into memory. The recordings within a dataset are thus primarily represented as paths, but provide iterators or other accessors to get their contents by slicing syntax.
- class mio.models.dataset.Dataset(*, path: Path, recordings: dict[str, ~mio.models.dataset.Recording]=<factory>, alignment_map: DataFrame | None = None)¶
A single capture from a mio device, including any videos, metadata tables, and other byproducts
- align(recordings: list[Recording] | list[str], write: bool = False) Self¶
Create an alignment map, or return an already-existing alignment map
- alignment_map: DataFrame | None¶
A dataframe with a column “index” that is the common index for frames within recordings, and columns for each recording name containing the index that the mapped index corresponds to such that each frame within a row was captured at the same time.
Stored as alignment_map.csv in the dataset directory
E.g. if a dataset contains two videos “a” and “b”, and “b” started 5 frames before “a”, then the alignment map would look like:
index | a | b |—– | - | - |0 | 0 | 5 |1 | 1 | 6 |
- classmethod from_recordings(recordings: list[Recording]) Dataset¶
Instantiate a dataset from recordings, loading any alignment map found.
- get_stitched(recordings: list[Recording] | list[str]) StitchedRecording¶
Get a stitched recording of a set of recordings, if it exists, otherwise throw a KeyError
- model_config = {'arbitrary_types_allowed': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mio.models.dataset.RawVideoRecording(*, name: str, type: Literal['raw'] = 'raw', video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)], metadata: DataFrame[StreamBufferTable] | None = None, timestamps: DataFrame[TimestampTable] | None = None, noise: DataFrame[NoiseTable] | None = None, binary: Path | None = None, derived_from: RecordingDerivation | None = None)¶
A raw video
- model_config = {'arbitrary_types_allowed': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mio.models.dataset.Recording(*, name: str, type: Literal['raw', 'stitched'], video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)], metadata: DataFrame[StreamBufferTable] | None = None, timestamps: DataFrame[TimestampTable] | None = None, noise: DataFrame[NoiseTable] | None = None, binary: Path | None = None, derived_from: RecordingDerivation | None = None)¶
A single set of matching data streams from a device within a dataset.
- derived_from: RecordingDerivation | None¶
- classmethod from_video(path: Path) Annotated[Annotated[RawVideoRecording, Tag(tag=raw)] | Annotated[StitchedRecording, Tag(tag=stitched)], Discriminator(discriminator=_recording_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)]¶
Find the adjoining files from the video path
- metadata: DataFrame[StreamBufferTable] | None¶
Metadata for frames within the video
- model_config = {'arbitrary_types_allowed': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- noise: DataFrame[NoiseTable] | None¶
Framewise noise measurements (created with
score_noise()).
- property paths: RecordingPaths¶
Given some video, the expected paths for its related components
- score_noise(config: NoisePatchConfig | None = None, progress: bool = False, force: bool = False) DataFrame¶
Score the noise level in each frame with
score_noise(), saving as a csv with {name}_noise.csv
- timestamps: DataFrame[TimestampTable] | None¶
Timestamps table, (currently) stored as
{video_name}_timestamps.csvnext to the video. When instantiating a recording, if a metadata file exists but timestamps do not, they are automatically generated.
- video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)]¶
A video created as part of this recording
- class mio.models.dataset.RecordingDerivation(*, type: Literal['stitched'], sources: set[str])¶
How a recording was derived from other recordings
- model_config = {}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class mio.models.dataset.RecordingPaths¶
Filenames for potential parts of a recording
- class mio.models.dataset.StitchedRecording(*, name: str, type: Literal['stitched'] = 'stitched', video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)], metadata: DataFrame[StreamBufferTable], timestamps: DataFrame[TimestampTable] | None = None, noise: DataFrame[NoiseTable] | None = None, binary: Path | None = None, derived_from: RecordingDerivation, scores: DataFrame[StitchTable], debug_video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | None = None)¶
Multiple video recordings stitched together, picking one best aligned frame from each
- debug_video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | None¶
An optional debug video that shows the source videos side by side with differences marked
- derived_from: RecordingDerivation¶
A derivation reference that indicates which videos this stitch was derived from
- classmethod from_video(path: Path) StitchedRecording¶
Determine which videos we were derived from using the path name
- metadata: DataFrame[StreamBufferTable]¶
Metadata for frames within the video
- model_config = {'arbitrary_types_allowed': True, 'validate_default': True}¶
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- scores: DataFrame[StitchTable]¶
A csv that indicates which recording each stitched frame was selected from
- mio.models.dataset.paths_from_video(video: Path) RecordingPaths¶
Given some path to a root video, create the expected paths for its components