mdf_reader package

Submodules

mdf_reader.mdf_blocks module

This module is part of the mdf_parser.

The md_blocks module contains classes for all required blocks of a mdf-file

Objects of the different mdf format blocks are instantiated by the MDFParser or by the classes within mdf_blocks itself.

Short description of the block classes

MDFBlock

Is the base class for all other classes

Provides methods for the interpretation of the mdf standard

Provides methods to manipulate strings

Methods is this class a common for all other classes

MDHFileHeader

Name in MDF: Identification block

Identification of the file as MDF file and MDF version

class mdf_reader.mdf_blocks.DataFormat[source]

Bases: object

Definition of format names according to MDF manual.

Notes

  • The coding is one-index-based and cyclic on 256, meaning that 256 = 1, 257 = 2,…

  • To obtain the correct name, do : DataFormat[ data_format_nr % 256 - 1]

bytes = {'BOOL': 2, 'BOOL16': 2, 'BOOL32': 4, 'BOOL64': 8, 'BOOL8': 1, 'CHAR': 1, 'DOUBLE': 8, 'FLOAT': 4, 'INT16': 2, 'LINK': 4, 'LONG': 4, 'LONG DOUBLE': 10, 'LONGLONG': 8, 'REAL': 6, 'REAL48': 6, 'SHORT': 2, 'UCHAR': 1, 'UINT16': 2, 'UINT32': 4, 'UINT64': 8, 'UINT8': 1, 'ULONG': 4, 'ULONGLONG': 8, 'USHORT': 2, 'text': None, 'ymdhms': 4}
name = 'LINK'
np_dtypes = {'BOOL': dtype('bool'), 'BOOL16': dtype('bool'), 'BOOL32': dtype('bool'), 'BOOL64': dtype('bool'), 'BOOL8': dtype('bool'), 'CHAR': dtype('int64'), 'DOUBLE': dtype('float64'), 'FLOAT': dtype('float64'), 'INT16': dtype('int16'), 'LINK': dtype('int64'), 'LONG': dtype('int64'), 'LONG DOUBLE': dtype('int64'), 'LONGLONG': dtype('int64'), 'REAL': dtype('float64'), 'REAL48': dtype('float64'), 'SHORT': dtype('int16'), 'UCHAR': dtype('int64'), 'UINT16': dtype('uint16'), 'UINT32': dtype('uint32'), 'UINT64': dtype('uint64'), 'UINT8': dtype('int64'), 'ULONG': dtype('uint64'), 'ULONGLONG': dtype('uint64'), 'USHORT': dtype('uint16'), 'text': dtype('int64'), 'ymdhms': dtype('uint64')}
class mdf_reader.mdf_blocks.DataSetRecord(file_pointer, verbose=1)[source]

Bases: MDFBlock

DataSetRecord contains all information specified by MDF for the Dataset Record

Parameters:
  • file_pointer (object) – Point to the file stream

  • verbose (int) – verbosity level

type

type of the data record

Type:

int

size

size of the data record

Type:

int

version

verbosity level

Type:

int

frame_offset

offset of the current frame

Type:

int

byte_to_ndarray(byte_array: object, frame_size: int, n_frames_to_read: int) ndarray[source]

Turn a byte array into a numpy array

Parameters:
  • byte_array (object) – Binary array containing all the records

  • frame_size (int) – Size of a single frame

  • n_frames_to_read (int) – Number of records to read

Returns:

The numpy array with the converted that

Return type:

ndarray

class mdf_reader.mdf_blocks.MDFBlock[source]

Bases: object

Base class to define a block in the mdf file

static pretty(string)[source]

Removes tailing zero strings from a string

Parameters:

string (str) – The string to clean

Returns:

string with removed tailing zero strings

Return type:

str

read_format(fp, data_type, number=1)[source]
Parameters:
  • fp (file object) – pointer to the current file

  • data_type (dtype) – Type of the data to read

  • number (int, optional) – Number of items to read. Default value = 1

Returns:

Unpacked data

Return type:

dtype

read_string(fp, n_characters)[source]

Read a string consisting of n_characters starting at the current position of the file pointer

Parameters:
  • fp (IOStream) – file pointer to the current data

  • n_characters (int) – Number of characters to read

Returns:

The string read from the fp file pointer

Return type:

str

Notes

It is also ensured that the file pointer is positioned at the end of the n_characters after reading

static unpack_byte_array(byte_array, data_type)[source]
Parameters:
  • byte_array (ndarray) – Array with the data to unpack

  • data_type (dtype) – Type of the data

Returns:

Unpacked data

Return type:

ndarray

class mdf_reader.mdf_blocks.MDHFileHeader(file_pointer)[source]

Bases: MDFBlock

MDHFileHeader contains all information specified by MDF for the ID BLOCK

Parameters:

mdf_stream (IOStream) – reference to mdf file

version

Version string, such as 2.1

Type:

str

version_minor

Minor version (1)

Type:

UINT16

version_major

Major version (2)

Type:

UINT16

status_record_position

Refers to status of type 10. Must be 0 if unused

Type:

LONG

created_by

Generated by: 0=User, 1=MLab

Type:

LONG

mdf_header_size

Size of the header (normally 72)

Type:

LONG

store_type

storage method: 0. Multiplexed 1. Block-wise (currently unused)

Type:

LONG

file_type

File type

Type:

LONG

frame_size

Size of the data frame in bytes

Type:

LONG

no_of_data_sets

Number of datasets in the file

Type:

LONG

day

Day of the date at which the MDF file was created

Type:

LONG

month

Month of the date at which the MDF file was created

Type:

LONG

year

Year of the date at which the MDF file was created

Type:

LONG

hour

Hour of the time at which the MDF file was created

Type:

LONG

minute

Minute of the time at which the MDF file was created

Type:

LONG

second

Second of the time at which the MDF file was created

Type:

LONG

mdf_reader.mdf_blocks.set_logging_level(logger, verbose=1)[source]

function to set the level of the logger

Parameters:
  • logger – handle to the logger

  • verbose – 0=silent, 1=info, 2=debug (Default value = 1)

Raises:

AssertionError – In case a non valid option is passed

mdf_reader.mdf_parser module

A module for reading microlab MDF file. Usage

import mdf_parser.mdf_parser as mdf

mdf_object = mdf.MDFParser(file_name)

Author: Eelco van Vliet 29-2-2015

class mdf_reader.mdf_parser.MDFParser(mdf_file, import_data=True, include_columns=None, exclude_columns=None, verbose=1, convert_datetime=True, resample_data=False, constant_sample_rate=True, replace_record_names={}, log_level=30, date_time_label='DateTime', date_time_match_string='^_DateTime32$', load_date_time=True, set_relative_time_column=False, include_date_time=False)[source]

Bases: object

The MDFParser class contains methods for reading mdf files.

Parameters:
  • mdf_file (str) – Path to a binary the follows MDF 3.3 specification

  • import_data (bool , optional) – Flag to enable to import the data, default = True. If False, only the header information is read.

  • include_columns (list) – List with columns to import. Default value = []. If empty, all columns are included

  • exclude_columns (list) – List with colums to exclude. Default value =[]. If empty, none are excluded

  • verbose (int) – Set the logging level. Obsolete 0. Silent 1. Normal info 2. Debugging

  • convert_datetime (int, optional) – Translate the ymdhms integer into a data time string

  • resample_data (bool, optional, False) – The sampled data is not completely uniformly sampled. To enforce an equidistant sampling, set this flag to true

  • constant_sample_rate (bool, optional) – If true, use the sample rate for the clock, otherwise, the ymdhms is leading. Defaults to True

  • replace_record_names (dict, optional) – A dictionary with records names which we want to replace from A1 to B1

  • date_time_label (str, optional) – Default label to assign to the Date time string. Default = “DateTime”

  • date_time_match_string (str, optional) – The date time column is selected based on this match string. Default = “_DateTime32”

  • load_date_time (bool, optional) – Always read the date time information channel, even if it is not explicitly mentioned in the filter list. Defaults to True

  • set_relative_time_column (bool, optional) – If true, create a column time_r in seconds with the relative time starting at t=0 s. Defaults to False

Examples

Reading an MDF file is done by creating a MDFParser object with a file_name as first argument.

>>> file_name = "../data/AMS_BALDER_110225T233000_UTC222959.mdf"
>>> header_object = MDFParser(mdf_file=file_name, import_data=False)
>>> names = header_object.make_report()

If the import_data flag would have been set to True, the header_object class would have been created and all MDF data would be put in a data frame header_object.data. In this example, however, we only read the header information of the MDF file first. As a next step, we can make a selection of the data columns we want to import. In this way the reading time of an MDF data file can be reduced significantly as only the selected data needs to be imported. The data available in the mdf file can be explored by using the make_report() method. which writes all channels to screen. Now, we are going to select the MRU_Roll data first.

>>> from tabulate import tabulate
>>> names_labels_and_groups = header_object.set_column_selection(
...     filter_list=["MRU_Roll"], include_date_time=True)
>>> header_object.import_data()
>>> print(tabulate(header_object.data.head(5), headers="keys", tablefmt="psql"))
+----------------------------+------------+
| DateTime                   |   MRU_Roll |
|----------------------------+------------|
| 2011-02-25 23:30:00        |    0.01207 |
| 2011-02-25 23:30:00.040000 |    0.01207 |
| 2011-02-25 23:30:00.080000 |    0.01207 |
| 2011-02-25 23:30:00.120000 |    0.01204 |
| 2011-02-25 23:30:00.160000 |    0.01204 |
+----------------------------+------------+

The names_labels_and_groups now contains 3 lists, but we don’t use it now. For more information about the return values, look at the docstring of the set_column_selection method.

Because we have added the include_date_time flag, the DataTime column is read by default and set as the index of the DataFrame. You can do this multiple times if you want to add more columns. The include_data_time does not have to be given again as we already have imported the DateTime. So let’s import the MRU Roll Pitch Heave channels as well. We do this with a regular expression matching all the channels names starting with MRU_R, MRU_P, or MRU_H

>>> names_labels_and_groups = header_object.set_column_selection(
...    filter_list=["MRU_[RPH]"])
>>> header_object.import_data()
>>> print(tabulate(header_object.data.head(5), headers="keys", tablefmt="psql"))
+----------------------------+------------+-------------+-------------+
| DateTime                   |   MRU_Roll |   MRU_Heave |   MRU_Pitch |
|----------------------------+------------+-------------+-------------|
| 2011-02-25 23:30:00        |    0.01207 |     -0.1051 |  -0.0001869 |
| 2011-02-25 23:30:00.040000 |    0.01207 |     -0.1051 |  -0.0001869 |
| 2011-02-25 23:30:00.080000 |    0.01207 |     -0.1051 |  -0.0001869 |
| 2011-02-25 23:30:00.120000 |    0.01204 |     -0.1078 |  -0.0002593 |
| 2011-02-25 23:30:00.160000 |    0.01204 |     -0.1078 |  -0.0002593 |
+----------------------------+------------+-------------+-------------+

Since all data is stored in the Pandas Dataframe header_object.data we can plot the data using all Pandas/matplotlib plotting capabilities. This is demonstrated in the example notebook.

import_data(set_relative_time_column=None)[source]

Import the binary data from the dta file

Parameters:

set_relative_time_column (bool or None, optional) – If true, store the relative time in the time_r column. Default is None, which means that the value as stored during initialization of the class is taken. This is False by default, but can also be passed through the constructor arguments.

import_header(mdf_file)[source]

Read the header data from the mdf file

Parameters:

mdf_file – the name of the mdf header file

Returns:

nothing

Return type:

type

make_report(show_loaded_data_only=False)[source]

Make a report of the records available in the mdf file

Parameters:

show_loaded_data_only (bool, optional) – If True, only show the data columns that have been loaded. Default = False, which means that all channels are shown

Returns:

List of the reported columns. We can use this list to obtain the channel name by the index

Return type:

list

set_column_selection(filter_list, set_on_exclude_list=False, include_date_time=None)[source]

Select the data to import based on a list of regular expressions

Parameters:
  • filter_list (list) – A list with regular expression in which the first filter is always applied on the name field and the next filters are all applied to the label field of the record.

  • set_on_exclude_list (bool, optional) – By default, the selected columns are added to the include list. If this value is true, set the selection on the excluded list. Defaults to False

  • include_date_time (bool, optional) – Include the date time field by default (without specification in the filter_list). Handy for the examples as you don’t have to specify the DateTime explicitly. Defaults to None, implying that the setting is taken from the constructor and is set to False.

Returns:

Selection of name columns along with a list of the () group selection

Return type:

tuple (name_list, label_list, group_list)

Notes

The data reader allows passing a list of exclude_columns and include_columns by which you can select which column is actually read. With the routine, lists can be created by a regular expression filter

mdf_reader.mdf_parser.convert_ymdhms_to_data_time(ymdhrs_array, sample_rate=1, constant_sample_rate=True)[source]

Convert the binary year month day hour minutes seconds representation into a readable data/time string

Parameters:
  • ymdhrs_array (binary) – array with the ymdhrs datatime integers

  • sample_rate (float, optional) – the sampling rate of the signal (Default value = 1)

  • constant_sample_rate (bool, optional) – If True assume that the sample read is leading (Default value = True)

Returns:

DateTime pandas index array

Return type:

type

Notes

In the first version of the script, the ymdhrs was taken as leading and the number of samples per seconds we corrected to take care of missing samples or too many samples in a second. It appears that the sample rate is really constant and that the clock time may vary. Setting this flag true takes the sample rate leading

mdf_reader.mdf_parser.decode_ymdhms(ymdhms)[source]

The year month day hour minute seconds are stored in the 4-byte integer

Parameters:

ymdhms (int) – A 4-byte integer containing the date time according to the MTF manual

Returns:

The ISO Data time string

Return type:

type

mdf_reader.mdf_parser.main(args)[source]

The main routine for testing purpose

Parameters:

args (list) – Command line arguments

mdf_reader.mdf_parser.parse_args(args)[source]

Parse command line parameters

Parameters:

args (list) – Command line parameters as a list of strings

Returns:

command line parameters

Return type:

argparse.Namespace

mdf_reader.mdf_parser.run()[source]

Module contents