pepys_import.file.highlighter.support package

Submodules

pepys_import.file.highlighter.support.char module

class pepys_import.file.highlighter.support.char.Char(letter)[source]

Bases: object

Object used to store information on a specific character.

Stores the character letter itself, plus a list of usages of the character.

A list of these is kept in HighlightedFile.chars (and also available through SubToken.chars), and iterating through this list is used to create the final highlighted file.

letter
usages

pepys_import.file.highlighter.support.color_picker module

pepys_import.file.highlighter.support.color_picker.color_for(hash_code, color_dict)[source]

Get a color for a specific ‘hash code’ by either taking one we’ve already recorded for this hash code, or generating a new random one.

pepys_import.file.highlighter.support.color_picker.html_color_for(rgb)[source]

Convert a 3-element rgb structure to a HTML color definition

pepys_import.file.highlighter.support.color_picker.mean_color_for(color_arr)[source]

find the mean of the provided colors

Args:

color_arr: three-element list of R, G and B components of color

pepys_import.file.highlighter.support.combine module

pepys_import.file.highlighter.support.combine.combine_tokens(*tokens)[source]

Combine multiple tokens into one new Token, so that one single usage can be given for these tokens.

pepys_import.file.highlighter.support.export module

pepys_import.file.highlighter.support.export.export_report(filename, chars, dict_colors, include_key=False)[source]

Export a HTML report showing all the extraction usages for the file.

Parameters
  • filename – Output filename

  • chars – Characters array (should be HighlightedFile.chars)

  • dict_colors – Dictionary specifying colors to use (should be HighlightedFile.dict_colors)

  • include_key – Whether to include a key at the bottom defining the usages of the colors

This basically loops through all of the characters in the characters array, and then creates the relevant <span> tags for each character based on the usages stored for that character.

pepys_import.file.highlighter.support.line module

class pepys_import.file.highlighter.support.line.Line(list_of_subtokens, hf_instance)[source]

Bases: object

Object representing a line from a HighlightedDatafile.

Has methods to get a list of Tokens in the line, and to record a usage of the whole line.

CSV_TOKENISER = '(?:,"|^")(""|[\\w\\W]*?)(?=",|"$)|(?:,(?!")|^(?!"))([^,]*?)(?=$|,)|(\\r\\n|\\n)'
QUOTED_NAME_REGEX = '([\\"\'])(?:(?=(\\\\?))\\2.)*?\\1'
WHITESPACE_TOKENISER = '\\S+'
children
highlighted_file
record(tool: str, field: str, value: str, units: Optional[str] = None)[source]

Record a usage of the whole line

Parameters
  • tool – Name of the importer handling the import (eg. “NMEA Importer) Should be set to self.name when called from an importer

  • field – The field that the token is being interpreted as (eg. “speed”)

  • value – The parsed value of the token (eg. “5 knots”) - where possible, pass a Quantity object with associated units

  • units – The units that the field was interpreted as using (optional - do not include if the value was a Quantity as that holds unit information itself

Adds a SingleUsage object to each of the relevant characters in the char array referenced by each SubToken child.

property text

Returns the entire text of the Line

Returns

Entire text content of the Line

Return type

String

tokens(reg_exp='\\S+', strip_char='', quoted_name='([\\"\'])(?:(?=(\\\\?))\\2.)*?\\1')[source]

Generates a list of Token objects for each token in the line.

Parameters
  • reg_exp (String, optional) – Regular expression used to split the line into tokens. Useful constants are defined in this class, including CSV_TOKENISER, defaults to WHITESPACE_TOKENISER. See notes below.

  • strip_char (String, optional) – Characters to strip after splitting, defaults to “”

Returns

List of Token objects

Return type

List

Notes: The reg_exp given to this function should be a regular expression that extracts the individual tokens from the line, and not a regular expression that identifies the characters to split by. Thus, the WHITESPACE_TOKENISER regex is simply S+, which matches any amount of anything that isn’t whitespace. The CSV_TOKENISER is more complex, as it deals with quotes and other issues that cause problems in CSV files. The regular expression can use groups, but the entire match of the regular expression should be the token - there is no capacity (currently at least) for extracting particular groups of the regular expression. Use can be made of look-ahead and look-behind expressions in the regex to constrain it so that the entire match covers just the token and nothing else. (For a good example of this see the SLASH_TOKENISER in the Nisida importer)

pepys_import.file.highlighter.support.test_utils module

class pepys_import.file.highlighter.support.test_utils.FakeDatafile[source]

Bases: object

pepys_import.file.highlighter.support.test_utils.create_test_line_object(line_str)[source]

Create a Line object for the given string, with all of the various member variables set properly, so it can be passed to something expecting a line and work.

Used for tests, particularly of the REP line parser

pepys_import.file.highlighter.support.test_utils.delete_entries(d, keys_to_delete)[source]

pepys_import.file.highlighter.support.token module

class pepys_import.file.highlighter.support.token.SubToken(span, text, line_start, chars)[source]

Bases: object

Object representing a single token at a lower level than Token.

Usually there is a single SubToken object as a child of each Token object, but when tokens are combined (with the combine_tokens function) then there will be multiple SubToken children.

Each SubToken object keeps track of the span (start and end characters) of the SubToken, the text that is contained within the SubToken, the character index that the line starts at and a reference to the overall character array created by HighlightedFile.

chars
end()[source]

Returns the index into the character array that this SubToken ends at

line_start
span
start()[source]

Returns the index into the character array that this SubToken starts at

text
class pepys_import.file.highlighter.support.token.Token(list_of_subtokens, hf_instance)[source]

Bases: object

Object representing a single token extracted from a Line.

This is the main object that the user will interact with, running the record method to record that this token has been used for a specific purpose.

The children of this token are SubToken objects. Most of the time there will just be one SubToken object as a child of a Token object - however, when tokens are combined there can be multiple children.

children
highlighted_file
record(tool: str, field: str, value: str, units: Optional[str] = None)[source]

Record the usage of this token for a specific purpose

Parameters
  • tool – Name of the importer handling the import (eg. “NMEA Importer) Should be set to self.name when called from an importer

  • field – The field that the token is being interpreted as (eg. “speed”)

  • value – The parsed value of the token (eg. “5 knots”) - where possible, pass a Quantity object with associated units

  • units – The units that the field was interpreted as using (optional - do not include if the value was a Quantity as that holds unit information itself

This adds SingleUsage objects to each of the relevant characters in the character array stored by the SubToken objects that are children of this object.

property text

Returns the entire text of the Line

Returns

Entire text content of the Line

Return type

String

property text_space_separated

Returns the entire text of the Line, with spaces separating the different subtokens

Returns

Entire text content of the Line

Return type

String

pepys_import.file.highlighter.support.usages module

class pepys_import.file.highlighter.support.usages.SingleUsage(tool_field, message)[source]

Bases: object

Stores information on a single usage of a character.

Contains two fields: tool_field and message.

Objects created from this class are stored in the usages list on Char objects.

message
tool_field

Module contents