pepys_import.file.highlighter.support package¶
Submodules¶
pepys_import.file.highlighter.support.char module¶
- class pepys_import.file.highlighter.support.char.Char(letter)[source]¶
Bases:
object
Object used to store information on a specific character.
Stores the character letter itself, plus a list of usages of the character.
A list of these is kept in HighlightedFile.chars (and also available through SubToken.chars), and iterating through this list is used to create the final highlighted file.
- letter¶
- usages¶
pepys_import.file.highlighter.support.color_picker module¶
- pepys_import.file.highlighter.support.color_picker.color_for(hash_code, color_dict)[source]¶
Get a color for a specific ‘hash code’ by either taking one we’ve already recorded for this hash code, or generating a new random one.
pepys_import.file.highlighter.support.combine module¶
pepys_import.file.highlighter.support.export module¶
- pepys_import.file.highlighter.support.export.export_report(filename, chars, dict_colors, include_key=False)[source]¶
Export a HTML report showing all the extraction usages for the file.
- Parameters
filename – Output filename
chars – Characters array (should be HighlightedFile.chars)
dict_colors – Dictionary specifying colors to use (should be HighlightedFile.dict_colors)
include_key – Whether to include a key at the bottom defining the usages of the colors
This basically loops through all of the characters in the characters array, and then creates the relevant <span> tags for each character based on the usages stored for that character.
pepys_import.file.highlighter.support.line module¶
- class pepys_import.file.highlighter.support.line.Line(list_of_subtokens, hf_instance)[source]¶
Bases:
object
Object representing a line from a HighlightedDatafile.
Has methods to get a list of Tokens in the line, and to record a usage of the whole line.
- CSV_TOKENISER = '(?:,"|^")(""|[\\w\\W]*?)(?=",|"$)|(?:,(?!")|^(?!"))([^,]*?)(?=$|,)|(\\r\\n|\\n)'¶
- QUOTED_NAME_REGEX = '([\\"\'])(?:(?=(\\\\?))\\2.)*?\\1'¶
- WHITESPACE_TOKENISER = '\\S+'¶
- children¶
- highlighted_file¶
- record(tool: str, field: str, value: str, units: Optional[str] = None)[source]¶
Record a usage of the whole line
- Parameters
tool – Name of the importer handling the import (eg. “NMEA Importer) Should be set to self.name when called from an importer
field – The field that the token is being interpreted as (eg. “speed”)
value – The parsed value of the token (eg. “5 knots”) - where possible, pass a Quantity object with associated units
units – The units that the field was interpreted as using (optional - do not include if the value was a Quantity as that holds unit information itself
Adds a SingleUsage object to each of the relevant characters in the char array referenced by each SubToken child.
- property text¶
Returns the entire text of the Line
- Returns
Entire text content of the Line
- Return type
String
- tokens(reg_exp='\\S+', strip_char='', quoted_name='([\\"\'])(?:(?=(\\\\?))\\2.)*?\\1')[source]¶
Generates a list of Token objects for each token in the line.
- Parameters
reg_exp (String, optional) – Regular expression used to split the line into tokens. Useful constants are defined in this class, including CSV_TOKENISER, defaults to WHITESPACE_TOKENISER. See notes below.
strip_char (String, optional) – Characters to strip after splitting, defaults to “”
- Returns
List of Token objects
- Return type
List
Notes: The reg_exp given to this function should be a regular expression that extracts the individual tokens from the line, and not a regular expression that identifies the characters to split by. Thus, the WHITESPACE_TOKENISER regex is simply S+, which matches any amount of anything that isn’t whitespace. The CSV_TOKENISER is more complex, as it deals with quotes and other issues that cause problems in CSV files. The regular expression can use groups, but the entire match of the regular expression should be the token - there is no capacity (currently at least) for extracting particular groups of the regular expression. Use can be made of look-ahead and look-behind expressions in the regex to constrain it so that the entire match covers just the token and nothing else. (For a good example of this see the SLASH_TOKENISER in the Nisida importer)
pepys_import.file.highlighter.support.test_utils module¶
- pepys_import.file.highlighter.support.test_utils.create_test_line_object(line_str)[source]¶
Create a Line object for the given string, with all of the various member variables set properly, so it can be passed to something expecting a line and work.
Used for tests, particularly of the REP line parser
pepys_import.file.highlighter.support.token module¶
- class pepys_import.file.highlighter.support.token.SubToken(span, text, line_start, chars)[source]¶
Bases:
object
Object representing a single token at a lower level than Token.
Usually there is a single SubToken object as a child of each Token object, but when tokens are combined (with the combine_tokens function) then there will be multiple SubToken children.
Each SubToken object keeps track of the span (start and end characters) of the SubToken, the text that is contained within the SubToken, the character index that the line starts at and a reference to the overall character array created by HighlightedFile.
- chars¶
- line_start¶
- span¶
- text¶
- class pepys_import.file.highlighter.support.token.Token(list_of_subtokens, hf_instance)[source]¶
Bases:
object
Object representing a single token extracted from a Line.
This is the main object that the user will interact with, running the record method to record that this token has been used for a specific purpose.
The children of this token are SubToken objects. Most of the time there will just be one SubToken object as a child of a Token object - however, when tokens are combined there can be multiple children.
- children¶
- highlighted_file¶
- record(tool: str, field: str, value: str, units: Optional[str] = None)[source]¶
Record the usage of this token for a specific purpose
- Parameters
tool – Name of the importer handling the import (eg. “NMEA Importer) Should be set to self.name when called from an importer
field – The field that the token is being interpreted as (eg. “speed”)
value – The parsed value of the token (eg. “5 knots”) - where possible, pass a Quantity object with associated units
units – The units that the field was interpreted as using (optional - do not include if the value was a Quantity as that holds unit information itself
This adds SingleUsage objects to each of the relevant characters in the character array stored by the SubToken objects that are children of this object.
- property text¶
Returns the entire text of the Line
- Returns
Entire text content of the Line
- Return type
String
- property text_space_separated¶
Returns the entire text of the Line, with spaces separating the different subtokens
- Returns
Entire text content of the Line
- Return type
String
pepys_import.file.highlighter.support.usages module¶
- class pepys_import.file.highlighter.support.usages.SingleUsage(tool_field, message)[source]¶
Bases:
object
Stores information on a single usage of a character.
Contains two fields: tool_field and message.
Objects created from this class are stored in the usages list on Char objects.
- message¶
- tool_field¶