Python plac.annotations() Examples
The following are 3
code examples of plac.annotations().
You can vote up the ones you like or vote down the ones you don't like,
and go to the original project or source file by following the links above each example.
You may also want to check out all available functions/classes of the module
plac
, or try the search function
.
Example #1
Source File: simple_training.py From Blackstone with Apache License 2.0 | 6 votes |
def trim_entity_spans(data: list) -> list: invalid_span_tokens = re.compile(r'\s') cleaned_data = [] for text, annotations in data: entities = annotations['entities'] valid_entities = [] for start, end, label in entities: valid_start = start valid_end = end while valid_start < len(text) and invalid_span_tokens.match( text[valid_start]): valid_start += 1 while valid_end > 1 and invalid_span_tokens.match( text[valid_end - 1]): valid_end -= 1 valid_entities.append([valid_start, valid_end, label]) cleaned_data.append([text, {'entities': valid_entities}]) return cleaned_data # training data
Example #2
Source File: custom_train.py From pyresparser with GNU General Public License v3.0 | 5 votes |
def trim_entity_spans(data: list) -> list: """Removes leading and trailing white spaces from entity spans. Args: data (list): The data to be cleaned in spaCy JSON format. Returns: list: The cleaned data. """ invalid_span_tokens = re.compile(r'\s') cleaned_data = [] for text, annotations in data: entities = annotations['entities'] valid_entities = [] for start, end, label in entities: valid_start = start valid_end = end while valid_start < len(text) and invalid_span_tokens.match( text[valid_start]): valid_start += 1 while valid_end > 1 and invalid_span_tokens.match( text[valid_end - 1]): valid_end -= 1 valid_entities.append([valid_start, valid_end, label]) cleaned_data.append([text, {'entities': valid_entities}]) return cleaned_data
Example #3
Source File: train_ner.py From Blackstone with Apache License 2.0 | 5 votes |
def trim_entity_spans(data: list) -> list: """ The training data is derived from sources that have a fair bit of errant whitespace. This function takes a list of annotations and trims naughty bits of whitespace from the entity spans. Better safe than sorry.""" invalid_span_tokens = re.compile(r"\s") cleaned_data = [] for text, annotations in data: entities = annotations["entities"] valid_entities = [] for start, end, label in entities: valid_start = start valid_end = end while valid_start < len(text) and invalid_span_tokens.match( text[valid_start] ): valid_start += 1 while valid_end > 1 and invalid_span_tokens.match(text[valid_end - 1]): valid_end -= 1 valid_entities.append([valid_start, valid_end, label]) cleaned_data.append([text, {"entities": valid_entities}]) return cleaned_data