Checking empty descriptions¶
In this example, we use fastobo
to create a small validation script which will report empty definitions in an OBO file. We also use requests
in order to connect to the OBO library.
[1]:
import fastobo
import requests
fastobo.load
takes a file-handle, which can be accessed using the raw
property of the Response
object returned by requests.get
:
[2]:
res = requests.get("http://purl.obolibrary.org/obo/ms.obo", stream=True)
doc = fastobo.load(res.raw)
Header¶
Now, we can check the header for empty descriptions in definition clauses:
[3]:
for clause in doc.header:
if isinstance(clause, fastobo.header.SynonymTypedefClause) and not clause.description:
print("Empty description in definition of", clause.typedef)
elif isinstance(clause, fastobo.header.SubsetdefClause) and not clause.description:
print("Empty description in definition of", clause.subset)
Note that we are using isinstance
a lot compared to what you may be used to in other Python library: this is because fastobo
is based on a Rust library which is strongly-typed, so that is reflected in the Python library that wraps it. We could use the strong typing to write the same snippet using type-specific callback wrapped in a dict
:
[4]:
def check_synonym_typedef(clause):
if not clause.description:
print("Empty description in definition of", clause.typedef, "in header")
def check_subsetdef(clause):
if not clause.description:
print("Empty description in definition of", clause.subset, "in header")
CALLBACKS = {
fastobo.header.SynonymTypedefClause: check_synonym_typedef,
fastobo.header.SynonymTypedefClause: check_subsetdef,
}
for clause in doc.header:
callback = CALLBACKS.get(type(clause))
if callback is not None:
callback(clause)
Such a construct can be used to process all possible clauses while reducing the number of if
/elif
branches, in particular when many different clauses are processed at the same time.
Entities¶
Checking for definitions in entity frames is straightforward: all definition clauses have a definition
property that returns the textual definition of the entity. We can use duck-typing here to check for empty definitions:
[5]:
for frame in doc:
for clause in frame:
try:
if not clause.definition:
print("Empty definition of", frame.id)
except AttributeError:
pass