{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Checking empty descriptions" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this example, we use `fastobo` to create a small validation script which will report empty definitions in an OBO file. We also use `requests` in order to connect to the OBO library." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import fastobo\n", "import requests" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "`fastobo.load` takes a file-handle, which can be accessed using the `raw` property of the `Response` object returned by `requests.get`:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "res = requests.get(\"http://purl.obolibrary.org/obo/ms.obo\", stream=True)\n", "doc = fastobo.load(res.raw)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Header\n", "\n", "Now, we can check the header for empty descriptions in definition clauses: " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "for clause in doc.header:\n", " if isinstance(clause, fastobo.header.SynonymTypedefClause) and not clause.description:\n", " print(\"Empty description in definition of\", clause.typedef)\n", " elif isinstance(clause, fastobo.header.SubsetdefClause) and not clause.description:\n", " print(\"Empty description in definition of\", clause.subset)\n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Note that we are using `isinstance` a lot compared to what you may be used to in other Python library: this is because `fastobo` is based on a Rust library which is strongly-typed, so that is reflected in the Python library that wraps it. We could use the strong typing to write the same snippet using type-specific callback wrapped in a `dict`:" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "def check_synonym_typedef(clause):\n", " if not clause.description:\n", " print(\"Empty description in definition of\", clause.typedef, \"in header\")\n", "\n", "def check_subsetdef(clause):\n", " if not clause.description:\n", " print(\"Empty description in definition of\", clause.subset, \"in header\")\n", " \n", "CALLBACKS = {\n", " fastobo.header.SynonymTypedefClause: check_synonym_typedef,\n", " fastobo.header.SynonymTypedefClause: check_subsetdef,\n", "}\n", "\n", "for clause in doc.header:\n", " callback = CALLBACKS.get(type(clause))\n", " if callback is not None:\n", " callback(clause)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Such a construct can be used to process all possible clauses while reducing the number of `if`/`elif` branches, in particular when many different clauses are processed at the same time." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Entities" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Checking for definitions in entity frames is straightforward: all definition clauses have a `definition` property that returns the textual definition of the entity. We can use duck-typing here to check for empty definitions:" ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "for frame in doc:\n", " for clause in frame:\n", " try:\n", " if not clause.definition:\n", " print(\"Empty definition of\", frame.id)\n", " except AttributeError:\n", " pass" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.1" } }, "nbformat": 4, "nbformat_minor": 4 }