Organization and Documentation#
Cookie Cutter#
I use cookiecutter, a python package, to create a directory structure for each data science project.
cookiecutter https://github.com/macadology/cookie-cutter-compbio
IDE#
I use Atom.
It combines jupyter notebook interface (using Hydrogen) with a traditional IDE for scripting, creating a hybrid that is primarily scripting focus with elements that lets you document your code and export jupyter notebooks with ease. It also contains a ton of packages, allowing users to customize their IDE with extra functionality where necessary without the clutter of bloat.
One important feature is its ability to code remotely through two important packages. To enable direct edit of scripts on the cluster, I installed the Remote FTP package. To run code on the remote server, I connect Atom to a remote jupyter kernel on server.
Script vs Markdown vs Notebook#
Scripts are generally written to run a series of functions. To improve code readability, comment the script where appropriate. It is the cleanest way to write a software in that you can run the script as is. We can leverage linter and debuggers in IDEs to correct a script with ease. Generally, comments in scripts should be short and succinct to preserve code readibility. Long comments should be written in a separate document.
Markdown is a great language to write documentation quickly. The best thing about markdown is how it creates beautiful inline and blocks of code in the right font, syntax and highlight with little effort! However, we can't generally run markdown files themselves except for files written as R markdown.
Jupyter notebooks breaks up a script into multiple cells. The cells contain tiny blocks of code that is meant to run in order. In addition, the notebooks contain markdown cells that can be used to write extensive documentation in markdown language. It is essentially a combination of a script and its documentation. For tutorials on how perform a specific task, there is nothing better than a standalone jupyter notebook. However, it is inadvisable to integrate multiple code blocks from multiple notebooks (even though there are notebook extensions that allow such functionality. Instead, stick to writing packages with the proper docstring and importing them into a notebook instead of linking multiple notebooks. Note that a big disadvantage of .ipynb files is that they are .json files which cannot be run as is, unlike scripts. You would also need to initialize a jupyter notebook server before you can start working on a notebook.
Atom + Hydrogen lets us combine the functionality of jupyter kernels (the backend of jupyter) and scripts. When writing code in Atom, we can run segments of code easily (not unlike cells) and view their corresponding results without having to copy and paste code into a separate interactive shell. Atom is basically linking the script to a jupyter kernel / interactive shell and running commands on it. Additionally, we can add Watch panels to track changes to variables and figures in real time as the code is written. The final file is a script written in the native language and not a json .ipynb file.
While Atom does not render jupyter notebooks natively, it can export scripts to .ipynb file. By using # %%
and # %% md
, we can define cells within a script and export them to .ipynb files using Atom. While this is theoretically awesome, the inability of Atom to update changes in scripts to notebooks in real time makes it less useful than Markdown for documentation. Also, the script ends up with many lines of clutter from # %%
and # %% md
.
The last possibility for documenting code is to write Markdown files with code blocks, and generate ipynb using jupytext. The codes within the Markdown code blocks are automatically converted to executable code cells in the notebook. The advantage of this method is that we can check the formatting of the documentation in real time. Additionally, we can run codes within code blocks in Atom as well.
Documentation#
I use markdown, mkdocs, combined with github and readthedocs to create online documentation for common processes. This guide is created using this exact process.
To enable mkdocs locally on 127.0.0.1:8000, activate the conda environment for mkdocs and run mkdocs serve
.
conda activate markdown_env
mkdocs serve
If creating an mkdoc server on a remote server, run mkdocs
with --dev-addr
to create a port, and set up port forwarding locally.
# On a remote server
mkdocs serve --dev-addr '0.0.0.0:8829'
# Locally
ssh <username>@<server> -N -L localhost:8829:<node>:8829
# Access site on http://127.0.0.1:8828