Reporting is often seen as tedious and time-consuming. But if you want others to use what you have built, there is no way around it.
Let’s say you’ve trained some exciting new model and want to share it with the world. Just uploading it on Github without any documentation won’t do the trick. No one will understand what the model’s purpose is, what the limitations are, and so on. The same goes for datasets.
Here is a quick and distilled list of references for how to report on your machine learning “artifacts”:
Use Model Cards to report on your model. Used by HuggingFace and many others, they are already quite established. Model cards allow others to understand and re-use your model.
DataSheets for Datasets is a reporting approach for datasets. Datasheets are a checklist that guides you through all relevant questions, such as motivation to create the data, composition, collection process, and so on.
Use the REFORMS checklist if you report on your research based on machine learning. Typically when writing a paper. REFORMS is agnostic to the field of application, so it’s a rather broad checklist.
For more “special cases”:
Writing a paper on a clinical prediction model? Use the TRIPOD+AI checklist.
For writing a paper on medical imaging, you can use CLAIM, a checklist for artificial intelligence in medical imaging.
Are there any checklists or guidelines you have been using for your project?
Update on ML for Science book
Reporting will also be one of the chapters in our new book Supervised Machine Learning for Science. I’m working on this chapter while Timo is working on the Uncertainty chapter. When these two chapters are done, we have all the chapters that we want to include in the first edition. 🥳
The next step will be “fine-tuning”: Editing and proofreading. Making sure everything looks good. We plan to finish this book this autumn and have the paperback, epub, and PDF versions ready. But you can already check out the in-progress version for free online: ml-science-book.com