Computing MiniProject Rubric#

Note: This rubric is derived from the MulQuaBio MiniProject appendix.

The MiniProject asks students to answer the biological question “What mathematical models best fit an empirical dataset?” in a fully reproducible way. Students choose (or are given) an empirical dataset, fit and compare ≥2 alternative mathematical models (at least one nonlinear/mechanistic), and produce a LaTeX report. The project must be fully reproducible (runnable) end-to-end.

Summative marking rubric — total = 100 marks (Part A: 50 marks; Part B: 50 marks)


Part A — Computing & Workflow (50 marks)#

#

Criterion

Weight

What earns full marks

Typical reasons for lost marks

A1

Project organisation & README

10 marks

- MiniProject/ directory at the same level as Week*/ directories.
- Expected subdirectories (code/, data/, results/) present and correctly populated.
- results/ is empty in the repo (outputs generated on run).
- README states language versions, dependencies, and what each package is for.
- Sensible .gitignore; no large binary/output files committed.

• Missing or misnamed subdirectories. • results/ contains committed outputs. • README absent, sparse, or missing required content. • Large data/output files committed without justification.

A2

Single-script reproducibility

15 marks

- A single run script (run_MiniProject.py or run_MiniProject.sh) orchestrates the full pipeline: data preparation → model fitting → plotting → LaTeX compilation.
- Script completes without errors on a clean Linux environment.
- All expected outputs (PDF report, results CSVs) are produced.
- Pipeline runtime is reasonable for the submitted dataset and avoids obvious redundant recomputation.

• Run script absent or empty.
• Pipeline fails with errors (broken paths, missing packages, unhandled exceptions).
• LaTeX compilation step missing or failing.
• Environment-specific issues (hard-coded paths, OS-specific calls).
• Functionally correct pipeline but avoidably slow due to repeated full recomputation or unnecessary heavy I/O.

A3

Code quality & style

10 marks

- Code is readable: meaningful variable/function names, consistent style (PEP 8 for Python, tidyverse/Google style for R).
- Functions used to avoid repetition; scripts modularised by task (e.g. separate data prep, fitting, and plotting scripts).
- Helpful inline comments explaining logic; docstrings/headers on functions.
- Language choice justified and used appropriately.
- Basic efficiency-aware practices are used where appropriate (e.g. vectorization, avoiding unnecessary loops/reloads, sensible intermediate caching).

• Meaningless variable names or no comments.
• Monolithic scripts with excessive copy-paste.
• Inconsistent or no code style.
• Unjustified or inappropriate language choices.
• Clear avoidable inefficiencies that materially increase runtime or memory use.

A4

Model fitting & statistical analysis

10 marks

- ≥2 mathematical models fitted (at least one nonlinear/mechanistic model via NLLS or equivalent).
- Starting values estimated and documented; convergence failures handled with try/tryCatch.
- Model comparison uses appropriate metrics (AIC, BIC, R², etc.).
- Results exported to CSV for downstream analysis and plotting.
- Fitting workflow is computationally sensible (bounded iterations/tolerances where relevant, and model scope justified for the dataset size).

• Only trivial linear models fitted (no NLLS attempted).
• No model comparison metrics computed.
• Starting values absent or arbitrary with no justification.
• Convergence failures not handled.
• Excessive or unstable fitting strategy without justification that causes avoidable compute overhead.

A5

Version control & workflow discipline

5 marks

- Regular commits throughout development with descriptive messages.
- Git history shows iterative, incremental progress (not bulk end-of-project commits).
- No unnecessary or generated files committed.

• Generic or absent commit messages.
• Single or very few commits (dump of finished work).
• Generated outputs or data committed without justification.


Part B — Written Report (50 marks)#

The report must be written in LaTeX (article class, 11pt, 1.5-spaced, continuous line numbers, ≤3500 words excluding title page, references, and captions). It must include a separate Title page (title, author, affiliation, word count), Abstract, and sections: Introduction, Methods (with a Computing Tools sub-section), Results, and Discussion. References must use a non-numeric in-text citation format (e.g. apalike) compiled with BibTeX.

Key Principle: The narrative must flow coherently from title through discussion, with hypotheses/questions naturally emerging from biological context rather than appearing disconnected. Display items (4–6 figures/tables) should tell most of the story on their own.

#

Criterion

Weight

What earns full marks

Typical reasons for lost marks

B1

Report format & presentation

10 marks

- LaTeX article class at 11pt, 1.5-spaced with lineno continuous line numbers.
- Title page present with title, author, affiliation, and word count.
- Word count ≤3500 (excluding title page, references, captions).
- All figures/tables have informative captions and legends; vector graphics used where possible.
- References correctly cited in-text (non-numeric) and formatted via BibTeX.

• Missing or incorrectly configured LaTeX formatting.
• No word count or count not tracked.
• Figures low-resolution or unlabelled.
• Bibliography style numeric or missing.

B2

Introduction & objectives

10 marks

- Opens with sufficient biological context, with citations, that motivates the study topic.
- The narrative funnels logically from general context to specific focus, so that by the end, stated hypotheses/questions emerge naturally (not abruptly).
- Biological question(s) or hypotheses stated clearly and backed by logical/theoretical arguments (if presenting hypotheses, brief explanatory statements help).
- Biological objectives clearly distinguished from methodological ones.
- The chosen modelling approach is justified as appropriate for the biological question.

• Context too brief, too generic, or disconnected from study focus.
• Hypotheses stated without logical build-up or theoretical grounding.
• Introduction consists only of methodological aims (“we will fit X models”).
• No citations or poorly integrated citations.
• Hypotheses appear “out of the blue” rather than emerging naturally from narrative.

B3

Methods (including Computing Tools)**

10 marks

- Data and its provenance clearly described (source, units, how unique datasets/curves are identified).
- Model forms explicitly stated (with equations where relevant); fitting procedures described clearly and reproducibly (starting values/estimation, convergence criteria, comparison metrics).
- Computing Tools sub-section is mandatory: Explicitly states which languages (bash, Python, R) were used for each task, which packages/libraries were employed, and justifies why each was chosen (e.g., “Python/SciPy used for NLLS fitting because…”).
- Level of detail appropriate — does not recite code line-by-line, but sufficient for independent reproduction.

• Data provenance or description absent.
• Model forms or fitting approach not clearly described or reproduced.
Computing Tools sub-section missing, incomplete, or lacks justification for tool choices.
• Either vastly over-detailed (reciting code) or too vague to reproduce.

B4

Results & display items

10 marks

- Results presented clearly and in the same logical order as the objectives (Introduction→Results alignment).
- 4–6 well-designed figures/tables with captions explaining what is shown and conveying take-home messages.
- Model fits plotted over data; model comparison summary presented (AIC/BIC table or equivalent).
- No discussion of results in this section.

• Results not related back to stated objectives.
• Figures absent, poorly designed, or without meaningful captions.
• Model comparison not shown, or shown incompletely.
• Results section contains discussion or interpretation beyond factual reporting.

B5

Discussion, conclusions & abstract

10 marks

- Opens by reminding reader of original goals; key findings stated succinctly.
- Findings interpreted in biological context with additional citations beyond Introduction; implications discussed in wider scientific context.
- Mandatory: At least one substantive paragraph engaging with advanced statistical methods (MLE, Bayesian inference, machine learning): clearly explains what additional biological insight such methods would provide, even if not implemented. This demonstrates understanding of methodological scope.
- Caveats and limitations explicitly discussed; specific, concrete future directions suggested (not just “more work needed”).
- Concluding take-home messages stated clearly and distinctly.
- Abstract present (~200 words); self-contained with background, objectives, methods, key results, and main conclusions; specific about findings (not vague).

• Discussion fails to return to original objectives or biological context.
• No engagement with advanced methods; only describes work actually done.
• Caveats absent or superficial.
• Abstract vague or missing concrete findings (e.g., “this study shows model selection is important”).
• Conclusion absent or fails to deliver clear take-home message.


Mark classification#

Total mark

Classification

70–100

Distinction

60–69

Merit

50–59

Pass

< 50

Below Pass threshold

Provisional mark format (for assessor use):

Part A (Computing): XX/50
Part B (Report):    XX/50
Total Mark:         XX/100
Classification:     Distinction / Merit / Pass / Below Pass threshold

Engagement-level anchors#

Band

Typical profile

Strong Distinction (75–90)

Complete end-to-end reproducible workflow with no errors; NLLS correctly implemented with ≥2 models (including ≥1 mechanistic); appropriate model comparison metrics; well-crafted Introduction with natural narrative funnel to hypotheses; substantive, concrete Discussion engagement with advanced methods; well-structured LaTeX report showing original synthesis; professional display items (4–6 figures with effective visual communication); clean project organisation; excellent Git history.

Solid Distinction (70–74)

Complete or near-complete reproducible workflow; NLLS with ≥2 models and appropriate comparison; Introduction logically structured with clear hypotheses; Discussion explicitly engages advanced methods with concrete reasoning; all required sections present with good depth; clear Computing Tools justification; reasonable display items; solid organisation.

Solid Merit (62–69)

Working workflow (possibly minor issues); ≥2 models fitted with comparison metrics; Introduction covers biological context and hypotheses; Discussion acknowledges advanced methods; adequate report with all sections present; Computing Tools section included; reasonable display items; competent organisation.

Pass (50–61)

Partially working workflow; some model fitting and comparison attempted; report present with Introduction/Results/Discussion but lacking depth or narrative flow; limited advanced methods engagement; minimal display items; basic organisation; some Computing Tools documentation.

Below Pass (<50)

Workflow broken or absent; minimal model fitting; report missing, critically incomplete, or incoherent; no advanced methods engagement; poor project organisation; Computing workflow unclear.


Important Note: Ambition vs. Coherence Trade-off#

While extra credit is available for attempting more challenging models (multiple nonlinear/mechanistic models), choosing overly ambitious projects risks losing marks overall. Students who spend excessive time on complex model fitting and run out of time to write a coherent, well-structured report with clear narrative flow will score lower than those who tackle a simpler problem well. Coherence and completeness take priority over model complexity. Start with a tractable problem (e.g., two linear models), establish a working workflow end-to-end, then iteratively add model complexity.


Missing submissions policy#

Situation

Deduction

run_MiniProject.* absent or empty

Treat A2 as 0

MiniProject/ directory missing or misnamed

Up to −10 marks (A1)

LaTeX report absent

All B criteria scored 0

Required report section absent

Up to −3 marks per section (B2–B5)

Results committed to repo

−2 marks (A1)

Partial credit is always available where effort is clearly demonstrated.

Efficiency fairness note#

Computational efficiency is assessed proportionately and in context: correctness and reproducibility remain primary, and minor runtime differences are not heavily penalized. Efficiency judgements should be relative to project scope, dataset size, and model complexity.