milestone 4
worksheet
I306
Statistics for Informatics
Spring 2025

Published

April 25, 2025

Goal

For the final milestone, you’ll write a comprehensive report as if you’d put it on a manager’s desk. This report should flow well and be free of distractions. It should include the following sections plus an appendix of the R commands you used. (There should not be R commands in the body, just results!).

Choose a dataset that we haven’t analyzed so far in this class from the table published . A good choice will have at least 100 or so rows and more than a small handful (5 or so) of columns. Explore a few datasets by looking at their documentation (under the ‘Doc’) link. Then write a report according to the outline below. As always, submit both your .qmd and .html files.

Outline

  1. Disclaimer
  2. Introduction
  3. Obtaining the data
  4. Data dictionary
  5. Data description
    1. Numerical Description
    2. Visual Description
  6. Regression to predict (chosen output variable)
    1. Trial and error
    2. Best subsets
    3. Final regression model
  7. Regression diagnostics for the final model
  8. Conclusion
  9. Appendix

You can find plenty of resources on paper writing to tell you what should be in an introduction and a conclusion.

Note that you should only include the Disclaimer section if you used any LLMs to generate any of your material. It should be clear in the disclaimer which specific things were produced by the generative AI tool.

Tips to prepare the report

  • Use the title “Final Report”
  • Name the files m4.qmd and m4.html
  • Use today as the date, so it resolves to today’s date
  • Don’t keep reading the original file over and over again
  • Don’t keep assigning the same variable over and over again

Checking your code

  • Open a terminal and say grep -n '<-' m4.qmd to get the line numbers of all your assignments
  • Say grep -n '<-' m4.qmd | sort -k 2 -t to sort the output by variable so you can easily see if you assign the same variable twice
  • Use ls() in the R studio console to find out how many objects you have in the R environment

Use a consistent theme

Don’t just use the defaults. Pick a theme and stick with it. There are some built-in themes and some in packages you can load.

Following are some URLs where you can find examples of themes and code to generate them.

Use psychometrically proven palettes

  • RColorBrewer for discrete data
  • Viridis for continuous data

Create an appendix

```{r ref.label=knitr::all_labels()}
#| echo: true
#| eval: false
```

This is explained at https://github.com/quarto-dev/quarto-cli/discussions/6650

Suppress code display in the body

You can suppress the display of code globally in the front matter and then override that later.

---
title: "My Document"
execute:
    echo: false
---

You can see more explanation of execution options at https://quarto.org/docs/computations/execution-options.html

An alternative is to include the following chunk at the beginning of the document. Notice that this version also supresses system messages, which can also be done with the above option by adding message: false below the line that reads echo: false, being sure to use the same indentation.

```{r}
#| label: optionsSetup
#| include=FALSE
knitr::opts_chunk$set(
  message=FALSE,
  echo=FALSE
)
```