Literature survey

Steps to find a research question.


Set up Zotero

Collect relevant papers

Search for keywords on academic search engines.

Add to Zotero collection using the connector browser extension.


Abstracts

Abstracts follow a similar structure to the thesis itself:

  1. Field
  2. Research gap
  3. Novelty
  4. Evidence
  5. Conclusion
  6. keywords (optional)

To start with, choose 3 papers that you find most interesting or relevant to your problem. For each of them, assign each sentence to one of the above categories. It is possible that some sentences may fall into multiple categories while some none, use your judgement.

The aim of this exercise is to understand broadly which research questions have already been answered, and what field currently focusses on. It will also encourage you to search for the 'real' novelty as opposed to the claims made by the authors. Finally, it provides structure so you can compare various abstracts and quickly judge the ones relevant to you.

At the end, you should ideally have a table like :

title abstract field research gap novelty evidence conclusion keywords
.
.
.

You can copy and paste the above into excel, excel365, google sheets and other spreadsheet software.

Tips

  • Take a look at the rubric's section on originality of research to get an intuition of what constitutes original research.
  • While reading, keep in mind that it is possible that some papers you read may not be well written. Think about what is lacking and how it could be improved.
  • Think about how you would pose the research question for the paper you are reading.
  • One clear sign of a research gap is the phrase 'However, ...'.
  • Try to imagine the graphs/tables that would answer the research question in your opinion.
  • Look for concrete metrics to establish what is currently possible.
  • Examine the validity of the conclusions based on the data and methodology used.
  • Get a feel for benchmarks and associated metrics that are commonly used.
  • Survey papers are a good source of public datasets and code.

Examples


Evidence

In order to get an understanding of how the methods used to investigate a research question, we need to understand the evidence presented. This is usually in the Results section. For each of the papers above, record the evidence that was presented. You can additionally take screenshots of the tables/graphs and captions to create a visual impression of the paper.

At the end, you should ideally have a table like :

title models datasets metrics visualization evidence
Object Detection in ... ResNet ImageNet F1 (screenshots (screenshots
MobileNet Cifar100 Dice score of figures of tables/graphs
Image segmentation with ... VGG Cifar100 F1 from the paper) from the paper)
MobileNet Cifar10 Dice score
EfficientNet

You can copy and paste the above into excel, excel365, google sheets and other spreadsheet software.

Reflection

Once the table above is complete, reflect on the following:

Models:

  • Save the reference to the paper that introduced the model.
  • Is the code available? If so, save a link to the code.
  • Concretely, what are the inputs and outputs? (exact shapes of arrays/tensors)
  • What is the size of the model, i.e. number of trainable parameters?
  • What are the hyperparameters?

Datasets:

  • Is it a custom datasets or a well known benchmark?
  • What is the modality? Concretely, what are the inputs and what is the supervision?
  • Is the data/supervision synthetically generated?
  • Is it downloadable? If so, download it.
  • Can you train a naive baseline model supervised/unsupervised?
  • What does the data say about the validity of the evidence in general?

Metrics:

  • Is it a custom metric or a well known one used for this task?
  • Are the metrics bounded?
    • Accuracy is bounded by 1.0
    • mean square error unbounded
  • How close is the problem to being solved?
    • even if the metrics may seem arbitrary, whether it is 30, 70 or 99 out of 100 says something valuable, as it is assumed that the metrics are not completely random, but designed in a way to reflect progress on the defined task
  • What is a good baseline, what is state of the art?
    • similar to above, exact numbers are useful

Examples