Literature survey

Steps to find a research question.

Set up Zotero

install program
set up connector
(optional) set up zotero account
start zotero and create collection for your field

Collect relevant papers

Search for keywords on academic search engines.

Add to Zotero collection using the connector browser extension.

Abstracts

Abstracts follow a similar structure to the thesis itself:

Field
Research gap
Novelty
Evidence
Conclusion
keywords (optional)

To start with, choose 3 papers that you find most interesting or relevant to your problem. For each of them, assign each sentence to one of the above categories. It is possible that some sentences may fall into multiple categories while some none, use your judgement.

The aim of this exercise is to understand broadly which research questions have already been answered, and what field currently focusses on. It will also encourage you to search for the 'real' novelty as opposed to the claims made by the authors. Finally, it provides structure so you can compare various abstracts and quickly judge the ones relevant to you.

At the end, you should ideally have a table like :

title	abstract	field	research gap	novelty	evidence	conclusion	keywords
.
.
.

You can copy and paste the above into excel, excel365, google sheets and other spreadsheet software.

Tips

Take a look at the rubric's section on originality of research to get an intuition of what constitutes original research.
While reading, keep in mind that it is possible that some papers you read may not be well written. Think about what is lacking and how it could be improved.
Think about how you would pose the research question for the paper you are reading.
One clear sign of a research gap is the phrase 'However, ...'.
Try to imagine the graphs/tables that would answer the research question in your opinion.
Look for concrete metrics to establish what is currently possible.
Examine the validity of the conclusions based on the data and methodology used.
Get a feel for benchmarks and associated metrics that are commonly used.
Survey papers are a good source of public datasets and code.

Examples

Computer vision

Evidence

In order to get an understanding of how the methods used to investigate a research question, we need to understand the evidence presented. This is usually in the Results section. For each of the papers above, record the evidence that was presented. You can additionally take screenshots of the tables/graphs and captions to create a visual impression of the paper.

At the end, you should ideally have a table like :

title	models	datasets	metrics	visualization	evidence
Object Detection in ...	ResNet	ImageNet	F1	(screenshots	(screenshots
	MobileNet	Cifar100	Dice score	of figures	of tables/graphs
Image segmentation with ...	VGG	Cifar100	F1	from the paper)	from the paper)
	MobileNet	Cifar10	Dice score
	EfficientNet

You can copy and paste the above into excel, excel365, google sheets and other spreadsheet software.

Reflection

Once the table above is complete, reflect on the following:

Models:

Save the reference to the paper that introduced the model.
Is the code available? If so, save a link to the code.
Concretely, what are the inputs and outputs? (exact shapes of arrays/tensors)
What is the size of the model, i.e. number of trainable parameters?
What are the hyperparameters?

Datasets:

Is it a custom datasets or a well known benchmark?
What is the modality? Concretely, what are the inputs and what is the supervision?
Is the data/supervision synthetically generated?
Is it downloadable? If so, download it.
Can you train a naive baseline model supervised/unsupervised?
What does the data say about the validity of the evidence in general?

Metrics:

Is it a custom metric or a well known one used for this task?
Are the metrics bounded?
- Accuracy is bounded by 1.0
- mean square error unbounded
How close is the problem to being solved?
- even if the metrics may seem arbitrary, whether it is 30, 70 or 99 out of 100 says something valuable, as it is assumed that the metrics are not completely random, but designed in a way to reflect progress on the defined task
What is a good baseline, what is state of the art?
- similar to above, exact numbers are useful

Examples

Computer vision