Statistical Analysis in 5 steps: A step by step guide

Who can help me with statistical analysis? Worry no more, experts at are ready to heed to your call.

Statistical analysis refers to the ways through which different trends, patterns and relationships are established through quantitative data. Statistical analysis is used by different institutions to understand the existing trends, the patterns, and relationships. Some of the institutions that use them include research experts, scientists, governments, businesses, NGOs.

To attain meaningful conclusions, statistical analysis must include proper planning right from the beginning of the research process. The first process is understanding the hypothesis and the mentioning the hypotheses. A decision needs to be made concerning the research design, sample size and sampling procedure to be applied.

After data collection from the identified sample, you can opt to organize and summarize the data using descriptive statistics. The next step is using inferential statistics to formally test hypotheses and make estimates regarding the population. The last phase is interpretation and generalization of the findings.

This step by step guide is a real introduction for beginners or even middle level analysts who may include students and research personnel. Each step detailed herein has examples. The examples are experimental and correlational. The experimental ones establishes the likely cause-and-effect relationships between the variables. On the other hand, the correlational investigates the potential correlation between the different variables.

order statistical analysis

Step 1: Research hypotheses and research design

In this first step, as a researcher, you have to write your hypotheses and plan the research design. The specification of the two will help in collection of valid data.

Write the Statistical Hypotheses

Many often ask, “what is the goal of research?” The primary objective of any research is to establish a relationship between variables in the target population. You commence by prediction and then apply statistical analysis to test the prediction. Statistical hypothesis is a formal means of writing a prediction regarding a given population. Begin by rephrasing the research prediction into null and alternative hypotheses. The rephrasing into the hypotheses must be done in a way that can be easily tested using sample data.

The null hypothesis predicts existence of no effect or no relationship between variables. Alternative hypothesis states the research prediction of a relationship or an effect.

Example of Hypotheses: Statistical hypotheses to test an effect

Null hypothesis: Drinking 5 liters of water has no effect on hydration levels of teenagers

Alternative hypothesis: Drinking 5 liters of water improves hydration levels of teenagers

Example of Hypotheses: Statistical Hypotheses to test a correlation

Null hypothesis: Student’s comprehension level and their GPA scores have no relationship

Alternative hypothesis: Student’s comprehension level and the GPA scores are positively correlated

Plan the Research Design

The research design is the overall strategy used in data collection and analysis. It determines the statistical tests that one can apply to test their chosen hypothesis later.

It is at this point that you will have to decide if your research will use descriptive, correlational or experimental research design.

The experiments done directly impacts the variables, while descriptive and correlational studies helps in measuring variables.

Experimental Design Vs. Correlation Design Vs. Descriptive Design

  • Experimental Design- Helps you to establish the cause and effect relationship using statistical tests of comparison or regression (e.g The effect of drinking water and hydration levels).
  • Correlational Design- Through it, you can explore the relationships between variables and this is done without assuming existence of causality. It happens through correlation coefficients and significance tests (e.g student’s comprehension level and GPA scores).
  • Descriptive Design- Entails studying the features of the target population or phenomenon using statistical tests and then drawing inferences from the sample data (the prevalence of malaria in tropical regions)

The choice of the research design is also determined with whether you will need to compare participants at the group level or individual level or use both.

The between-subjects design– In this case a comparison of the outcomes is done at the group-level and includes those who were subjected to different treatments (students who drink 5litres of water vs. those who do not drink 5 liters of water).

Within-Subjects design– There is comparison of repeated measures from the different participants who have been in all the treatments of the study (the hydration levels of the students before and after drinking 5 liters of water).

Mixed (Factorial) Design– In this case, there is alteration of one variable between subjects and another is also altered within the subjects (there is establishing pretest and post test scores from the various students who did or did not take the 5 liters of water).

Decide on how to measure the Variables

When planning the research design, the next step should be about operationalization of the variables. At this point you are deciding how the variables will be measured.

In statistical analysis, it is necessary to consider the level of measurement for the variables. The consideration tells the type of data that is contained. The measurements can be two:

Categorical- This will represent the groupings. They may be nominal e.g gender or they may be ordinal like the class level.

Quantitative- This represents the amounts. They may be interval scale e.g (test score) or ratio scale (age).

The variables may be measured at varied levels of precision. For instance, age data may be quantitative (10 years old) or may be categorical (young). For instance, coding a variable numerically like level of agreement in scale of 1 to 5, it does not imply that it is automatically quantitative and not categorical.

Identification of the measurement levels is necessary when one is selecting the right statistics and hypothesis tests. For instance, you can calculate the mean score with quantitative data and not using categorical data. However, the same categorical data may be used to establish the modal score.

In any research study, along with the measures of the variables of interest, you will collect data on relevant participant features.

Step 2: Data Collection from the Decided Sample

statistical analysis

In several scenarios, it is often cumbersome to collect data from every member of the population that the study is interested in. Therefore, data is collected from the sample and as such it needs to be representative of the population.

Sampling for statistical analysis

Two primary approaches exist to be used in selection of the sample.

  • Probability Sampling- In this case, each member of the population stands a chance of being selected for the study and this is done through random selection.
  • Non-probability sampling- There are certain members of the population who stand a chance of being selected than others and this is because the criteria set may be convenience or voluntary.

In theory, for the findings that are generalized, you should consider probability sampling method. Using the random selection method reduces several forms of bias that may arise like the sampling bias. It also ensures that data from the sample selected is fully representing the population. The study may require that parametric tests be used to arrive at strong statistical inferences whenever data is collected using the probability sampling technique.

Therefore, it is never easy to arrive at an ideal sample. Even though the non-probability samples are more likely to be risked for biases such as self-selection bias, it is easier for them to recruit and collect data from them. The non-parametric tests are often very appropriate to be used in non-probability samples, though they often end up giving weaker inferences regarding the population.

If you need to use the parametric tests for non-parametric tests for the non-probability samples, then you must hold that:

  1. Your sample is representative of the population you are generalizing your findings to.
  2. The sample lacks systematic bias.

External validity means that you can generalize the conclusions to the others who share the characteristics of your sample. The results from the college students in U.S may not be generalized to all students globally because of the varied characteristics and demographics.

In case the parametric tests is applied to data from non-probability samples, it is necessary to elaborate on the extent to which the results may be generalized. The elaboration is done in the discussion section.

Come up with the right sampling procedure

In accordance to the resources that are available for the research, it is important to decide on the best way to recruit the participants.


Are there resources to market the study and have it done in several areas? Outside the nearest setting?

Are there means of getting diverse sample representing a broad population?

Is there time for contacting and following up with members of groups that are not easily reached?

VariableType of data
AgeQuantitative (ratio)
GenderCategorical (nominal)
Race or ethnicityCategorical (nominal)
Baseline test scoresQuantitative (interval)
Final test scoresQuantitative (interval)

Calculate sufficient sample size

Your sample size should be determined before you begin recruiting participants. You may do this by looking at existing studies that have been conducted in your field or by utilizing statistics. If you take too little of a sample, the results may not be reflective of the whole, but if you take too large of a sample, the costs will be higher than they need to be.

On the internet, you may find a lot of different sample size calculators. Different formulae are utilized based on whether or not subgroups are included in the study as well as the level of scrutiny that should be applied (e.g., in clinical research). As a general rule, there must be at least 30 units in each subgroup in order for it to be considered valid.

To use these calculators, you have to understand and input these key components:

  • Significance level (alpha): the percentage of false null hypotheses that you are ready to risk rejecting; in most cases, this value is fixed at 5%.
  • Statistical power: the likelihood that your research will find an impact of a particular magnitude if such an effect exists; this probability should be at least 80% in most cases.
  • Expected effect size: a standardized estimate of the size of the predicted outcome of your investigation, which is typically based on the results of previous studies that are comparable.
  • Population standard deviation: an estimate of the parameter for the population based on a prior research or on a pilot study that you conducted on your own.