What is Inferential Statistics?
What is Inferential Statistics?
Sometimes we need to learn something about a large group (of people, things, etc.) but only have access to a small group within it. Inferential Statistics is the branch of Statistics that tells us how we can say something about the large group based on what we learn from the smaller one. The larger group is called the population, and the smaller group is called a sample.
We want to know what the expected reliability of a specific component is. To know this with absolute certainty, we would have to test each and every one of said components in existence (the population). Since this is impossible (it would take too long and be too expensive) we can only test ten identical components (the sample). Using the information obtained from our sample, the methods provided by Inferential Statistics can help us estimate the reliability of the component.
Since the group we actually study is not the same as the one we are interested in, the conclusions we reach about it are generalizations and are subject to uncertainty. The way we deal with uncertainty is by using probability. That is, the population is assumed to have a distribution $f(x)$ and the elements contained in the sample are assumed to be random variables drawn from that distribution.
We can state the goal of inferential statistics more formally as, given a population with density $f(x)$, we want to make probability statements about $f(x)$ based on sample observations.
Before moving on, there are few things you need to know about populations and samples. The first is the difference between target and sampled populations.
DEFINITION: the target population is “the totality of elements which are under discussion and about which information is desired.” (Mood and Graybill, 1963)
All voters in an election
All parts manufactured in a plant
All CubeSats launched in the last year
DEFINITION: the sampled population is the population from which the sample observations are taken.
In a phone-based poll, all voters who have a phone (not the same as all voters)
All parts manufactured on a certain day (not the same as all the parts manufactured)
All CubeSats launched in the U.S. in the last year (not the same as all CubeSats launched last year)
INTERACTIVE CONTENT: for the following scenarios, select which is the target population
BEWARE! You should always be careful that the sampled population is in fact the same as the target population. For example, suppose you are interested in finding the failure rate of all CubeSat missions in the world. Your target population, then is every single CubeSat mission, regardless of place of origin. Suppose you look at data from CubeSat missions originating in the U.S. only. In that case, the population you sampled from is not the same as the one you are targeting! Your conclusions will not be valid for all CubeSats in the world but only for those built in the U.S.!
Concerning samples, the requirement is that the sample you collect should be what is called a random sample.
DEFINITION: a random sample is a collection of random variables $X_1,…,X_n$, with the following properties:
- The random variables are independent
- All have the same probability distribution
This is also referred to as the variables being independent and identically distributed (iid)
NOTE: Though random samples are rarely encountered in practice, if the sample size is small compared to the population size (around 5% according to Devore, 2015) then the sample can be considered random for practical purposes.