What is the difference between "probability density function" and "probability distribution function"?
Distribution Function
- The probability distribution function / probability function has ambiguous definition. They may be referred to:
- Probability density function (PDF)
- Cumulative distribution function (CDF)
- or probability mass function (PMF) (statement from Wikipedia)
- But what confirm is:
- Discrete case: Probability Mass Function (PMF)
- Continuous case: Probability Density Function (PDF)
- Both cases: Cumulative distribution function (CDF)
- Probability at certain $x$ value, $P(X = x)$ can be directly obtained in:
- PMF for discrete case
- PDF for continuous case
- Probability for values less than $x$, $P(X < x)$ or Probability for values within a range from $a$ to $b$, $P(a < X < b)$ can be directly obtained in:
- CDF for both discrete / continuous case
- Distribution function is referred to CDF or Cumulative Frequency Function (see this)
In terms of Acquisition and Plot Generation Method
- Collected data appear as discrete when:
- The measurement of a subject is naturally discrete type, such as numbers resulted from dice rolled, count of people.
- The measurement is digitized machine data, which has no intermediate values between quantized levels due to sampling process.
- In later case, when resolution higher, the measurement is closer to analog/continuous signal/variable.
- Way of generate a PMF from discrete data:
- Plot a histogram of the data for all the $x$'s, the $y$-axis is the frequency or quantity at every $x$.
- Scale the $y$-axis by dividing with total number of data collected (data size) $\longrightarrow$ and this is called PMF.
- Way of generate a PDF from discrete / continuous data:
- Find a continuous equation that models the collected data, let say normal distribution equation.
- Calculate the parameters required in the equation from the collected data. For example, parameters for normal distribution equation are mean and standard deviation. Calculate them from collected data.
- Based on the parameters, plot the equation with continuous $x$-value $\longrightarrow$ that is called PDF.
- How to generate a CDF:
- In discrete case, CDF accumulates the $y$ values in PMF at each discrete $x$ and less than $x$. Repeat this for every $x$. The final plot is a monotonically increasing until $1$ in the last $x$ $\longrightarrow$ this is called discrete CDF.
- In continuous case, integrate PDF over $x$; the result is a continuous CDF.
Why PMF, PDF and CDF?
- PMF is preferred when
- Probability at every $x$ value is interest of study. This makes sense when studying a discrete data - such as we interest to probability of getting certain number from a dice roll.
- PDF is preferred when
- We wish to model a collected data with a continuous function, by using few parameters such as mean to speculate the population distribution.
- CDF is preferred when
- Cumulative probability in a range is point of interest.
- Especially in the case of continuous data, CDF much makes sense than PDF - e.g., probability of students' height less than $170$ cm (CDF) is much informative than the probability at exact $170$ cm (PDF).
The relation between the probability density funtion $f$ and the cumulative distribution function $F$ is...
if $f$ is discrete: $$ F(k) = \sum_{i \le k} f(i) $$
if $f$ is continuous: $$ F(x) = \int_{y \le x} f(y)\,dy $$