STATISTICS

Statistics deals with all things about data. It starts with collecting the right set of data using proper sampling techniques, organizing the data in the appropriate format, visualizing the data with the help of charts, analyzing the data to generate valuable as well as actionable insights, and in the end presenting the information to end-users.

Types of Statistics:

  • Descriptive Statistics: The statistical method refers to summarizing and presenting the data. It’s what we all do on a regular basis. From calculating average marks of a class in Math's test to telling your store manager the total number of sanitizers sold last month, all tell a little more about data which aids in the further decision-making process.
  • Inferential Statistics: The statistical method refers to finding out trends, test hypothesis theories, make predictions about a larger set of data namely Population by studying a smaller set of data called a Sample. This involves complex calculations. For instance, by studying the trend and pattern in last year’s sanitizer sale, a business owner trying to predict the sale of sanitizers for the coming months.

All things Data:

  • Collecting Data: Collecting data is one of the crucial task in dealing with data. Either you can collect data from primary sources where you collect your own data or secondary sources where you work on data collected by someone else. You collect data by selecting a sample. There are two ways in which you can collect sample.
  1. Non-Probability Sampling: Here we select the items without knowing about their probability of selection. These include judgement sampling and convenience sampling.
  2. Probability Sampling: Here we select items for our sample based on their known probabilities. These include Simple Random Sampling, Systematic Sampling, Stratified Sampling and Cluster Sampling.
  • Defining Data: It seems a rather simple process to label different variables in our dataset into numerical and categorical but it’s a really tiresome process as well as an important one. Categorical variables either take the form of ‘Yes’ / ‘No’ (binary, dichotomous variables) or represent certain categories such as ‘Beginners’, ‘Intermediate’, or ‘Advance’. Numerical variables have a whole different story. They can either be discrete such as the ‘number of likes this post will get’ or continuous such as time taken to read this post. We can group data variables into different categories using scales of measurement.
  • Visualizing Data: One can visualize Categorical Variables in form of Pivot Tables, Bar Chart, Pie Chart, Pareto Chart, Side-by-Side Bar Chart and Numerical Variables in form of frequency distribution,histogram, polygon, boxplot, probability plot, scatter plot, time series plot.
  • Analyzing Data: We can run a descriptive summary on the data we collect. It includes calculating mean, median, mode, quartiles, variance, covariance. By doing this we will know about the data such as trends and pattern, variability in data. At this stage, one builds models and study how outliers behave. By doing this a deeper understanding of theory of data is reached with some statistical evidence.
  • Data Presentation: Without knowing the data properly, one can not present it in effective manner. In order to reach this stage, one has to go through all the above mentioned steps. There are various tools available in the market which helps in making attractive and effective visuals.

In the next post, I will cover the descriptive statistics aspects. Till then, be safe and healthy. I hope you enjoyed reading!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Pearl Seth

Pearl Seth

3 Followers

Data Science Enthusiast| Ex- Banker| Decent Photographer | Loves to Travel | Devoted Dog-Mom