Data Analysis with R. by

Начиная с 30 марта каждую среду Николай Павлов, Data Scientist @, проведёт в серию воркшопов Data Analysis with R.

Николай Павлов (Data Scientist @ (2)

Data Analysis with R. включает в себя 8 занятий длительностью по 2 часа, посвященных Data Science.


Каждую среду с 30 марта по 18 мая с 19:00 до 21:00.

План занятий

Introduction to data

  • R programming language
  • Observations and variables
  • Relationship between variables
  • Population and sample
  • Dependent and independent variables
  • Experimental design and sampling methods

Data exploration, visualization and cleaning

  • Data import, cleaning and manipulations
  • Scatter plot
  • Histogram, mean, variance and standard deviation
  • Box plots, quartiles, median and outliers
  • Data transformations
  • Categorical data, contingency tables and bar plot


  • Outcome, random process and Law of Large numbers
  • Disjoint/joint outcomes, addition rule
  • Independence
  • Conditional, marginal and joint probabilities
  • Multiplication Rule
  • Bayes theorem
  • Random variables, Expected Value, Variance
  • Probability distributions: PDF, CDF
  • Normal distribution
  • Geometric distribution
  • Binomial distribution

Statistical Inference

  • Point estimates
  • Confidence interval
  • Hypothesis testing
  • Type I, type II errors, power
  • Paired data, different of two means
  • T-distribution
  • Inference for categorical data

Regression analysis

  • Linear regression and least squares (LS)
  • Conditions for fitting regression line
  • Residuals analysis, R^2
  • Interpretation and inference
  • Multiple regression
  • Model selection
  • Logistic regression

Predictive Analytics

  • Machine learning and Supervised learning
  • Regression / Classification
  • Error functions
  • Linear model
  • Gradient descent, SGD, mini-batches
  • Decision Trees, Random Forest, Neural Networks, SVM
  • Bias-Variance tradeoff, regularization L1/L2
  • Cross-validation
  • Hyperparameters tuning

BigData, R and Apache Spark

  • Resilient Distributed Datasets (RDD)
  • Map-Reduce
  • SparkR, Data Frame operations
  • Machine Learning in Spark

Необходимо иметь при себе ноутбук с предустановленным языком R и IDE R-Studio