My Data Science & Programming Skills & Tools

Here is a selection of my Data Science and Programming skills and tools that turned out to be helpful in my work and I believe are important for any Data Scientist as well.

The last update of this post was done: 2017-07-21.

Data Science Skills

  1. Data Manipulation
    • Efficient data manipulation in R;
      dplyr, data.table, reshape2.
    • Working with dates and time-series;
      lubridate, xts.
  2. Reproducible Analyses
    • RStudio IDE, Markdown, LaTeX;
      RMarkdown, knitr.
  3. Databases and Data Formats
    • MySQL, MongoDB, JSON, XML, Excel, CSV, shapefiles, Linked Open Data / SPARQL, etc.
      mongolite, jsonlite.
  4. Data Acquisition
    • parallel web-scrapping; working with REST API;
      httr, RCurl, rvest, RSelenium.
  5. Text-mining
    • regular expressions; stemming with dictionary method;
      stringr, tm, genderizeR.
  6. Qualitative Analyses
    • RQDA.
  7. On-Line Surveys (CAWI)
    • advanced on-line surveys in LimeSurvey;
      LimeRick.
  8. Networks Analyses
    • visualizing and analyzing networks in Gelphi;
    • programming dynamic networks simulations in NetLogo;
  9. Data Visualization
    • static and interactive data visualization;
      ggplot2, ggvis, waffle.
    • dashboards and data reporting via WWW apps;
      Shiny, shinydashboard, shinyjs.
  10. Communicating Results
    • reports and presentations in Tableau, PowerPoint, and HTML5;
      Rmarkdown, isoslides.
  11. Statistical Analysis & Machine Learning
    • multiple linear regression, logistic regression, ordinal logistic regression, Principal Component Analysis (PCA), factor analysis, segmentation analysis, cluster analysis (k-means, hierarchical clustering), supervised / unsupervised learning,
    • neural networks, Support Vector Machines (SVM), decision trees, Random Forests, Ridge Regression, LASSO, elastic net regularization, K-fold cross-validation, bootstraping with caret;
    • survival analysis with survival;
    • time series analysis (ARIMA, TBATS) with forecast.
  12. Big Data Tools
    • familiarity with Spark and sparklyr in R;
    • familiarity with MapReduce, Hadoop, Pig.
  13. Other Data Science Tools
    • familiarity with SPSS, SAS, Octave, RapidMiner, Excel VisualBasic, R Revolution Enterprise / Microsoft R.

Programming Skills

  1. Programming in R
    • code versioning with RStudio IDE, Git, GitHub;
    • functional programming with purrr;
    • reactive programming with Shiny, shinyjs;
    • parallel programming with parallel;
  2. R package development
    • package distribution with GitHub, CRAN;
    • preparing package documentation with roxygen2;
    • unit testing with testthat;
  3. Internet technologies
    • HTML5, CSS3, JavaScript;
    • familiarity with MongoDB, MEAN Stack, Bootstrap, and Linux servers;
  4. Other programming languages
    • familiarity with Python, C++, Java for Android & Android Studio.