RichardOnData
RichardOnData
  • Видео 128
  • Просмотров 924 602
Classification Metrics Explained | Sensitivity, Precision, AUROC, & More
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A
In this video, I go through the different types of binary classification metrics. These include: accuracy, prevalence, confusion matrices, sensitivity (aka recall or true positive rate), specificity (aka true negative rate), precision (aka positive predictive value), F1 score, and the areas under the precision-recall curve and the receiver operating characteristic curve, that is: AUPRC and AUROC. We close with how to implement these using the scikit-learn package in Python, going through a Jupyter notebook.
Code can be found here: github.com/RichardOnData/RUclips/blob/main/Python%20Notebooks/classification_metrics...
Просмотров: 435

Видео

SHAP Values: An Overview
Просмотров 6272 месяца назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video, I talk about SHAP values and how these can be used for explainable AI and explaining how features contribute to a machine learning's predictions for each observation. These are great tools when your goal isn't (only) prediction, but is also inference - that is, understanding the most important features ...
Is ChatGPT-4 Worth It?
Просмотров 9723 месяца назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A NOTE: Sorry about the bad audio quality on this one. I switched microphones when I upgraded phones recently, and thought during testing that it would be a lot better than it was here. Looking into a REAL microphone upgrade here. NOTE 2: I didn't talk about DALL-E on this one, which is another feature to GPT-4. The foc...
Follow THESE 5 Tips to Get a Data Job
Просмотров 9445 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video I'll break down some tips that I have to get data jobs. This is going to be broad and apply to all types of positions, whether those are data analyst, data science, or data engineering jobs! To summarize: 1) Have good education in a field like statistics, computer science, math, engineering, business, or...
Learn (and Do) Data Science FAST with ChatGPT
Просмотров 1 тыс.5 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video I show some ways I've used ChatGPT to both learn, and to data science faster. ChatGPT can be an excellent tool if you're responsible with it. It can provide great ideas to help get through creative roadblocks, as well as to generate great coding examples that you can turn around and use to learn. You DON...
The Data Job Market in 2024
Просмотров 8 тыс.6 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A My thoughts on the data job market in 2024. I looked at data scientist, data analyst, data engineer, and machine learning engineer jobs. In particular we talk about some broader trends in tech more recently, the recent tech layoffs, and what hiring and salaries are looking like for these positions. Crunchbase: news.cr...
No, AI (Probably) Won’t Take Your Data Job Soon
Просмотров 6096 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A NOTE: The beginning of this video is somewhat tongue in cheek. Certain things, you just have to let yourself have fun with. Some of the articles and videos I reference make very different points, specifically regarding the rise of data engineering and constructing end-to-end machine learning pipelines. Those are valid...
R or Python: Which Should You Learn in 2024?
Просмотров 6 тыс.6 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video we're revisiting the R vs Python comparison in the year 2024. How do they stand in recent job reports and in indices like PyPL or the TIOBE index?
Four Data Science Jobs: My Experiences
Просмотров 5597 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video I talk about every data science job I've had, how each job was dramatically different from the others, and how each one sort of led to the next.
10 Python Packages You Should Know (in 2024)
Просмотров 9087 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video I'm going to provide a recommended 10 packages that you should know and focus on to get strong at Python programming, in the context of data science. Recommended book "Python for Data Analysis": amzn.to/3cDXKcE 1. pandas pandas.pydata.org/Pandas_Cheat_Sheet.pdf 2. numpy images.datacamp.com/image/upload/v...
What Is Survival Analysis?
Просмотров 5107 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video I cover survival analysis. Specifically what it is, and why it's useful when the time until an event is important and when you have "censored" data. I talk about what censored data is and provide definitions of the survival and hazard functions. This is illustrated visually by showing a Kaplan-Meier curv...
How I Would Learn Data Science in 2024 (If I Had to Start Over)
Просмотров 2 тыс.7 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A ChatGPT: Bri Does AI: ruclips.net/video/MnDudvCyWpc/видео.html Ryan Scribner: ruclips.net/video/X9ksiScY7hM/видео.html Statistics: Duke: www.coursera.org/specializations/statistics John Hopkins: www.coursera.org/specializations/jhu-data-science University of Amsterdam: www.coursera.org/specializations/social-science S...
How to Setup Your Python Environment (With VSCode & Anaconda)
Просмотров 7 тыс.8 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A In this video, I walk you through how to set up your Python development environment. If you're a complete beginner, you'll probably be good with just Anaconda/JupyterLab/Jupyter Notebooks. If you're going to be a serious developer, you'll want to use Visual Studio Code and as a best practice set up virtual environment...
How I Passed the Google Cloud Professional ML Engineer Exam
Просмотров 9 тыс.8 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A 'Journey to Become a Google Cloud Machine Learning Engineer': amzn.to/3TjwmYT Exam guide: cloud.google.com/learn/certification/guides/machine-learning-engineer Github compilation: github.com/sathishvj/awesome-gcp-certifications/blob/master/professional-machine-learning-engineer.md Medium articles: towardsdatascience.c...
Update | Where I’ve Been
Просмотров 7578 месяцев назад
Subscribe to RichardOnData here: ruclips.net/channel/UCKPyg5gsnt6h0aA8EBw3i6A Hi everyone. It's been a while.
Tufte's Principles of Graphical Integrity
Просмотров 4,5 тыс.2 года назад
Tufte's Principles of Graphical Integrity
Why Is It SO HARD to Get a Data Science Job?
Просмотров 4,8 тыс.2 года назад
Why Is It SO HARD to Get a Data Science Job?
I Quit My Data Science Job. Here’s Why
Просмотров 7 тыс.2 года назад
I Quit My Data Science Job. Here’s Why
10 Good Coding Practices for Data Science
Просмотров 3,8 тыс.2 года назад
10 Good Coding Practices for Data Science
Data Science Advice for College Students
Просмотров 3,1 тыс.2 года назад
Data Science Advice for College Students
The TRUTH About Learning Data Science
Просмотров 4,2 тыс.2 года назад
The TRUTH About Learning Data Science
Is the Future of Data Work Remote?
Просмотров 1,5 тыс.2 года назад
Is the Future of Data Work Remote?
The State of Data Science in 2021 | Anaconda's Annual Report
Просмотров 2,5 тыс.3 года назад
The State of Data Science in 2021 | Anaconda's Annual Report
What Is Data Engineering?
Просмотров 2,4 тыс.3 года назад
What Is Data Engineering?
When Should You Use Regression Methods?
Просмотров 5 тыс.3 года назад
When Should You Use Regression Methods?
Tuning hyperparameters and stacking models with "tidymodels" | R Tutorial (2021)
Просмотров 2,3 тыс.3 года назад
Tuning hyperparameters and stacking models with "tidymodels" | R Tutorial (2021)
Evaluating ML Performance, Resampling, and Workflows in "tidymodels" | R Tutorial (2021)
Просмотров 1,9 тыс.3 года назад
Evaluating ML Performance, Resampling, and Workflows in "tidymodels" | R Tutorial (2021)
Intro to machine learning in R with "tidymodels" | R Tutorial (2021)
Просмотров 8 тыс.3 года назад
Intro to machine learning in R with "tidymodels" | R Tutorial (2021)
20 R Packages You Should Know
Просмотров 40 тыс.3 года назад
20 R Packages You Should Know
Creating ROC curves and ensembling models in R with "caret" | R Tutorial (2021)
Просмотров 4,4 тыс.3 года назад
Creating ROC curves and ensembling models in R with "caret" | R Tutorial (2021)

Комментарии

  • @TheRealDCoy
    @TheRealDCoy 8 дней назад

    Very helpful. As a social scientist (not working in text-as-data), R is straightforwardly more useful. I superficially learned Python first. Then, I learned R and found it more useful for just about everything I need to do. One consideration I'd add (unless you said it and I missed it) is that R users tend to use R Studio as the IDE, which makes things easier while getting started and remains useful (knitr, markdown) as you gain skill. With R Studio, you are able to see all the objects in memory, as well as storage type. You can open a dataframe as a spreadsheet or pop it out as a new window and look at it side-by-side with any section of your code. When I learned Python (I really only use R now, so I'm probably biased), the best we had was Jupyter notebooks. I don't know if people are using something better, but when I learned Python, I found it pretty frustrating to have to constantly print things to check on objects' attributes and contents. My understanding is R Studio runs Python code now, but I don't much evidence of people using it. Have Python's IDE options improved since I learned five years ago?

  • @CharlesMartel829
    @CharlesMartel829 9 дней назад

    Fiats are more common than tractors, but would you use a Fiat instead of a tractor as a farmer just because Fiats are more widely used?

  • @user-rw8db7tc2s
    @user-rw8db7tc2s 14 дней назад

    pliase repy

  • @user-rw8db7tc2s
    @user-rw8db7tc2s 14 дней назад

    i am indea contriy

  • @user-rw8db7tc2s
    @user-rw8db7tc2s 14 дней назад

    sir which your contriy 😇

  • @somayverma6001
    @somayverma6001 16 дней назад

    Is this certificate worth it?like does it give power to resume?

  • @Godsontechy
    @Godsontechy 20 дней назад

    I will be taking the GCP machine learning Certification exam in some weeks to come , will be back with some good news, Thank you so much for this video,

  • @Suchen_Wahrheit
    @Suchen_Wahrheit 23 дня назад

    I usually don't subscribe, to avoid unnecessary suggestions. But with current RUclips algorithm. It doesn't really matter whether I am subscribed or not. You should mention this as well. I doesn't hurt in anyway to subscribe to a channel. So hit the subscribe button😂😅 So why not.... I subscribed 😉👍

  • @DivyanshSrivastava-u8t
    @DivyanshSrivastava-u8t 29 дней назад

    When did you receive the certificate after passing. Also are the google's badges and certificates same?

  • @Letslearntogetheruzh7
    @Letslearntogetheruzh7 Месяц назад

    1:40

  • @dijanaostojic5077
    @dijanaostojic5077 Месяц назад

    Hi Richard, thanks so much for your videos; they've been incredibly helpful! I’m currently working with a highly imbalanced dataset-1% positive class and 99% negative class-and I'm interested in adjusting the classification threshold using tidymodels. I’ve read that direct threshold modification might not be supported yet. Is there any workaround for this, or any alternative methods you recommend for handling this kind of class imbalance? I’d appreciate any advice or resources you could share!

  • @EuTomcosta
    @EuTomcosta Месяц назад

    👏👏👏👏

  • @halecj1
    @halecj1 Месяц назад

    Both. I mainly do financial statistics. My main codebase is in R but I have Python helper files that can be loaded into an R script when needed, and then call those Python functions directly in the R script.

  • @nxronite9994
    @nxronite9994 Месяц назад

    Out of all of the realms of IT and CS, Data Analysis was the one that peaked my interest in school. I hope this is just a cycle that will come to pass because finally landing with something I like within the subfields in my major, only for it to turn out into a nightmare when it comes to career propects would feel so defeating.

  • @sams1078
    @sams1078 Месяц назад

    Thanks so much Richie! Loved your description of the exam!! Well done bro for an honest assessment..

  • @samruddhichaodhari3028
    @samruddhichaodhari3028 Месяц назад

    Thanks a lot for this video

  • @user-sb9oc3bm7u
    @user-sb9oc3bm7u Месяц назад

    lol Rust 2% 11:10 there is a great implementation of pytorch in R, called torch

  • @bernardsolomon79
    @bernardsolomon79 Месяц назад

    Re people eventually getting bored by machine learning: isn't it in part because the standard machine learning task of point prediction (which I understand is what 99% of times people mean by prediction) is actually quite limited as compared to scientific explanation and statistical inference? Finding a y_hat that minimises smth like MSE is a very limited way of doing 'science'. In contrast trying to infer something close to an approximate data generating process, or probabilistic causal model is fundamentally interesting if you care about economic or business or marketing problems (of medical/epidemiologic etc problems if you care about that stuff). And the thing is, from a decision science perspective the focus on prediction is also quite limiting. Decision science.-> you want probabilistic scenarios and causal effects. Data science would be more interesting if there was more probabilistic causal data science (there is, but it seems to me it's still a much smaller segment than standard predictive machine learning, and data analysis- and again, there's only so much you can learn about how a business or medical situation works from EDA). The thing, if you do more probabilistic causal model inference based on domain knowledge, is that really just data science? Isn't it more like applied empirical economics or sociology or epidemiology? Isn't data science potentially boring because the focus on just data is a limiting way to solve business problems?

  • @JSmithRecords
    @JSmithRecords Месяц назад

    Dont get into the typical data analyst, scientist, engineer etc. Get into DATABASE (developer, administration etc) Less competition and higher demand. Can't get data to analyze, etl, or make predictions without the database. Master database development and you'll never worry about a job again.

  • @Asmonix
    @Asmonix Месяц назад

    the bro is like: 🤌👁👄👁🤌

  • @蕭俊煒
    @蕭俊煒 Месяц назад

    it is useful than other channel saying that degree is useless or something like that for data science, I have seen that most of real world data science jobs require PhD

  • @ndz7372
    @ndz7372 Месяц назад

    Thanks!

  • @MindfulInsights4u
    @MindfulInsights4u Месяц назад

    Pls mic fix

  • @adamf5018
    @adamf5018 Месяц назад

    I am a PhD candidate in data analytics looking for job for eight months right now not even a single interview it’s very tough😫

  • @wb7779
    @wb7779 2 месяца назад

    That was a really good explanation. Short and powerful.

  • @matthewson8917
    @matthewson8917 2 месяца назад

    I used Python during my PhD and ended up shifting to R. Python's statistical packages are lackluster (maybe not surprisingly). I'm not a big fan of dot chains and pandas' index system, and the deal breaker was that it was so sluggish and busts memory so often with medium size data (20GB+) even with 60GB+ RAM machine. Tried Dask but it's pandas based and slow - with duckDB / polas I think dask project will be less popular. I picked up tidyverse and data.table from R, and it did the job without a problem, and I kinda regretted learning Python. R has fixest package that is really fast for high demensional fixed effects regressions, and python doesn't seem support large scale regressions very well.

  • @lorenzopeiyang6934
    @lorenzopeiyang6934 2 месяца назад

    R is more capable of doing amazing things better than python

  • @yoyo-ue5pf
    @yoyo-ue5pf 2 месяца назад

    I am AWS ML certified

  • @narayanasrikanthreddyg
    @narayanasrikanthreddyg 2 месяца назад

    Good to hear that you learnt R and then created the video. I can understand @6:41 - After learning c , c++, basic , cobol ie having a programming background. R really felt funny and weird because there are multiple ways you can do the samething. But later i fell in love with R . I have heard numpy and pandas are inspired from R datastructures. You have computer engineers backing up development and usage of python whereas bunch of academicians and statisticians for R. R initially looked like hotchpotch but after looking at numpy and pandas with basic python...... i just laugh at my judgements reversing over time. Python seems to be more in line with traditional expectation from OOPS syntax...i can go on ..... but both could have been more streamlined for the workflow of datascience

  • @chrishardy2909
    @chrishardy2909 2 месяца назад

    I can't believe FORTRAN is #12! I programmed my master's thesis project in 1995 in FORTRAN and I thought nobody used it anymore. As a statistician I'm guessing R is the way to go.

  • @1wuniverse675
    @1wuniverse675 2 месяца назад

    Nice honest and informative video. Thank you.

  • @BSTDeepaneeshRV
    @BSTDeepaneeshRV 2 месяца назад

    helpful video , thank you sir 🌟

  • @Sonntagssoziologe
    @Sonntagssoziologe 2 месяца назад

    It remains vague. What exactly can you do with R that is not possible with Python?

    • @narayanasrikanthreddyg
      @narayanasrikanthreddyg 2 месяца назад

      You mean to say python along with packages numpy , pandas scikit learn etc....

  • @JimjnrPrince
    @JimjnrPrince 2 месяца назад

    I like how you present the data/ideas 😂... Thanks for the information ❤

  • @djangoworldwide7925
    @djangoworldwide7925 2 месяца назад

    I dislike videos that make reproducibility challenging. You could demonstrate the exact same concepts using a simple data frame that can be found in seaborn (or any other imported package for that matter). Nice video otherwise

    • @RichardOnData
      @RichardOnData 2 месяца назад

      That' a totally fair point. If I'm understanding you correctly here, the issue basically being that this dataset requires an API key and a few steps overall to get ahold of. I do find these concepts easy to understand through the lens of a disease, but I totally see what you mean here. I have a video coming out soon on bagging vs boosting, and I'll use a dataset for that one that's simpler to get your hands on.

    • @djangoworldwide7925
      @djangoworldwide7925 2 месяца назад

      @@RichardOnData That's cool man. I appreciate you replying and enjoy your overall content (i'm subscribed for quite sometime now). To be clear, I didnt "disliked" as the "button dislike", but i mean in general that i dont like the idea that [...]. cheers!

  • @mugomuiruri2313
    @mugomuiruri2313 2 месяца назад

    good.mugo on data

    • @RichardOnData
      @RichardOnData 2 месяца назад

      Thanks for watching as always

  • @firstname4337
    @firstname4337 2 месяца назад

    how do you LEARN this stuff -- I mean really LEARN -- I took a course in biostatistics where we covered this and for every problem I had to keep referring back to a page where I had written all the formulas -- there was no way I could tell you the formula for specificity or sensitivity -- I understood the consequences and reasons for them (telling someone they have diabetes when they don't leads them to spending money on drugs they don't need) -- but as for applying the correct measure and formula to every scenario I was totally lost -- if we weren't allowed to use a page of formulas for the final I would have failed spectacularly

    • @RichardOnData
      @RichardOnData 2 месяца назад

      There's really no substitute for repetition and experience. Years ago I had to give multiple presentations for a sepsis prediction model and had to use a ton of these metrics and then answer questions. It went from always mixing them up, to being able to rattle them off in my sleep, but it did take a lot of time.

  • @silvertube52
    @silvertube52 2 месяца назад

    Thanks Richard, that was a good overview of classification metrics.

  • @mugomuiruri2313
    @mugomuiruri2313 2 месяца назад

    good.mugo on data

  • @dimitrioskioroglou4316
    @dimitrioskioroglou4316 2 месяца назад

    For me the greatest difference between the two languages is the mentality. R users are taught basic programming fundamentals and learn that for every solution there is a package they can use. Python users are taught programming first and how the language is used to create packages. So R users learn to use the language at a higher level, and when they go deeper then things get messy. Also in 2024 I wouldn't keep putting labels such as R for statistics and Python general purpose etc. This kind of labels is absolutely nonsense.

  • @matteolatinov6630
    @matteolatinov6630 2 месяца назад

    Nice one! This is a topic I need to start getting into. Would love more content on XAI!

  • @KN-tx7sd
    @KN-tx7sd 2 месяца назад

    Awesome, instead of Python, can this be done using R

  • @JackieReu
    @JackieReu 2 месяца назад

    Thanks for the great video! Should i reverse the remotesigned execution policy after finishing the install and setup, or does it have to stay on Y permanently?

  • @rccola362
    @rccola362 2 месяца назад

    This was awesome. Thanks. I’m gonna go see how I can apply these when presenting to stakeholders

  • @Daniel83021
    @Daniel83021 2 месяца назад

    You are awesome bro, thanks

  • @mart1484
    @mart1484 3 месяца назад

    Well damn 1yr left to learn

  • @asdnmr6858
    @asdnmr6858 3 месяца назад

    How many weeks or months did you take to study for the exam?

  • @Antowan
    @Antowan 3 месяца назад

    My university economics program uses R. I learned both for obvious reasons

  • @moviezone8130
    @moviezone8130 3 месяца назад

    Thanks for the great video. it was an awesome comparison. Are you practicing data science, I am looking for small role in data analysis with R programming software, do you have any advice. I have a masters degree in environmental science from Addis Ababa University. By the way are you on LinkedIn, would like to follow you. Thanks.

  • @moviezone8130
    @moviezone8130 3 месяца назад

    Hi Sir thanks for yet another great video. can you make a video on the most widely used ML tools. I have a background of chemistry and Environmental science on a masters level, I have started learning r through reading book and watching you tube videos. do you think I have a future on data science. I'm from Ethiopia.