Видео 292
Просмотров 1 603 322

Developing an LLM: Building, Training, Finetuning

58:46

Managing Sources of Randomness When Training Deep Neural Networks

23:11

Insights from Finetuning LLMs with Low-Rank Adaptation

13:49

Finetuning Open-Source LLMs

20:05

Scaling PyTorch Model Training With Minimal Code Changes

15:25

L13.5 What's The Difference Between Cross-Correlation And Convolution?

10:38

Understanding PyTorch Buffers

Sebastian's books: sebastianraschka.com/books/
This video explains what PyTorch buffers are, a concept that is particularly useful when dealing with GPU computations and implement large models like LLMs.
Code notebook: github.com/rasbt/LLMs-from-scratch/blob/main/ch03/03_understanding-buffers/understanding-buffers.ipynb
GitHub discussion about "triu" in the forward pass: github.com/rasbt/LLMs-from-scratch/discussions/282
Link to the Studio GPU environment to follow along: lightning.ai/seraschka/studios/understanding-pytorch-buffers?section=recent
---
To support this channel, please consider purchasing a copy of my books: sebastianraschka.com/books/
---
rasbt
linkedin.com/in/sebastianra...

Видео

Developing an LLM: Building, Training, Finetuning

58:46

Developing an LLM: Building, Training, Finetuning

Просмотров 35 тыс.2 месяца назад

REFERENCES: 1. Build an LLM from Scratch book: mng.bz/M96o 2. Build an LLM from Scratch repo: github.com/rasbt/LLMs-from-scratch 3. Slides: sebastianraschka.com/pdf/slides/2024-build-llms.pdf 4. LitGPT: github.com/Lightning-AI/litgpt 5. TinyLlama pretraining: lightning.ai/lightning-ai/studios/pretrain-llms-tinyllama-1-1b DESCRIPTION: This video provides an overview of the three stages of develo...

Managing Sources of Randomness When Training Deep Neural Networks

23:11

Managing Sources of Randomness When Training Deep Neural Networks

Просмотров 2,2 тыс.4 месяца назад

Sebastian's books: sebastianraschka.com/books/ REFERENCES: 1. Link to the code on GitHub: github.com/rasbt/MachineLearning-QandAI-book/tree/main/supplementary/q10-random-sources 2. Link to the book mentioned at the end of the video: nostarch.com/machine-learning-q-and-ai DESCRIPTION: In this video, we managing common sources of randomness when training deep neural networks. We cover sources of ...

Insights from Finetuning LLMs with Low-Rank Adaptation

13:49

Insights from Finetuning LLMs with Low-Rank Adaptation

Просмотров 5 тыс.8 месяцев назад

Sebastian's books: sebastianraschka.com/books/ Links: - LoRA: Low-Rank Adaptation of Large Language Models, arxiv.org/abs/2106.09685 - LitGPT: github.com/Lightning-AI/lit-gpt - LitGPT LoRA Tutorial: github.com/Lightning-AI/lit-gpt/blob/main/tutorials/finetune_lora.md Low-rank adaptation (LoRA) stands as one of the most popular and effective methods for efficiently training custom Large Language...

20:05

Finetuning Open-Source LLMs

Просмотров 31 тыс.10 месяцев назад

Sebastian's books: sebastianraschka.com/books/ This video offers a quick dive into the world of finetuning Large Language Models (LLMs). This video covers - common usage scenarios for pretrained LLMs - parameter-efficient finetuning - a hands-on guide to using the 'lit-GPT' open-source repository for LLM finetuning #FineTuning #LargeLanguageModels #LLMs #OpenAI #DeepLearning Useful links to res...

Scaling PyTorch Model Training With Minimal Code Changes

15:25

Scaling PyTorch Model Training With Minimal Code Changes

Просмотров 2,7 тыс.Год назад

Sebastian's books: sebastianraschka.com/books/ Code examples: github.com/rasbt/cvpr2023 In this short tutorial, I will show you how to accelerate the training of LLMs and Vision Transformers with minimal code changes using open-source libraries. To support this channel, please consider purchasing a copy of my books: sebastianraschka.com/books/ x.com/rasbt linkedin.com/in/sebastianraschka/ magaz...

L13.5 What's The Difference Between Cross-Correlation And Convolution?

10:38

L13.5 What's The Difference Between Cross-Correlation And Convolution?

Просмотров 7 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ This replaces a previous video where the video & audio were out of sync. Slides: sebastianraschka.com/pdf/lecture-notes/stat453ss21/L13_intro-cnn slides.pdf Code: github.com/rasbt/stat453-deep-learning-ss21/blob/main/L13/code/notes/cross-correlation.ipynb

Conditional Ordinal Regression for Neural Networks (CORN) With Examples in PyTorch

28:33

Conditional Ordinal Regression for Neural Networks (CORN) With Examples in PyTorch

Просмотров 4,7 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ Using deep neural networks for prediction problems where the labels have a natural order. Link to the code and slides: github.com/rasbt/scipy2022-talk

56:58

The Three Elements of PyTorch

Просмотров 5 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ Code: github.com/rasbt/machine-learning-notes/blob/main/demos/basic-pytorch-cnn-for-3-ele-pytorch-video.ipynb Slides: sebastianraschka.com/pdf/slides/2022-05_three-elements-pytorch.pdf 00:00 Three elements of PyTorch 02:10 (1) Tensor library 05:56 (2) Automatic differentiation engine 13:32 (3) Deep learning library 14:27 PyTorch in 3 Steps 15:17 St...

Ratings and Rankings -- Using Deep Learning When Class Labels Have A Natural Order

14:59

Ratings and Rankings -- Using Deep Learning When Class Labels Have A Natural Order

Просмотров 1,1 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ Deep learning offers state-of-the-art results for classifying images and text. Common deep learning architectures and training procedures focus on predicting unordered categories, such as recognizing a positive and negative sentiment from written text or indicating whether images contain cats, dogs, or airplanes. However, in many real-world problem...

13.4.5 Sequential Feature Selection -- Code Examples (L13: Feature Selection)

23:36

13.4.5 Sequential Feature Selection -- Code Examples (L13: Feature Selection)

Просмотров 11 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ This final video in the "Feature Selection" series shows you how to use Sequential Feature Selection in Python using both mlxtend and scikit-learn. Jupyter notebook: github.com/rasbt/stat451-machine-learning-fs21/blob/main/13-feature-selection/08_sequential-feature-selection.ipynb Timestamps: 00:00 Dataset setup and KNN baseline 04:08 Selecting the...

13.4.4 Sequential Feature Selection (L13: Feature Selection)

30:00

13.4.4 Sequential Feature Selection (L13: Feature Selection)

Просмотров 11 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ This video explains how sequential feature selection works. Sequential feature selection is a wrapper method for feature selection that uses the performance (e.g., accuracy) of a classifier to select good feature subsets in an iterative fashion. You can think of sequential feature selection method as an efficient approximation to an exhaustive feat...

13.4.3 Feature Permutation Importance Code Examples (L13: Feature Selection)

27:38

13.4.3 Feature Permutation Importance Code Examples (L13: Feature Selection)

Просмотров 9 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ This video shows code examples for computing permutation importance in mlxtend and scikit-learn. Permutation importance is a model-agnostic, versatile way for computing the importance of features based on a machine learning classifier or regression model. Code notebooks: Wine data example: github.com/rasbt/stat451-machine- learning-fs21/blob/main/1...

13.4.2 Feature Permutation Importance (L13: Feature Selection)

16:56

13.4.2 Feature Permutation Importance (L13: Feature Selection)

Просмотров 13 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ This video introduces permutation importance, which is a model-agnostic, versatile way for computing the importance of features based on a machine learning classifier or regression model. Slides: sebastianraschka.com/pdf/lecture-notes/stat451fs21/13_feat-sele slides.pdf Random forest importance video: ruclips.net/video/ycyCtxZ0a9w/видео.html This v...

13.4.1 Recursive Feature Elimination (L13: Feature Selection)

28:52

13.4.1 Recursive Feature Elimination (L13: Feature Selection)

Просмотров 11 тыс.2 года назад

Sebastian's books: sebastianraschka.com/books/ In this video, we start our discussion of wrapper methods for feature selection. In particular, we cover Recursive Feature Elimination (RFE) and see how we can use it in scikit-learn to select features based on linear model coefficients. Slides: sebastianraschka.com/pdf/lecture-notes/stat451fs21/13_feat-sele slides.pdf Code: github.com/rasbt/stat45...

13.3.2 Decision Trees & Random Forest Feature Importance (L13: Feature Selection)

39:43

13.3.2 Decision Trees & Random Forest Feature Importance (L13: Feature Selection)

Просмотров 13 тыс.2 года назад

13.3.2 Decision Trees & Random Forest Feature Importance (L13: Feature Selection)

13.3.1 L1-regularized Logistic Regression as Embedded Feature Selection (L13: Feature Selection)

23:33

13.3.1 L1-regularized Logistic Regression as Embedded Feature Selection (L13: Feature Selection)

Просмотров 4,8 тыс.2 года назад

13.3.1 L1-regularized Logistic Regression as Embedded Feature Selection (L13: Feature Selection)

13.2 Filter Methods for Feature Selection -- Variance Threshold (L13: Feature Selection)

19:53

13.2 Filter Methods for Feature Selection -- Variance Threshold (L13: Feature Selection)

Просмотров 6 тыс.2 года назад

13.2 Filter Methods for Feature Selection Variance Threshold (L13: Feature Selection)

13.1 The Different Categories of Feature Selection (L13: Feature Selection)

11:39

13.1 The Different Categories of Feature Selection (L13: Feature Selection)

Просмотров 5 тыс.2 года назад

13.1 The Different Categories of Feature Selection (L13: Feature Selection)

13.0 Introduction to Feature Selection (L13: Feature Selection)

16:10

13.0 Introduction to Feature Selection (L13: Feature Selection)

Просмотров 5 тыс.2 года назад

13.0 Introduction to Feature Selection (L13: Feature Selection)

Introduction to Generative Adversarial Networks (Tutorial Recording at ISSDL 2021)

1:28:26

Introduction to Generative Adversarial Networks (Tutorial Recording at ISSDL 2021)

Просмотров 2,2 тыс.2 года назад

Introduction to Generative Adversarial Networks (Tutorial Recording at ISSDL 2021)

Designing Generative Adversarial Networks for Privacy-enhanced Face Recognition (Conference rec.)

34:39

Designing Generative Adversarial Networks for Privacy-enhanced Face Recognition (Conference rec.)

Просмотров 5992 года назад

Designing Generative Adversarial Networks for Privacy-enhanced Face Recognition (Conference rec.)

L19.5.2.2 GPT-v1: Generative Pre-Trained Transformer

9:54

L19.5.2.2 GPT-v1: Generative Pre-Trained Transformer

Просмотров 10 тыс.3 года назад

L19.5.2.2 GPT-v1: Generative Pre-Trained Transformer

L19.5.2.4 GPT-v2: Language Models are Unsupervised Multitask Learners

9:03

L19.5.2.4 GPT-v2: Language Models are Unsupervised Multitask Learners

Просмотров 4,4 тыс.3 года назад

L19.5.2.4 GPT-v2: Language Models are Unsupervised Multitask Learners

L19.5.2.7: Closing Words -- The Recent Growth of Language Transformers

6:10

L19.5.2.7: Closing Words -- The Recent Growth of Language Transformers

Просмотров 1,9 тыс.3 года назад

L19.5.2.7: Closing Words The Recent Growth of Language Transformers

L19.5.2.6 BART: Combining Bidirectional and Auto-Regressive Transformers

10:15

L19.5.2.6 BART: Combining Bidirectional and Auto-Regressive Transformers

Просмотров 4,9 тыс.3 года назад

L19.5.2.6 BART: Combining Bidirectional and Auto-Regressive Transformers

L19.5.2.5 GPT-v3: Language Models are Few-Shot Learners

6:41

L19.5.2.5 GPT-v3: Language Models are Few-Shot Learners

Просмотров 3,7 тыс.3 года назад

L19.5.2.5 GPT-v3: Language Models are Few-Shot Learners

L19.6 DistilBert Movie Review Classifier in PyTorch -- Code Example

17:58

L19.6 DistilBert Movie Review Classifier in PyTorch -- Code Example

Просмотров 7 тыс.3 года назад

L19.6 DistilBert Movie Review Classifier in PyTorch Code Example

L19.5.2.3 BERT: Bidirectional Encoder Representations from Transformers

18:31

L19.5.2.3 BERT: Bidirectional Encoder Representations from Transformers

Просмотров 8 тыс.3 года назад

L19.5.2.3 BERT: Bidirectional Encoder Representations from Transformers

L19.5.2.1 Some Popular Transformer Models: BERT, GPT, and BART -- Overview

8:41

L19.5.2.1 Some Popular Transformer Models: BERT, GPT, and BART -- Overview

Просмотров 13 тыс.3 года назад

L19.5.2.1 Some Popular Transformer Models: BERT, GPT, and BART Overview

@nikosterizakis День назад
Hi Sebastian. I guess this was recorded before 'flatten' was adopted by Pytorch?
@imfeelindirectionles День назад
would be great to add a video explaining the vanishing/exploding gradient problem in detail! thanks so much!
@Xnaarkhoo 2 дня назад
@16:37 when you say Llama was trained on 1T token, do you still mean there was 32K unique token ? because on your blog post you have "They also have a surprisingly large 151,642 token vocabulary (for reference, Llama 2 uses a 32k vocabulary, and Llama 3.1 uses a 128k token vocabulary); as a rule of thumb, increasing the vocab size by 2x reduces the number of input tokens by 2x so the LLM can fit more tokens into the same input. Also it especially helps with multilingual data and coding to cover words outside the standard English vocabulary."
@SebastianRaschka 2 дня назад
Thanks for the comment! So in the talk these are the dataset sizes using the respective tokenizer that was used during model training. The vocabulary sizes that the models used are 32k for Llama 2 and 128k for Llama 3.1. So, regarding "do you still mean there was 32K unique token", the vocabulary was 32k unique tokens (but there could be more unique tokens in the dataset). I hope this helps. Otherwise, please let me know, happy to explain more!
@user-eq6xn6mg3k 3 дня назад
Your video was incredibly clear and engaging! Thank you for the awesome explanation!
@SebastianRaschka 2 дня назад
That's awesome to hear! Glad it was clear and helpful!
@DeepSingh-bi5sd 3 дня назад
Thanks for explaining
@yarasultan3433 3 дня назад
good
@yarasultan3433 3 дня назад
watched all 3 vids came back to leave a comment. better explained than all those 1 hour videos
@SebastianRaschka 3 дня назад
Thanks for the comment! Glad to hear those were helpful!
@alokranjansrivastava623 7 дней назад
Nice Video. Does LLM mean only auto-regressive models (Not Bert)?
@SebastianRaschka 7 дней назад
Yes, here LLM is basically synonymous with decoder-style autoregressive model like Llama, GPT, Gemma, etc.
@alokranjansrivastava623 7 дней назад
@@SebastianRaschka Bert has stack of encoder transformers, but it is not LLM. Am I correct here?
@SebastianRaschka 7 дней назад
@@alokranjansrivastava623 Architecture-wise, it's kind of the same thing though, except it doesn't have the causal mask, and the pretraining task is not next-token prediction but predicting masked tokens (plus sentence order prediction).
@alokranjansrivastava623 7 дней назад
@@SebastianRaschkaJust One question. How to define LLM? When we can say that this particular Language model is of LLM category.
@pe6649 7 дней назад
Danke!
@adityasamalla3251 9 дней назад
You are the best! Thanks a lot for sharing your knowledge to the world.
@tosinadekunle646 10 дней назад
Dear Dr. Sebastian, please kindly advice. @5:40, you mentioned that the first feature map has 96 channels. So i want to assume that since it is a coloured image, the input must have 3 channels and the total filters based on the first output feature maps = 96/3 =< 32 with the assumptions that the strides = 1 and no padding. Could this also contribute to the success of the architecture, researching and implementing different features in the convolution processes to capture as many features as possible? Could they have succeeded if they used half the feature size? Thank you.
@haribhauhud8881 10 дней назад
Thank you, Sir. Your lessons are beneficial for the community. Appreciate your hard work..!! 😊
@admercs 10 дней назад
You are a true educator. Honored to be a contributor to one of your libraries.
@JR-gy1lh 10 дней назад
I know you don't do many tutorials but personally I love theme especially from you!
@SebastianRaschka 10 дней назад
Thanks, that's very motivating to hear!
@nithinma8697 10 дней назад
00:03 PyTorch buffers are essential for implementing large models 01:39 Instantiating a new causal attention without buffers 03:12 Transferring data to GPU using PyTorch Cuda 04:56 Optimizing memory usage during forward pass 06:36 Explanation of creating mask for efficiency in PyTorch Buffers 08:07 Parameters are automatically transferred to GPU, but torch tensors need to be made parameters to be transferred. 10:05 The mask is made a buffer so it's not learned by the optimizer. 11:50 PyTorch buffers facilitate easy transfer of parameters between GPU and CPU
@nithinma8697 11 дней назад
00:02 Three common ways of using large language models 02:39 Developing LLM involves building, pre-training, and fine-tuning. 07:11 LLM predicts the next token in the text 09:30 Training LLM involves sliding fixed size inputs over text data to create batches. 14:22 Byte pair encoding and sentence piece variations allow LLMs to handle unknown words 16:42 Training sets are increasing in size 21:09 Developing an LM involves architecture, pre-training, model evaluation, and fine-tuning. 23:14 The Transformer block is repeated multiple times in the architecture. 27:22 Pre-training creates the Foundation model for fine-tuning 29:28 Training LLMs typically done for one to two epochs 33:44 Pre-training is not usually necessary for adapting LLM for a certain task 35:51 Replace the output layer for efficient classification. 39:54 Classification fine-tuning is key for practical business tasks. 42:01 LLM instruction data set and preference tuning 45:58 Evaluating LLMs is crucial, with MML being a popular metric. 48:07 Multiple choice questions are not sufficient to measure an LM's performance 52:34 Comparing LLM models for performance evaluation 54:32 Continued pre-training is effective for instilling new knowledge in LLMs 58:28 Access slides on the website for more details
@kyokushinfighter78 11 дней назад
One of the best 60 minutes of my time. Really thankful for this..
@SebastianRaschka 11 дней назад
Thanks for the kind words!
@berlinbrown03 11 дней назад
This is still great stuff, still learning
@SebastianRaschka 11 дней назад
Nice! Glad to hear that it's still relevant!
@baburamchaudhary159 13 дней назад
Thanks Sebastian! I got what is buffer for. Great lecture.
@CrickBritney 13 дней назад
If I am getting this right, for model evaluation using CV, it is done on the whole dataset and then average the model performance, right? Anyways, your videos are very good to the point that I bought your book "Machine Learning with PyTorch and Scikit-Learn: Develop machine learning and deep learning models with Python" , even though I use R for my analysis 🤣.
@dr_greg_mouse4125 19 дней назад
This is such a great explanation with a clear explanation of each step of the algorithm with examples. Thank you!!
@raiszakirdzhanov2148 20 дней назад
Hi, Sebastian I really respect what you are doing. I like your github repository - there are a lot of helpful tutorials I'm going to buy your next book - Build a Large Language Model (From Scratch). i have one question. What minimal gpu do you recommend to have to explore and do all examples from your next book?
@SebastianRaschka 20 дней назад
Thanks @raiszakirdzhanov2148! Actually, you don't need anything powerful -- I made sure all the examples run on minimal hardware. The other day, there was a reader who got it to work on an RTX3060 Laptop GPU with ~6GB of RAM (by decreasing the batch size). That being said, for some chapters, if you don't have a GPU, I would recommend an A10G or L4 GPU, which cost around 50 cents / hour on a cloud platform. I have some recommendations here: github.com/rasbt/LLMs-from-scratch/tree/main/setup#cloud-resources
@raiszakirdzhanov2148 20 дней назад
@@SebastianRaschka thanks a lot!)
@andrei_aksionau 22 дня назад
Actually learned something new. Thanks Sebastian!
@SebastianRaschka 22 дня назад
Wow thanks @andrei_aksionau! The fact that even you as a PyTorch expert learned something new is probably the biggest compliment 😊
@SHAMIKII 22 дня назад
Thank you very much for this explanation.
@SebastianRaschka 22 дня назад
Glad to hear it was useful!
@stephanembatchou5300 22 дня назад
Great content...thanks a lot
@putskan 23 дня назад
Cheers, great video. I'd suggest being slightly more concise. Either way, great video.
@SebastianRaschka 23 дня назад
@putskan This is useful feedback! I also often wish the videos would be more concise, but it's hard to know how long they actually are until the recording is finished, and then it's already too late 😅
@orrimoch5226 23 дня назад
Great Work! I like your LLM notebooks as well!
@noureddineadjir 23 дня назад
Thank you for the details. You haven't used the validation set.
@kevindelnoye9641 23 дня назад
Another advantage is that the buffer gets saved in the state_dict when saving the model
@SebastianRaschka 23 дня назад
Yes good point! In this case, if you'd modify the mask during the usage, then this would be super useful.
@SebastianRaschka 23 дня назад
@kevindelnoye9641 thanks again for the suggestion, I added a section on this to the code notebook
@kevindelnoye9641 22 дня назад
@@SebastianRaschka great! Thanks for the great tutorials, keep them coming
@anishbhanushali 23 дня назад
It's indeed a clean way to do things but Can't we do the same-thing by adding them as parameter and setting up .requires_grad = False ?
@SebastianRaschka 23 дня назад
This might achieve the same thing, but at the same time, it would also be a bit more work 😅
@RohanPaul-AI 23 дня назад
Awesome tutorial🔥
@SebastianRaschka 23 дня назад
Thanks!
@mkamp 24 дня назад
Back to basics. Love it. ❤
@user-xk3tj5cj8p 24 дня назад
The man is back more videos please ❤
@ashishgoyal4958 24 дня назад
I always see this register buffer code in transformer network and never though of the reason would be so simple. Thanks for explaining such ignored concept of pytprch.
@SebastianRaschka 24 дня назад
Great, I'm glad to hear that I was able to finally shed some light on this 😊
@CRTagadiya 24 дня назад
I recently purchased llm from scratch from Manning. Amazing learning experience till now
@SebastianRaschka 24 дня назад
Thanks for getting a copy. And I’m really happy to hear that you are getting lots out of the book :)
@2dapoint424 23 дня назад
@@SebastianRaschka is the book released or is it just the pre-order?
@SebastianRaschka 23 дня назад
@@2dapoint424 Currently preorder but the publisher is currently wrapping up the layouting, so it shouldn't be too long...
@mainakkundu2103 23 дня назад
Can I purchase it this point of time from Manning directly , plz let me know I am eager to purchase it
@SebastianRaschka 23 дня назад
@@mainakkundu2103 Yes you could! 😊
@helpfuldude3778 24 дня назад
More videos please
@ricardogomes9528 24 дня назад
Very useful tip 💪💪
@SebastianRaschka 24 дня назад
Thanks, glad to hear!
@ricardogomes9528 24 дня назад
@@SebastianRaschka do you have any book on pytorch coding that would somehow resemble “Deep Learning with Python” from François Chollet?
@SebastianRaschka 24 дня назад
@@ricardogomes9528 My "Machine Learning with PyTorch and Scikit-Learn" books perhaps: www.amazon.com/Machine-Learning-PyTorch-Scikit-Learn-scikit-learn-ebook-dp-B09NW48MR1/dp/B09NW48MR1/
@ricardogomes9528 24 дня назад
@@SebastianRaschka thank you for your prompt reply. Hope I can master it 🙏 keep up with the good videos 💪🙏
@PtYt24 25 дней назад
I really wish people would stop putting their x link and start sharing something like mastadon or threads, as a free user, x is where u go to feel second class citizen.
@SebastianRaschka 25 дней назад
I hear you. On that note, I do have Threads and Mastodon accounts 😅. Just not using them much, somehow all the AI folks are still on X :(. I think the days of this type of social media are counted ...
@PtYt24 25 дней назад
@@SebastianRaschka Haha, I get it. I feel in the topic of "All the AI folks are sill ON X" is somewhere the buck starts with you problem, if more people start sharing it will eventually move there I guess.
@tilkesh 28 дней назад
Thank you
@xpt5oo186 Месяц назад
best explanation ever.
@RachitSengupta Месяц назад
Coming to this video 3years late, now with LLMs being the hot thing around, just wondering if CNN are still the standard for vision or are companies now using transformers for vision?
@SebastianRaschka Месяц назад
Good question! Based on what I know (from interacting with other people in industry), CNNs are still the most widely used models in CV. However, depending on the application and company, vision transformers are becoming more and more common. I.e., at faster-moving startups, you'll find more people using vision transformers these days. Btw I made a short video on training ViTs in PyTorch that you might find helpful: ruclips.net/video/5vVYXhvjEsk/видео.html
@nithinma8697 Месяц назад
God Level Explanation From Scratch
@vipulsangode8612 Месяц назад
In the 4th tip, it was advised to train them separately. But you just calculated the loss separately but that loss was fed as a whole to the train the parameters. Shouldn't the training be done separately as well? first with the real images loss and then on the fake images loss?
@tilkesh Месяц назад
Thank you
@ShreyashKasar Месяц назад
Thank You Sir 🙏🙏
@havard8031 Месяц назад
Can you cite the formula in the end? around 09:15?
@Nikhil-q8p Месяц назад
Can anyone explain why the deriv is 2*(y-yhat). I can not understand anything after it.
@ZavierBanerjea Месяц назад
What wonderful Tech Minds : { Sebastian Raschka, Yann LeCun, Andrej Karpathy, ...} who share their works and beautiful ideations for Mere mortal like me... Sebastian's teachings are so, so fundamental that takes fear off my clogged mind... 🙏 Although I am struggling to build LLMs for specific & niche areas, I am confidant of cracking them with great resources like : Build a Large Language Model (From Scratch)!!!
@guis487 Месяц назад
I am your fan, I have most of your books, thanks for this excellent video ! Another evaluation metric that I found interesting in another channel was to make the LLMs to play chess against each other 10 times.
@SebastianRaschka Месяц назад
Hah nice, that's a fun one. How do you evaluate who's the winner, do you use a third LLM for that?
@sindijagrisle6551 Месяц назад
Very good explanation.

Sebastian Raschka

Комментарии