更智能地部署 AI - LLM 可扩展性、ML-Ops 和成本效率
- 1. Introduction
- 1. Introduction & Welcome
- 2. Getting Started
- 1. Course Structure How to get the Most out of this Course
- 2. Environment Setup Prepare and Use the Resource of this Course Right
- 3. Pre-Deployment Strategies
- 1. Ensuring Model Correctness Evaluation Techniques
- 2. Performance Optimization Exploring Key Dimensions
- 3. Balancing Speed and Accuracy Best Practices
- 4. Advanced Model Management with ML-Ops
- 1. Fundamentals of ML Model Management and ML-Ops
- 2. Overview of Effective ML-Ops Frameworks
- 3. Setting up ML-Ops Framework Introduction to MLflow (Practical)
- 4. Getting Started with MLflow A Practical Approach (Practical)
- 5. Training Models with MLflow A Hands-On Guide (Practical)
- 6. MLflow for Model Inference Techniques and Practices (Practical)
- 7. Advanced Techniques in MLflow Extending Functionality (Practical)
- 5. Advanced Model Deployment Techniques
- 1. Efficiency through Batching and Dynamic Batches
- 2. Hands-on Application of Batching Techniques (Practical)
- 3. The Role of Sorting in Model Deployment (Practical)
- 4. Leveraging Quantization for Model Efficiency (Practical)
- 5. Inference Strategies Parallelism, Flash Attention, GPTQ & AVQ,
- 6. Next-Gen Scaling LoRa, Paged Attention, ZeRO
- 6. The Economics of Machine Learning Inference
- 1. The Broader Context of AI A Wider Perspective
- 2. Measuring Performance Key Metrics for Large AI Projects
- 3. Evaluating Deployment Strategies for Cost & Efficiency
- 4. Real-World Benchmarks for Success Case Studies and Insights
- 7. Effective Cluster Management for Large Scale ML Deployments
- 1. Basic Inference - First Levels of Deployment (Practical)
- 2. Entering Optimisations - Advanced Levels of Deployment (Practical)
- 3. Setting Up Data Access in Distributed Environments (Practical)
- 4. Distributing Data Across a Cluster with RabbitMQ (Practical)
- 5. Foundations of Distributed Computing with Ray (Practical)
- 6. Scaling Large Language Models on a Cluster (Practical)