AWS AI Practitioner - Preparation / Last Minute Revision Sheet

AWS AI Practitioner - Preparation / Last Minute Revision Sheet

Are you ready to give the AWS AI Practitioner Certification Exam?

Below is the list of all the material and prep notes that helped me pass the exam.

Hope it will be helpful to you.

KEY NOTES: PLEASE READ FIRST

  1. Below are my personal notes taken from the AWS Skill Builder Free course I have taken for this exam. You can also take it by going to AWS Skill Builder.

  2. It is not exhaustive list and may not be specific sequence also (as it my personal note and follows my way to noting and remembering) so just use it as additional help to your planning

  3. Usually my idea was to write everything down while learning and then go through it again and in case I don't recall the given topic then I would go in depth to understand it further.

  4. The below material should be read as per the Indentation levels. If you don't see proper indentation, message me directly and can email that to you.

  5. Hope it helps you in prep, however way it can and most importantly WISH YOU GOOD LUCK..

 

Domain Level Revision below From the Course on AWS Skill Builder

Domain 1: Fundamentals of AI and ML 

Known Data -> Features -> Algorithm -> Output

Adjustments

Inference

ML models can be trained on various types of data.

Structured data on RDS, S3 or Redshift

S3 is primary source of training data

Semi-structures = DynamoDB & DocumentDB

Unstructured data - tokenization

Timeseries - sequential data

Model Training - Algorithm

Inference 2 options

  - Real time

         Low Latency

        High throughput

        persistent endpoint

- - Batch Transform

       Offline

       Large datasets

       Infrequent use

 

ML Types

  Supervised Learning

      Amazon Sagemaker GroundTruth -> Amazon Mechanical Turk

  Unsupervised Learning

  Reinforcement Learning

      Reward - AWS DeepRacer

 

Overfitting

  Model does well on training data but not outside it

Underfitting

  Model cannot determine meaningful results. It gives negative results for training data and new inputs

Bias and fairness

  Diversity of training data

  Feature importance

  Fairness constraints

Deep Learning

  Neural Networks

  Input Layer -> Hidden Layers -> Output Layer

Machine Learning vs Deep Learning

Consider alternatives when

  Costs outweigh the benefits

  Models cannot meet the interpretability requirements

  Systems must be deterministic rather than probabilistic

ML Models are probabilistic

 

Supervised learning -

  Classification

     Binary          - Diabetic or not diabetic

     MultiClass

  Regression

      Simple Linear regression

      Multiple Linear regression

      Logistic regression

Unsupervised Learning

  Clustering

     Define features

     Similarity function

     Number of clusters

  Anomaly detection

      Data points that diverge

   

 

Amazon Rekognition

   Facial comparison and analysis

   Text detection

   Object detection and labelling

   Content moderation

   Can find out explicit text from images and videos

 

Amazon Textract

  Extract text from scanned documents

 

Amazon Comprehend

  Extract key phrases, entities and sentiment.

  Main is finding PII data

 

Amazon Lex

   Conversational voice and text

 

Amazon Transcribe

   Converts speech to text

 

Amazon Polly

   Converts Text to speech

 

Amazon Kendra

   Intelligent document search

 

Amazon Personalize

   Personalized product recommendations

 

Amazon Translate

  Translates between 75 languages

 

Amazon Forecast

   Predicts future points in time-series data

 

Amazon Fraud Detector

   Detects fraud and fraudulent activities

 

Amazon Bedrock

 

Amazon Sagemaker

 

ML Pipeline

Identify Business Goal -> Frame ML Problem -> Collect Data -> Pre-process Data -> Engineer Features -> Train, Tune Evaluate -> Deploy -> Monitor

 

Collect Data

   AWS Glue -

      Cloud optimized ETL service

      Contains its own data catalog

      Built in transformations

  AWS Glue DataBrew

      Point and click data transformation

      200+ transformations

  AWS SageMaker Ground Truth

     Uses ML to label your training data

     Can automatically label

AWS SageMaker Canvas

     Import, Prepare, Transform, Visualize and analyze

AWS Sagemaker Feature Store

     Processes raw data into features by using a processing workflow

Amazon Sagemaker Experiments

     visual interface

Amazon Sagemaker automatic model tuning

 

Deploy

     Batch inference

     Real-time inference

     Self-managed

     Hosted

 

Amazon Sagemaker inference

    Batch Transform

               Offline inference

               Large datasets

   Asynchronous

               Long processing times

               Large payloads

   Serverless

               Intermittent traffic

               Periods of no traffic

   Real-time

               Live predictions

               Sustained traffic

               Low latency

               Consistent

 

Monitor the model

             Configure alerts to notify and initiate actions if any drift

             data drift / concept drift

 

Amazon Sagemaker Model Monitor

 

MLOps

      Amazon SageMaker Model Building Pipelines

      Repository Options

             AWS Codecommit

             AWS Sagemaker feature store

             AWS Sagemaker model registry

            3rd party repository

      Orchestration options

             Amazon Sagemaker pipelines

             Amazon managed workflows for apache airflow

             AWS Step functions

 

Accuracy = (True Positives + Ture Negatives) / Total

Precision = True Positives / (True Positivies + False Positives)

Recall = True Positives / (True Positives + False Negatives)

F1 = Precision Recall 2 / (Precision + Recall)

False Positive Rate FPR = False Positives / (True Negatives + False Positives)

True Negative Rate = True Negatives / (True Negatives + False Positives)

Area Under Curve - AUC

Regression Model Errors

      Mean Squared Error

       Root mean squared error

       Mean absolute error

 

 

Domain 2: Fundamentals of Generative AI

 

AI - ML - DL - GAI

Model

In-context learning

Prompts, prompt tuning, prompt engineering

Every NLP has a tokenizer which converts texts into token ID's.

Vector - ordered list of numbers.

Ability to encode related relationships and collect associations

Embeddings

Numerical vectorized representations of type that capture the semantic meaning of the token

Self-attention

 

LLMs

Deep learning foundation models

Transformers

Unimodal or multimodal

Multimodal use cases

Multimodal tasks

Diffusion Models

Forward Diffusion

Reverse Diffusion

Stable Diffusion

Does not use pixel space of the image, uses a reduced-definition latent space

 

SageMaker + Amazon Q Developer

Amazon Nimble studio and amazon samarian

 

Gen AI Architectures

Generative Adversarial Networks GANs

Variational autoencoders VAE

Transformers

 

AI Project lifecycle

Identify User case

Experiment and select

Adapt, align and augment

Evaluate

Deploy and integrate

Monitor

 

Interpretability

Intrinsic analysis

Post hoc analysis

 

ML outputs are deterministic

Gen AI outputs are non-deterministic

 

Gen AI Performance metrics

Recall - Oriented Understudy for Gisting Evaluation (ROUGE)

Bilingual Evaluation Understudy (BLEU)

 

Transfer learning

 

SageMaker JumpStart

 

 

Domain 3: Applications of Foundation Models

 

Considerations

Architecture

Complexity

Availability

Compatibility

Explainability

Interpretability

 

Inference

It is the process of generating an output from an input that you provided to the model.

Input = Prompt and inference parameters

Randomness and Diversity

Temperature  (Lower value = high probability outputs and Higher value = Low probability outputs)

Top K (Lower value = decrease the size of pool)

Top P

Length

Response Length

Penalties

Stop sequences

Prompt

A specific set of inputs to guide LLMs to generate an appropriate output or completion

RAG - Retrieval Augmented Generation (RAG)

Prompt enrichment and appending external data to your prompt

Vector Database

Collection of data stored as mathematical representations

 

AWS Services for Vector search databases

Amazon OpenSearch Service

Amazon OpenSearch Serverless

Amazon Aurora PostgreSQL

Amazon RDS PostgreSQL

Amazon Aurora

Amazon Neptune

Amazon DocumentDB [with MongoDB compatibility]

 

Amazon Bedrock AGENTS

Orchestrate prompt completion workflows

 

Prompt

Zero shot prompting

Few shot prompting

Prompt Template

Chain-of-thought prompting

Prompt tuning

 

Latent space

The encoded knowledge of language in LLMs or the stored patterns of data that capture relationships and reconstruct the language from the patterns when prompted

Statistical database

 

Prompt Engineering risks and limitations

Exposure

Prompt Injection

Jailbreaking

Hijacking

Poisoning

 

Training process for foundation models

Pretraining         - Self supervised learning

Fine-tuning        - Supervised learning            :: Catastrophic forgetting

Continuous pre-training

 

Fine-tuning techniques

Parameter-efficient fine-tuning (PEFT)

Low-Rank Adaptation (LoRA)

Representation fine-tuning (ReFT)

Multitask fine-tuning

Domain adaption fine-tuning

Reinforcement learning from human feedback (RLHF)

 

Data preparation fine-tuning

Prepare your training data

Select prompts

Calculate loss

Update weights

Define evaluation steps

 

Data preparation AWS Services

Amazon SageMaker Canvas

Open-source frameworks

Amazon Sagemaker studio - integration with EMR, can use jupyter labs

Amazon Glue

Amazon SageMaker Feature Store

Amazon SageMaker Clarify  -- if you have bias in your data

Amazon SageMaker Ground Truth  -- manage data labelling

 

Model performance

One option to reduce inference latency is to decrease the size of LLMs but might decrease its performance

 

Gen AI Performance Metrics

Recall Oriented Understudy for Gisting Evaluation (ROUGE)

Automatic summarization tasks

Machine translation software

Bilingual Evaluation Understudy (BLEU)

Used for translation tasks

General Language Understanding Evaluation (GLUE)

Compare against benchmarks set by the experts

Access model generalization across multiple tasks

Holistic Evaluation of Language Models (HELM)

Help improve model transparency

Massive Multitask Language Understanding (MMLU)

Evaluates knowledge and problem solving capabilities of the model

Tested against history, mathematics, laws, computer science and more

Beyond the Imitation Game Benchmark (BIG-bench)

Focuses on tasks that are beyond the capabilities of the current language models

 

AWS Services for model evaluation

Amazon SageMaker JumpStart

Amazon SageMaker Clarify

 

Review these materials to learn more about the topics covered in this exam domain: 

 

 

 Domain 4: Guidelines for Responsible AI

 

Responsible AI

Fairness

Explainability

Robustness

Privacy and security

Governance

Transparency

 

Effects of bias and variance

Demographic disparities

Inaccuracy

Overfitting

Underfitting

User Trust

 

Responsible datasets

Inclusivity

Diversity

Balanced datasets

Privacy protection

Consent and transparency

Regular audits

 

Responsible practices

Environmental considerations

Sustainability

Transparency

Accountability

Stakeholder engagement

 

AWS service for this

Amazon SageMaker Clarify

Detect bias

Explainability

SageMaker Processing jobs

 

SageMaker pre-training bias analysis

Class imbalance

Label imbalance

Demographic disparity

Difference in positive proportions

Specificity difference

Recall difference

Accuracy difference

Treatment equality

 

Gen AI Risks

Hallucinations

Intellectual Property

Bias

Toxicity

Data privacy

 

Guardrails for Amazon Bedrock

Hate

Insults

Sexual

Violence

+ Denied topics

 

Model transparency

Interpretability   - Deep analysis

Explainability      - black box analysis

 

AI Service Card

Amazon SageMaker Model Cards

Sagemaker provides

Feature attributions - SHAP Values

Partial dependence plots

Amazon Augmented AI (A2I) - send data to human reviewers to review random predictions.

Use your own reviewers or use mechanical turf

 

 

Domain 5: Security, Compliance, and Governance for AI Solutions 

 IAM Identity Center

Workforce users, Workforce identities

Logging with CloudTrail

Captures API calls and related events

Integrated with SageMaker

Amazon SageMaker Role Manager

Preconfigured permissions for 12 activities

 

Encryption at rest

Amazon SageMaker

Data is encrypted by default on ML storage volumes

Notebook instances, SageMaker jobs, and endpoints

 

AWS Key Management Service - KMS

Amazon Macie

Identifies and alerts you to sensitive data

Remove PII during ingestion

 

AI System Vulnerabilities

Training Data

Input Data

Output Data

Models

Inversion

Theft

LLM's

Prompt Injection

 

Amazon SageMaker Model Monitor

Capture data

Create a baseline

Define data quality monitoring jobs

Evaluate statistics

 

Amazon SageMaker Model Registry

Amazon SageMaker Model Cards

Amazon SageMaker ML Lineage Tracking

Amazon SageMaker Feature Store

Amazon SageMaker Model Dashboard

 

Emerging AI compliance standards

ISO 42001 and ISO 23894

EU Artificial Intelligence Act

NIST AI Risk Management Framework (RMF)

 

AI Risk Management

Probability of occurrence

Severity of occurrence

 

Algorithmic Accountability Act

Transparency and explainability

Monitor for Bias

 

AWS Audit Manager

Audits AWS usage to assess compliance

Choose a framework

Gen AI

Customer frameworks

Collect evidence and add to audit report

 

Guardrails for Amazon Bedrock

Apply guardrails to any foundation model and agents for Amazon Bedrock

Configure harmful content filtering

Define and disallow denied topics

PII data

 

AWS Config

Continuously monitors and records configurations

AWS Config rules

Conformance packs

Operational best practices for AI and ML

Security best practices for Amazon SageMaker

 

Amazon Inspector

Works at application level

Performs automated security assessments on your applications

 

AWS Trusted Advisor

Provides guidance to help you

Reduce cost

Increase performance

Improve security

 

Data Governance

Curation

Discovery and understanding

Protection

  Define roles

Data steward

Data owner

IT Roles

 

AWS Glue DataBrew for data goverance

Data profiling

Data Lineage

AWS Glue Data Catalog

AWS Glue Data Quality

 

Curation

Data Quality Management

Data Integration

Data Management

Protection

Data Security

Data Compliance

Data Lifecycle management

 

Review these materials to learn more about the topics covered in this exam domain: 

 

GENERAL LINKS - For Revision

What are Transformers in Artificial Intelligence? -> aws.amazon.com/what-is/transformers-in-artificial-intelligence/

What are Foundation Models? -> aws.amazon.com/what-is/foundation-models/

What is Artificial Intelligence (AI)? -> aws.amazon.com/what-is/artificial-intelligence/ 

What is Machine Learning? -> aws.amazon.com/what-is/machine-learning/ 

What is Deep Learning? -> aws.amazon.com/what-is/deep-learning/ 

What is Generative AI? -> aws.amazon.com/what-is/generative-ai/

What’s the Difference Between Supervised and Unsupervised Learning? -> aws.amazon.com/compare/the-difference-between-machine-learning-supervised-and-unsupervised/

Machine Learning Concepts -> docs.aws.amazon.com/machine-learning/latest/dg/machine-learning-concepts.html

 AWS AI Use Case Explorer -> aws.amazon.com/machine-learning/ai-use-cases/?use-cases

 What is Amazon SageMaker? -> docs.aws.amazon.com/sagemaker/latest/dg/whatis.html

 AWS Services - Machine Learning (ML) and Artificial Intelligence (AI) -> docs.aws.amazon.com/whitepapers/latest/aws-overview/machine-learning.html

AWS Deploy Serverless ML ->aws.amazon.com/blogs/machine-learning/deploy-a-serverless-ml-inference-endpoint-of-large-language-models-using-fastapi-aws-lambda-and-aws-cdk/

AWS Sagemaker - API Gateway - AWS Lambda -> aws.amazon.com/blogs/machine-learning/call-an-amazon-sagemaker-model-endpoint-using-amazon-api-gateway-and-aws-lambda/

Inference parameters ->docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html

Inference parameters -> docs.aws.amazon.com/bedrock/latest/userguide/inference-parameters.html?icmpid=docs_bedrock_help_panel_playgrounds 

Amazon Bedrock or Amazon SageMaker? -> docs.aws.amazon.com/decision-guides/latest/bedrock-or-sagemaker/bedrock-or-sagemaker.html 

Choosing a generative AI service -> docs.aws.amazon.com/decision-guides/latest/generative-ai-on-aws-how-to-choose/guide.html

AWS Bedrock Agents -> aws.amazon.com/bedrock/agents/

What is RAG? - Retrieval-Augmented Generation AI Explained - AWS (amazon.com)

docs.aws.amazon.com/awscloudtrail/latest/userguide/how-cloudtrail-works.html

docs.aws.amazon.com/bedrock/latest/userguide/usingVPC.html

aws.amazon.com/blogs/machine-learning/use-aws-privatelink-to-set-up-private-access-to-amazon-bedrock/

To view or add a comment, sign in

Others also viewed

Explore topics