Confoo 2024 Gettings started with OpenAI and data science

Getting started with
OpenAI and Data
science
SUSAN IBACH | HOCKEYGEEKGIRL
SUSAN.IBACH@LIVE.COM

You can't go
anywhere these
days without
hearing about
Generative AI

AI won't replace you, but someone with your skills + AI might

Coders are more productive when they
use AI to help them code
 Over 80% of coders say they are more productive when they use a code helper
such as GitHub Copilot
 74% say it enables them to focus on more satisfying work
 96% say they are faster completing repetitive tasks
 When studying two control groups, the group using a built in AI to help with
coding completed their tasks 50% faster

Okay I get it Susan this AI thing
looks useful, how do I get
started using it for data
science?

You could just
open up ChatGPT
ask it to write code
for you then copy
& paste

But the real win is doing it inside your IDE!
This Photo by Unknown Author is licensed under CC BY

Step 1
Find a Large
Language Model
(LLM) you can install
inside your IDE

This takes a bit
of research
OpenAI – Owned by Microsoft
Codeium – VS Code, Vim, Jupyter Notebook, Eclipse
GitHub Copilot – comes as an extension for VS Code, Visual
Studio, JetBrains
Obsidian Integration, heroml, Superpower extension,
llmops.space, cursor.so, ChatGPT, CometLLM, Cohere

I use Jupyter
notebooks so
I'm going with
Jupyter AI

Jupyter AI is
vendor neutral
and can
connect to
different LLMs
 AI21
 Anthropic
 AWS
 Cohere
 HuggingFace Hub
 OpenAI

I chose OpenAI
because I had
played with it a
bit already

Step 2
Install the
extension or
library in your IDE

If you want to use Jupyter AI with OpenAI
in a Jupyter Notebook
Software versions required
 Requires Jupyter Lab 4
 Python 3.8 – 3.11 (I installed Python 3.11.6 64 bit)
Accounts required (you can start with the free version)
 OpenAI

If you want to use Jupyter AI with OpenAI
in a Jupyter Notebook
Install the openai library
 pip install openai
Create an environment variable and set it to the API key for your OpenAI account
 OPENAI_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxx
 Each LLM supported has a specific environment variable name
Install the jupyter_ai extension
 load_ext jupyter_ai

Not all OpenAI models are created equal
Version GPT-3.5 Turbo GPT-4.0
Speed Faster Slower
Database size 10X size of ChatGPT 3.5
and can handle images
Quality of output 40% more likely to
produce factual responses
than 3.5, better at dialects
$ Input / 1000 tokens $0.00005 $0.03
$ Output / 1000 tokens $0.0015 $0.06
You can find more information on pricing at openai.com

So what is a token anyway?
You can think of tokens as pieces of words
Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens
1 token is about 4 characters in English
1 English word is typically 1.3 tokens
1 French word is typically 2 tokens
Punctuation marks are counted as one token
Special characters are one to three tokens
Emojis are between two to three tokens

Step 3
Try a hello world
type command

Ask the AI to create "Hello World"
%%ai chatgpt --format code
display a message that says hello world
Possible successful outputs include
print("Hello World")
System.out.println("Hello World");
console.log("Hello World");
echo "Hello World";

Step 4
Evaluate the
suggested code

AI does not replace programmers.
Programmers with AI replace programmers
 There is more than one way to write code to complete a task
 LLMs make an educated guess based on code it has seen in the past
 The coder provides the knowledge to evaluate the suggestion from the AI and make
modifications to the prompt as needed (referred to as prompt engineering)

How many tokens and calls was it?

Maybe I need a
dataframe with
some sample
data

Maybe I forgot the
syntax for returning
entries that start
with a particular
letter

Let's read a .csv file
and then do some
linear regression

ValueError: Input y
contains NaN

AI does not replace
programmers.
Programmers with AI
replace
programmers

What would a
coder do? We'd
get rid of the rows
with Nulls and try
again!

Victory!
I have successfully
produced a plot but if
you don't know how
to read it this isn't
going to help you 

AI does not replace
data scientists. Data
scientists with AI
replace data
scientists

Until today, I have never done a
live code demo
- with this much code
- in a session this short
- without having to look up
method names and parameters
- without spending time in the
session having the audience
help me find my typing mistakes

AI doesn't replace
presenters.
Presenters with AI
replace presenters

References
ChatGPT
Open AI
Project Jupyter | Installing Jupyter
Generative AI in Jupyter. Jupyter AI, a new open source project… | by Jason Weill | Jupyter Blog
GitHub - jupyterlab/jupyter-ai: A generative AI extension for JupyterLab
What are tokens and how to count them
OpenAI Pricing

Questions?
SUSAN IBACH | HOCKEYGEEKGIRL
SUSAN.IBACH@LIVE.COM

Confoo 2024 Gettings started with OpenAI and data science

More Related Content

What's hot (20)

Similar to Confoo 2024 Gettings started with OpenAI and data science (20)

Recently uploaded (20)

Confoo 2024 Gettings started with OpenAI and data science