Exploring the AI-Generated Image Landscape
Imagine three cute baby kākāpō in the New Zealand native bush, 3D cartoon style.

Exploring the AI-Generated Image Landscape

TLDR: In this post I reflect on my early experiences with digitising images, computer vision and AI-generated content (AIGC). I discuss concerns around AIGC and conduct my own experiment across Midjourney and DALL.E 2 to generate images of New Zealand native birds.

When I was 16 years old, I first experienced a flatbed scanner that gave me the power to digitise photographs and load images into a computer. This technology set me on a path to study computer science and computer vision at university almost 30 years ago.

The morphing sequence in Michael Jackson's Black or White video in 1991 inspired me to go further and I was soon making my own morphing videos on my home computer photographing all my friends and then digitising the images. I remember it rendering for days to generate the frames to create a 5 second video. When I showed the video to my friends and teacher at school, they couldn't believe what they were seeing. This was before the Internet!

Large advances in computer vision had a long winter until the launch of the of the annual ImageNet Large Scale Visual Recognition Challenge (ILSVRC) in 2010. ILSVRC encouraged research in computer vision and benchmarked the progress of image recognition algorithms.

ImageNet was conceived by Fei-Fei Li, a Stanford University professor in 2006. The project aimed to create a vast database of labeled images to facilitate the development of computer vision algorithms. Spanning over 14 million images with annotations, ImageNet covers a diverse range of objects, scenes, and concepts, organised according to the WordNet hierarchy. The labeling exercise was outsourced to people around the world using Amazon Mechanical Turk.

ILSVRC led to the emergence of deep learning and rapid progress in image recognition. Five years ago, I wrote a post about this.

We are now entering a new phase in the advancement of AI-Generated Content (AIGC).

The history of Generative AI in CV, NLP and VL.
Statistics of model size and training speed across different models
. The general structure of generative vision language.

One of the most notable breakthroughs in AI-generated image generation was the development of Generative Adversarial Networks (GANs) by Ian Goodfellow and his team in 2014. GANs consist of two neural networks, the generator and the discriminator, that work together to create realistic images.

Building on the success of GANs, researchers at NVIDIA developed StyleGAN and its successors, StyleGAN2 and StyleGAN3. These models improve the quality of generated images and introduced new features, such as style mixing and the ability to control different aspects of an image.

The advancements of Large Language Models (LLMs) have bought about an interesting phenomenon which is the creation of images from text prompts. Three of the most popular services leveraging this technology are OpenAI's DALL.E 2 which is behind Microsoft's new Image Creator as well as Midjourney and Stable Diffusion.

Like any new technology there are concerns around AI-Generated Image Creation:

  1. Ethical concerns - As AI-generated images become more realistic, concerns regarding ethics and misuse have emerged. One such concern is deepfakes, which involve manipulating images or videos to make it appear as though someone is doing or saying something they did not. These can be used for nefarious purposes, such as spreading misinformation or defaming individuals. Recent examples include the Fake Trump Arrest Photos.

  2. Bias in AI Systems - AI-generated images can inadvertently perpetuate harmful stereotypes or biases. Since AI models learn from existing data, they can absorb and reproduce the biases present in that data. It is crucial for researchers and developers to actively work towards reducing biases in AI-generated images to ensure a more inclusive and diverse representation of people and objects. Jenka Gurfinkel does a great job of describing the influence of the American Smile on Midjourney

  3. Environmental Impact - Training large AI models like GANs requires substantial computational resources, which can have a significant environmental impact. As the field advances, it is essential to consider the ecological footprint of these technologies and develop more energy-efficient models.

  4. Copyright and Ownership - As AI-generated content becomes more prevalent, questions of copyright and ownership arise. Determining the intellectual property rights for images generated by AI systems is a complex issue that will require new legal frameworks and guidelines. This is also shaking up the music industry with a strong online debate.

Before you can use the Microsoft Azure OpenAI Service you need to agree to the companies Responsible AI policies. I worked with Natasha Crampton at Microsoft New Zealand for many years. Natasha moved to Redmond in 2018 to the role of chief counsel to the AETHER Committe and is now Microsoft's chief Responsible AI officer and posts often on the responsible AI program.

With all this as a backdrop this morning I conducted my own experiment across Midjourney and DALL.E 2 carrying on the theme of native birds from Aotearoa that I started five years ago.

Building an image classifier for NZ native brids.

My first prompt: "three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style"

DALL.E 2 three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style
Midjourney three cute baby kākāpō in the New Zealand native bush, 3D Pixar cartoon style

Some observations:

  • Midjourney did a better job of imagining the background of NZ native bush.

  • DALL.E put three of the birds up a tree despite the fact that the kākāpō is flightless.

I then turned to a more photo realistic prompt "kākāpō on the forest floor in the New Zealand native bush."

DALL.E 2 kākāpō on the forest floor in the New Zealand native bush
Midjourney kākāpō on the forest floor in the New Zealand native bush

Some observations:

  • Again, Midjourney did a better job of imagining the background of NZ native bush including ferns.

  • DALL.E produced a bird that appeared to be a hybrid of many NZ native parrots including the kākāpō, the kea and the kākā.

If you search for images of kākāpō you often get a photo of a kea instead with its distinctive hooked beak. This feature seems to have made its way incorrectly into both Midjourney and DALL.E's models.

image search with kea included incorrectly along with Kākāpō

Incorrect and correct beak styling for the kākāpō

I was very impressed by this imagined image from Midjourney

Midjourney kākāpō on the forest floor in the New Zealand native bush

That said if you zoom in you will see what appears to be jumbled copyright text at the bottom of the image.

what appears to be jumbled copyright text at the bottom of the image.

When I was building my own kākāpō classifier five years ago I noticed that most of the high-quality photos of kākāpō on the web had copyright notices watermarked into the images. A visual image search from the generated image links to copyrighted photos like this one.

The morphing between the kākāpō and the kea got me thinking and I created the same prompt for a kea.

DALL.E 2 kea on the forest floor in the New Zealand native bush
Midjourney kea on the forest floor in the New Zealand native bush

The distinctive curved beak certainly came through in these examples. Again, the images produced by DALL.E 2 were out of proportion and less accurate. The following imagined kea photo is fantastic.

Imagined KEA Photo

The next thing I thought I would try to generate an image from a reference photo.

I picked this reference photo from Rob Pine to "re-imagine".

DALL.E 2 reimagining Rob Pine's photo of two keas playing in the snow
Midjourney reimagining Rob Pine's photo of two keas playing in the snow

Where things get really interesting is when I used that image as a reference and dialed up the emotion by editing my prompt to include things like expressive human eyes. When I did this the strangest thing happened. Midjourney imagined what appear to be two bird like children focusing in on the emotion in the eyes.

Midjourney imagined what appear to be two bird like children morphed with two kea in the snow focusing in on the emotion in the eyes.

The landscape of AI-generated image generation is rapidly evolving as researchers, developers, and policymakers navigate this landscape, striking a balance between innovation and responsible development will be key to unlocking the full potential of AI-generated images.

I am only just scratching the surface over what these tools can do. I recommend if you are keen to get started experimenting for yourself you check out the Introduction to Prompt Engineering for Generative AI course on LinkedIn Learning that is currently free.

Update June 2025: a lot has happened in a year with the advancements of these technologies. The key advancement being that short video generation from prompt and images has come to life.

Created by Google Flow from generated image Veo 3 of three cute baby kākāpō in the New Zealand native bush playfully climbing and eating the fruit of the Rimu tree in a 3D cartoon style - note the fourth kākāpō appearing and disappearing.
Created by Google Flow from generated image Veo 2 of three cute baby kākāpō in the New Zealand native bush playfully climbing and eating the fruit of the Rimu tree in a 3D cartoon style
Created by OpenAI Sora three cute baby kākāpō in the New Zealand native bush playfully climbing and eating the fruit of the rimu tree in a 3D cartoon style

Dan Te Whenua Walker

Proud dad | Ngāti Ruanui | Aotea Waka

2y

The Fake Trump photo got me for a second

Like
Reply
☁️ Eisa Q.

AI Partnership & GTM Strategist | Microsoft Alum | Building New Products & Enabling High-Impact Alliances

2y

What an awesome image ❤️

Drew Robbins

Engineering Leader | Author of When No One’s Keeping Score

2y

Great article and very interesting experiments that support the earlier points about copyright, etc.

Nimish Rao

Building something new.

2y

Wow this is so cool Nigel.

Igor Portugal

Technology Innovator | Fractional CxO | Best Selling Author | Investor | AI | Cyber Security | Cloud Computing | Empowering businesses, enriching lives with technology and human insight for a smarter, safer world.

2y

This is a very good overview and you are raising some very interesting questions. Time to think.

To view or add a comment, sign in

Others also viewed

Explore topics