Scrap the Manual: Generative AI
Generative AI has taken the creative industry by storm, flooding our social feeds with beautiful creations powered by the technology. But is it here to stay? And what should creators keep in mind?
In this episode of Scrap the Manual, host Angelica Ortiz is joined by fellow Creative Technologist Samuel Snider-Held, who specializes in machine learning and Generative AI. Together, Sam and Angelica answer questions from our audience—breaking down the buzzword into tangible considerations and takeaways—and why embracing Generative AI could be a good thing for creators and brands.
Read the discussion below or listen to the episode on your preferred podcast platform.
Angelica: Hey everyone. Welcome to Scrap the Manual, a podcast where we prompt "aha" moments through discussions of technology, creativity, experimentation and how all those work together to address cultural and business challenges. My name's Angelica, and I'm joined today by a very special guest host, Sam Snider-Held
Sam: Hey, great to be here. My name's Sam. We're both Senior Creative Techs with Media.Monks. I work out of New York City, specifically on machine learning and Generative AI, while Angelica's working from the Netherlands office with the Labs.Monks team.
Angelica: For this episode, we're going to be switching things up a bit and introducing a new segment where we bring a specialist and go over some common misconceptions on a certain tech.
And, oh boy, are we starting off with a big one: Generative AI. You know, the one that's inspired the long scrolls of Midjourney, Stable Diffusion and DALL-E images and the tech that people just can't seem to get enough of the past few months. We just recently covered this topic on our Labs Report, so if you haven't already checked that out, definitely go do that. It's not needed to listen to this episode, of course, but it'll definitely help in covering the high level overview of things. And we also did a prototype that goes more in depth on how we at Media.Monks are looking into this technology and how it implements within our workflows.
For the list of misconceptions we’re busting or confirming today, we gathered this list from across the globe–ranging from art directors to technical directors–to get a variety of what people are thinking about on this topic. So let's go ahead and start with the basics: What in the world is Generative AI?
Sam: Yeah, so from a high level sense, you can think about generative models as AI algorithms that can generate new content based off of the patterns inherent in its training data set. So that might be a bit complex. So another way to explain it is since the dawn of the deep learning revolution back in 2012, computers have been getting increasingly better at understanding what's in an image, the contents of an image. So for instance, you can show a picture of a cat to a computer now and it will be like, "oh yeah, that's a cat." But if you show it, perhaps, a picture of a dog, it'll say, "No, that's not a cat. That's a dog."
So you can think of this as discriminative machine learning. It is discriminating whether or not that is a picture of a dog or a cat. It's discriminating what group of things this picture belongs to. Now with Generative AI, it's trying to do something a little bit different: It's trying to understand what “catness” is. What are the defining features of what makes up a cat image in a picture?
And once you can do that, once you have a function that can describe “catness”, well, then you can just sample from that function and turn it into all sorts of new cats. Cats that the algorithm's actually never seen before, but it just has this idea of “catness” creativity that you can use to create new images.
Angelica: I've heard AI generally described as a child, where you pretty much have to teach it everything. It's starting from a blank slate, but over the course of the years, it is no longer a blank slate. It's been learning from all the different types of training sets that we've been giving it. From various researchers, various teams over the course of time, so it's not blank anymore, but it's interesting to think about what we as humans take for granted and being like, "Oh that's definitely a cat." Or what's a cat versus a lion? Or a cat versus a tiger? Those are the things that we know of, but we have to actually teach AI these things.
Sam: Yeah. They're getting to a point where they're moving past that. They all started with this idea of being these expert systems. These things that could only generate pictures of cats...could only generate pictures of dogs.
But now we're in this new sort of generative pre-training paradigm, where you have these models that are trained by these massive corporations and they have the money to create these things, but then they often open source them to someone else, and those models are actually very generalized. They can very quickly turn their knowledge into something else.
So if it was trained on generating this one thing, you do what we call “fine tuning”, where you train it on another data set to very quickly learn how to generate specifically Bengal cats or tigers or stuff like that. But that is moving more and more towards what we want from artificial intelligence algorithms.
We want them to be generalized. We don't want to have to train a new model for every different task. So we are moving in that direction. And of course they learn from the internet. So anything that's on the internet is probably going to be in those models.
Angelica: Yeah. Speaking of fine tuning, that reminds me of when we were doing some R&D for a project and we were looking into how to fine tune Stable Diffusion for a product model. They wanted to be able to generate these distinctive backgrounds, but have the product always be consistent first and foremost. And that's tricky, right? When thinking about Generative AI and it wanting to do its own thing because either it doesn't know better or you weren't necessarily very specific on the prompts to be able to get the product consistent. But now, because of this fine tuning, I feel like it's actually making it more viable of a product because then we don't feel like it's this uncontrollable platform. It's something that we could actually leverage for an application that is more consistent than it may have been otherwise.
So the next question we got is: with all of the focus on Midjourney prompts being posted on LinkedIn and Twitter, is Generative AI simply just a pretty face? Is it only for generating cool images?
Sam: I would definitely say no. It's not just images. It's audio. It's text. Any type of data set you put into it, it should be able to create that generative model on that dataset. It's just the amount of innovation in the space is staggering.
Angelica: What I think is really interesting about this field is not only just how quickly it's advanced in such a short period of time, but also the implementation has been so wide and varied.
Angelica: So we talked about generating images, generating text and audio and video, but I had seen that Stable Diffusion is being used for generating different types of VR spaces, for example. Or it's Stable Diffusion powered processes, or not even just Stable Diffusion... just different types of Generative AI models to create 3D models and being able to create all these other things that are outside of images. There's just so much advancement within a short period of time.
Sam: Yeah, a lot of this stuff you can think about like LEGO blocks. You know, a lot of these models that we're talking about are past this generative pre-training paradigm shift where you're using these amazingly powerful models trained by big companies and you're pairing them together to do different sorts of things. One of the big ones that's powering this, came from OpenAI, was CLIP. This is the model that allows you to really map text and images into the same vector space. So that if you put in an image and a text, it will understand that those are the same things from a very mathematical standpoint. These were some of the first things that people were like, "Oh my gosh, it can really generate text and it looks like a human wrote it and it's coherent and it circles back in on it itself. It knows what it wrote five paragraphs back." And so, people started to think, "What if we could do this with images?" And then maybe instead of having the text and the images mapped to the same space, it's text to song, or text to 3D models?
And that's how all of this started. You have people going down the evolutionary tree of AI and then all of a sudden, somebody comes out with something new and people abandon that tree and move on to another branch. And this is what's so interesting about it: Whatever it is you do, there's some cool way to incorporate Generative AI into your workflow.
Angelica: Yeah, that reminds me of another question that we got that's a little bit further down the list, but I think it relates really well with what you just mentioned. Is Generative AI gonna take our jobs? I remember there was a conversation a few years ago, and it still happens today as well, where they were saying the creative industry is safe from AI. Because it's something that humans take creativity from a variety of different sources, and we all have different ways of how we get our creative ideas. And there's a problem solving thing that's just inherently human. But with seeing all of these really cool prompts being generated, it's creating different things that even go beyond what we would've thought of. What are your thoughts on that?
Sam: Um, so this is a difficult question. It's really hard to predict the future of this stuff. Will it? I don't know.
I like to think about this in terms of “singularity light technology.” So what I mean by singularity light technology is a technology that can zero out entire industries. The one we're thinking about right now is stock photography and stock video. You know, it's hard to tell those companies that they're not facing an existential risk when anybody can download an algorithm that can basically generate the same quality of images without a subscription.
And so if you are working for one of those companies, you might be out of a job because that company's gonna go bankrupt. Now, is that going to happen? I don't know. Instead, try to understand how you incorporate it into your workflow. I think Shutterstock is incorporating this technology into their pipeline, too.
I think within the creative industry, we should really stop thinking that there's something that a human can do that an AI can't do. I think that's just not gonna be a relevant idea in the near future.
Angelica: Yeah. My perspective from it would be: not necessarily it's going to take our jobs, but it's going to evolve how we approach our jobs. We could think of a classic example of film editors where they had like physical reels to have to cut. And then when Premiere and After Effects come out, then that process is becoming digitized.
Angelica: And then further and further and further, right? So there's still video editors, it's just how they approach their job is a little bit different.
And same thing here. Where there'll still be art directors, but it'll be different on how they approach the work. Maybe it'll be a lot more efficient because they don't necessarily have to scour the internet for inspiration. Generative AI could be a part of that inspiration finding. It'll be a part of the generating of mockups and it won't be all human made. And we don't necessarily have to mourn the loss of it not being a hundred percent human made. It'll be something where it will allow art directors, creatives, creators of all different types to be able to even supercharge what they currently can do.
Sam: Yeah, that's definitely true. There's always going to be a product that comes out from NVIDIA or Adobe that allows you to use this technology in a very user friendly way.
Last month, a lot of blog posts brought up a good point: if you are an indie games company and you need some illustrations for your work, normally you would hire somebody to do that. But this is an alternative and it's cheaper and it's faster. And you can generate a lot of content in the course of an hour, way more than a hired illustrator could do.
It's probably not as good. But for people at that budget, at that level, they might take the dip in quality for the accessibility, the ease of use. There's places where it might change how people are doing business, what type of business they're doing.
Another thing is that sometimes we get projects that for us, we don't have enough time. It's not enough money. If we did do it, they would basically take our entire illustration team off the bench to work on this one project. And normally if a company came to us and we passed on it, they would go to another one. But perhaps now that we are investing more and more on this technology, we say, "Hey, listen, we can't put real people on it, but we have this team of AI engineers, and we can build this for you.” For our prototype, that's what we were really trying to understand is how much of this can we use right now and how much benefit is that going to give us? And the benefit was to allow this small team to start doing things that large teams could do for a fraction of the cost.
I think that's just going to be the nature of this type of acceleration. More and more people are going to be using it to get ahead. And because of that, other companies will do the same. Then it becomes sort of an AI creativity arms race, if you will. But I think that companies that have the ability to hire people that can go to their artists and say, "Hey, what things are you having problems with? What things do you not want to do? What things take too much time?" And then they can look at all the research that's coming out and say, "Hey, you know what? I think we can use this brand new model to make us make better art faster, better, cheaper." It protects them from any sort of tool that comes out in the future that might make it harder for them to get business. At the very least, just understanding how these things work and not from a black box perspective, but having an understanding of how they work.
Angelica: It seems like a safe bet, at least for the short term, is just to understand how the technology works. Like listening to this podcast is actually a great start.
Sam: If you are an artist and you're curious, you can play around with it by yourself. Google CoLab is a great resource. And Stable Diffusion is designed to run on cheap GPU. Or you can start to use these services like Midjourney, to have a better handle on what's happening with it and how fast it's moving.
Angelica: Yeah, exactly. Another question that came through is: if I create something with Generative AI through Prompt Engineering, is that work really mine?
Sam: So this is starting to get into a little bit more of a philosophical question. Is it mine in the sense that I own it? Well, if the model says so, then yes. Stable Diffusion, I believe, comes with a MIT license. So that is like the most permissive license. If you generate an image with that, then it is technically yours, provided somebody doesn't come along and say, "The people who made Stable Diffusion didn't have the rights to offer you that license."
But until that happens, then yes, it is yours from an ownership point of view. Are you the creator? Are you the creative person generating that? That's a bit of a different question. That becomes a little bit murkier. How different is that between a creative director and illustrator going back and forth saying:
"I want this."
"No, I don't want that."
"No, you need to fix this."
"Oh, I liked what you did there."
"That's really great. I didn't think about that."
Who's the owner in that solution? Ideally, it's the company that hires both of them. This is something that's gonna have to play out in the legal courts if they get there. I know a lot of people already have opinions on who is going to win all the legal challenges, and that is just starting to happen right now.
Angelica: Yeah, from what I've seen in a lot of discussion, it's a co-creation platform of sorts, where you have to know what to say in order to get it to be the right outcome. So if you say, “I want an underwater scene that has mermaids floating and gold neon coral,” it'll generate certain types of visuals based off of that, but it may not be the visuals you want.
Then that's where it gets nitpicky into styles and references. That's where the artists come into play, where it's a Dali or Picasso version of an underwater scene. We've even seen prompts that use Unreal...
Angelica: ...as a way to describe artistic styles. Generative AI could create things from a basic prompt. But there's a back and forth, kinda like you were describing with a director and illustrator, in order to know exactly what outcomes to have and using the right words and key terms and fine tuning to get the desired outcome.
Sam: Definitely, and I think this is a very specific question to this generation of models. They are designed to work with text to image. There's a lot of reasons for why they are this way. A lot of this research is built on the backs of transformers, which were initially language generation models. If you talk to any sort of artist, the idea that you're creating art by typing is very counterintuitive to what they spent years learning and training to do. You know, artists create images by drawing or painting or manipulating creative software and its way more gestural interface. And I think that as technology evolves–and definitely how we want to start building more and more of these technologies to make it more engineered with the artist in mind–I think we're gonna see more of these image interfaces.
And Stable Diffusion has that, you can draw sort of an MS paint type image and then say, "Alright, now I want this to be an image of a landscape, but in the style of a specific artist." So then it's not just writing text and waiting for the output to come in, I'm drawing into it too. So we're both working more collaboratively. But I think also in the future, you might find algorithms that are way more in tune with specific artists. Like the person who's making it, how they like to make art. I think this problem's gonna be less of a question in the future. At one point, all of these things will be in your Photoshop or your creative software, and at that point, we don't even think about it as AI anymore. It's just a tool that's in Photoshop that we use. They already have neural filters in Photoshop–the Content Aware fill. No one really thinks about these questions when they're already using them. It's just this area we are right now where it's posing a lot of questions.
Angelica: Yeah. The most interesting executions of technology have been when it fades into the background. Or to your point, we don't necessarily say, "Oh, that's AI", or "Yep, that's AR". That's a classic one too. We just know it from the utility it provides us. And like Google Translate, for example, that could be associated with AR if you use the camera and it actually overlays the text in front. But the majority of people aren't thinking, oh, this is Google Translate using AR. We don't think about it like that. We're just like, "Oh, okay, cool. This is helping me out here."
Sam: Yeah, just think about all the students that are applying to art school this year and they're going into their undergrad art degree and by next year it's gonna be easier to use all this technology. And I think their understanding of it is gonna be very different than our understanding of people who never had this technology when we were in undergrad. You know, it's changing very quickly. It's changing how people work very rapidly too.
Angelica: Right. Another question came relating to copyright usage, which you touched on a little bit, and that's something that's an evolving conversation already in the courts, or even out of court–or if you're looking in the terms and conditions of Midjourney and DALL-E and Stable Diffusion.
Sam: When you download the model from Hugging Face, you have to agree to certain Terms and Conditions. I think it's basically a legal stop gap for them.
Sam: If I use these, am I going to get sued? You want to talk to a copyright lawyer or attorney, but I don't think they know the answer just yet either. What I will say is that many of the companies that create these algorithms–your OpenAIs, your Google's, your NVIDIAs–a lot of these companies also have large lobbying teams and they're going to try to push the law in a way that doesn't get them sued. Now, you might see that in the near future because these companies can throw so much money at the legal issue that by, in virtue of protecting themselves, they protect all the people who use their software. The way I like to talk about it is, and maybe I'm dating myself, but if you think about all the way to the early 2000's with Napster and file sharing, it didn't work out so well for the artists. And that technology has completely changed their industry and how they make money. Artists do not make money off of selling records anymore because anyone can get them for free. They make money now primarily through merchandise and touring. Perhaps something like that is going to happen.
Angelica: Yeah. When you brought up Napster, that reminded me of a sidetrack story where I got Napster and it was legitimate at that time, but every time I was like, "Oh yeah, I have this song on Napster." They were like, "Mmmm?" They're giving me a side eye because of where Napster came from and the illegal downloading. It's like, "No, it's legit. I swear I just got a gift card."
Sam: [laughter] Well, yeah, many of us now listen to all of our music on Spotify. That evolved in a way where they are paying artists in a specific way that sometimes is very predatory and something like that could happen to artists in these models. It doesn't look like history provides good examples where the artists win or come out on top. So again, something to think about if you are one of these artists. How do I prepare for this? How do I deal with it? At the end of the day, people are still gonna want your top fantasy illustrator to work on their project, but maybe people that aren't as famous, maybe those people are going to suffer a bit more.
Angelica: Right. There's also been a discussion on: can artists be exempted from being a part of prompts? For example, there was a really long Twitter thread, we'll link it in the show notes, but it was pretty much discussing how there was a lot of art that was being generated using her name in the prompt, and it looked very similar to what she would create. Should she get a commission because it used her name and her style to be able to generate that? Those are the questions there. Or if they're able to get exempt, does that also prevent the type of creative output Generative AI is able to create? Because now it's not an open forum anymore where you can use any artist. And now we're gonna see a lot of Picasso uses because that one hasn't been exempted. Or more indie artists aren't represented because they don't want to be.
Sam: I don't think the companies creating these exemptions are really going to work. One of my favorite things about artificial intelligence is that it's one of the most advanced high tech technologies that's ever existed, and it's also one of the most open. So it's going to work on their platforms because they can control it, but it's an extremely open technology. All these companies are putting some of their most stellar code and train models. There's DreamBooth now where you can basically take Stable Diffusion and then fine tune it on specific artists using a hundred or less images or so.
Even if a company does create these exemptions, you can't create images on Midjourney or DALL-E 2 in the style of Yoshitaka Amano or something like that, it wouldn't be so hard for somebody to just download all the free train models, train it on Yoshitaka Amano images, and then create art like that. The barrier to entry to do these things isn't high enough that this is a solution for that.
Angelica: Yeah, the mainstream platforms could help to get exempt, but if someone was to train their own model, then they could still do that.
Sam: It's starting to become kind of a wild west, and I can understand why certain artists are angry and nervous. It's just...it's something that's happening and if you wanna stop it, how do we stop it? It has to come from a very concerted legal sort of idea. Bunch of people getting together saying, "We need to do this now and this is how we want it to work." But can they do that faster than corporations can lobby to say, "No, we can do this." You know, it's very hard for small groups of artists to combat corporations that basically run all of our technologies.
It's an interesting thing. I don't know what the answer is. We should probably talk to a lawyer about it.
Angelica: Yeah. There's other technologies that have a similar conundrum as well. It's hard with emerging tech to control these things, especially when it is so open and anyone's able to contribute in either a good or a bad way.
Sam: Yeah, a hundred percent.
Angelica: That actually leads to our last question. It's not really a question, more of a statement. They mentioned that Generative AI seems like it's growing so fast and that it will get outta control soon. From my perspective, it's already starting to because of just the rapid iteration that's happening within this short period of time.
Sam: Even for us, we were spending time engineering these tools, creating these projects that use these and we'll be halfway through it and there's all these new technologies that might be better to use. Yeah, it does give a little bit of anxiety like, "Am I using the right one? What's it going to take to change technology right now?" Do you wait for the technology to advance, to become cheaper?
If you think about a company like Midjourney spending all this investment money on creating this platform, because theoretically only you can make this and it's very hard for other companies to recreate your business. But then six months later, Stable Diffusion comes out. It's open source, anyone can download it. And then two months later somebody open sources a full on scalable web platform. It's just that sort of thing where it evolves so fast. And how do you make business decisions about it? It's changing month to month at this point. Whereas before, it was changing every year or so, but now it's too fast. It does seem like it is starting to, again, become that singularity light type technology. Who’s to say that it's going to continue like that? It's just so hard to predict the future with this stuff. It's more what can I do right now and is it going to save me money or time? If not, don't do it. If yes, then do it.
Angelica: Yeah. The types of technologies that get the most excitement are the ones that get different types of people more mobilized that then makes the technology advance a lot faster. It just feels like towards the beginning of the summer, we were hearing like, "Oh, DALL-E 2, yay! Awesome." And then it seemed like it went exponentially fast from there based on a lot of the momentum. There was probably a lot of stuff behind the scenes that made it feel exponential. Would you say that it was because of a lot of interest that brought a lot of people into the same topic at one point? Or do you feel like it might have always been coming to this point?
Sam: Yeah, I think so. Whenever you start to see technology that is really starting to deliver on its promise, I think, again, a lot of people become interested in it. The big thing about Stable Diffusion was that it was able to use a different type of model to compress the actual training size of the images, which then allowed it to train faster and then be able to be trained and executed on a single GPU. That type of thing is how a lot of this stuff goes. There's generally one big company that creates the, "We figured out how to do this." And then all these other companies and groups and researchers say, "Alright, now we know how to do this. How do we do it cheaper, faster, with less data, and more powerful?" And any time there's something that comes out like that, people start spending a lot of time and money on it.
DALL-E was this thing that I like to say really demonstrated creative arithmetic. When you say, I want you to draw me a Pikachu sitting on a goat. And not only does it know what Pikachu and a goat looks like, but it understands that in order for us to believe that it's sitting on it, and you have to have it sitting in a very specific space. Pikachu's legs are on either side of it.
The idea that a machine can do that, something so similar to the way humans think, got a lot of people extremely excited. And at the time it was just, I think at the time it was like, 256 pixels by 256. But now we are doing 2048 by 24... whatever size you want. And that's only two years later. So yeah, a lot of excitement, obviously.
I think it is one of those technologies that really gets people excited because it is starting to deliver on the promise of AI. Just like self-driving cars–AI doing protein folding–you're starting to see more and more examples of what it could be and how exciting and how beneficial it can be.
Angelica: Awesome! Well, we've covered quite a bit, lots of great info here. Thanks again, Sam, for coming on the show.
Sam: Yeah, thanks for having me.
Angelica: Thanks everyone for listening to the Scrap The Manual podcast!
If you like what you hear, please subscribe and share! You can find us on Spotify, Apple Podcasts and wherever you get your podcasts. If you want to suggest topics, segment ideas, or general feedback, feel free to email us at email@example.com. If you want to partner with Labs.Monks, feel free to reach out to us at that same email address. Until next time!
Make our digital heart beat faster
Get our newsletter with inspiration on the latest trends, projects and much more.
Thank you for signing up!
Head over to your email for more.