diff options
author | msglm <msglm@techchud.xyz> | 2023-01-14 05:31:48 -0600 |
---|---|---|
committer | msglm <msglm@techchud.xyz> | 2023-01-14 05:31:48 -0600 |
commit | 9d53d8857eaa1c9405894a88ca75bc4657e42f35 (patch) | |
tree | eb1efc1d028b949dd83bb710c68be8eff58f26e7 /LaTeX/ResearchPaper/ResearchPaper.tex | |
download | school-code-master.tar.gz school-code-master.tar.bz2 school-code-master.zip |
Diffstat (limited to 'LaTeX/ResearchPaper/ResearchPaper.tex')
-rw-r--r-- | LaTeX/ResearchPaper/ResearchPaper.tex | 264 |
1 files changed, 264 insertions, 0 deletions
diff --git a/LaTeX/ResearchPaper/ResearchPaper.tex b/LaTeX/ResearchPaper/ResearchPaper.tex new file mode 100644 index 0000000..67aa726 --- /dev/null +++ b/LaTeX/ResearchPaper/ResearchPaper.tex @@ -0,0 +1,264 @@ +\documentclass[stu]{apa7} +\usepackage[style=apa,backend=biber]{biblatex} +\usepackage{blindtext} +%for URL embedding +\usepackage{animate} +\usepackage{hyperref} +\hypersetup{ + colorlinks=true, + linkcolor=black, + urlcolor=blue, + pdfborderstyle={/S/U/W 1}, + citecolor=black +} + +\addbibresource{Biblio.bib} + +\usepackage{caption} + +\begin{document} +\title{Machine Learning Generative Adversal Networks For Comic Book Art Synthesis on Consumer Grade Hardware} +\course{ENGL-2053} +\professor{Anonymous Anonymous} +\duedate{December 2nd, 2021} +\authornote{This work is licensed under the \href{https://creativecommons.org/licenses/by-sa/4.0/}{CC-BY-SA 4.0} with a willingness to sell exceptions.} +\author{msglm} +\authorsaffiliations{Anonymized College Name} +\maketitle +\tableofcontents +\pagebreak +\listoffigures +\listoftables + +\pagebreak +\section{Abstract} +Machine learning algorithms can be used for generation for art. One Implementation of this is called a Generative Adversal Network (GAN). These GANs have been simplified and optimized for usage on lower grade hardware, including consumer hardware. Through various tweaks and optimizations to a lightweight implementation of a GAN a fully trained GAN capable of art generation can be created. To test the extent and capabilities of this, comic book pages are supplied to the GAN to use as training material. At various stages of testing, images are analyzed for their quality and closeness to the input images. +\clearpage +\section{Introduction} +Machine Leaning is a process where machines take in data about a task to be able to replicate or otherwise make inferences about new data that's related to the input data. For example, A machine learning algorithm could be trained to identify different flower breeds. For this use case, a machine will learn to produce comic book pictures by using a Generative Adversal Network. +\subsection{How AI Art Generation works} +Generative Adversal Networks (Also known as GANs) are a type of machine learning algorithm used for the creation of images similar to input data with machines. These networks work by training two algorithms at the same time, one that generates and one that discriminates. The Generative algorithms goal is to fool the Discriminator algorithm into thinking what it generates is apart of the input data set. The Discriminator algorithm's goal is to correctly identify what is a real part of the data set and what is not. \autocite{GANs} +\subsubsection{How Generation is Measured} +For the purposes of this paper, it will be assumed that 100\% completion is 150000 iterations (the standard amount that the program automatically sets) \autocite{lightweight-gan}. 1 iteration, on average, takes 5 seconds to complete. This means that the estimated completion time would be 750000 seconds or 208 hours (rounded down). This paper will use "percentage done" as a measurement of both time and iterations. A table for translation from percentages to iterations and time has been provided below: +\pagebreak +\def \tenpercentiteration{15000} +\def \tenpercenttimeinseconds{75000} +\begin{center} + \captionof{table}{Generation Measurement} + \resizebox{0.8\pdfpagewidth}{!}{ + \begin{tabular}{ |c|c|c|c| } + \hline + Percentage Done & Iterations Made & Real Time Taken \\ + \hline + 0.00000666666666666667\% & 1 & 5 sec \\ + \hline + 1\% & 1500 & 2.083 hrs \\ + \hline + 10\% & \the\numexpr \tenpercentiteration & \the\numexpr ((\tenpercenttimeinseconds)/60)/60 hrs \\ + \hline + 20\% & \the\numexpr \tenpercentiteration*2 & \the\numexpr ((\tenpercenttimeinseconds*2)/60)/60 hrs \\ + \hline + 30\% & \the\numexpr \tenpercentiteration*3 & \the\numexpr ((\tenpercenttimeinseconds*3)/60)/60 hrs \\ + \hline + 40\% & \the\numexpr \tenpercentiteration*4 & \the\numexpr ((\tenpercenttimeinseconds*4)/60)/60 hrs \\ + \hline + 50\% & \the\numexpr \tenpercentiteration*5 & \the\numexpr ((\tenpercenttimeinseconds*5)/60)/60 hrs \\ + \hline + 60\% & \the\numexpr \tenpercentiteration*6 & \the\numexpr ((\tenpercenttimeinseconds*6)/60)/60 hrs \\ + \hline + 70\% & \the\numexpr \tenpercentiteration*7 & \the\numexpr ((\tenpercenttimeinseconds*7)/60)/60 hrs \\ + \hline + 80\% & \the\numexpr \tenpercentiteration*8 & \the\numexpr ((\tenpercenttimeinseconds*8)/60)/60 hrs \\ + \hline + 90\% & \the\numexpr \tenpercentiteration*9 & \the\numexpr ((\tenpercenttimeinseconds*9)/60)/60 hrs \\ + \hline + 100\% & \the\numexpr \tenpercentiteration*10 & \the\numexpr ((\tenpercenttimeinseconds*10)/60)/60 hrs \\ + \hline + \end{tabular} +} +\end{center} + + +\subsection{Statement of Purpose} +The state of GAN-based art generation has progressed significantly since the first papers on it were published.\autocite{anonymous2021towards} Due to this, the ever looming question of "are we there yet" when it comes to consumer usage of these tools is in question. The purpose of this paper is the explore the capabilities of machine learning algorithms for the purpose of art generation on consumer hardware as to find out if we have reached a point in data science where this sort of usage is feasible. + +The more technical definition of this paper's goal is to see if a lightweight GAN, as defined by the paper \textit{Towards Faster and Stabilized {\{}GAN{\}} Training for High-fidelity Few-shot Image Synthesis}, is capable of producing artwork after 150000 trials. + +\subsection{Audience} +The audience in question is mainly focused on those who enjoy comic books, have consumer grade hardware, or are interested, in some way, with data science influenced art generation. This paper is also a class project so, in specific, my professor, Anonymous Anonymous, is apart of the list of people that this paper targets (if he wasn't already by any of the categories before). + +\subsection{Tools and Tool modifications} +Various tooling was used throughout this project as to obtain the results needed for analysis. +\subsubsection{Input} +The input data consists of 4340 JPEG image files of comic book pages harvested from a publicly accessible collection of DC comics made before the 1960s. The comics are either photographs or were scanned. Most comics are in color. The comics themselves come from various series that include, but aren't limited to, Wonder Woman, Movie Comics, and All Flash. Along with the standard comic books, several advertisement comics are included. +\subsubsection{Machine} +The hardware and software used for this has an influence on the speed, qualities, and capabilities of the GAN to produce images. Due to this, an enumeration of what tools are being used is required: +\clearpage +\begin{center} + \captionof{table}{Tooling Overview} + \resizebox{0.8\pdfpagewidth}{!}{ + \begin{tabular}{ |c|c|p{7cm}|c| } + \hline + Component Type & Tool Abbreviated Name & Technical Notes & Hardware or Software \\ + \hline + Machine Learning Software & lightweight\_gan & "A PyTorch implementation of the paper Towards Faster and Stabilized GAN Training for High-fidelity Few-shot Image Synthesis" \autocite{lightweight-gan} & Software \\ + \hline + Programming Language & Python & Python version 3.9.9 is being used & Software \\ + \hline + Parallel Computing Platform & CUDA & Release V11.3.109 is being used. & Software \\ + \hline + Operating System & Devuan GNU/Linux Ceres (Unstable) & Linux user 5.14.0-4-amd64 \#1 SMP Debian 5.14.16-1 (2021-11-03) x86\_64 GNU/Linux & Software \\ + \hline + CPU & AMD Ryzen 5 1600 & Comes with 12 Threads, 6 cores, and has a base clock speed of 3.2GHz. \autocite{amd-ryzen-5-1600} & Hardware \\ + \hline + GPU & NVIDIA GeForce GTX 1050 Ti & Comes with 768 CUDA Cores with a base clock speed of 1290 MHz. Has 4GB of Virtual Memory. \autocite{geforce-gtx-1050} & Hardware \\ + \hline + \end{tabular} +} +\end{center} + +All hardware is consumer grade and can be purchased from white market or gray market sources. Most software (GNU/Linux, Python, and lightweight\_gan) is free software as defined by the Free Software Foundation \autocite{what-is-free-software} The CUDA toolkit is free of charge, however, comes with a number of licensing restrictions that make it possible for discrimination in downloading, usage, or sharing of the software. All projects that can be installed on a Debian-based operating system give a copy of their license when installed. The only exception is lightweight\_gan, however, this software comes with a LICENSE file in its git repository. + +The machine was also used as if it was a general purpose computer while generation happened. This was twofold in purpose, to allow the hardware to still be useful for casual use and for the purpose of testing if average consumer usage of hardware would negatively impact the performance or stability of the machine learning software. This process was ceased at times of user inactivity to give the system dedicated time with only minimal operations being ran (For example, background downloads and a lightweight desktop environment). Even with this reduction intensity, there were still times where the program would crash due to memory constraints. See the section on Complications for more detail. + +\subsubsection{Parameters} +The program has been modified with a number of parameters for the purpose of optimization, these are listed in the following table: +\begin{center} + \captionof{table}{Parameters Overview} + \resizebox{0.8\pdfpagewidth}{!}{ + \begin{tabular}{ |c|c|p{7cm}|c| } + \hline + Parameter Name & Value & \centering{Description} & In-line Parameter \\ + \hline + Name & DC & Sets the name of the outputted folder and file for images and AI Weights (models) & \texttt{--name DC} \\ + \hline + Batch Size & 8 & Sets the amount of images that are used to influence an output image from the generator algorithm & \texttt{--batch-size 8} \\ + \hline + Image Size & 512 & Sets the size of the images that the program will generate. Image size must be a power of 2 \autocite{lightweight-gan.py}. & \texttt{--image-size 512} \\ + \hline + Discriminator Output Size & 5 & "The Discriminator output size . . . leads to different results on different datasets . . . 5x5 works better for art than for faces, as an example..." \autocite{lightweight-gan}. "Discriminator output dimensions can only be 5x5 or 1x1" \autocite{lightweight-gan.py}. & \texttt{--disc-output-size 5} \\ + \hline + Gradient Accumulation & 4 & Sets the amount of pass-through the AI must go through before it saves its results to disc. & \texttt{--gradient-accumulate-every 4} \\ + \hline + Data Directory & ./data/DC & Sets the location of the generated model files for the AI & \texttt{--data ./data/DC}\\ + \hline + \end{tabular} +} +\end{center} + +Modifications that effect the GAN's manipulation or ability to work with the image were changed as to allow for the program to run on consumer grade hardware. This, in effect, lowered resource consumption, especially of GPU RAM, and allowed the program to work in most cases. If one were to run the lightweight\_gan program without any parameters on the same hardware, the program would likely crash due to memory constraints. + +Parameters such as "Name" and "Data Directory" were purely for categorizations purposes and had negligible to no effect on the speed of the tools. + +\section{Results} +The results, or output, of the machine learning art generator depends heavily on the amount of time that the AI trains. As there exists little way to objectively measure the "quality" of the output images except by the percentage towards completion made by the original developers, this section will subsist of mostly showing of produced images and descriptions of the images. As a general rule, the lower amount of progress that an image has, the lower likelihood that there will be anything decipherable in the image +\subsection{Complications} +Throughout the training process for the images, multiple crashes happened due to a lack of GPU Virtual Memory. This was due to the computer using high amounts of GPU Memory when rendering images, videos, or other graphics taxing processes such as Web Browsers, Video Games, or heavy chatting applications. These crashes did lead to loss of training during unsaved times, however, were not a significant enough in their loss to drastically lose speed. At times the training process had to be disabled for performance issues. In cases when software didn't crash the training process by taking too many resources, the software that did take resources ran quite poorly. This was due to the limited amount of graphical resources at the disposal of other applications. At times the whole system needed to be shutdown for hardware driver updates, security patches, or weather concerns. This impacted speed slightly in the same way that crashes did. +\subsection{Generated Images} +Images are rendered in two ways, 'ema' and 'default' \autocite{lightweight-gan}. 'EMA' uses a "rolling average" for the weights of data, meaning that it takes an average from each part of the training to construct the final model file. This is opposed to the 'default' behavior that only accumulates weights over time and doesn't take into account past data. \autocite{lightweight-gan.py}. + +The pictures that follow are handpicked examples of art generated by the AI: +\clearpage + +%\captionof{figure}{} +%\includegraphics[width=\linewidth]{./Pictures/DC} +%\pagebreak + +\subsection{Compilations} + +\captionof{figure}{A compilation of generated images at the AI's first Pictures in 'regular' mode.} +\includegraphics[width=\linewidth]{./Pictures/DC/0.jpg} +\pagebreak + +\captionof{figure}{A compilation of generated images at 32\% completion in ema mode.} +\includegraphics[width=\linewidth]{./Pictures/DC/48-ema.jpg} +\pagebreak + +\captionof{figure}{A compilation of generated images at 64\% completion in ema mode.} +\includegraphics[width=\linewidth]{./Pictures/DC/96-ema.jpg} +\pagebreak + +\captionof{figure}{A compilation of generated images at 83\% completion in ema mode.} +\includegraphics[width=\linewidth]{./Pictures/DC/125-ema.jpg} +\pagebreak + +\captionof{figure}{A compilation of generated images at 93\% completion in ema mode.} +\includegraphics[width=\linewidth]{./Pictures/DC/137-ema.jpg} +\pagebreak + +\captionof{figure}{A compilation of generated images at 100\% completion in ema mode.} +\includegraphics[width=\linewidth]{./Pictures/DC/150-ema.jpg} +\pagebreak + +\subsection{Hand Picked} + +\captionof{figure}{A generated "comic" that the AI created in regular mode (regular) (83\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-125/generated-11-27-2021_21-15-07-1.jpg} +\pagebreak + +\captionof{figure}{A vaguely dreary backdrop with facial features melded into it (ema) (83\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-125/generated-11-27-2021_18-17-56-1-ema.jpg} +\pagebreak + +\captionof{figure}{An extraordinarily pink comic (ema) (83\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-125/generated-11-27-2021_21-15-07-93-ema} +\pagebreak + +\captionof{figure}{A comic that seems to depict either trees or fire (ema) (92\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-137/generated-11-30-2021_02-51-00-33-ema.jpg} +\pagebreak + +\captionof{figure}{What seems to be the "cover page" of a comic book (ema) (92\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-137/generated-11-30-2021_02-49-12-6-ema.jpg} +\pagebreak + +\captionof{figure}{Another comic book cover page (ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_01-56-50-17-ema.jpg} +\pagebreak + +\captionof{figure}{A yellow comic with very humanoid characters (regular) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_01-43-15-6.jpg} +\pagebreak + +%\captionof{figure}{() (100\%)} +%\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/} +%\pagebreak + +\captionof{figure}{Filled with a high amount of detail that doesn't resemble anything at all (ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_01-56-50-39-ema.jpg} +\pagebreak + +\captionof{figure}{Vaguely defined orbs (ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_01-43-15-1-ema.jpg} +\pagebreak + +\captionof{figure}{More deformed humanoid figures (ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_01-56-50-45-ema.jpg} +\pagebreak + +\captionof{figure}{A blue crystal discussing something on a yellow background (regular) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_02-09-09-146.jpg} +\pagebreak + +\captionof{figure}{What seems to be a part of superman is on the left-middle panel (ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_02-09-09-34-ema.jpg} +\pagebreak + +\captionof{figure}{An attempt at a cover page mixed with text gibberish (ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_02-09-09-43-ema.jpg} +\pagebreak + +\captionof{figure}{A what seems to be Wonder Woman is depicted on the right(ema) (100\%)} +\includegraphics[width=\linewidth]{./Pictures/DC-generated-150/generated-12-01-2021_02-09-09-131-ema.jpg} +\pagebreak + + + + + +\section{Conclusion} + +The technology presented in this paper has quite the capabilities in producing interesting mockeries of the art forms given as input data. However, what it makes is just that, a mockery. The program doesn't seem capable of creation with intent as the machine is merely trying to deceive another program. Due to this, everything created fits most of the definitions of what a comic should be from a visual aspect, however, is missing the much needed direction to make story, humanoid characters, or anything consistent. A human, if presented with this many comics and the ability to learn would come up with something more akin to the input data than what the machine made. The machine's limitations in ability are especially apparent when given the technical restrictions of consumer grade hardware. Although these critiques are the case, the machine is an expert at making general shapes, themes, and motifs. Given simpler input data to copy, it would be reasonable to hypothesize that the machine will be able to create something that looks like a human made it. + +\printbibliography +\end{document} |