Spacecraft:
Brains,
GANs,
Finnegans
with Metahaven and Riccardo Petrini
We stage three tightly linked case studies to illuminate the deep analogies between latent spaces that appear natural/biological and artificial/synthetic, peeking into the preconceptual boundaries between biology, technology, and culture and into how these concepts, once realized, ultimately lead to one another.
● Speech-producing brains paralleled with speech-generating generative adversarial networks (GANs). Drawing on developmental phonology, we show that a speech‑generating Featural InfoWaveGAN (fiwGAN) trained on English words mimics, at least as some stages of learning, child language acquisition and that the latent space reflections of speech sounds closely mimic brain responses to those same sounds. We demonstrate that fiwGAN learns by informative imitation, a motivation in the inner networks of GANs to convey the necessary information with linguistically useful distinctions, and that fiwGAN performs imagitation, a blend of imitative and imaginary production that exceeds mere replication of the training set and results in spontaneous evolution of simple syntax.
● Finnegans Wake as latent cartography of language. Joyce’s polyglot novel is read as a deliberately engineered linguistic map of latent space. Its nonce words and associative syntax mirror the continuous geometry by which neural models store semantic relations in vectors from the sound to the symbol. This part explores the boundary where signal becomes noise and meaning destabilizes, arguing that Joyce portrays the prenarrative stage of language, thus revealing its interiority.
● Finnegans Wake’s speech-generating model, FinneGAN. We present a fiwGAN trained on Finnegans Wake audio, which generates speech enabled chiefly by Joyce’s idiom in this novel. Automatic transcriptions oscillate between comprehensible English, Irish, and glossolalia, foregrounding the limits of comprehensibility and interpretability of the pre-speech phase. By probing the space of the novel further through its audio, we treat FinneGAN as a language-forming model with interpretative higher entropy than the novel it was trained on, shoving the language further into the pre-narration phase. Exploring FinneGAN’s intermediate layers between the most abstract layer and the actual language makes the latent space interpretably navigable.
The thick cursor hovers over the dimensions of the GAN model’s latent space, which are laid out like a flat surface. The model’s interior is thus explored as a physical site rather than as the mathematical abstraction that it also is.
A latent space is not narratable, and we are trying to make it navigable.
All latent spaces are precategorical. We compare biological and technological latent spaces, assuming that artificial neural networks imperfectly mimic actual biological networks. Latent spaces require neural computations across categories.
Across these isomorphs, we demonstrate the unprecedented, cross‑modal exploration of language and thought, navigating this new airspace through linguistics, neuroscience, machine learning, and literary writing. We hypothesize that language is only the final layer that linearizes a complex epiphenomenally symbolic latent space. We offer concrete examples (design, literature, internal interpretability) to access this prelinguistic space and argue that these examples are simultaneously utilitarian and aesthetic. We also argue that architecture is the right metaphor for this interior: an armature whose parts can be named, tested, and explored.
The exploration of the preverbal world in latent spaces enables us to argue that the seeming abstraction of the symbolic layer is often physical, all the way to the deepest layers of the brain and of artificial neural networks. Architecture as physical thus serves as the interior legibility of an otherwise unobservable and unimaginable space.
With the invention of deep learning and the rapid expansion of its computational capabilities, we are entering into a new kind of machinic space: latent space. Latent spaces, also known as latent feature spaces, vector spaces, or embedding spaces, represent the transition from the physical world to the abstract spaces of symbolic-like representations. Latent spaces can be laid out as an architectural map, which in turn makes it possible to physically explore abstract concepts that emerge in both human and nonhuman intelligence.
Bernard Stiegler, in Acting Out, talks about flying fish experiencing the intermittent moment when they are able to leave their everyday aquatic milieu to experience air (Figure 1). ● 1 .
The flying fish serve as an allegory for transcending a philosophical boundary: an experience by which we, exceeding our environment, can think outside of the immediate reality by means of extraction and abstraction (noēsis). Latent spaces, as mathematical concepts, unobserved spaces, and architectures, allow precisely this: to examine the transformation from the physical to the abstract. Latent space is not a separate realm but a transitional cycle, transforming matter into numbers in a geometrical space and back into matter. We differ from Stiegler’s metaphor of flying fish by pointing out that abstraction, as the symbolic layer, also involves the physical.
Latent spaces extend the possibility of stepping outside of one’s milieu to reflect on it to machines: What is it like to be abstract? ● 2 Machines use embeddings (which is why latent spaces are sometimes called embedding spaces), learning to assign a set of numbers to concrete sights, words, and sounds, reexpressed as vector points in a multidimensional landscape. Embeddings encode distance patterns, mirroring the data and their meaning—what Buckminster Fuller ([1975] 1997), referring to humans, calls “geometry as thinking.” Latent spaces therefore exteriorize patterns, render them manipulable, and feed the results back into perception, letting novel forms enter culture.
For French philosopher Stiegler, technics is the defining feature of what makes us human. Technics consist of “organized inorganic matter” (Stiegler [1994] 1998, 17) that is relational to us humans; they are coevolved human artifacts in which life exteriorizes itself and through which it remembers, anticipates, and becomes. Do technics expand our philosophical abilities? Have some forms of AI created the flying fish moment? It depends—technology is a pharmakon, both a remedy and a poison, an enlargement and a threat (Stiegler 2010a).
If AI enables us to advance new forms of being, action, and learning, to expand critical debates and extend the landscape of imagination, then yes, we are indeed experiencing a shift in knowledge. If AI accelerates short-term thinking and renders us passive to machine outputs, we are in quite the opposite experience: trapped and forgetting about the air.
Figure 1: Flying fish (Exocoetidae).
Let us stay with the flight analogy for a bit. Flight evolved on our planet independently at least four times (Anderson and Ruxton 2020) and was invented once. It evolved for the first time in insects, then pterosaurs, birds, and bats—as well as flying fish; later, humans built machines to fly. Like flight, latent spaces evolved or were invented multiple times. At first, they were a biological invention driven by the differentiation of cells into the central nervous system, which started processing increasingly complex information. Early neural circuits began to compress sensory information into distributed codes. These implicit codes are physical patterns of spiking activity in the brain, mapped as best guesses about the causes in the world. The brain structure is able to retain information (as memory), process it, and ultimately evolve it (change it, upgrade it, complexify it) and transfer it to another system or technics (another brain, a speech, a book, a machine). We call all these activities learning, and it has been evolutionarily advantageous, regardless of the larger energy demand to sustain the brain.
The second time, latent space was invented—or, one could also argue, discovered—as a computational approximation of human neural processing. In machines, we recreate these functions in silicon—and yet, the abstract geometry of latent space emerges in both biological and technological systems. Latent spaces are thus a novelty only in machines, offering a new space of exploration and a new parallel to already existing biological latent spaces.
Imagine explaining the invention of flight to someone from the nineteenth century. Once humans figured out the physics of flight, we gradually gained access to a number of different options for how to conduct it: by paraglider, airplane, helicopter, zeppelin, balloon, drone, rocket, jet pack, and so forth. Inventing AI is analogous to inventing flight. It opens a new space of possibility, experience, and thinking, even though, like flying, it can be dangerous. We risk new vulnerabilities each time we off-load our cognition into silicon. In this project, we reveal the hidden structures of virtual spaces that make abstraction possible.
Latent spaces are abstract, organized structures of vectors with hundreds or thousands of dimensions, created by training procedures for further operationalization. In latent spaces, textual, visual, or other kinds of data are turned into numbers that make up vectors and the weights that map between them. Data is preserved in fragmented form ● 3 , evoked, and eventually contextualized with each vector activation—similar to a neuron spiking in the brain.
Latent space is a common property of all existing deep neural network models and an intricate component of the images, words, sounds, and other outputs they produce. There is not a single latent space, but each model has its own, which evolves across successive versions. Today, these distinct latent spaces “compete for user attention and for the power to impose their default aesthetics. Controlling a latent space means controlling imagination and the bounds of the visible” (Somaini 2024, 56) ● 4 . Latent spaces are finite, creating novelty built completely on the old, and limited by their training data, ultimately filtering cultural tokens into newly assembled outputs.
Latent spaces emerge from learned structuring. The mathematical form of latent spaces is a learned representation space whose actual layout is not scripted in advance. The algorithm discovers the geometry during the training process, based on the network’s architecture, the optimization procedure, and—crucially—the statistical structure of training data. Once this geometry stabilizes, language can emerge from the latent space in the form of words, syntax, and coherence. A double emergence occurs: the latent space is algorithmically emergent from machine learning, and the cognitive-like capacities such as language are emergent from the latent space.
Our coinage of an architecture of deep learning networks, with a focus on generative adversarial networks (GANs), differs from the broader use of architecture pertaining to the structure of deep learning models. The inner structure (i.e., architecture) of deep learning models corresponds to an instrumentalized idea of design or architecture, whereas by architecture we refer to the interiority of the models, corresponding to an explanatory or revelatory idea of design or architecture.
Latent spaces are architecturally singular and unique emergent structures that are both aesthetic and utilitarian—akin to physical architecture. They shape the world of the model and are highly generative and potential in nature: trained on a particular set of data, they enable new possibilities based on what already exists. We argue that AI makes visible an unobserved variance before it becomes outputs, products, and outsides.
Whether the substrate of these latent operations is silicon or living tissue (brains), the numbers constituting a latent vector are physically instantiated. There is little abstraction in the processing of the world. We have long known that in the brain, for example in the primate motor cortex, each neuron contributes a vector in movement space to the overall pattern of activation (Georgopoulos et al. 1986). The vector sum is decoded analogously to coordinate dimensions in machine learning. The process is not a reification from the abstract to the concrete. Rather, the latent geometry we treat as purely mathematical is literally etched into, and animated by, matter. Model weights are the material geometry of a latent space, and learning is the process by which that geometry is sculpted.
Thus, geometry becomes computation:
● Model weights instantiate the geometry of a latent space by storing the vectors and computing the mappings.
● Numeric representations depend on the conceptual geometry of the space: its architecture, its training, and its relation to the world through data.
● Abstract relations among numbers are inseparable from their embodiments: no vector lives outside the voltage patterns—both in silicon and in biological tissues—that realize it.
Seemingly invisible to humans, placed in unimaginable, opaque, heterogeneous dimensions, latent spaces are porous interiors, permissive to interpretation and mapping. Our intermittent flights to the architecture of latent spaces are enabled by various interpretative approaches: technical and conceptual, analogous and speculative, poetic and narrative.
Let us peek into them through language.
Human language evolved sometime after the split in the evolutionary tree from the last ancestor we have in common with chimpanzees and bonobos. Human language also arises every time a human baby acquires language. And now, an imitation of human language arises in all neural computation trained on language.
With GANs generating speech, language in all its complexity can obtain a new definition: as informative imitation, that is, “learning by imitation and with the production-perception loop from raw spoken language data (ciwGAN [Categorical InfoWave GAN])” (Beguš, Lu, Wang 2023, 3). In a similar process during human language acquisition, the child learns to produce sounds that resemble the sounds it hears in the environment and that carry meaning—the child learns to encode information into those imitative sounds.
When GANs were invented in 2014 (Goodfellow et al. 2014), MIT Technology Review journalist Martin Giles (2018) called their inventor, Ian Goodfellow, “the GANfather: the man who’s given machines the gift of imagination.” Initially known for their aptitude at creating deepfake images, ● 5 GANs indeed appear as extremely imaginative neural network architectures, although they work with mere imitation. When used in speech and language, GANs work with what Beguš terms imagitation—a motley of imitation and imagination. Imagitation conveys that although GANs learn by imitation, this process exceeds iterating identical copies: it transgresses into making up—imagitating—new words. Crucially, GANs are the only deep learning architecture in which the part of the network that generates data never accesses training data directly.
Initially, a Featural InfoWaveGAN (fiwGAN) was given only eight words of the English language, chosen from among the most frequently recorded content words in the TIMIT speech corpus (Garofolo et al. 1993): ask, carry, dark, greasy, like, suit, water, year (Beguš 2021, 315). Overall, for neural computation training scales, the data for speaking GANs was rather scarce: altogether, it consisted of approximately 600 recordings of each of these eight words. After training, GANs produced the first new word: start. The word start was not a part of the training process, and was formed through training words suit and dark. The fiwGAN continued with carrot, dust, watery, and some much less distinguishable noisy speech. The fiwGAN also utters sart, wargi, and greachy.
GANs produce both words and nonce words—i.e., existing words and words that could exist but do not, respectively. Impossible words, on the other hand, are words that cannot function as words since they defy the phonology of the language of the training corpus, such as juaoioi or gfzk. Possible words are words that occur one single time, such as squart or bulldy. They could potentially be words in (in this case) the English language, but humans have not yet invented and used them. GANs are extremely generative in all respects, and also in stretching the space of English phonology.
Contrary to many other neural architecture models, GANs are not autoregressive. They do not predict future values based on past values. Their training is based on inner workings, where one agent has to transmit a message to another agent (as in fiwGAN) and thus find the best way to communicate the necessary information to produce the imitative output, termed informative imitation (Figure 2).
Figure 2: FiwGAN language modeling as informative imitation.
A GAN’s inner structure forces its inner network, called the generator, to produce viable speech through a minimax game with the discriminator and to do so informatively by exchanging messages with a Q-network. In other words, the generator learns to produce speech from a small vector of noise. The minimax game means that the generator needs to maximize the discriminator’s error rate, whereas the discriminator needs to minimize its own error rate. At the same time, the generator learns to encode information (a binary code or a one-hot vector) into speech such that the Q-network will be able to decode the same information from the generator’s outputs (Beguš 2021).
The generator works as a speaker and the discriminator with the Q-network as a perceiver-listener. “Since learning is completely unsupervised, the Generator could in principle encode any information about speech into its latent space, but the requirement to be maximally informative causes it to encode linguistically meaningful properties (both lexical and sublexical information)” (Beguš and Zhou 2022a). FiwGAN, based on InfoGAN, can be formalized as follows (based on Donahue et al. 2019; Chen et al. 2016; Arjovsky et al. 2017; Gulrajani et al. 2017):
FiwGAN [Featural InfoWaveGAN] and ciwGAN [Categorical InfoWave GAN] are, for now, the only deep neural architecture with baked-in production and perception. The difference between the fiwGAN and the ciwGAN model is in the compositionality of their latent space: fiwGAN’s generator takes a binary code (compositional) as its initial vector, while ciwGAN takes a one-hot vector (non-compositional). The generator never sees the actual data: it begins with producing complete noise and gradually molds this into increasingly structured and thus information-rich speech sound. The model learns by producing and perceiving speech.
A loose analogy to phonologist Dr. Higgins “directing” Eliza Doolittle’s speech in George Bernard Shaw’s stage play Pygmalion applies to the discriminator network directing the generator towards a successful output. One major difference in GANs is that both roles are united in a single mode of the artificial neural network (more on this analogy in Beguš 2025, 65–67). Conceptually, on the surface, these neural networks—the discriminator (“Higgins”) and the generator (“Eliza”)—seem to work very much like the Turing test, because they appear to strive to make a computational output sound as human as possible. Under the surface, however, the generator is motivated to learn by imitation and ultimately produce outputs indistinguishable from the initial training data. The discriminator is the only network with access to the real data, and thus the only one who can compare it to the imitations by the generator. Once the discriminator cannot differentiate between synthetic speech and human input, the generator has succeeded in imitating human speech.
To sum up: on a technical level, the generator learns to associate each word with a unique code only because it is forced to generate informative data. The fiwGAN generator follows the general principles of learning in the form of teacher–student training—with the exception that the generator may not see the data, only the discriminator, hence providing the generator with a new space to create new words and their assemblages.
There is an intense, although seemingly subtle, difference in GANs’ deep learning technique: GANs are imitative, not replicative. Autoencoders learn by replication, creating their output by copying their input. Transformers learn by predicting the next word. GANs learn by imitation, where the generator effectively convinces the discriminator that there is no difference between the original, real data and the imitative, fake data. In this second-order learning, imitation is more than a replication: GANs can imitate all training data and add new data that aligns with the initial imitative and generative capacity. Hence, imagitation.
Figure 3: Structure of the Generator Network.
Another significant difference between GANs (implemented as convolutional neural networks, or CNNs) and other models, particularly transformers used for large language models (LLMs), is that the former do not work with tokenization in the internal layers. Discrete units, such as textual words, need to be tokenized for neural computation. Sound and speech, however, are not discrete data, meaning that GANs’ end-to-end sound model has analogues to tokenization only in the final layer of about a hundred variables. Learning from continuous raw audio is the primary mode in language acquisition, whereas text is a derivative, an analysis or theory of language. Language as speech is raw audio and data, and gets to abstract representation through layers. The continuous, unsupervised, and imitative aspects of GANs might reflect the human brain more closely than LLMs, which work with discrete units turned into continuous data by the virtue of tokenization and back into discrete units. This presents a radically different setting for language production.
This internal approach results in an exterior difference, with LLMs having a polished and glib exterior and sometimes producing overfitted outputs. GANs, on the other hand, are noisy and unfinished.
Noisiness is a strong characteristic of speech GANs, both literally, in the old-school gritty sound they produce, and technically, in their inner neural layers of latent space. As a result of this inherent noisiness, the outputs of GANs often retain ambiguity. We cannot always rely on our ears to confirm what GANs “say,” and have to help ourselves with visual renditions of their outputs, such as spectrograms. ● 6
FiwGAN allows us to study how language is acquired in neural computation. In the process of training, the generator implicitly learns the mapping of the latent space that is required to succeed. The generator is, as it were, forced to trace the contours of an object or space it cannot see directly. “Success” for the generator is acquiring enough English phonology to produce new words and nonce words merely on the basis of eight English words, hearing each word pronounced differently approximately 600 times. With the self-set goal of imitating informatively, the two networks inside the GAN are motivated to encode information into the gradually growing awareness of what a GAN output end goal should be like. Imagitation, that is, the making of new words not in the training data, thus seems to stem from the generator’s probe of the latent space.
Linguistics has long operated with the concept of underlying representations. It was hypothesized that the surface language is not a representation stored in the lexicon. For example, the English plural suffix is stored as /-z/ in the underlying representation, and gets variably realized as [-s] in cats, [-z] in dogs or [-iz] in kisses, depending on the context in which it surfaces. Latent space exploration is useful for accessing the underlying representations of language in a fully connectionist model, such as fiwGAN. In fiwGAN, the underlying representation can be probed by extreme value interpolation (Beguš 2020). During training, all latent space variables in the final layers are limited to the interval (-1, 1). A single latent variable in the latent space can be extended to extreme value during generation (e.g. -15 or 15), which elucidates its underlying value. Beguš (2020) and Beguš, Leban, Gero (2023) show that with this technique, we can causally extract what each variable in the latent space represents—the so-called monosemanticity. The final layer in the latent space of fiwGAN approximates the epiphenomenally symbolic nature of language. The layers between the final output and these deepest layers of the latent space are thus located on a continuum from the symbolic to the physical.
Listening to the GAN is like encountering an unobserved, untapped realm of potential in language that feels, culturally, strangely familiar. Why is this?
First, there is a seemingly straight analogy between the fiwGAN’s imitative setup and children learning language by imitating the words they hear around them. Children babble. Babbling could be seen as cognitive noise full of coarsely rendered proto-words. More interestingly, a further analogy between fiwGAN and children suggests that the mistakes—the imperfections at mere imitation and the nonce words thus generated—are productive, necessary, and mimic the behavior of language-acquiring children (Beguš 2020). Children practice so-called speech play. That is, in acquiring language competence, vocabulary, and expressive gusto, children sometimes intentionally deviate from the rules and regulations of proper language, permeating their speech with nonce words, novel words, and illogicisms. Speech play occurs as children “work out, experiment, exercise, and define the properties of their languages, cultures, and societies, and especially the intersections and relations among them” (Sherzer and Webster 2015, 3). These probes can take on an appearance of philosophical depth. Consider, for instance, when Ukrainian-Brazilian author Clarice Lispector’s young son Pedro exclaimed, to his mother’s reported amazement, that “The word ‘word’ is ex-possible!” (Moser 2009, 182). Children have no access to formalism in speech play. They are getting all their information in the wild, from their environment, and use their own devices and processes to put together and turn away words, associating and dissociating.
Aspects of children’s speech play may be seen as roughly analogous to a GAN’s evocative mixture of imitation and imagination, or imagitation. Yet, while a GAN “imagitates” to make itself better at its foundational task—imitation—human subjectivity can derive distinct aesthetic and intellectual pleasures from the, let’s say, accidents and mistakes from imagitations. They seem to juxtapose a less edited and smooth interior with the exterior of correctly executed output. In twentieth-century art and literature, the use of speech play may have signaled various things at the same time. One motif is the idea of art for art’s sake; art for the sake of its play. Another one is class—think Doolittle versus Higgins. There also may have been overtones of resistance to dominant and normative rhetorical frames, such as political phraseology or the kind of talk coming from an autocratic government. Cognitively, there may have been ripples of Freud’s unconscious, as amplified by the Surrealists. Yet, to remain focused on the essentials of the GAN, is latent space a viable way of describing that untapped interior of the outer appearance of language?
Exploring the latent space of fiwGAN helps us model the evolution of language from a simple single call unit to a complex, compositional stage. Imagitation is a powerful concept in this process. A model, trained only on single words, spontaneously started to concatenate words (Beguš, Lu, Wang 2023). For example, a model that is trained on isolated words such as box, suit, greasy, water, under etc. starts generating outputs with two or three words combined such as suit greasy or box under water. The concatenation shows aspects of compositionality (Beguš, Lu, Wang 2023). Exploration of the latent space provided the mechanistic explanation for concatenation: negative values of latent space result in concatenated words. We can thus trace and simulate in silico one of the most important developments in the evolution of human language, the development from holistic single-call utterances to semi-compositional syntax. The latent space thus holds a key for finding connectionist, neural underpinnings of symbolic-like operations such as compositional concatenation which is the precursor to human syntax and language.
The exploration suggests that inhibition of inhibitory neurons, or disinhibition, can give rise to semi-compositional concatenated outputs, a necessary step in the evolution and acquisition of language. Negative values when combined with negative learned weights (inhibition of inhibition or disinhibition) thus yields concatenated outputs. Disinhibition is a biologically plausible process in human neuroscience. One of the most intriguing symbolic operations, Merge in Chomsky’s terms, thus receives a connectionist basis. The concatenated signal is the condition for the complexity of language. It is the first leap that leads to the other end of the complexity spectrum: Finnegans Wake and further to FinneGAN.
Preexisting and newly made analogies to latent spaces will be explored in the following in a series of isomorphs of the latent space in speaking GANs.
● The first isomorphs are biological neural networks of the human brain that originated language (from a million years ago to no later than 135,000 years ago (Miyagawa et al. 2025)).
● The second isomorph is a well-known literary narration, James Joyce’s Finnegans Wake (1939).
● The third isomorph is an artificial neural network of GANs, trained on Finnegans Wake exclusively, named FinneGAN (2025).
GANs begin in a unique latent space of pure noise and work as a realistic model of human speech, especially looking at language acquisition: they go from noise to babble to actual phonetic words and finally to multiple words, modeling the human acquisition of syntax (Beguš, Lu, et al. 2023). This progression is not a consequence of a human-specific mechanisms but an emergent property of general cognitive abilities. “The observed spontaneous concatenation is not an idiosyncratic property of a single model, but emerges in several replicated models as well as in models that feature several changes in the architecture” (Beguš, Lu, et al. 2023). There are also no human-specific language mechanisms in the latent space of GANs: no “language organ” in Chomsky’s terms. Informative imagitation alone works as a precursor to compositional syntax.
The concept of latent representation long predates machine learning. From the past century of linguistics, neuroscience, and cognitive science we have learned a lot about how language works in humans. This knowledge can be applied to studying how language works in (computational) machines. For example, we know that children in the process of language acquisition do not apply grammatical rules, as adults do. The same rule-binding happens with phonetics: when a child says pit, the p in this word is aspirated, pʰ. A child applies this phonetic feature as a rule to other p-phonemes when they appear in a similar context, so that the word spit has an aspirated pʰ, making it a spʰit, resulting in an overaspiration. A child eventually learns when p’s are aspirated and when not, abiding to the grammatical rule mastered by adult speakers. GANs make the same mistake when acquiring English.
The learning progression in GANs makes it possible to model human speech and machine-generated speech comparatively and results in findings such as the uncannily similar perception of the same sound in human brains and in artificial neural networks (Beguš, Zhou, and Zhao 2023; see Figure 4). Deep in the brainstem, speech is not perceived as sound but as electrical activity. The sound travels through the air to the eardrum, into the cochlea, where it is translated into neural impulses. From there, it moves through the brainstem as spikes of electricity. The sound is translated into timing patterns, frequencies, and firing rates. What we hear as sound is a final interpretation of early voltage patterns. It appears that the abstraction in the brain has a physical substrate.
Throughout the auditory processing pathway, the brain’s electric activity closely mimics the physicality of sound. In the brainstem, the sound pressure levels directly translate into voltage patterns (Zhao and Kuhl 2018). Even deep in the cortex, the electric activity follows derivatives of the acoustic envelope (Oganian and Chang 2019). What appears to be abstract symbolic processing of human language is in fact physical dimensionality reduction that biological neurons learn to perform. The same process applies in convolutional neural network-based GANs: the brain signal and the GAN signal in intermediate layers are highly similar in raw form, with no further transformations (Beguš, Zhou, and Zhao 2023). Symbolic language, both artificial and biological, might thus be much more physical and less abstract than previously believed. A synthesized stimulus of a syllable ba (from Zhao and Kuhl 2018) even sounds similarly in the brainstem (averaged from across speakers, from Zhao and Kuhl 2018) and the second convolutional layer of the discriminator (data and techniques based on Beguš, Zhou, and Zhao 2023).
Figure 4: Similar internal perception of the same sound in human brains and in artificial neural networks (Beguš, Zhou, and Zhao 2023a).
Famously parabled in J. L. Borges’s (1999) story “On Exactitude in Science,” in which a map is made so accurately that it becomes a double of the territory it describes, a map-and-territory analogy loosely translates to the brain. The brain is both physical (the territory) and symbolic (the map). Brains hold compressed, lossy maps rather than perfect replicas. The brain does not contain a picture-perfect duplicate of the world: When you hear the word apple, you may imagine an internal simulation of a shiny red apple, afforded by the sparkling of your neurons but not conducted with the visual clarity your sight affords.
If language is partly a physical reflection of reality, then its complexity stems also from the complexity of the physical world. For example, in the Nunavut territory in Canada, we find a multiplicity of recursive occurrences of “an island in a lake…on an island in a lake…on an island in a lake,” with Yathkyed Lake being the only known fifth-order instance (Danyluk 2022). The complexity of our language allows us to describe this lake and reflect the complexity of the world. The description of a recursive lake requires recursive language. The physical necessitates the symbolic-like.
Both biological and artificial systems rely on compressed, lossy representations, i.e. they reduce high-dimensional sensory input into manageable internal codes, discarding details. Large language models sit on this continuum: they are neither literal brains nor perfect linguistic mirrors, but pragmatic sketches that expose the structure of language precisely because they leave most of reality out. As Blaise Agüera y Arcas and James Manyika remind us, a (large language) model “is a map, not the actual territory” (2025). It is the latent space of the model that serves as the map (Yee-King 2022, 137). A model’s map cannot be a literal cartographic duplicate of language but is rather an approximate representation—and an instrument for scientific probing. A language model enables linguists to isolate distinct features of language and language acquisition from the wider context so that the elements of a map of the territory, regardless of which element has precedence, become better defined, as long as close relationships between the computational model and the physical substrate of the biological brain are maintained.
Adding embodiment to the model builds another bridge between the physical and the symbolic-like. To add to the differences and resemblances between producing speech in the brain and in GANs, the inner networks of fiwGANs are tied into a game of creating human speech with a completely different vocal apparatus from humans. GANs’ innovativeness in English speech has improved when the networks were given mouth articulators in ciwaGAN [Categorical InfoArticulationWave GAN] (Beguš, Lu, Zhou, et al. 2023). The ciwaGAN has learned, unsupervised, to use the articulators the way humans use their mouth muscles and vocal folds.
Like fiwGAN, the brain confronts a listener with the physicality of the abstract. This confrontation is at its most poignant when the role of noise—represented as an uncompressible set of random numbers—is properly appreciated, transcending its traditional opposition to signal and information. The abstraction of noise is at the same time deeply physical, in the child’s babbling and in the “field radio” sound of the GAN’s incomprehensible intermezzi. The invalid outputs, the invented words, are like a direct bridge between the abstraction of the underlying structures and the physicality of language and its discrete infinity. They show the abstraction because we do not have pictures for these words yet.
Terms like “subconscious,” or “deep learning,” or “embedding” strongly suggest that cognitive structures—say, brains, both organic and synthetic—have depth. In an inversion of the flying fish analogy discussed previously, this depth appears like some kind of oceanic unknown; the surface between air and sea is precisely where the general public’s understanding of the outputs of cognitive systems stops working, and where linguists, cognitive scientists, and mathematicians come in to understand them from the inside. In popular culture, there is already an almost exclusive focus on AI’s useful and less useful above-surface outputs, rather than any serious attempt at understanding its “submarine” inner workings.
Making the latent vector navigable by mapping it onto a two-dimensional plane that we can crisscross is a straightforward way of showing that the embeddings are somewhere, and that there is thus a partially walkable interior rather than something that must remain mysteriously unknown. The key is that we do not represent the latent space as a three-dimensional space, which falsely suggests experiential spatiality to the detriment of many dimensions that are neither represented nor representable in pictures.
Looking at the latent vector as a plane to walk makes the walker or traveler across those vectors a latent spacecraft: a vessel, a craft indeed, taking us along for a ride that procures an intuitive and erratic index of the inside.
In the negative values of the latent space, fiwGAN starts spontaneously concatenating words, despite never being trained on multiple word utterances. The development from single word utterances to concatenated, semi-compositional complex signs is a crucial step in every child’s language acquisition as well as the evolution of language. The interface lets you navigate through the space and generate outputs from the negative areas of the latent space that are concatenated from a model trained for 8,956 steps in Beguš, Lu, and Wang (2023). The GAN’s outputs are transcribed using OpenAI’s Whisper Base speech-to-text model.
A way a lone a last a loved a long the
—
riverrun, past Eve and Adam’s, from swerve of shore to bend of bay, brings us by a commodius vicus of recirculation back to Howth Castle and Environs.
—James Joyce, Finnegans Wake (1939), the ending and beginning sentence
Well, you know or don’t you kennet or haven’t I told you every telling has a taling and that’s the he and the she of it. Look, look, the dusk is growing! My branches lofty are taking root. And my cold cher’s gone ashley. Fieluhr? Filou! What age is at? It saon is late. ’Tis endless now senne eye or erewone last saw Waterhouse’s clogh. They took it asunder, I hurd thum sigh. When will they reassemble it? O, my back, my back, my bach!
—James Joyce reading Finnegans Wake, Chapter 8
The novel is the ultimate human genre, informing the reader of the inner subjectivity of its characters. Yet it has also, at times, served as a way of exceeding the boundaries of the speakable. Exploring and instructing us how to be human, Joyce’s Finnegans Wake tests the limits of the novel with the limits of human language. The work starts not merely in medias res but also mid-sentence. What follows is a sprawling yet sometimes barely understandable story, of which the first page alone features, after riverrun, invented words like passencore, scraggy, penisolate, themselse, gorgios, mumper, avoice, bellowsed, venissoon, kidscad, and many others. Finnegans’ kidscad is fiwGAN’s sart. This seamless passage between valid and invalid words signals the literary and neurocognitive landmark that the novel established.
Finnegans Wake is a literary solitary extreme, a hard-to-follow narration that the reader reassembles by their own intuitive, antisocial comprehension. Its language is a layer preceding the actual language (as text) before it becomes social. Using GAN-like contamination and imagitation, Joyce’s play goes deeper than the cultural knowledge and narrative poetic and epic forms, approaching the preverbal world. Finnegans Wake practices a primordial, childlike operation, where one knows some rules of the game and explores the rest. One does not know every word there is to know—but one can imagitate.
Joyce taps into the blurry representation of our lives (our epics) not through explicit language—“Think about red apple”—but rather by bypassing explicit language. He provides a channel to connect one latent space with another directly, for the minds to touch, without externalizing language. Rather, he offers an internal layer of language’s inner states. His words need to be explained (see McHugh’s Annotations to Finnegans Wake, 2016) ● 7 precisely because they are not the final layer.
To be sure, Joyce’s narration is antisocial in the same way reading a novel has been established as an intimate, individualistic act, a writer-to-reader pipeline for a community of readers. Though in contemporary Ireland, the novel plays a significant social role: its reading is conducted as a social event in Sweny’s Pharmacy in Dublin: in house, on social media, and via streaming services (Sweny’s, n.d.; see Figure 5).
Figure 5: Finnegans Wake reading in Sweny’s Pharmacy in Dublin.
Similarly, reading AI-generated texts or listening to AI-generated speech is conducted as yet another solitary extreme: an output created for the user in a form of algorithmic standardization. A collective reading is an act of care against the singular sense that algorithms provide.
A deeper antisociality is at stake in Finnegans Wake and AI, for that matter: in the novel and in language models, the construction of the meaning does not occur through a social, syntagmatic link between the words. The audial and textual patterns of words do not correspond to the socially established concepts in language. They are idiomatic only in a technical sense—in language itself—but disregard its milieu, its sociotechnical environment. With language at the limits of the speakable, Finnegans Wake embodies Stiegler’s idiotext (2010b). The reader of Finnegans Wake needs to employ a multilingual approach—containing over fifty languages (McHugh 2016 xxviii–xxx)—to a localized narrative, perhaps as a means of transcending its locale, stitching together discrepant cultures, and as a means of destabilizing language as a complete system. This meaning-making process relates to speech play, by which (not just) children test the boundaries of language in the social world, potentially continuing and challenging what is effectively requested—namely, imitation.
Imitation is a property of generalizing and extrapolating underlying principles, and imagitation pushes those regularities into novel combinations and innovations.
As Jasmin B. Frelih writes in his forthcoming essay “About Life,” Finnegans Wake is not sampling our human experience and reality but rather
"the sensory apparatus itself . . . attempting to bridge language and meaning in a more organic way, but in discarding the social dimension of language (oblivious of Wittgenstein’s Sprachspiel) it samples reality perhaps in a more perfect, yet to us, steeped in social not aural language, less intelligible way. It is sampling the reality of language itself, not of reality through the medium of language." (Frelih 2026)
Finnegans Wake overwhelms the reader, yet trains their attention. Frelih continues with the idea that a perfect correspondence between Finnegans Wake and its reader is possible: “To train your sensory apparatus to hear the meaning of Finnegans Wake would perhaps unlock a superpower: you would then be able to read the meaning of things inscribed in nature.” What about a machinic reader?
Language can be generated in myriad ways—a language model’s next-token prediction, Joyce’s reality-of-language streams, a fiwGANs’ imagitation—without guaranteeing the presence of a conscious self behind it. Yet the illusion of meaning in all cases can be quite strong. Is it the recipient who unlocks the meaning or is it the language itself? Or is Joyce, rather, inadvertently laying out a version of latent space here by ostentatiously interweaving words for which no meaning is established with words for which there is?
Testing the boundary between language and nonlanguage more broadly, literature has probed the boundaries of the speakable. In poetry, the whole is greater than the sum of its parts, rendered through the condensed form. Poetry is the foundational form of exploring meaning in an unknown structure, not as the highest form of language but as its most powerful expression of basic form. It seems not entirely coincidental that many contemporaneous literary experiments similar to Joyce’s happened in poetry, for it is a space more licensed for experimentation and pushing the boundaries of the speakable.
“One great part of every human existence is passed in a state which cannot be rendered sensible by the use of wideawake language, cutanddry grammar and goahead plot,” Joyce wrote in a 1926 letter to his publisher. Joyce remained faithful to a form of writing that is vocal, differing from grammatical or autocorrected text rendered by machine use. Joyce did not use a typewriter but rather pen and paper, by which his work seemed future-proofed against systems that would predict passenger for passencore, themselves for themselse, venison for venissoon, and so on. In this sense, Finnegans Wake seems to pre-resist the normative push of next-token prediction. In the short evolution of machine writing, we have moved from typesetting our words on paper (in an edition of one), to a word-processing machine in which the same writing could always be changed, over to a computational system that predicts what we are about to write, to end up with a machine that does the writing for us, all in less than two hundred years.
Joyce’s spelling resists compression. Compression is an essential characteristic of artificial neural network models while literature is narratively and poetically incompressible. Blaise Agüera y Arcas writes in his book What Is Life? (2025) that this particular book, as a file, can be zipped (i.e. compressed to a zip file) to 35 percent of its contents, which is rather typical of English texts. Compression exploits statistical redundancy in the text, and not aesthetic novelty. Agüera compares this number to Joyce’s Ulysses, which is less compressible and therefore yields a higher number when zipped: 40 percent. Ulysses lands at lower compressibility simply because Joyce is a very innovative writer (Agüera y Arcas 2025, 128). The compression percentage for Finnegans Wake is 46 percent (641,901 bytes out of 1,394,805 bytes), a result of Joyce’s multilingual play of highly compressible patterns and poorly compressible, near-random character sequences (the nonce words and phrases). The result is noticeably less compressible than Ulysses, but still more compressible than pure, random data because recurring motifs, echoes, and portmanteaux give a compression algorithm something to latch onto. Close to half of Finnegans Wake is estimated to be composed of nonce words, considered broadly as invented puns, hapax legomena—lacking most semantic context since they are only used once—and words drawn from languages other than English (Sandulescu et al. 2015). In “Quantifying Joyce’s Finnegans Wake,” Sandulescu et al. (2015, 70) conclude that “clearly even extraordinary texts, where the writer tries to deviate from the standard, follow some subconscious laws,” which the authors call “language laws,” and which they equate with physical laws. Finnegans Wake is held together by the laws of narration that are at the same time violated.
Like GANs, Finnegans Wake confronts the listener with the limits of narration.
Inevitably, scholars have asked the question of whether and if Joyce was influenced by the cutting-edge science of his day. Among them was Clive Hart, a Joycean thinker who—coincidentally to this paper—also authored The Prehistory of Flight (1985). Causal links between art and science are, for the most part, notoriously hard to prove. For example, we cannot prove that the hummingbird influenced flying fish or the Airbus A380, but they all try to engage in various kinds of flight. Bearing with this difficulty, in Structure and Motif in Finnegans Wake, Clive Hart (1962, 65) engages in a pattern-level analysis of Joyce’s book, arguing that “a certain random element of unpredictability was necessary to Finnegans Wake if it was adequately to reflect the new world of physics of which Joyce was trying to build up a faithful verbal analog.” Though any clear answer about Joyce’s engagement with science seems hard to give, it would matter, as we’re trying to be more precise about the inadvertent aspect of a latent space showing up in Finnegans Wake. Was it really inadvertent? Did Joyce write entirely on artistic intuition, or was there some direct form of influence from science? Such a question, speculative as it is, could belong in a larger inquiry about the relationship between art and science in the 1920s and 1930s, a period which saw the onset of quantum mechanics disrupt the world of classical physics. At the same time, the avant-garde literature of the period unsettled the rock-solid principles by which descriptions of reality were to be drawn up. Both art and physics claimed new, sometimes foundational abstractions, steering away from the demand that we must stick to what makes sense according to our everyday eyes and ears or what seems plausible in terms of linguistic cohesion.
Figure 6: A fresco of flying fish by Maja Dolores Šubic, 2025. The fresco’s tufa base is reminiscent of latent spaces with its porous interior. Frescos represent an allegory to our argument as the intermediary between external art in the final layer and internal architecture with porous structure that supports the final layer.
If novelists and poets, in so doing, sought out the outer edges of the speakable by inventing new words and uprooting the technologies of meaning-making, both developments collapsed when quantum mechanics “landed” in everyday language, especially when physicists got worried about this. In Speakable and Unspeakable in Quantum Mechanics, theoretical physicist John Stewart Bell writes:
"The physicists who first came upon such [quantum] phenomena found them so bizarre that they despaired of describing them in terms of ordinary concepts like space and time, position and velocity. The founding fathers of quantum theory decided that no concepts could possibly be found which could permit direct description of the quantum world." (Bell [1987] 2004, 170)
So great was the shock, writes Bell, that it was “as if our friends could not find words to tell us about the very strange places where they went on holiday” ([1987] 2004, 171).
Philosopher of physics David Z. Albert continues this thinking. In the “Physics and Narrative” chapter of After Physics (2015, 109), he engages with the “narratability” of Newtonian versus quantum-mechanical worlds. More specifically, Albert proposes to “call a world narratable if the entirety of what there is to say about it can be presented as a single story, if the entirety of what there is to say about it can be presented as a single temporal sequence of instantaneous global physical situations.” “Physics and Narrative” argues that, upon close analysis, some quantum-mechanical “worlds” defy narratability thus defined. In other words, the physical realities of these non-Newtonian worlds aren’t convertible into a story form. Following Albert, Metahaven speaks of “non-narratable worlds” (2026, particularly in the chapters “Poets and Particles” and “Non-Narratable Worlds”). Attempts by literature and poetry to change what a story is, what its components are, how it makes sense, and what it sounds like seem to matter even more now.
Indeed, this exploration of nonnarratability can be extrapolated to latent spaces. While science is all about making foundational observations, knowledge, ideas, and principles narratable, latent spaces are radically invisible and unimaginable to humans; thus, at a minimum, they are challenging the notion of narratability. What Joyce’s literary experiment, the opacity of quantum description, and the unruly output of GANs have in common is not merely their resistance to narration but also the fact that each relies on a hidden structural logic, an interior armature, a hidden architecture that must exist for the visible work to stand at all. When exploring that interior, the outside seems to matter less and less. The “outside” of microscopic quantum mechanics is the macroscopic world of everyday experience. The “exterior” of disorderly human language is how we order pizza, lead a panel, give a speech, or write an essay. The “outside” of generative AI is a creative writing essay authored by ChatGPT. Latent spaces appear as an omnipotent, but nonnarratable interior, with the language as the final layer that attempts to externalize itself but often fails to do so due to its inherent limitations. What matters, though, is the interior. Making the interior navigable and thus narratable. Making it architecture.
In attempting to make the latent space more legible, and thus narratable, than it currently is, what direction should we choose? Latent spaces are bound up with aesthetic discomfort. The GAN’s raspy voice utters noisy and nonsense words; Joyce parades barely understandable material in his Finnegans latent space printout; a babbling child, endearing as she is, leaves us best-guessing for meaning and pragmatic results. This element of discomfort must be maintained in attempts to make the latent space legible, as it contains not merely what is unobserved but in some sense also what is unwanted. A latent space visualized as too smooth or classically beautiful would not do justice to its inherent challenges to the aesthetic norms, which are, on closer inspection, merely challenges to what is frequently seen or heard.
The discomfort of the latent space can be discussed in architectural terms as a tension between outside (exterior) and inside (interior). The exterior is the public-facing side of a (built) structure, whereas its interior is both what is contained in that outside and in part what upholds it. We can, similarly, propose that the “exterior” of a language, read as a list of words, contains only those words that are valid and known in a conversation between speakers of that language, whereas its “interior,” or latent space, would include expressions that are not recognized yet, not listed in a dictionary, not carrying any meaning yet. A fiwGAN’s capacity to build new words out of a small set of initial data effortlessly constructs a latent space. Is there a comparable architecture for this that would allow us to think of visualization?
For literary critic Kojin Karatani, “architecture” can be applied in a wide range of contexts, as both a “metaphor” and a “system where various formalizations take place” (Isozaki 1995, vii). In the past decades, the proliferation of architecture as a way of thinking has been linked to a perceived “architectural crisis,” as architect Arata Isozaki (1995) elaborates in his introduction to Karatani’s study Architecture as Metaphor. For our study, architecture matters less as the “will to construct” and more as the intent to locate and then explain, prioritizing bottom-up readings of the latent space as an interior.
A 1992 essay by American architect Greg Lynn provides an exceptional philosophical and architectural reading of the interior. It is about the Statue of Liberty. A neoclassical sculpture gifted to the United States by the people of France in 1886, the Statue of Liberty’s exterior was sculpted by Frédéric Auguste Bartholdi, while its interior framework was designed by Gustave Eiffel, allowing us to perceive marked differences between inside and outside. Lynn, in quite a liberating move, was not concerned with the outside of the statue at all. Under the influence of French poststructural philosophy and the Roman architect Vitruvius, he scanned the interior armature of the statue, built to uphold it and also to never be seen, to never be public-facing. Lynn notes how Vitruvius “expressed remorse that, like a living body, the breast of architecture could never be opened to reveal the secrets of its interior” (1998, 35; see also Lynn 1992, 32–49). In a weird way, that’s similar to how we experience AI today. We see results, but no interior process.
Figure 7: Interior armature of the Statue of Liberty, after renovation in the 1980s. Source: National Institute of Standards and Technology (NIST) / Library of Congress.
To uphold the Statue of Liberty as a building but also as a “meaning-making” device, a hidden armature supports it (Figure 7). It is, in comparison to the outside, almost unreadable. As an inside armature, it was not supposed to be seen, let alone read. By reading it nonetheless, Lynn forced the armature to make architectural sense. He found it compliant not only with Bataille, Derrida, and Deleuze and Guattari, but also with the grids that underpinned Palladio’s Renaissance villas. In a similar way, we can interpret the unobserved space of the GAN as an architecture, as a scaffolding where the abstract reflects the physical. It is a way to renegotiate, albeit marginally, the narratability of the inside through a latent spacecraft. A latent spacecraft walks, meanders, and traces the latent vector, scuba diving its first few dimensions.
Right, we inhabit a locked hole, but can we use it?
—Whisper transcription of FinneGAN’s output (2025)
Figure 8: Latent space for FinneGANs output: Right, we inhabit a locked hole, but can we use it? The figure shows 100 numbers as values that generate this specific utterance.
While many of Joyce’s words are not intelligible, they are computable as terms. Both words and nonce words are valid in the latent space of this novel as well as in the latent space of speaking GANs.
We provided a new opportunity for the novel to speak in a machinic mode through the imagitation of GANs. We took the world of the novel—its locale, protagonists, styles, traditions, operations, and historical and theoretical discourses—and leveled it into a lexical spectrum used to probe its latent space. We trained a fiwGAN on Finnegans Wake and called it FinneGAN. FinneGAN was trained on ElevenLabs’ text-to-speech tool (eleven_flash_v2_5) in four-second chunks, covering the entirety of the novel. The fiwGAN model contained 16 code variables in the latent space, a learning rate of 10⁻⁵, six convolutional layers and 65,536 outputs that correspond to approximately 4 s of audio at 16 kHz sampling rate (Figure 8). ● 8 The model is not general but mimetic of the novel, its outputs produced by imagitation rather than mere mimicry.
Where does the noise of language become a signal, and the signal an expression? A sound structure, a physical pattern signifies an abstraction in order to transfer information and meaning of the text to the reader—although Joyce excludes the reader from complete, satisfactory communication.
Finnegans Wake charts a cognitive or emotional “latent space,” weaving associative leaps that language ordinarily keeps tidy. While Joyce seemingly exceeds into nonsense, his nonsensical words—such as the famous riverrun, Avelaval, and mememormee—are not disruptive to the extent that cohesion cannot be maintained. Extending the limits of language, there appears to be a kind of high-level legibility to the text that dissolves when looking at it from word to word. The compression and the abstraction, like in latent spaces, work together toward building the architectural space for the text.
Language holds Finnegans Wake together with a seeming lack of cohesion, but there is enough coherence for the text to stand narratively. This is contrary to the imitation (imagitation?) of generated narration, which Hannes Bajohr diagnoses with surface narration. Bajohr (2024, 2) notes that for current AI-generated narration, “the hope that coherence will arise naturally through cohesion runs parallel to the idea that semantics can be conjured up solely through syntax.” So, cohesion exists without deep coherence, at least for now.
With language now developing in silicon and gaining substance outside the human subject, we attach a re-rendering of Stanisław Lem’s pseudoscientific evaluation of linguistic evolution, in which Lem had a system called “GOLEM” represent a superintelligent computer taking over the development of language.
Figure 9: This poorly known graph presents the evolutionary development of language from Stanisław Lem’s collection Imaginary Magnitude. In this collection, the graphic is located in a pseudoacademic dictionary, preceding the more famous narrative of supercomputer GOLEM XIV. Source: Lem (1981).
In some strange analog to this diagram, recent findings suggest that GANs are encoding novel information into their silences in a manner that we do not understand. ● 9 In other words, researchers noticed informative clues encoded in the silences between words that were not a part of the data and were thus invented by the GANs’ inner networks to exchange information more effectively. We have yet to learn what it means.
Like other GANs, FinneGAN is confronting you with noise.
FinneGAN is a GAN model trained on Finnegans Wake. When you enter the interface, the model is initiated and generates new sentences from a binary code. Every initiation results in a unique generation. The generation is illustrated from the most abstract layer to the final pre-linguistic output. The interface stages this process as an ascent and descent through a vertical stack of volumetric clouds, each mapped to a specific internal convolutional layer of the GAN. The stack is organised by density: as you move through the lower, denser clouds, the generation remains abstract and dominated by noise. As you move upward, cloud density decreases and the sound becomes increasingly speech-like. At the top of the stack, corresponding to the final layer of the GAN, the output reaches its least noisy state and is transcribed using OpenAI’s Whisper Tiny speech-to-text model. Each time you begin a new descent from the top of the stack, a new generation is initiated.
Latent spaces, thought of as unobserved, and potentially nonnarratable, lend themselves to being regarded as an interior of sorts, an interior of which maps can be made. In an architectural sense, they are “hidden” from view by the exterior facades: the stuff of our everyday experience, or, in language, the valid expressions of our everyday verbal communications. They come to matter more once exteriors—outcomes of generative AI—dominate public discourse.
To enact Finnegans’ and fiwGANs transition between the preverbal, prenarrated latent space towards the final layer, which is language, we train a fiwGAN model on excerpts of the entire Finnegans Wake novel. The technique to explore the latent space developed in Beguš and Zhou (2022b) allows us to transcend between the fully abstract noise that acquires meaning during training towards the increasingly physical speech output. Yet, the final language stage is never fully achieved, neither in FinneGAN nor in Finnegans.
The technique that allows us to explore the latent space both scientifically and aesthetically applies a simple averaging operation in the time domain across all intermediate outputs (called feature maps) in each convolutional layer. Averaging artificial neural activity across feature maps is conceptually similar to brain imaging, where the captured electric activity often represents summed or averaged physical activity of neurons. The first vector, the compositional latent code, represents the most abstract, symbolic-like layer. It is sampled randomly, pointing to the noise origins of neural activity that leads to language. Each next layer approximates speech more closely.
In Finnegans Wake, Joyce inadvertently created a narration of the latent space. By inventing his own idiom, composed of a series of nonce (and nonsense) words, Joyce formulated language as speech in a preformed state. An inadvertently narrated latent space, like Finnegans Wake, is like the readout of an unobserved interior or like the final layers in the latent space. Here, it is as if the “reader” of that architecture, instead of trying to make sense of what it represents outwardly, traces this inner space stone by stone, wall by wall, dimension by dimension, and tells the listener what it is like, stumbling on the many unintelligible elements that it stores and that are absent from the normative space of valid terms.
Arguably, whilst Finnegans Wake seems to show language in a preformed, protoconceptual state, Joyce knew perfectly well what “formed” language looked like, but he diverted from this intentionally. The way in which he does so feels related to his rejection of the typewriter—a machine for mechanized writing in which each character is represented by a discrete, operable key—and his embrace of handwriting. The embrace of language is pre-discrete.
We are used to considering language in its deterministic state, where every word can be a discrete single word. The room for interpretation is scarce. In a more nondeterministic conception of language—in GANs, in small children, in Finnegans Wake—not every word is discrete or understandable. Like a child develops from a stage in which she utters single words to a stage of endless sentences, so does FinneGAN learn to generate imagitated outputs based on its training on Finnegans Wake.
In FinneGAN, the information entropy of the original work is reduced to a noisier, less information-specific version of Finnegans Wake, which suggests that the actual novel is lower in entropy than the GAN. Below is a single FinneGAN speech output transcribed into text with different OpenAI transcribers, Whisper and GPT-4o Transcribe:
Power of Motsunoshi Station Lettuce Wait a ti-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i-i
—Whisper transcription of FinneGAN’s output (2025)
Sheoladh muicseanaithe saighsean leisins, maide talún orthu.
[A group of pig farmers were sent to the fields, with a stick of earth on them.]
—GPT-4o Transcribe transcription of the same FinneGAN’s output (2025), followed by translation from Irish into English
Whisper, OpenAI’s machine learning model for speech recognition and transcription, serves as the transcriber of FinneGAN. Whisper appears as the meaning-maker, the adult deciphering the language in its preformed stage into what it should or could be. In many ways, Whisper makes FinneGAN’s utterances intelligible, conforming to the novel as a social event. This sociality becomes evident when Whisper translates them into modern-speak, enabled by the digital sphere. Whisper orders the entropy. FinneGAN, on the other hand, steps further into the novel’s latent space.
We are mostly seeing end products, final outcomes, exteriors, in that sense. We are—deliberately—not seeing what is going on inside. Interpreting the latent space as a cartography, or as an architecture, changes that. This interpretation shifts attention from polished interfaces and fluent outputs to the hidden structures that make those outputs possible and structured in the first place. In doing so, it links what are often treated as separate domains—artificial networks, brains, and novels—through a common concern with how an interior can be explored, modeled, and partially made legible without ever being fully visible or even comprehensible.
Thinking of these interiors as architectures rather than as black boxes matters for at least two reasons. First, it foregrounds the materiality of latent space that can be probed, perturbed, and partially mapped. Exploration of the latent space suggests that the abstract is in fact physical, and that the purely symbolic level does not exist. Second, it reframes interpretability and criticism as spatial practices, in which latent spacecraft conceptually navigates a space that cannot be seen directly, but that can be indexed through extreme interpolation, concatenated outputs, nonce words, internal layers, and the grain of machinic and literary noise.
This perspective also reshapes how we think about AI cultures. The analogues we trace across brains, GANs, and Finnegans Wake demonstrate that the design, inhabitation, and contestation of these interiors will shape what kinds of speech and narrative can emerge from them. Humanities are co-cartographers of AI’s interior, bringing literary, philosophical, and architectural tools into spaces that engineering alone cannot fully describe or design.
Finally, latent spaces complicate narratability itself. Quantum worlds, avant-garde novels, and GANs production of speech expose realities that resist being told as a coherent story. Latent spaces are structurally nonnarratable but remain expandable, particularly at the point where language is in the process of becoming.
●Anderson, S.C. and Ruxton, G.D. 2020. “The evolution of flight in bats: a novel hypothesis.” Mammal Review 50: 426-439. https://doi.org/10.1111/mam.12211
●Agüera y Arcas, Blaise. 2025. What Is Life? Evolution as Computation. MIT Press.
●Agüera y Arcas, Blaise, and James Manyika. 2025. “AI Is Evolving—And Changing Our Understanding of Intelligence.” Noema Magazine, April 8. https://www.noemamag.com/ai-is-evolving-and-changing-our-understanding-of-intelligence/.
●Albert, David Z. 2015. “Physics and Narrative.” In After Physics. Harvard University Press.
●Arjovsky, Martin, Soumith Chintala, and Léon Bottou. 2017. “Wasserstein Generative Adversarial Networks.” Edited by D. Precup, and Y. W. Teh, Proceedings of the 34th international conference on machine learning (pp. 214–223). http://proceedings.mlr.press/v70/arjovsky17a.html.
●Bajohr, Hannes. 2024. “Cohesion Without Coherence: Artificial Intelligence and Narrative Form.” TRANSIT 14 (2): 1–8. https://doi.org/10.5070/T714264658.
●Beguš, Gašper. 2020. “Generative Adversarial Phonology: Modeling Unsupervised Phonetic and Phonological Learning with Neural Networks.” Frontiers in Artificial Intelligence 3: 44. https://doi.org/10.3389/frai.2020.00044.
●Beguš, Gašper. 2021. “CiwGAN and fiwGAN: Encoding Information in Acoustic Data to Model Lexical Learning with Generative Adversarial Networks.” Neural Networks 139: 305–25. https://doi.org/10.1016/j.neunet.2021.03.017.
●Beguš, Gašper, Thomas Lu, and Zili Wang. 2023. “Basic Syntax from Speech: Spontaneous Concatenation in Unsupervised Deep Neural Networks.” Preprint, arXiv, last modified September 14, 2023. https://doi.org/10.48550/arXiv.2305.01626.
●Beguš, Gašper, Thomas Lu, Alan Zhou, Peter Wu, Gopala K. Anumanchipalli. “CiwaGAN: Articulatory information exchange.” Preprint, arXiv, February 2024. https://doi.org/10.48550/arXiv.2309.07861
●Beguš, Gašper, Andrej Leban, and Shane Gero. 2023. “Approaching an Unknown Communication System by Latent Space Exploration and Causal Inference.” Preprint, arXiv, last modified February 6, 2024. https://doi.org/10.48550/arXiv.2303.10931.
●Beguš, Gašper, and Alan Zhou. 2022a. “Modeling Speech Recognition and Synthesis Simultaneously: Encoding and Decoding Lexical and Sublexical Semantic Information into Speech with No Direct Access to Speech Data.” In Proceedings of Interspeech 2022. International Speech Communication Association. https://doi.org/10.21437/Interspeech.2022-11219.
●Beguš, Gašper, and Alan Zhou. 2022b. “Interpreting Intermediate Convolutional Layers of Generative CNNs Trained on Waveforms.” IEEE/ACM Transactions on Audio, Speech, and Language Processing 30: 3214-3229.
●Beguš, Gašper, Alan Zhou, Peter Wu, and Gopala K. Anumanchipalli. 2023. “Articulation GAN: Unsupervised Modeling of Articulatory Learning.” In ICASSP 2023—IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE. https://doi.org/10.1109/ICASSP49357.2023.10096800.
●Beguš, Gašper, Alan Zhou, and T. Christina Zhao. 2023. “Encoding of Speech in Convolutional Layers and the Brain Stem Based on Language Experience.” Scientific Reports 13: 6480. https://doi.org/10.1038/s41598-023-33384-9.
●Beguš, Nina. 2025. Artificial Humanities: A Fictional Perspective on Language in AI. University of Michigan Press. https://doi.org/10.3998/mpub.12778936.
●Bell, John S. (1987) 2004. Speakable and Unspeakable in Quantum Mechanics. 2nd ed. Cambridge University Press.
●Borges, Jorge Luis. 1999. “On Exactitude in Science.” In Collected Fictions. Translated by Andrew Hurley. Penguin.
●Chen, Xi, Yan Duan, Rein Houthooft, et al. 2016. “InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets.”In Advances in Neural Information Processing Systems (NIPS) 29. https://doi.org/10.48550/arXiv.1606.03657.
●Danyluk, Adam. 2022. Canada is home to the world’s most ‘recursive’ island: Yathkyed Lake, Nunavut. CurioCity. https://curiocity.com/canada-most-recursive-island-yathkyed-lake/.
●Donahue, C., McAuley, J. J., & Puckette, M. S. 2019. “Adversarial audio synthesis.” In 7th International Conference on Learning Representations (pp. 1–16). OpenReview.net. URL: https://openreview.net/forum?id=ByMVTsR5KQ.
●Frelih, Jasmin B. 2026. “About Life.” In First Encounters with AI: Writers on Writing, edited by Nina Beguš. University of Michigan Press. https://doi.org/10.3998/mpub.13316989
●Fuller, R. Buckminster. (1975) 1997. Synergetics: Explorations in the Geometry of Thinking. Digitized facsimile. Internet Archive, uploaded February 8, 2021. Estate of R. Buckminster Fuller. https://archive.org/details/buckminster-fuller-synergetics-explorations-in-the-geometry-of-thinking.
●Garofolo, John S., Lori F. Lamel, William M. Fisher, et al. 1993. TIMIT Acoustic-Phonetic Continuous Speech Corpus (LDC93S1). Linguistic Data Consortium. https://doi.org/10.35111/17gk-bn40.
●Georgopoulos, Apostolos P., Andrew B. Schwartz, and Ronald E. Kettner. 1986. “Neuronal Population Coding of Movement Direction.” Science 233 (4771): 1416–19. https://doi.org/10.1126/science.3749885.
●Giles, Martin. 2018. “The GANfather: The Man Who’s Given Machines the Gift of Imagination.” MIT Technology Review, February 21. https://www.technologyreview.com/2018/02/21/145289/the-ganfather-the-man-whos-given-machines-the-gift-of-imagination/.
●Goodfellow, Ian, Jean Pouget-Abadie, Mehdi Mirza, et al. 2014. “Generative Adversarial Nets.” In Proceedings of the 27th International Conference on Neural Information Processing Systems (NIPS) 2014. https://doi.org/10.5555/2969033.2969125.
●Gulrajani, Ishaan, Faruk Ahmed, Martin Arjovsky, Vincent Dumoulin, and Aaron Courville. 2017. “Improved Training of Wasserstein GANs.” In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS) 2017. https://doi.org/10.5555/3295222.3295327.
●Hart, Clive. 1962. Structure and Motif in Finnegans Wake. Northwestern University Press.
●Isozaki, Arata. 1995. “Introduction.” In Kojin Karatani, Architecture as Metaphor: Language, Number, Money. MIT Press.
●Joyce, James. 1926. “Letter to Harriet Shaw Weaver, 24 November 1926.” In Selected Letters of James Joyce, edited by Richard Ellmann. Faber and Faber.
●Joyce, James. (1939) 1999. Finnegans Wake. Penguin Classics.
●Joyce, James. (1939) 2018. Finnegans Wake. Faded Page eBook #20180126. First posted January 15, 2018; last updated June 4, 2024. https://www.fadedpage.com/showbook.php?pid=20180126.
●Lem, Stanisław. 1981. Imaginary Magnitude. Translated by Marc E. Heine. Harvest Books.
●Lynn, Greg. 1992. “Multiplicitous and Inorganic Bodies.” Assemblage 19: 32–49. https://doi.org/10.2307/3171175.
●Lynn, Greg. 1998. Folds, Bodies & Blobs: Collected Essays. La Lettre Volée.
●Manovich, Lev, and Emanuele Arielli. 2024. Generative AI, Art, and Visual Media. Online manuscript. https://manovich.net/index.php/projects/artificial-aesthetics.
●McHugh, Roland. 2016. Annotations to Finnegans Wake. Johns Hopkins University Press.
●Metahaven. 2026. The Hard Question of Art: Cognitive Futures. Verso.
●Miyagawa, Shigeru, Rob DeSalle, Vitor Augusto Nóbrega, et al. 2025. “Linguistic capacity was present in the Homo sapiens population 135 thousand years ago.” Frontiers in Psychology 16: 1503900. https://doi.org/10.3389/fpsyg.2025.1503900.
●Moser, Benjamin. 2009. “The Third Experience.” In Why This World: A Biography of Clarice Lispector. Haus Publishing.
●Offert, Fabian. 2021. “Latent Deep Space: Generative Adversarial Networks (GANs) in the Sciences.” Media+Environment 3 (2): 29905. https://doi.org/10.1525/001c.29905.
●Oganian, Yulia, and Edward F. Chang. 2019. “A Speech Envelope Landmark for Syllable Encoding in Human Superior Temporal Gyrus.” Science Advances 5 (11): 6279. https://doi.org/10.1126/sciadv.aay6279.
●Sandulescu, C. George, Lidia Vianu, Ioan-Iovitz Popescu, et al. 2015. “Quantifying Finnegans Wake.” Glottometrics 30: 45–72. https://glottometrics.iqla.org/wp-content/uploads/2021/06/g30zeit.pdf.
●Šegedin, Bruno Ferenc, and Gašper Beguš. 2025. “Exploring the Encoding of Linguistic Representations in the Fully-Connected Layer of Generative CNNs for Speech.” Preprint, arXiv, January 13. https://doi.org/10.48550/arXiv.2501.07726.
●Sherzer, Joel, and Anthony K. Webster. 2015. “Speech Play, Verbal Art, and Linguistic Anthropology.” In Oxford Handbook Topics in Linguistics. Oxford University Press. https://doi.org/10.1093/oxfordhb/9780199935345.013.33.
●Somaini, Antonio. 2024. “Le visible et l’énonçable. L’IA et les nouveaux liens algorithmiques entre images et mots.” Nouvelle Revue d’esthétique 33 (1): 47–58. https://doi.org/10.3917/nre.033.0047.
●Stiegler, Bernard. 1998. Technics and Time, 1: The Fault of Epimetheus. Translated by Richard Beardsworth and George Collins. Stanford University Press.
●Stiegler, Bernard. 2009. Acting Out. Translated by David Barison. Stanford University Press.
●Stiegler, Bernard. 2010a. For a New Critique of Political Economy. Translated by Daniel Ross. Polity Press.
●Stiegler, Bernard. 2010b. “Le concept d’‘idiotexte’: esquisses.” Intellectica 53–54: 51–66. https://doi.org/10.3406/intel.2010.1178.
●Sweny’s. n.d. “Reading Groups.” Accessed October 30, 2025. https://www.sweny.ie/reading-groups.
●Yee-King, Matthew. 2022. “Latent Spaces: A Creative Approach.” In The Language of Creative AI: Practices, Aesthetics, and Structures, edited by Craig Vear and Fabrizio Poltroneiri. Springer. https://doi.org/10.1007/978-3-031-10960-7_8.
●Zhao, Christina, and Patricia Kuhl. 2018. “Linguistic Effect on Speech Perception Observed at the Brainstem.” Proceedings of the National Academy of Sciences 115: 8716–8721. https://doi.org/10.1073/pnas.1800186115.
We would like to thank Thomas Lu for his help with training the models, Christina Zhao for permission to use her data, Aliona Ciobanu for collaboration in the early design phase, and Maja Šubic for the permission to use her fresco.