Kittydar: Detecting Edges in the World

From the Series: Digital Ontology

Photo by Gerd Altmann.

How do we see ontological transformations? Take the case of kittydar, a small demonstration of machine-learning techniques in the area of computer vision. According to Heather Arthur, the project’s developer, “Kittydar is short for kitty radar. Kittydar processes an image (canvas) and calculates the locations of all the cats in the image.” This playful piece of software demonstrates the detection of edges in the mundane, not to say banal, cosmos of cat photos on the Internet. Given that cats matter in this cosmos, kittydar generates ontological statements. Arthur explains:

Kittydar first chops the image up into many “windows” to test for the presence of a cat head. For each window, kittydar first extracts more tractable data from the image’s data. Namely, it computes the Histogram of Orient Gradients descriptor of the image . . . This data describes the directions of the edges in the image (where the image changes from light to dark and vice versa) and what strength they are. This data is a vector of numbers that is then fed into a neural network which gives a number from 0 to 1 on how likely the histogram data represents a cat.

The neural network (the JSON of which is located in this repo) has been pretrained with thousands of photos of cat heads and their histograms, as well as thousands of noncats. See the repo for the node training scripts.

This toy example of machine learning finds cat heads in photographs, but we can easily locate similar techniques in use in self-driving cars, border security systems, military robots, gesture recognition systems, online advertising, supply chain logistics, as well as truly legion scientific applications. In these places, road signs, immigrants, the enemy, players, customers, and transactions function analogously to cats. Their popularization is striking and perhaps attests to an ontological gradient, a curvature that shifts and reshapes beings. Kittydar relies on a neural network, a typical machine learning technique that has been heavily developed by researchers at Google, themselves working on images of cats, among other things taken from YouTube videos (BBC 2012).

Kittydar test image.

Several vectors orient the propagation of such techniques, whose results can be seen above:

  1. Imitation. Kittydar is alluring because it imitates something people do in the visual field. In this case, the visual form of cats, massively abundant and proliferating in contemporary visual culture, are already ritualized attachment memes which kittydar reiterates and orders in its own ways.
  2. Edging. Kittydar and its ilk can only really group beings according to visible edges. As the description provided by Heather Arthur suggests, the software finds cats by chopping up the images into smaller windows. For each window, it measures a set of gradients, running from light and dark in order, to find edges, and then compares the direction of these edges to the edges found in known cat images (the training set). The intuition here is that edges are associated with the determination of beings.
  3. Generalization. Kittydar runs in JavaScript in a web browser. This banal execution attests to a flat and wide dispersal of techniques that were once the preserve of major research projects. Techniques developed in science—cognitive science, in the case of neural networks (Rumelhart, Hinton, and Williams 1986)—have spread into industry, media, health, and commerce. What we see in kittydar is no eccentric plaything, but a symptom of a generalizing process.

Imitation, edges, and generalization together configure a system of transformations that may well have some ontological significance. The imitative vector follows existing patterns whose proliferation and profusion has created problems of scale: what to do with all the cat images on the Internet? One answer is to build machines that classify them so that they can be ordered, searched, or retrieved from amidst all the other images. The edge and shape-locating classification reconfigure the visible as a field of gradients whose variations can be traversed and marked as differences in the world. The errors attending this cosmological operation (for instance, kittydar identifies some human faces as cats) might amuse us, but they also engender renewed efforts to optimize the classificatory performance. Error is no obstacle, but like delinquency in Michel Foucault’s (1977) penal institutions, presents something absorbingly generative in its synthetic possibilities. Efforts to optimize errors also tend to generalize the techniques themselves. Each time a limit is addressed, each time a more complicated set of edges can be detected (cats side-on, for instance), the algorithm also lends itself to further generalization.

Such devices perhaps also diagram an ontological gradient, a surface whose variable curvature reshapes things—objects, subjects, fields, institutions—moving across it, by reframing the production of categorical statements. The ontological gradient of the probabilistic, whether in the neural net or support vector machine, densely weaves elements drawn from infrastructures, sciences, institutional practices (such as learning and training), and the plural transients of popular culture and media. Amidst very powerful vectors of normalization (the identification of cat faces does not, in itself, suggest any radical origination and may depotentialize alternative futures), what chance is there that different coherences, configurations of beings, and new edges might be articulated and become visible in the world? Could we learn from the difficulties that kittydar has in organizing its cosmos in order to negotiate that ontological gradient? To do so, we would need to imitate what kittydar does, finding new edges in the world and generalizing them in thought.

References

BBC. 2012. “Google Computer Works Out How to Spot Cats.” June 26.

Foucault, Michel. 1977. Discipline and Punish: The Birth of the Prison. Translated by Alan Sheridan. New York: Vintage.

Rumelhart, David E., Geoffrey E. Hinton, and Ronald J. Williams. 1986. “Learning Representations by Back-Propagating Errors.” Nature 323, no. 6088: 533–36.