Photography Must Be Curated! | Katrina Sluis | Thursday, 23.01.2020

Survival of the Fittest Image

“Anyone can become a photographer… But it’s harder to actually make sense of that: you can’t manually screen, curate and tag millions of images”. Florian Meissner, co-founder of EyeEm
If the contemporary task of the photography curator has been to rescue the photographic image from photographic reproduction, then the task of the computer scientist has been to rescue the photograph from semantic oblivion. Or, as the scholar David Weinberger observes: “When you have ten, twenty, or thirty thousand photos on your computer, storing a photo of Aunt Sally labelled ‘DSC00165.jpg’ is functionally the same as throwing it out, because you’ll never find it again.” 1David Weinberger, Everything Is Miscellaneous. The Power of the New Social Order (London: St Martins Press, 2007), 1. Curating images – caring for them – has become a significant concern for technologists who seek to classify, analyse, interpret, aesthetically evaluate and steward the circulation of image content. The question of what might constitute a successful, good and aesthetically brilliant photograph underpins a growing body of scientific research that proposes a form of ‘algorithmic connoisseurship’ to address the visual glut and secure the relevance and marketability of networked images. Whilst early work in image processing was concerned with narrowing the ‘semantic gap’ between human and machine interpretation of images, a considerable body of work now seeks to address an ‘aesthetics gap’ 2Ritendra Datta, Jia Li and James Z. Wang, “Algorithmic Inferencing of Aesthetics and Emotion in Natural Images: An Exposition”, in: 15th IEEE International Conference on Image Processing (San Diego, CA, 2008), 105–108. Available: https://www.ri.cmu.edu/pub_files/pub4/datta_ritendra_2008_2/datta_ritendra_2008_2.pdf which prevents the machinic classification of the subjective or emotional qualities of photography.
My introduction to the field of computational aesthetics was over a decade ago, when I first encountered the Aesthetic Quality Inference Engine (AQUINE), developed by computer scientists at Penn State University. Its creators Ritendra Datta and James Z. Wang boldly described it as “the first publicly available tool for automatically determining the aesthetic value of an image”. 3Ritendra Datta and James Z. Wang, “ACQUINE: Aesthetic Quality Inference Engine – Real-Time Automatic Rating of Photo Aesthetics”, in: Proceedings of the ACM International Conference on Multimedia Information Retrieval (Philadelphia, PA, 2010), 421–424. Available: http://infolab.stanford.edu/~wangz/project/imsearch/Aesthetics/MIR10/ Algorithmic systems of image evaluation had already been brought into public consciousness when Flickr launched its ‘Interestingness’ algorithm in 2005, which Yahoo later patented. Flickr’s algorithm evaluated likes, comments and other metadata to ensure the most thrilling images were prioritised in its web interface. AQUINE however offered a different service: it enabled visitors to upload their photos to the platform, which were then analysed and given an aesthetic score out of 100. In the first six months of its launch, over 140,000 photos were uploaded; the top ten users uploaded over 400 images each. Since then, the field of aesthetic computing has continued to advance, facilitated by developments in deep learning, alongside greater access to collections of photography which can be repurposed as training data for algorithmic classification.
ACQUINE website, c. 2009. All screenshots taken by the author
What is the purpose of automating ‘aesthetic judgement’? A quick overview of the literature suggests a range of ways to beautify and optimise photographic culture, from the development of smarter cameras, image search, personal album curation, photo ranking and creative recommendation, visual storytelling, automated photo enhancement and image management. These applications reflect a desire to free consumers from the burden of photography, from the moment of capture (which filter should I use?) to the act of sharing (which image should I upload?). However, as a leading scholar in this field recently admitted to me: beyond these industrial outcomes, this area of work is thrilling precisely because it is ostensibly one of the most challenging and unanswerable problems in the field. What makes a photograph good? Is there a canonical or universal basis on which beauty can be measured? Or, in the face of the infinite complexity of human subjectivity, is the answer to be found in an individualised, personal aesthetics?
In their paper introducing AQUINE, its creators lament that “there is no unanimously agreed standard for measuring aesthetic value” in existing cultural scholarship which could be modelled in the machine learning process. 4ibid. Instead, Datta and Wang turned to Photo.net, an online photo community website started in 1997 used by over 400,000 professional and amateur photographers. Photo.net’s community enthusiastically upload and rate each other’s work, awarding each image a score from 1 to 7 in terms of its ‘aesthetic’ and ‘originality’. All that was therefore required was to scrape this information into a dataset and produce a model against which future unknown images could be evaluated and rated. Crucially, aesthetic value could be easily quantified and decomposed into mathematical elements based on formal tropes such as the rule of thirds, saturation, hue and texture. Through this process, ACQUINE became trained in the systems of judgement, preferred subjects and technical approaches of camera club photography, which traditionally seeks legitimation not as art or as social practice, but through the generation of its own creative boundaries sanctioned by its community. Nonetheless, AQUINE’s creators celebrated Photo.net as a source of “peer reviewed” image evaluations from “a relatively diverse group… averaged out over the entire spectrum of amateurs to serious professionals.” 5ibid. Given its origins, it is possibly not surprising that AQUINE caused consternation amongst users by giving more favourable ratings to landscape photography as opposed to photos depicting people.
Photo.net website
The Labour of Curation

As the example of AQUINE demonstrates, aesthetic computing relies heavily on the curation of large collections of photography which have in turn been annotated by humans. Photo.net sits alongside a growing canon of datasets sourced from photographic websites which include DPChallenge, Flickr, Behance and GuruShots. Because the production of datasets for machine learning is notoriously laborious, undervalued and expensive to generate, there has historically been little motivation for researchers trying to scale the hierarchy of academia to invest time in making them. ImageNet, the most significant machine vision dataset, contains 14 million images scraped from the web, and processed and labelled by over 25,000 workers on Amazon’s Mechanical Turk. 6For a discussion of ImageNet, see Nicolas Malevé, “An Introduction to Image Datasets”, in: Unthinking Photography, November 2019. Available: https://unthinking.photography/articles/an-introduction-to-image-datasets; and Nicolas Malevé, “’The cat sits on the bed’. Pedagogies of Vision in Machine and Human Learning”, in: Unthinking Photography, September 2016. Available: https://unthinking.photography/articles/the-cat-sits-on-the-bed-pedagogies-of-vision-in-human-and-machine-learning In the field of aesthetic computing, it is not Turkers but photographic communities who have become unacknowledged sources of otherwise expensive ‘ground truth’ data which help train classification models. Their comments, ratings and likes offer technologists an inexpensive way to harvest aesthetic judgements by a community who can be described as ‘vested critics’ and prolific consumers of photography.
DPchallenge website
But beyond the ethics of datamining photographic playbour, the reliance on these datasets allow researchers to sidestep many troubling concerns, from the brute aesthetic-numerical scale and what values it maps onto (or not); the restricted cultural and demographic profiles of those undertaking the ratings; the problematic relationship of art and aesthetics to photography; to the meaning of categories such as ‘amateur’ and ‘professional’ photography (to name a few). There is a belief that the perplexing messiness of human subjectivity might instead be overcome by scale: with larger datasets containing more complex labels, greater accuracy of such tools might be developed. For this reason, Photo.net was quickly superseded by the DPChallenge dataset in 2009, before being usurped by the Aesthetic Visual Analysis (AVA) dataset in 2012. AVA is the reigning gold standard benchmark of aesthetic evaluation and contains 250,000 images drawn from 963 challenges sourced and extended from DPchallenge; it includes extensive labels relating to semantics, aesthetic scores and photographic ‘styles’ (e.g. motion blur, long exposure, macro, duotones).
Deep Connoisseurship

When the creators of DPChallenge.com founded their website to “teach ourselves to be better photographers by giving each other a ‘challenge’ for the week”, little did they know they – and their community – were also training machines in photographic connoisseurship. Faced with this feedback loop, the question inevitably arises: who trains the trainers? As a keen photographer himself, AQUINE’s James Z. Wang has publicly expressed his desire that the technology will one day overcome the binary classifications of ‘good’ and ‘bad’ in order to help others improve their photography capabilities. Rising to this challenge, researchers at Academia Sinica have released the Photo Critique Captioning Dataset (PCCD) which combines reviews and ratings from a ‘professional critique website’ which they reveal in their footnotes is Gurushots.com. 7Kuang-Yu Chang, Kung-Hung Lu and Chu Song Chen, “Aesthetic Critiques Generation for Photos”, in: IEEE International Conference on Computer Vision (ICCV) (Venice, 2017), 3534–3543. Available: https://ieeexplore.ieee.org/document/8237642 In reality, Gurushots is a platform launched by an Israeli tech startup in 2014 whose mission is to gamify photography. Resembling a camera club on steroids, it offers users the ability to take part in ‘epic challenges’ earning badges, points and privileges to become a photography ‘guru’.
There is therefore a strange situation emerging in which photographers are both valued as a source of aesthetic knowledge and, paradoxically, as a community in need of aesthetic improvement. In their work to classify visual aesthetics in photographic portraiture, Shehroz Khan and Daniel Vogel lament that “casual photographers are not always good at assessing the aesthetic quality of their photographs”. They propose an on-camera algorithmic system which could offer an “instant critique of the photograph” which might motivate the photographer “to re-take the photo to improve its aesthetics, or reconsider whether it is a good candidate for sharing.” 8Shehroz S. Khan and Daniel Vogel, “Evaluating Visual Aesthetics in Photographic Portraiture”, in: Proceedings of the Eighth Annual Symposium on Computational Aesthetics in Graphics, Visualization, and Imaging (June 2012), 55–62. Available: https://dl.acm.org/doi/10.5555/2328888.2328898 In this way, whilst aesthetic datasets emerge from a process in which photographers commit to self-improvement and self-education, the algorithmic systems they help produce are increasingly mobilised in turn to discipline photographers and shape photographic production.
It has recently been argued by scholars such as Tarleton Gillespie that that machine learning is fundamentally conservative; its reliance on training data means it is a tool which reinforces what it already knows, using correlation and pattern recognition in order to generate new knowledge. With respect to corporate machine learning aesthetics, Leo Impett has described this in terms of a “visual-algorithmic hegemony”, in which only those images which best match a distant labelled dataset of ‘good’ images become prioritised, shared and incorporated into new web interfaces. This, he notes, generates a self-perpetuating system which reproduces the statistical bias of low level features identified in the machine learning process as having aesthetic value. This positive feedback loop or algorithmic ‘way of seeing’ is reinforced by search engines, photography apps and camera software which now inform dominant visual hegemonies whilst simultaneously learning from them, and therefore fortifying and reinforcing them.
Demonstration of photo critique captioning by K. Chang, K. Lu and C. Chen in their paper on PCCD dataset
Deep Curation

The present desire to secure and consecrate photography’s aesthetic status is not just limited to cultural institutions or computer scientists, but extends to a range of tech entrepreneurs seeking to disrupt the photo marketplace. In TED talks, Tech Crunch articles and opinion pieces published on Medium, curation is positioned as a ‘cure’ for information overload and marketed as a unique selling point. The creators of photo sharing app EyeEm have stated their founding mission was to “carry over the quality of photography in the age of cameras everywhere” in a period where “a massive flood of images… had completely destroyed the aesthetics of the web”. Whilst Instagram utilises human curators bound by non-disclosure agreements to refine the flow of content to its interface, EyeEm claims their EyeVision algorithm “can do just what photo curators do, but within milliseconds”; with the additional promise of “on-demand curation”. In parallel, mobile app VSCO markets its own AI called Ava who can “look at art like a human” and can match users with images which share the same ‘mood’ or ‘emotions’ on their own timeline. Having been trained on four years of curation data, Ava is apparently unmoved by the fake success metrics of followers and likes, is able to algorithmically identify raw talent and pluck photographers from obscurity. In contrast, the stock photography platform Unsplash rejects the machinic touch, emphasising that a “human curation team” ensures that “a person sees every photo that is coming through.”
Gatekeeping and quality concerns therefore remain crucial for companies whose value is related to the data they stockpile under platform capitalism, even as they celebrate and proclaim their authentic community credentials. In this respect, EyeEm is a particularly fascinating example of how discourses of democratisation and community, education and empowerment, curation and expertise have escaped the museum and become operationalised in computational culture. Founded in 2010, EyeEm built their user base from their Berlin HQ through festivals, workshops, exhibitions, parties, photo contests and partnerships with Getty, Alamy and Apple. They strategically positioned themselves in opposition to the manicured visual trash of rival Instagram, championing photographers and supporting them to improve their skills and secure opportunities. This was followed by a move to effectively monetize the platform in 2014, when EyeEm began to license user content to global brands. A year later, they bought the machine vision startup sight.io and launched EyeEm Vision, a computer vision framework able to rank image aesthetics and recognise concepts built on deep learning. Today EyeVision automatically captions and keywords each photo uploaded by a user, but can also rank them, giving them a score from 0 to 100. Drawing on the example of automotive photography, co-founder Lorenz Achoff proclaims: “We do not want all the best car photos in the world. We want the best. So we are really obsessed with ranking images. We are unique.”
In this way, the white cube of the photo museum and the black box of the algorithm increasingly cannibalise the values of one another, even as both spheres retreat to the comforting rhetoric of expertise and quality. The language of aesthetic modernism now seeps into computation at even the level of the dataset: EyeEm celebrates its deployment of real photo curators, expert photographers to curate its database, which in turn empowers their users and brands to “sculpt with data”. As Appu Shaji, EyeEm’s Head of R&D poetically opines: “A well-curated dataset captures the form and features, and the algorithms are the chisels and hammers that aid in carving out the fine details. The personalized component of our algorithm allows photographers to gain access to the necessary tools to discover, define and share his or her artistic identity, or even taste, with a larger audience.” 9Emphasis added by the author.
Technologists such as Shaji recognise that aesthetic computing needs to please the majority and not provoke; and yet not end up with the drudgery of the statistically average or generic. A turn towards ‘personalised aesthetics’ reflects a desire for classification systems to adapt to the individual ‘taste’ of the user, to counter the loss of subjectivity in reductive and universalising photographic measures of ‘beauty’. Curation here is not mobilised to serve the public realm but reinforce hyperindividualism, and offer a frictionless consumer experience in a culture of hypertargeted image consumption. In this context, aesthetics becomes now discursively conflated with style or taste rather than beauty or quality; photographic curation becomes a matter of pattern recognition in the service of (self-) brand(ing) identity.
EyeEm EyeVision promotional images
This new moment of photo sharing feels a world away from the early 2000s when sites like Flickr seemed to offer a tantalising glimpse of a web in which new collectivities and economies might form around the photographic image, and the “aesthetic brilliance” 10Olga Goriunova, Art Platforms and Cultural Production on the Internet (London: Routledge, 2012). of the misfit user might be privileged. Even the humble folksonomy has lost its innocence: tagging, once valorised as for its subversive utopian promise, now gives way to machinic tags which linguistically conform to the search terms of brands in need of #lifestyle content which is both #tranquil yet #urban showing #oneperson who is also a #realperson in #casualclothing. In parallel, the empowered citizen curator who joyously edits cultural metadata is increasingly overshadowed by what Jara Rocha has termed “the subaltern curator”: 11A term she proposed in the roundtable “Artificial ’Artificial Imaginations’: Some Provocations in the Curating of Photography from the 21st Century” at Foto Colectania, http://fotocolectania.org/web/app_dev.php/en/activity/239/quot-imaginaciones-artificiales-quot-artificiales-algunas-provocaciones-en-el-comisariado-de-la-fotografia-del-siglo-xxi an anonymous Amazon Mechanical Turker who spends their days in a digital sweatshop tagging and evaluating photographs for AI.
What then for the photo museum whose purpose is to educate, celebrate and promote an understanding of photographic culture? In my first post I observed how museum education programmes have largely responded to the digital onslaught by re-asserting the significance of visual literacy skills, in which slow looking and an almost monastic devotion to semiotic analysis might discipline us to see beyond the swipe. Stare hard at its interface, and EyeEm reveals itself to be a photo sharing app, a marketplace, a way to celebrate and learn about photography. And yet, it is also a corporate machine learning system which can track community responses to stimuli served up by the algorithm in order to further optimise image output. What we need is much more than just careful looking at images which demand to be looked at; we need to find ways to repeatedly conjure into view these social-technical systems which cannibalise and extract value from the photographic. In my final post, I hope to return to these issues.
I have been struggling to write this from Canberra, Australia which has been clouded by toxic bushfire smoke for several weeks, whilst an environmental apocalypse which is beyond comprehension unfolds. Wanting to offer a photographic token of this moment to EyeEm, I turn to my camera roll. EyeEm shortlists the best images in my roll, including a view of Black Mountain obscured by smoke in the first days of the haze, which I’d taken from Lake Burley Griffin with a sense of impending doom. I dutifully upload it, noting that EyeEm’s suggested tags include “fog”, “beauty in nature” ,“tranquil scene”, “lifestyles” and “real people” with further suggestions of “travel destinations” and “vacations”. While the world is in crisis, the machine remains blind to the terror and the aesthetics of the Anthropocene.