After trying out CLIP ( https://openai.com/blog/clip/ ) on an AV archive it appears to have certain quirks. It is for instance quite sensitive to text. Querying for "a school" returns this shot (4 seconds into https://openbeelden.nl/media/631278 ) of a school sign. https://twitter.com/nannevn/status/1361348647326261252
As early AV material contains intertitles, this sometimes skews the results. A query for "wife" returns (@ 04:14 in https://openbeelden.nl/media/685469 )
Yet "husband" matches most closely with (@ 04:13 in https://archive.org/details/oi3948 ). Obviously, what is returned depends on both the CLIP representation and what is available in the archive. Are the results due to archive or model bias?
In trying to pull these biases apart I tried queries with keywords related to ethnicity. Results for "A European woman" (02:36 in https://openbeelden.nl/media/79929 ) and "An American woman" (01:08 in https://openbeelden.nl/media/7732 ). Left is correct, right appears to be a Dutch woman.
It can be hard to judge correctness from the image alone, but this result for "An Asian woman" returns this portrait of Joanna of Castile (01:12 in https://openbeelden.nl/media/8139 ), hinting at certain biases in the model perhaps?
Querying for "A Southeast Asian woman" returns (01:21 in https://openbeelden.nl/media/16627 ) a picture of a "Dutch East Indian garden party" near The Hague in 1927. Given Dutch colonial history, it is perhaps not surprising that this is found in a Dutch AV archive.
But the result for "An African woman" (00:46 in https://openbeelden.nl/media/27849 ) returns a van Gogh sketch "Head of a woman" ( https://www.vangoghmuseum.nl/en/collection/d0362V1962) that depicts a woman from the Nuenen (NL) area. Confirming for me that in addition to the archive's bias, CLIP also has strong biases.
Computer vision is often promoted as a way to open up archives and make them explorable (I'm guilty of this too), but important to investigate and be aware of biases CV models might add on top of existing archival biases.
You can follow @nannevn.
Tip: mention @twtextapp on a Twitter thread with the keyword “unroll” to get a link to it.

Latest Threads Unrolled:

By continuing to use the site, you are consenting to the use of cookies as explained in our Cookie Policy to improve your experience.