week 4 | JIZIWEI-at-ITP

Runway ML
Models & Concepts

Images tell stories. Indecisive moment photographs place people in the middle of carefully designed clues. When watching the frozen frame of the moment, uncanny storylines automatically roll in the spectators' minds. Where do humans find plots in an image? What if show the image to a machine? What narration can be created?

- In an auditorium, a group of blonde girls in red dresses was standing around a poor girl to cut off her hair. Others had their backs to the victim. The atmosphere was subdued and tense.

How do people read the image?

School Play, JULIA FULLERTON BATTEN

How did machines read the image?

Sentence Description

Seems like machines are more likely to ignore the interactions and relationships between human figures and focus on describing their outfits and gestures. Besides, machines would make a detailed introduction of the scene while people grab the emotion and atmosphere from the scene setting.

Will the machine produce a narrative image according to text?

I input human description and machine description of the photograph to a text-to-image model. Below are the outcomes. Interestingly, the machine-made description achieves less similarity to the original image. Maybe because the text did not give a whole picture of the scene (for example, how many people are there). And the human description shows a similar color tone and composition to the source image, but the action details are not clearly presented. The viewer may notice the crowded girls, but have no idea about what they are doing.

The curtain is red, the people are in the room. a door is open. A woman is wearing a red dress. A chair is in the middle of the room.

In an auditorium, a group of blonde girls in red dresses was standing around a poor girl to cut off her hair. Others had their backs to the victim. The atmosphere was subdued and tense.

Text to Image Generation with AttnGAN

Do different models show common knowledge?

After transiting text to images, I input the images into the sentence description model. "A chair is in the middle" is regenerated quite well. The image generated from "a woman is wearing a red dress" is interpreted as "a woman is wearing a white shirt". Computer vision and cognition seem to fail to reach unity. For the image that represents the "subdued and tense atmosphere", the machine even notices a "smiling woman".

How will machine develop a story?

I input the human interpretation of the image as the seed to ask the machine to continue writing the story in the style of Little Woman. I found after the sentence that describes the atmosphere, the machine goes on with another sentence talking about the atmosphere. So, I came up with an idea. If I delete the previous atmosphere sentence, will the machine still write about the atmosphere? No, it did not. It starts with the group, which is also the subject of the last sentence. I tried three times, each time with one seed-line reduced. Two of the outputs keep the same mood. One introduces a boy and some unrelated lines. Also, I generated images according to the Little Woman paragraphs.

GPT-2-Little-Woman

In an auditorium, a group of blonde girls in red dresses was standing around a poor girl to cut off her hair. Others had their backs to the victim. The atmosphere was subdued and tense. There was an undercurrent of complaint and unrest in the air, and a perpetual stream of youthful eyes, which looked straight into the soul of the matron as she cut short the glossy locks that fell limp along her forehead one summer evening. "Oh, young ladies," said Nancy, "do remember that these parts are always tart and rough in May, and do remember to be cheerful, and lend a helping hand, as we've been through, and can bear it."

"We can bear it," was the resounding chorus.

"More than we ought," was the resounding chorus.

In an auditorium, a group of blonde girls in red dresses was standing around a poor girl to cut off her hair. Others had their backs to the victim. They had scarcely turned the corner when the tall, lanky boy, with his enormous frame and pallid face, swept the whole audience into his arms, saying, in his beseeching tones, "I beg pardon, people! It's I, Meg, and I want you to help me!" The girl in the red dress had scarcely clapped when the tall man stretched out his hand, and saying, in a monosyllable breath, as he gave his worldly things, "I'm sorry, I hope?" into her trembling hand. "No, I'm glad!" she whispered, and

In an auditorium, a group of blonde girls in red dresses was standing around a poor girl to cut off her hair. She had found some old crayon as her only makeup, and was smiling ear to ear, as she worked away, with tearless countenance, at a small, dirty brush, in one eye. The colored head turned semi-awake as the eye went on, and stood upon a shelf, holding a gray paper and softly saying, with queer, pondering words, "Why are you crying? I can't help it, it is awful. Are you married? Yes, I'm forty. Can I go to the movies with you?" The head straightened

Runway ML Models & Concepts

School Play, JULIA FULLERTON BATTEN

Sentence Description

The curtain is red, the people are in the room. a door is open. A woman is wearing a red dress. A chair is in the middle of the room.

In an auditorium, a group of blonde girls in red dresses was standing around a poor girl to cut off her hair. Others had their backs to the victim. The atmosphere was subdued and tense.

Text to Image Generation with AttnGAN

Text to Image Generation with AttnGAN

GPT-2-Little-Woman

Runway ML
Models & Concepts