Microsoft Research working an an AI which can tell the story of your holiday snaps
2 min. read
Published on
Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more
AI which can describe pictures are already at work on the internet, adding captions to pictures on Facebook for example. Now Microsoft Research is taking the technology to the next level, asking their AI to tell the story of a series of pictures.
For instance, while an image captioning program might take five images and say, “This is a picture of a family; this is a picture of a cake; this is a picture of a dog; this is a picture of a beach,” the storytelling program might take those same images and say, “The family got together for a cookout; they had a lot of delicious food; the dog was happy to be there; they had a great time on the beach; they even had a swim in the water.”
In the future, computerized storytelling could help people automatically generate tales for slideshows of images they upload to social media, Mitchell said. “You’d help people share their experiences while reducing nitty-gritty work that some people find quite tedious,” she said. Computerized storytelling “can also help people who are visually impaired, to open up images for people who can’t see them.”
“The goal is to help give AIs more human-like intelligence, to help it understand things on a more abstract level — what it means to be fun or creepy or weird or interesting,” said study senior author Margaret Mitchell, a computer scientist at Microsoft Research. “People have passed down stories for eons, using them to convey our morals and strategies and wisdom. With our focus on storytelling, we hope to help AIs understand human concepts in a way that is very safe and beneficial for mankind, rather than teaching it how to beat mankind.”
If AI ever learns to tell stories based on sequences of images, “that’s a stepping stone toward doing the same for video,” Mitchell said. “That could help provide interesting applications. For instance, for security cameras, you might just want a summary of anything noteworthy, or you could automatically live tweet events,” she said.
The researchers will present their findings in San Diego at the annual meeting of the North American Chapter of the Association for Computational Linguistics later this month.
User forum
4 messages