Visual AI using your collector data

Hi Everyone,

I wanted to share with you an example of what the AI research community can do with the data that you’ve been contributing. Collectors like you have submitted over 400K videos of 37 simple activities that people perform in public places. These are activities like sitting, opening a door, or using a laptop that can be performed in a few seconds by one or more people. The Visual AI research community uses these videos to teach an AI system to learn to recognize when a person performs one of these activities, even if the system has never seen this person before. We use videos from hundreds of different people in over fifty countries to teach an AI system how people from around the world perform these activities so that we can accurately identify them in new videos.

Here is an example. Take a look at this video:

This is the output of our Visual AI system on a video that was collected by one of our team members. The video camera was set up on a shelf in a home office and the subject was asked to perform simple activities. Our Visual AI system finds the people (and lazy dogs) and draws boxes around them. When the subject performs one of the 37 activities our system knows about, the system outputs a caption over the box along with a confidence score. This system was trained on 8 GPUs over the course of about two weeks on the videos that you have been submitting, in order to automatically identify these activities in new videos. We call this task “activity detection”, and we’re pleased with the performance here.

However, our job is not done yet. Contrast with this video.

This again does a decent job for activity detection at the beginning. However, at the end (>1:04) we show that there is a significant challenge still remaining. Visual AI has a problem when faced with activities that look very similar to the activities it knows about, and can easily get confused. For example:

  1. “Scratching head” is confidently misclassified as person talks on phone.
  2. “Rubbing knuckles” is misclassified as person texts on phone
  3. “Squatting” is confidently misclassified as person sits down
  4. “Tapping ground” is confidently misclassified as person puts down object.

This suggests that we still need more data. We need to collect more difficult examples of the activities that the system knows about and we need new activities that are closely related to these activities. This will challenge the AI system to learn a more robust representation for activities, which will reduce these mistakes. This will be the primary goal of the next collection sprint.

All Visual AI systems are built on the foundation of high quality data. Your contributions are making an impact and our research collaborators are all super excited to receive more data from you. I look forward to continuing the mission of ethical and privacy preserving data collection for large scale visual AI, and sharing our progress along the way.



Hi Jeff, this topic definitely a good sharing for us. I believe that the accuracy of AI system is high but there are lacking of training for certain activity as you described. There are numbers of activities that need to be improved or can be added later on to differentiate from one another such as text on phone vs rubbing knuckles. I am positive that this issue can be solved after we manage to get much more difficult data. Looking forward for the next collection.



All, here is a sample of the video data that you have been submitting to us. See if you can find yourself in this montage! You’ll notice that the boxes shown here are tighter than what you collect. We use your boxes as “weak labels” to guide a machine learning system to find the person within your box to make the boxes more precise.

A final note that we’re hard at work to restart collections. We will have an update on the schedule in the next day or two.


Awesome. Ready to start the collection.

Amazing improvements. Looks good.


Awesome results! :clap:

1 Like

Increíble! Great job! :v:t4:



Hello. I just noticed that my partner is in the montage. Thank you. I am not there though. Thanks all the same.