HACS Temporal Action Localization Challenge 2021

We hosted HACS Temporal Action Localization Challenge 2021 in the CVPR'21 International Challenge on Activity Recognition Workshop.

The goal of this challenge is to temporally localize actions in untrimmed videos. This year, we continue to have the classical fully-supervised learning track, while introducing a MODIFIED weakly-supervised learning track. Given the practical situation that high quality action segment labels are expensive to obtain, and weak labels like video tags are usually easily accessible, weakly-supervised learning is an attractive solution. Participants in the weakly-supervised learning track can only use the action class labels for model training, but not the start and end time of the action segments. Performance of these two tracks will be ranked separately.

For your reference, results of last year's HACS Challenge can be found at HACS Challenge 2020.

Challenge 1: Supervised Learning Track

For this track, participants will use HACS Segments, a video dataset carefully annotated with a complete set of temporal action segments for the temporal action localization task. Each video can contain multiple action segments. The task is to localize these action segments by predicting the start and end times of each action as well as the action label. Participants are allowed to leverage multi-modalities (e.g. audio/video). Common external datasets for pre-training are allowed, but it needs to be clearly documented. Training and testing will be performed on the following dataset:

HACS Segments

  • Temporal annotations on action segment type, start time, end time.
  • 200 action classes, nearly 140K action segments annotated in nearly 50K videos.
  • 37.6Ktraining videos, 6K validation videos, 6K testing videos.
  • * HACS Clips dataset is NOT permitted in this track. *

Winners

In this challenge, we have recieved 53 submissions from 22 teams. Winner teams' performance and reports can be found below.

Rank Team mAP Score
First Place Huazhong University of Science and Technology & Alibaba Group [report] 44.29
Runner Up SenseTime Research & SIAT-SenseTime Joint Lab & Shanghai AI Laboratory [report] 39.91
2020 Challenge First Place Huazhong University of Science and Technology & DAMO Academy, Alibaba Group [report] 40.53
2020 Challenge Runner Up VIS, Baidu Inc. & Shanghai Jiao Tong University [report] 39.33
Baseline SSN re-implemented in HACS paper 16.10

Challenge 2: Weakly-supervised Learning Track

For this track, participants will ONLY use the action type labels in the HACS Segments dataset for training, but NOT the annotations of the start and end time of action segments. Participants are encouraged to explore a weakly-supervised training procedure to learn action localization models. Testing will still be performed on the test set of HACS Segments.

HACS Segments with Action Type ONLY

  • Action type annotations on the videos.
  • 200 action classes, 140K action type annotations on nearly 50K videos.
  • 37.6K trainingvideos, 6K validation videos, 6K testing videos.

Winners

In this challenge, we have recieved 17 submissions from 13 teams. Winner teams' performance and reports can be found below.

Rank Team mAP Score
First Place Huazhong University of Science and Technology & Alibaba Group [report] 22.45
First Place with extra training data SenseTime Research & SIAT-SenseTime Joint Lab & Shanghai AI Laboratory [report] 29.78
Runner Up Xi'an Jiaotong University & State University of New York at Buffalo [report] 21.68

Data Download

Please follow instructions in THIS PAGE to download HACS Segments dataset.

You can choose to use our I3D pretrained features (2FPS) on HACS Segments directly for the challenge. This I3D-50 model is pretrained on Kinetics-400, taking clips of 16 frames as input, and outputing a feature of 2048-D.


Evaluation Metric

We use mAP as our evaluation metric, which is the same as ActivityNet localization metric.

Interpolated Average Precision (AP) is used as the metric for evaluating the results on each activity category. Then, the AP is averaged over all the activity categories (mAP). To determine if a detection is a true positive, we inspect the temporal intersection over union (tIoU) with a ground truth segment, and check whether or not it is greater or equal to a given threshold (e.g. tIoU > 0.5). The official metric used in this task is the average mAP, which is defined as the mean of all mAP values computed with tIoU thresholds between 0.5 and 0.95 (inclusive) with a step size of 0.05.


Submission

Performance of BOTH tracks are evaluated on the test set of HACS Segments. You should submit a JSON file (and then ZIP into .zip) in the following format, where each video ID has a list of predicted action segments. And a short report (no page requirements) on your methodi should be sent to HangZhao AT csail.mit.edu.

{
  "results": {
    "--0edUL8zmA": [
      {
        "label": "Dodgeball",
        "score": 0.84,
        "segment": [5.40, 11.60]
      },
      {
        "label": "Dodgeball",
        "score": 0.71,
        "segment": [12.60, 88.16]
      }
    ]
  }
}

Important Dates

  • April 5, 2021: Challenge is announced, Train/Val/Test sets are made available.
  • May 5, 2021: Evaluation server opened. (DELAYED)
  • June 12, 2021: Evaluation server closed.
  • June 14, 2021: Deadline for submitting the report.
  • June 19, 2021: Challenge workshop at CVPR 2021.

Please contact ZhaoHang0124 AT gmail.com for further questions.


Sponsorship

We still have a few openings for the sponsorship. All funds will be used to reward the participants based on the results of their submissions. If you are interested, please check the HACS Sponsor Package and contact Hang Zhao to become a sponsor!