CVSI 2017



ICDAR2017 Competition on Video Script Identification

More Website Templates @ Templates.com!

Important dates:

  • Feb 1, 2017 : Website online
  • Feb 1, 2017 : Registration Open
  • Feb 20, 2017: Sample dataset available
  • March 15, March 24, April 14
    April 21, 2017: Training dataset available
  • March 25, April 3, April 21
    April 28 2017: Validation dataset available
  • March 31, April 15, May 1
    June 30, 2017: Registration closes
  • April 5, April 18, May 5,
    June 30, 2017: Submission of systems
  • April 7, April 20, May 10,
    May 20, 2017: Test dataset available
  • April 10, April 24, May 17,
    June 30, 2017: Submission of results

Latest News

  • Feb 1, 2017 : Website online, registration open
  • Feb 20, 2017 : Sample Dataset available
  • March 15, 2017 : Deadline extension
  • March 31, 2017 : Following the extension of the ICDAR2017 full paper submission deadline, the training dataset will now be available on 14th April 2017 to the registered participants
  • April 21, 2017 : The training dataset is now available to the registered participants
  • April 30, 2017 : The validation dataset is now available to the registered participants
  • May 29, 2017 : The test dataset is available after system submission
  • May 29, 2017 : Competition deadline extend

Dataset

Sample Dataset

A dataset comprising video words for each of the fifteen scripts will be provided. The dataset contains at least 2000 words (Colour) for each script extracted from various sources (news video, sports video, etc.). The word images will be in JPG format and the corresponding ground truth information will be in XML file. There will be an XML file corresponding to a word image file having the same file name. XML files will contain the script information and image properties. Sample word images for each of the fifteen scripts are shown in the figure below.

Word Samples

Figure: Samples of video word images. 1st row: Arabic, Bengali, English, Gujarati, Hindi; 2nd row: Kannada, Oriya, Punjabi, Tamil; 3rd Row: Telugu, Malayalam, Chinese, Thai; 4th row: Japanese and Korean.


Word Samples

The dataset is divided into three parts namely, Training set (60%), Validation set (10%) and Test Set (30%). The dataset will be made publicly available after the competition for research purpose via the competition website, CVC dataset site and IAPR TC-11.

Dataset Usage:

The CVSI 2017 dataset contains video words of 15 different scripts/languages, namely, English, Hindi(Devnagari), Bengali(Bangla), Oriya, Gujrathi, Punjabi, Kannada, Tamil, Telegu, and Arabic.

The dataset is made publicly available ONLY for RESEACH PURPOSES. Use of dataset for commercial purposes/applications is not allowed.




Reference:

[1] Nabin Sharma, Ranju Mandal, Rabi Sharma, Umapada Pal, and Michael Blumenstein, "ICDAR2015 Competition on Video Script Identification (CVSI 2015)", In Proc. 13th International Conference of Document Analysis and Recognition (ICDAR 2015), pp. 1196-1200, 2015, [pdf].