CVSI2017

Important dates:

Feb 1, 2017 : Website online
Feb 1, 2017 : Registration Open
Feb 20, 2017: Sample dataset available
~~March 15, March 24, April 14~~
April 21, 2017: Training dataset available
~~March 25, April 3, April 21~~
April 28 2017: Validation dataset available
~~March 31, April 15, May 1~~
June 30, 2017: Registration closes
~~April 5, April 18, May 5~~,
June 30, 2017: Submission of systems
~~April 7, April 20, May 10~~,
May 20, 2017: Test dataset available
~~April 10, April 24, May 17~~,
June 30, 2017: Submission of results

Latest News

Feb 1, 2017 : Website online, registration open
Feb 20, 2017 : Sample Dataset available
March 15, 2017 : Deadline extension
March 31, 2017 : Following the extension of the ICDAR2017 full paper submission deadline, the training dataset will now be available on 14th April 2017 to the registered participants
April 21, 2017 : The training dataset is now available to the registered participants
April 30, 2017 : The validation dataset is now available to the registered participants
May 29, 2017 : The test dataset is available after system submission
May 29, 2017 : Competition deadline extend

Dataset

A dataset comprising video words for each of the fifteen scripts will be provided. The dataset contains at least 2000 words (Colour) for each script extracted from various sources (news video, sports video, etc.). The word images will be in JPG format and the corresponding ground truth information will be in XML file. There will be an XML file corresponding to a word image file having the same file name. XML files will contain the script information and image properties. Sample word images for each of the fifteen scripts are shown in the figure below.

Figure: Samples of video word images. 1st row: Arabic, Bengali, English, Gujarati, Hindi; 2nd row: Kannada, Oriya, Punjabi, Tamil; 3rd Row: Telugu, Malayalam, Chinese, Thai; 4th row: Japanese and Korean.

The dataset is divided into three parts namely, Training set (60%), Validation set (10%) and Test Set (30%). The dataset will be made publicly available after the competition for research purpose via the competition website, CVC dataset site and IAPR TC-11.

Dataset Usage:

The CVSI 2017 dataset contains video words of 15 different scripts/languages, namely, English, Hindi(Devnagari), Bengali(Bangla), Oriya, Gujrathi, Punjabi, Kannada, Tamil, Telegu, and Arabic.

The dataset is made publicly available ONLY for RESEACH PURPOSES. Use of dataset for commercial purposes/applications is not allowed.

Reference:

[1] Nabin Sharma, Ranju Mandal, Rabi Sharma, Umapada Pal, and Michael Blumenstein, "ICDAR2015 Competition on Video Script Identification (CVSI 2015)", In Proc. 13th International Conference of Document Analysis and Recognition (ICDAR 2015), pp. 1196-1200, 2015, [pdf].