Sponsored Links:
The RCSB Protein Data Bank (PDB) (Popularity: )
http://www.rcsb.org/pdb/
Archive of experimentally-determined, biological macromolecule 3-D structures from the Brookhaven National Laboratory.
|
Dataset generator (Popularity: )
http://www.datgen.com/
Datgen, formerly SCDS, is a computer program that generates data to systematically test programs that consume data. These synthetic datasets ...
|
DELVE - Data for Evaluating Learning in Valid Experiments (Popularity: )
http://www.cs.utoronto.ca/~delve/
Data for Evaluating Learning Valid Experiments: A standardized environment designed to evaluate the performance of methods that learn relationships based ...
|
UCI Machine Learning Repository (Popularity: )
http://www.ics.uci.edu/~mlearn/MLRepository.html
A repository of databases, domain theories and data generators that are used by the machine learning community for the empirical ...
|
TREC Data (Popularity: )
http://trec.nist.gov/data.html
Text datasets used in information retrieval and learning in text domains.
|
National Space Science Data Center (Popularity: )
http://nssdc.gsfc.nasa.gov/
Provides access to a wide variety of astrophysics, space physics, solar physics, lunar and planetary data from NASA space flight ...
|
The StatLib Datasets Archive (Popularity: )
http://lib.stat.cmu.edu/datasets/
A repository of datasets used in statistics and machine learning.
|
NIST Special Database 4. (Popularity: )
http://www.nist.gov/srd/nistsd4.htm
This NIST database of fingerprint images contains 2000 8- bit gray scale fingerprint image pairs.
|
Face recognition dataset (Popularity: )
http://www.cs.cmu.edu/afs/cs.cmu.edu/user/avrim/www/ML94/face_homework.html
A dataset of face images for face recognition algorithms.
|
Time Series Data Library (Popularity: )
http://www-personal.buseco.monash.edu.au/~hyndman/TSDL/
A collection of over 500 time series, maintained by Rob Hyndman. Time series are organized by subject.
|
Penn Treebank Project (Popularity: )
http://www.cis.upenn.edu/~treebank/
A corpus of parsed sentences. Used by many researchers for training data-driven parsing algorithms.
|
HS3D - Homo Sapiens Splice Sites Dataset (Popularity: )
http://www.sci.unisannio.it/docenti/rampone/
HS3D (Homo Sapiens Splice Sites Dataset) is a database of Homo Sapiens Exon, Intron and Splice regions extracted from GenBank ...
|
Learning Relational Concepts from Sensor Data of a Mobile Robot (Popularity: )
http://www-ai.cs.uni-dortmund.de/FORSCHUNG/PROJEKTE/BLEARN2/data-sets.html
A set of data sets, where each data set is represented in first order logic. Maintained at the University of ...
|
Web->KB dataset (Popularity: )
http://www.cs.cmu.edu/afs/cs.cmu.edu/project/theo-11/www/wwkb/
Web pages partitioned into classes, with hyperlink data. The dataset has been used for text categorization and learning to extract ...
|
WordSimilarity-353 Test Collection (Popularity: )
http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/wordsim353.html
Contains 353 English word pairs along with human-assigned similarity judgements.
|
RISE: Repository of Information Sources used in information Extraction tasks. (Popularity: )
http://www.isi.edu/info-agents/RISE/
Repository of online information sources: test domains for information extraction and wrapper generation tools that learn extraction rules (extraction patterns).
|
Reuters-21578 Text Categorization Corpus (Popularity: )
http://www.daviddlewis.com/resources/testcollections/reuters21578/
A classic benchmark for text categorization algorithms.
|