Share.

To be proficient in data science, you need to practice and try various projects. data science is best learned by doing it. To practice, create models and visualize data, the first thing you need is datasets. In fact, without data, there is no data science. In that regard, we have scoured the internet and compiled a list of thirty open data sources websites which have data in almost every subject.

The projects you create while learning may be useful for you in the near future when you launch your job hunting adventure. Always ensure that every project you implement is kept. These are your portfolio projects and you can present them to the potential employers for consideration.

Go through the list of websites below to find datasets you may be interested in.

1) KAGGLE

Kaggle is probably the largest online data science community. The google owned platform offers users a comprehensive platform to find and publish data sets, build models and work with other scientists in a web based environment. A part from thousands of datasets available for practice, Kaggle also has data science challenges and competitions which can enhance your learning experience. There are also data science tutorials where you can begin and Kaggle kernels, a cloud-based workbench which allows you to share your projects in python and R.

If you have mastered some data science concepts and would like to get a job, Kaggle jobs board is there to sort you out. If you are interested in learning data science, make Kaggle your friend and you will get never get lost.

Kaggle website

2) UCI MACHINE LEARNING REPOSITORY

This is a great site to get datasets for your  machine learning projects. It is widely used by students, researchers and educators across all the world. The data is neatly categorized based on data types, attributes and the area the data is obtained from. There are plenty of data in the area of sciences,business and games. All you have to do is search the datasets you are interested in.

UCI Website

3) WORLD BANK

The world bank publishes a huge amount of data on different countries and regions. There is data on census, demographic, health, agriculture, income, GDP etc. This is a great platform you can search any sort of data you are interested in.

World Bank Website

4) IMF

The International Monetary fund has most of the financial data you need. Data on IMF lending, exchange rates and other economic and financial indicators in all member countries are available. If you are doing projects in financial modelling and analysis, you should check on this website.

IMF Website

5) AMAZON REVIEWS

Amazon is the world largest marketplace with millions of visitors every month, as result, colossal amount of data is generated daily. This dataset consists of close to thirty-five million consumer reviews on products, ratings and user information spanning in a period of 18 years till 2013. You can go through the various categories and practice while learning.

Amazon Review Website

6) CLIMATE DATA ONLINE

Climate Data Online (CDO) provides free access to NCDC’s archive of global historical weather and climate data in addition to station history information. These data include quality controlled daily, monthly, seasonal, and yearly measurements of temperature, precipitation, wind, and degree days as well as radar data and 30-year Climate Normals. Customers can also order most of these data as certified hard copies for legal use.

CDO Website

7) US CENSUS DATA

The united states Census bureau provides data about the US citizens and their economy, population, housing, workforce, facts and figures. You can obtain these and more datasets from the link below.

US Census Website

8) DATA.GOV

Managed and hosted by the U.S General Services Administration, Technology Transformation Service, this is another huge open source data available for research, data manipulation and visualization. Data on climate, Agriculture, local governments Maritime etc. are available. You can search for the data keywords you are interested in. Some datasets are downloadable while others are links to websites or apps that help you access or use the data.

          Website:  Data.gov

9) BUREAU OF ECONOMIC ANALYSIS

BEA is an agency of the Department of Commerce. data on US gross domestic product also known as GDP, foreign trade and investment and industry data are available for research and analysis in this website.

Bureau of Economic Analysis

10) UK DATA SERVICES

The UK data services was created to meet the data needs of researchers, students and people from all sectors including academia, central and local governments, charities and foundations, independent research centers, business consultants and commercial sectors. There are UK government-sponsored surveys, UK census data, business data and qualitative data. The data here is available for anyone to use provided you register.

UK Data Services Website

11) BUREAU OF LABOR STATISTICS

The Bureau of Labor statistics has data on market activity, working conditions, price changes, inflations, pay and benefits and productivity in the US economy.

Bureau of Labor and Statistics website

12) ENRON EMAIL DATASET

This dataset was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. You can check the website below for more details.

Enron Email website

13) FEDERAL RESERVE

You can access over 500,000 financial and economic data series from more than 85 public and proprietary sources based in the United States. Data on currency, interest rates, inflation etc. is also available.

Federal Reserve Website

14) OPEN DATA FOR AFRICA

If you are looking for datasets specific to Africa, There’s plenty on this website. Data on energy, infrastructure, monetary statistics, governance, environment etc. could be found here. You can browse the data by countries or search what you are interested in.

Open Data Website

15) GROUP LENS

GroupLens Research has made available rating data sets from the MovieLens web site. At least 25 million movie ratings are available in this site. If you are interested in some movie analysis, checkout on this site.

GroupLens Website

16) QUANDL

Quandl has a huge amount of financial and economic data. If your projects are centred around financial analysis, this is where you can find data that may help you in your analysis.

Quandl Website

17) YELP OPEN DATASETS

The Yelp datasets is a subset of Yelp businesses, reviews, and user data for use in personal, educational, and academic purposes. Available as JSON files,you can use it to teach students about databases, to learn NLP, or for sample production data while you learn how to make mobile apps. Millions of datasets are available.

Yelp Website

18) MOVIE REVIEW DATASETS

This is a dataset for binary sentiment classification. There are 25,000 movie reviews and you can use them in your projects.

Movie Review Data

19) MICROSOFT COCO

Coco has about 330,000 images most of them labeled. You can download the datasets and explore.

Coco Website

20) TWITTER SENTIMENTS

If you are doing sentiment analysis projects, be sure to check on this website which has great amount of resources that will definitely add value to you.

Twitter Sentiments Website

21) AIRBNB

The data behind the Inside Airbnb site is sourced from publicly available information from the Airbnb site.

The data has been analyzed, cleansed and aggregated where appropriate to faciliate public discussion. The dataset in this website is available under  creative commons license.

Airbnb Data

22) NIST

National Institute of Standards and Technology has some datasets you can explore.

NIST Website

23) REDDIT

A reddit community where find, share and discuss Datasets. You can join the community and post the datasets you are looking for. Users will help you.

Reddit Website

24) IMAGENET

This an image database, there are more the 14 million images available for researchers, educators and students. If you are doing image classification project,you can check this site.

http://image-net.org/

25) GOOGLE

Goggles open image datasets has approximately 9 million URLs to images that have been annotated with labels spanning over 6000 categories. 

Google Datasets

26) BELGIAN TRAFFIC SIGNS

The dataset here is related Traffic Sign Recognition.

Traffic Signs Data

27) STANFORD DOGS

The Stanford Dogs dataset contains images of 120 breeds of dogs from around the world. You can download them and use in your projects.

Stanford Data

28) BERKELEY DEEPDRIVE

This is probably the largest driving video dataset with 100,000 videos and 10 tasks to evaluate. You can download, create models and train algorirhms.

Berkeley Data

29) UCSD LISA

The Laboratory for Intelligent and Safe Automobiles has huge amount of datasets on traffic signals, vehicle detection etc.

UCSD Data

30) INDOOR SCENE RECOGNITION.

This is database containing 67 indoor categories and a total of 15620 imgaes. The number of images varies across categories but there are at leats 100 images per category.

Indoor Scene Recognition Data


Share.