If you are an early-career grad student wondering around if you can ever settle down with a research question and find a data that can answer that question, I feel you. This is a constant exercise, we get better with experience, and let me share some of my thoughts from my own experience. This post assumes that you want to find a question and fit a data with that question.
How to find data for your first project?
Finding a research question that motivates you, and finding data that has information in that context is one of the toughest task for applied micro students. I am laying out the things I learnt in the way of "failing" to do so. These are probably the steps in sequence:
First, Research Question: field class, discussion with mentor and other professors, reading articles, newspapers, blogs. The blogs I follow for new ideas are linked at "Links" in my website. And, my dissertation adviser (the #Great Prof. Amy Ando) agreed to read some trillion research proposals in five years!
Second, You know the question, at least broadly - how to get a data to answer it. Write the question in 2/3 different ways. Ask your adviser/field course professors to read it and give you feedback.
Third, Now that you know your research question broadly and you have feedback from some specialists, read relevant papers that use similar question. Find which data they used - country, context, time period. Make a list and see those data (you should get a sense of available variables from there).
Fourth, Talk to people who work in that area. They may know about a data in your context. They may know a context or a policy in some country. For my second-year paper, Prof. Arun Agarwal told me Nepal has some new policies regarding the question I was interested in. Read the books from my first post to see how to write cold emails, but do it anyway.
Fifth, Have two options open. Having a back up plan always helps, and I admit that I am not good in plan-B.
Six, Data will be desperately unclean - learn how to clean data. Learn how to spend your whole time with data. My first paper's data was not geocoded, I did it all by myself with the name of the villages. It takes time.
Here are some links and sources that might be helpful. If you have a research question on developing countries:
1) LSMS is probably the first place I will always go. You get a repeated cross sectional data for almost all developing countries. Sometimes they even have panel data. If you are interested in agricultural questions, look at LSMS-ISA.
For most of the countries, it is free. And the World Bank is really great in assisting with data related questions.
2) ICRISAT also provides information on agriculture and poverty for some countries.
3) IFPRI has data on food and agriculture.
4) Demographic and Health Survey: DHS is a great source of information for demography/environmental health questions. It's geocoded. It's freely available.
6) India NSSO data: You have to pay as you go by selecting variables and years.
7) Indonesia: Indonesian PODES data is a great source for economic and environmental questions. Village-level panel and annual data. You need to pay as you go.
For environmental questions, there is an increasing demand for satellite data and geocoded information.
Satellite and Spatial Data
1) Forest cover: Global forest cover data is publicly available from 2000 (here). Use gfcanalysis in R to clean/extract the data.
2) Nightlight: Nighttime light data is also widely used in development economics (here). I have some concern in using it for village-level economic analysis in developing countries. Use Rnightlights in R to clean/extract the data.
3) Admin boundary: You can get admin boundary data and shapefile from GADM for every country (here).
4) Fire: Nearly real-time fire data is available from FIRMS (here).
5) Marine data: Global fish watching data available here.
If one thing I learnt and still am learning: don't give up on a research question just because you do not have data right now. If your intuition says it could be something, chase it. It is painful, it is worthy!