Mining Dark Data

10 views 0 Comments

UC ANR Research and Extension Centers (REC) are the origins of very valuable research and projects. Records of these projects and research regarding past agricultural, ecological and climate conditions exist in documents in research centers in the form of paper archives, images, biological specimens, and digital data. Unfortunately, these data sources are not easily accessible, and hence are in the danger of becoming inaccessible in the near future. In other words, ANR research data is under the danger of going ‘dark’.

By using Optical Character Recognition (OCR) and Natural Language processing, I information of interest from the documents. This helps as a starting point for other researchers to discover and potentially benefit from the existing research findings.

I worked on this project as a graduate student researcher in the Kelly Lab under Professor Maggi Kelly in UC Berkeley.

0 Comments

Leave a Reply