American Coatings Show 2018

12.6 Robots Reading Recipes: A Semantic Framework for Coatings Science (Room 234-236)

10 Apr 18
5:00 PM - 5:30 PM

Tracks: Session 12: Measuring & Testing II, Session 12: Measuring, Testing & Automation

Natural language processing tools are data workflows created using software and machine learning models. These tools are used to extract structured, semantic information from English text, and have been used to create online chatbots and to quickly assess sentiment about a product or brand on social media platforms. Natural language processing and text mining tools have not been fully leveraged in the chemical sciences, and certainly not in the domain of coatings science and engineering. Here, we propose a rules-based and template-assisted framework for the semantic recording and delivery of coatings formulation information. This framework may be combined with text mining and natural language processing tools in order to extract chemical entity information, as well as formulation relationships between those entities. The resulting information is automatically ‘tidy’ and ready for data mining and modeling activities, without the need for arduous spreadsheet data entry and manipulation. The proposed framework allows for easy dissemination of formulation information, be it in trade journals, technical data sheets, peer-reviewed publications, or within or between industrial organizations and laboratories. The objective is to transform the ubiquitous formulation sheet—the format of which varies from organization to organization—into a simple, prose-based format which is simultaneously information dense as well as reader friendly, to both human and robot readers. The proposed framework has the capability to turn ‘information archeology’ into an automated process, eventually leading to automated and machine-led analysis and interpretation of scientific results related to coating formulation and performance. Adopting a standard and widespread approach to formulation information fosters scientific reproducibility, open innovation, and standardization of reporting, as well as easy, reliable, and long-lasting access to data.