2019 I/ITSEC

Persistent Machine Learning for Government Applications (Room 320A)

Machine Learning (ML) offers benefits such as adaptive systems that are more performant initially and continuously improve over time.   However, we often encounter difficulties operationalizing commercial ML breakthroughs into the government sector due to a lack of available training data.  While data collection is at the forefront of most commercial entities such as Google and Facebook who monetize the data, it is often neglected outside the commercial scope which lacks that incentive. However, it is often the data, not the model architecture, that defines many of the breakthroughs being commercialized.   Google, for instance, freely shares many of its models and tools through publications and open source repositories, but not the training data used, preventing result replication.  Collecting datasets is critical for effective ML given the dataset size needs by modern Deep Learning approaches.  Even when datasets are available, too often ML algorithms are performed once and then never updated, not effectively using new data as it is collected.  Too often a system is as performant initially as it will be after hundreds of hours of use. This paper presents an architecture developed to support long-lifespan ML within the government space. This architecture provides three key components for an effective Machine Learning architecture.  First and foremost, it supports continuous acquisition and curation of new training data. Secondly, LEARN provides computational resources to support Machine Learning exploration of data.  Finally, it provides automatic continuous ML allowing models to update in response to new data observations.   We describe several sample domains, such as speech recognition, that impact military training needs. Finally, we discuss and how this architecture addresses the ML needs that prevent operationalization of modern methods within the government space as well as outstanding challenges.