datamaster summit 2020

Bringing Speed, Agility and Insights to Geospatial Data

 

Todd Bacastow & Todd Broadhurst

Todd Bacastow, Senior Director of Strategic Growth @ Maxar
Todd Broadhurst, Solution Director @ Tamr
Agencies and departments have a huge problem in their isolated data structures including their geospatial data silos. Each group wants its own repository with different access points, schema categorization and classification levels. Join this webinar to learn:
  • Strategies to provide a foundation for near real-time geospatial information clarity for your mission
  • Proven tactics to improve the time-to-insights of location patterns and activities to ensure better decision-making
  • How to utilize machine learning coupled with human expertise to save time and improve mission effectiveness.

Transcript

Howard Wahlberg:
Thank you for joining today’s edition of the SIGNAL Media webinars series. I’m Howard Wahlberg, correspondent for AFCEA International SIGNAL Media. Data sharing among the entire GON community has become a renewed focus for the National System for geospatial intelligence, yet agencies and departments have a huge problem in their isolated data structures, including their geospatial data silos. How do we ensure all that data is cleaned and ready for consumption? How do we ascertain all applicable information on a given point or area in an accelerated timeframe when the amount of information being collected only keeps growing?

Howard Wahlberg:
Today’s webinar, which was brought to us through the generous sponsorship of Tamr, whose data unification platform catalogs, connects, and curates internal and external data sources through a combination of machine learning algorithms and human expertise will illuminate ways to consolidate catalog and curate geodata from a variety of sources, it will lay out strategies to provide a foundation for near real-time geospatial information clarity for your mission. It’s going to offer proven tactics to improve the time to insights of location patterns and activities to ensure better decision-making, and discuss how to utilize machine learning coupled with human expertise to save time and improve mission effectiveness. Joining me today to take on these challenges will be Todd Broadhurst, Director of Solutions for Tamr Government, and Todd M. Bacastow, Senior Director for the Strategic Growth Team at Maxar Technologies devoted to advancing space infrastructure and earth intelligence capabilities.

Howard Wahlberg:
Today, our two Todds are going to speak with us about how Maxar and Tamr work together in the modern geospatial data fusion workflow. Their capabilities can help you tackle your massive data tech data challenges at scale. As always, please feel free to submit questions at any time during this discussion by using the ask-a-question box on your webinar console. We’re going to have plenty of time for your questions at the conclusion of the discussion with our two Todds, Mr. Broadhurst and Mr. Bacastow. Please also know that we have uploaded some resources for you to download at your convenience. You can find those in the resources tab on your webinar console. With that, I’ll turn it over to Todd and Todd to get us started.

Todd Bacastow:
Great. Thanks, Howard. Appreciate your introduction. Good afternoon, I’m Todd Bacastow, Senior Director of Strategic Growth at Maxar Technologies. My team is focused on working with customers to better serve their mission needs by harnessing Maxar capabilities, our partnerships, as well as emerging technologies. It’s certainly an exciting time to be a part of the geospatial community. The industry has come a long way in the last decade that I’ve been a part of it. Geospatial intelligence is instrumental to keeping our nation and allies safe, and it’s fueling many emerging technologies ranging from autonomous vehicles to drone delivery and 5G wireless. More and more sources of imagery are coming online and geospatial data is becoming ubiquitous across many technologies.

Todd Bacastow:
Go ahead to the next slide please. You’ve probably seen Maxar imagery in the news, in mobile mapping applications or an online mapping. What you might not realize is just how much data Maxar creates every day and stores as part of our image library, and how this data is enabling advanced analytics across a variety of missions. Actually, can you go back to the previous slide, please? Thanks. To give you a sense of the volume of this imagery in a big data challenges that it poses, for the amount of imagery that we collect in one day, which is about 80 terabytes, it would take a human approximately 85 years to extract one feature such as building footprints or roads from that imagery.

Howard Wahlberg:
Wow.

Todd Bacastow:
Next slide. Over the last 20 years, Maxar has built up an image library of approximately 110 petabytes of data. All of this serves as a digital time machine that allows us to understand how earth is changing and perform a variety of advanced analytics. To give you a sense of Maxar’s business, we’re organized around two primary areas. This includes earth intelligence, which is focused upon providing geospatial intelligence with the satellite imagery that we collect from the constellation that we own and operate, as well as a variety of automated analytics technologies, as well as humans expertise that we provide in this area. In addition, the other main part of our business is space infrastructure. This includes design and manufacturing of satellites and spacecraft for both exploration and communications, as well as robotics. In fact, this includes the robotic arms that have been on all five Mars rovers.

Howard Wahlberg:
Wow.

Todd Bacastow:
We have the privilege of serving customers in government and business across a variety of application areas. To give you a sense, this ranges from defense to public safety, we serve more than 50 of the governments around the world, as well as some of the world’s most innovative companies. As a pioneer and trusted innovator around earth intelligence, Maxar provides capability combining multi-source data as well as satellite imagery and machine learning to power products that derive analytics and insights about our changing planet. This enables the commercial businesses and government organizations to monitor, to understand and to navigate the world in ways at a global scale. This unique approach combines decades of understanding as well as proven foundation commercial technology that we leverage every day.

Todd Bacastow:
To give you a sense of some of the application areas, this includes applications such as mapping remote areas. There are many parts of the world where maps are just not up to date. We’re able to create new maps. A map layer is derived from our satellite imagery. We’re also able to provide insight and analysis around urban growth centers and understand how this impacts and enables new connectivity such as 5G wireless. We’re able to monitor the various parts of the world to assess both security issues, as well as providing sites that will allow long-term sustainability. Let me talk a little bit more about Maxar’s portfolio of earth intelligence products and services. This starts with our data collection, which comes from the satellite constellation that we own and operate. I’ll talk more about that in a minute.

Todd Bacastow:
We extract a variety of information layers from our satellite imagery and other sources, and Todd from Tamr will talk about how this data can be fused from multi sources of geospatial data, and we perform advanced analytics through our technologies, as well as our human expertise. All of this is provided to our customers through simplified access in analytic platforms that meet the customers where they are to enable timely insights for decision-making. Maxar’s constellation currently consists of four very high resolution, very highly spatially accurate earth observation satellites that are in orbit. We collect more than three million square kilometers of earth imagery every day, and that adds to our image library that dates back 20 years.

Todd Bacastow:
2021 is an exciting year for Maxar as our WorldView Legion satellites will come online to add to our constellation. We’ll be adding six WorldView Legion satellites which will enable up to 15 times a day revisit when combined with our current satellites over certain parts of the world in over five million square kilometers of earth imagery collected every day. As we think about this big data challenge that I alluded to earlier, this is key for fueling advanced analytics, but also more and more data that’s coming online poses new opportunities and challenges to make sense and derive insights from this data. To give you a sense of the imagery that Maxar collects, our satellites generally collect imagery at either 50 or 30 centimeters native resolution. These are represented by the two images that are right most on your screen.

Todd Bacastow:
The resolution of the data matters as you get into more and more advanced analytic applications. You’re able to observe features that might not otherwise be observable, and really start to be able to characterize those features. Now, add the temporal aspect of this of having constant high-frequency revisit as we move into … expand our constellation with our WorldView Legion satellites, and we’ll be able to derive patterns and understand activities that wouldn’t otherwise be possible. To give you a sense of our machine learning and crowdsourcing workflow, we tap into our satellite library to develop features and training data from our algorithms. This is key to fueling our advanced analytics. That process starts by identifying areas of interest, and then we identify examples of features that we want to derive to then extract at broader scales.

Todd Bacastow:
We get into the annotation phase of this by either labeling these features, some of them by hand ourselves, or when speed and broader scale are necessary in order to label many features, we leverage our crowdsourcing community called GeoHIVE. This allows us to quickly spin up a vetted crowd of users that are able to rapidly identify these features. In the case that we have an algorithm that can do this if it’s already been trained, then the algorithm can also be used to identify features, and that can then be validated further by the crowd and our crowd-rank algorithm that looks for consensus amongst these annotations. Once we have the validated set of features, we chip the data and we provision this in a way that simplifies the data logistics to be able to deliver that to our customers.

Todd Bacastow:
Now I’m going to talk about an example of this type of use case. This is a computer vision example where we’re looking at the US-Mexico border. What you’ll observe is we’re actually applying this algorithm to identify features from the imagery, in this case cars. Go ahead to the next slide. I’ll talk a little bit more about this example. Often this will start by establishing a baseline. In this case, we are able to … We have a computer vision algorithm that we run, or term is called inference, to identify the future that’s the algorithm has been trained to perform. The model’s been trained. In this case, it’s identifying the cars. We then are able to perform other analytics, just counting the cars and then fuse that with other information. The key is then being able to observe this change over time.

Todd Bacastow:
If you go to the next slide, we can see in this particular date there was an uptick in cars. If you multiply this out times many observations over many days, many months or years, as well as many different locations, we’re able to start to analyze patterns and discover insights that might not otherwise be humanly possible by just counting an area. Now, in this example, a human could perhaps count these by hand, but we get into that multiplying this out over many areas and over a longer period of time, the problem becomes much more challenging and there’s a need for automation. As I alluded to earlier, providing access to our data is key to being able to meet our customer’s needs. We provide access on through a variety of platforms.

Todd Bacastow:
Our government customers have access to Maxar imagery through a platform called a Global EGD or Global Enhanced GEOINT Delivery, and a similar commercial equivalent of that is called SecureWatch that many of our international government customers or commercial customers use to access this type of data. In addition, we are able to process our imagery in a way that allows us to drive new products as well. Last year, we completed the combination of a company called Vricon, which was a joint venture that, is now fully part of Maxar. This is a key capability being able to derive a full 3D models from the Maxar image library. What’s also important about this capability is that it allows us to fuse various geospatial sources of data, whether it’s imagery, whether it’s full motion video feeds, or other types of data in a way that’s highly spatially accurate.

Todd Bacastow:
Todd from Tamr will talk more about using data in his presentation. But this capability is also a key to being able to perform these sort of multi-source advanced analytics. Before I touch for a minute on some of the strategies and ways in which you can utilize machine learning with geospatial data, I think a lot of this process should start with really considering the problem. Before we jump into technologies or before we become immersed in large data sets, it’s really important to take a step back and ask the question of automation versus the traditional approaches really help to solve this problem in a way that wouldn’t otherwise be possible. Are there … Generally, we’re looking at criteria such as speed, scale or complexity that need to be addressed. Is there a way that AI gives us an edge?

Todd Bacastow:
For example, in a geospatial context, this might be is it problem regional? Is it local? Is it regional? Is it global? And to the extent that you get into these global, problems becomes much harder to address through manual and traditional processes. Once we’ve asked these questions and decided that it’s a problem that’s suitable for automation or machine learning, then it’s key to be able to identify the data. The first question that one should ask is, is there sufficient labeled training data to actually train the algorithm? In this case, if in a geospatial context, that might be satellite imagery and a feature such as building footprints, or in the previous example I used cars, are there enough examples of these or enough labeled examples to train an algorithm for this? If not, then the question is, can you go produce that? We often get a question of, well, how many training examples do you need?

Todd Bacastow:
The true answer to that is going to really vary by challenge, by problem. It’s going to vary based on the area, the type of feature you’re going to extract. But a good rule of thumb is it’s certainly in the high thousands. It could be hundreds of thousands of examples. It also depends on the algorithm, if you’re starting from scratch or whether you have a pre-trained algorithm as well that has already been able to have those training samples as part of it. Then also from a data perspective, it’s important to have a corpus of data or imagery in a geospatial context of where you can run that. Once you train an algorithm, then when you go to deploy that you need to have imagery that you can actually detect this, and this is the real data to run your algorithm on.

Todd Bacastow:
When you’re developing the model, it’s also important to consider how performance should be measured. When you’re looking at existing machine learning models, there’s precision and recall, there’s a variety of metrics that are relevant for computer vision. Do those metrics apply to your case, or do you need to develop something that is based on those prior standards, but perhaps take some other factors into consideration? Such as if you’re extracting a road network, you might want to assess the integrity of the network, not just looking at whether the amount of overlap in the lines. Because if there’s a break in the road network, then in fact it may not be as useful for a routing app location. Sorry, one more point on the last slide. The last point is around deploying the model. This is about, one, do you have the corpus of data? But also does the model that you’re going to deploy, does that feed another analytic process?

Todd Bacastow:
Oftentimes, if you detect the cars in the example I gave, well, that may be one indicator, but it may not be the answer to a more complex analytic question. Then can this data be co-registered with other data sources? That gets into a little bit of what I described with the 3D models and our capability around a co-registration of data. Go ahead to the next slide. We talked a bit about data collection. I talked about the information that we extract from the geospatial data that we collect and how that can be turned into actionable insights with advanced analysis. It’s certainly an exciting time to be part of the geospatial community, and it’s a privilege to serve our customers important missions. Now, Todd from Tamr will talk more about geospatial data mastering, which combines a variety of sources to enable groundbreaking analytics and insights for decision makers. Thank you.

Todd Broadhurst:
Thanks, Todd. Good afternoon, everybody. As Todd Bacastow described, there is a ton of awesome information and data being generated and delivered by Maxar. But it’s just one provider with multiple sources. How do you deal with a variety of data providers and hundreds of data sources using different descriptions and nomenclature? Today I’ll be speaking with you about the importance of mastering many data sources, specifically geospatial data sources and how machine learning coupled with human expertise can really fit in your data apps ecosystem and it’s an essential to saving time and resources ensuring overall mission effectiveness. Next slide, please.

Todd Broadhurst:
All large organizations across both the commercial and the US government have a need to master their data, and use it as an advantage in driving mission outcomes. Throughout the decades, across the Us government and private industry, there has been a feeling that there needed to be firewalls between these agencies, departments, or organizations. But those feelings are changing and the need to combine many data sets from all types of descriptions and categorizations, data sources, et cetera, are needed. With all that data out there and the data silos housing, it can be difficult to clean the data and gain the most insight. Manual work is long and tedious to do so.

Todd Broadhurst:
According to Gartner, as many as or as much as 80% of all data analytics projects fail. Geospatial projects certainly fall into those group. These projects fail due to their volume, the velocity, and the variety of the data. Furthermore, by the time the project might have the ability to connect those disparate data sources, the data is often stale or unusable. Human-guided machine learning greatly reduces the failure rate of these projects, as well as the manual workflows and time to effectiveness. Modern data mastering solutions offer an alternative to traditional master data management tools using machine learning to do all the heavy lifting and perform data mastering functions while empowering your internal subject matter experts to ensure that the results meet the mission needs every step of the way.

Todd Broadhurst:
These internal subject matter experts aren’t the data scientist and the personnel that really understand the data, in this case, geospatial data. The analyst just like for an expertise on the parts in your car engine, you would look to a mechanic. Chances are good these personnel are already on staff and capable of providing this type of SME support. These systems need to be designed to provide the modern data apps ecosystem with essential functionality, such as schema mapping, enrichment, data mastering and data publishing, all leveraging modern cloud infrastructures, but also being able to integrate with existing data tooling and flows. The outcome needs to be well curated data sets, ready for analytics, accelerating the ability of agencies to leverage data as an asset.

Todd Broadhurst:
Next slide, please. The radical departure to overcoming data mastering is through this human-guided machine learning to do, once again, that heavy lifting, to cleanse, combine, consolidate and categorize data from disparate data sources, representing a wide variety of entity types, such as points, subjects, suppliers, parts. Your internal experts remain in charge of the decision-making on the data remediation through simple data-oriented interfaces that enable them to specifically or specify the correct results. The system learns from these interactions to improve the automated results. A seamless data publishing capability makes this data available for consumption in downstream analytic and operational use cases. This approach drives higher accuracy and accelerates time to insights so that your teams can focus on the objectives at hand with complete accurate and up-to-date data.

Todd Broadhurst:
Next slide, please. Traditional MDM solutions require thousands of personnel hours to generate, manage and tune complex rule sets from mastering data. That’s the old way of doing things. Your solution should include human-guided machine learning centered on examples and providing feedback on the machine-generated recommendations, thus reducing manual curation workloads. The result is much faster time to value, delivering use case ready data in days to weeks, instead of months to years seen with traditional MDM solutions, and their rule base, or if then approaches. Your data experts and SMEs can focus more on utilizing the outcomes as opposed to writing, if then statements for rule-based model. Plus you have a system that works 24 hours a day, every day of the year, regardless of the number and types of data sets sent to it. Next slide, please. Machine learning can be used to build associations between different data points across dozens, maybe hundreds of data sources.

Todd Broadhurst:
There is a huge departure from traditional ingest methods in areas like geospatial, where max of maybe three data sets could be capable. Traditionally combining more than two or three data sets would cause overlap or repetition problems, but not with a human-guided machine learning approach. These associations evolve via input from the data SMEs and pairs are created to help train the model and speed data correlation. Next slide, please. SMEs, once again, these are analysts personnel that understand, in this case, shape comparisons, line the segments, polygon associations, they should be able to answer yes-no questions to determine the iterative algorithm, in this case whether a line is the same as a polygon on the map. But it may be merging textual data sources with geospatial data sources, and multiple subject matter experts can collaborate on the end-to-end categorization workflow, both to divide up the work as well as to provide multiple-

Speaker 5:
[crosstalk 00:26:23] business.

Todd Broadhurst:
To provide multiple points to review options for the curators to validate the results prior to publication. Next slide, please. Now we’ll take a look at some particular features of the best way to approach your geospatial MDM platform. We’ll talk about entity mastering and then categorization. Next slide, please. Entity mastering involves identifying references to key entities, then assembling information from all of those references into what we call golden records. This capability is designed to work with any entity type, and in this case, geospatial types. The process starts with one or many datasets that may differ in size, structure and semantics. Being able to bring these together in a work bench where your in-house subject matter experts provide examples of records that may look dissimilar but refer to the same entity, and then of records that may look similar but refer to distinct entities.

Todd Broadhurst:
The system uses an innovative process to learn from these examples, how to gather together or cluster the data about each entity. Once all the data has been clustered into entities, the best data in each cluster can be gathered into a golden record for each entity. This highly curated consolidated data set is then published into your lake or your data warehouse where it can be used for high impact analytics for reliable operations and to improve the data and the source system. I know this is a little bit hard to see, but these screenshots show an example of how human-trained machine learning algorithms can integrate 21 input datasets producing about 87 attributes, about just one entity via one golden record in a few seconds. Now, can you imagine how long it would take an actual person to do that? What if there were 100 plus datasets with hundreds of attributes?

Todd Broadhurst:
It could literally take man years to do those types of calculations, and that’s where the machine learning human-guided is so important in the mastering. Next slide, please. The second feature of ideal geospatial data apps strategy is the data categorization. Categorization involves analyzing the contents of the records themselves to correctly identify the category in the taxonomy that best describes each record. Similar to entity mastering approach, the process starts with the ingestion of one to many data sets that may different size structure and semantics. You want to bring these together in a work bench where your in-house subject matter experts can provide examples of these records that match the different categories in the taxonomy. The system uses an interactive process to learn from these examples to align similar records to the taxonomy, automatically dealing with conflicts and ambiguity.

Todd Broadhurst:
Once all the data has been aligned with the categories, the categorized data set and corresponding taxonomy are published into your data lake or warehouse where it can be used for those high analytics and to support roll up and drill down across data form from parts of the enterprise that may have never been known. Next slide, please. Loading information from geospatial data sources such as GeoJSON, Shapefile and OpenStreetMap is essential, and the properties in geometry should fit cleanly into the data model. You should also add the ability to integrate geospatial data sources with textual data sources, regardless of their origin, type or purpose, both public and private. Using entity mastering, machine learning can take advantage of such characteristics as distance, overlap, diameter, et cetera, when learning how to identify these entities.

Todd Broadhurst:
The user interface should also be able to integrate with tileservers to display geometries on maps to each subject matter experts in making decisions when training or hand curating these data sets. The addition of foreign keys and durable identifiers back to the corresponding entity in the source system, you can ensure provenance and governance and internal tracking is being completed by the system, allowing users to track and review change over time so that you can ensure you’re getting the most value out of your data. It is essential to have a powerful MDM tool for integrating geospatial information across multiple sources to create highly curated data sets to power analytics with roll up and drill down. Next slide, please. Now I wanted to talk to you about a specific enhancement to editing mastering capability. It’s called golden record tear lines.

Todd Broadhurst:
Next slide, please. Golden record tear lines give you the ability to prepare high quality entity mastering models, then deliver those results to the analysts that require a restricted view of the data. The system should allow or does allow for an enhanced sensitivity of information be specified down to dataset, the column or the row level. So the data included in the output data set can be restricted automatically based on the sensitivity of the information. This is the premise of the golden tear lines that can be applied across all data sets. This is ideal when information is needed across classification, security level or agency. The golden records tear line feature can enable the sharing of data sets across silos, and limiting what can be accessed on those data sets. This plays exactly into the commitment of the geo intelligence community and the NSG as a whole, and what they’re driving towards, the sense of strength through community.

Todd Broadhurst:
Tamr and Maxar are also firmly committed to this goal. These focus cooperation areas coupled with the increasing use of secure cloud environments and access controls are allowing us to experience really a renaissance of data unification and cross operational partnerships. Next slide, please. Building the models used in golden record tear lines uses the same entity mastering workflow and is done with visibility into the full data stack. This enables curators to verify the accuracy of these models prior to publication. The resulting models contain no information from the source data sets. They are expressed entirely in terms of similarity thresholds. Therefore, these models can be published into projects with different data sensitivity requirements and used to construct a result from a restricted view of that data set.

Todd Broadhurst:
Data can be masked according to the sensitivity of the data, the attribute or role level, and the external service can be invoked to determine whether data with a sensitivity label should be included at a specified target sensitivity. This ensures that a resulting data set of golden records can be built using the highest quality model while also ensuring that it contains only the information appropriate for target sensitivity, thereby unblocking analysts and aiding in the dissemination of this needed data. Next slide, please. Although companies like Maxar are making great strides at it, locating things on the surface of the earth is a known hard problem. Matching things that are described in different ways and different terms is really hard thing to do. Tamr’s ability to do this fuzzy matching based on many attributes is a radical departure from traditional geospatial conflation.

Todd Broadhurst:
Next slide, please. Tamr can understand and load the different ways items are categorized or named in geospace or in other data sets. Our machine learning uses a variety of algorithms to record match, including Hausdorff distance, metadata comparisons and geometric attributes. We do all of this without changing any GIS or other data source attributes. Next slide, please. Along with the Tamr core master data management platform and our various ways of record matching, as I mentioned, Hausdorff distance, metadata comparisons, geometric attributes, we also use bidding algorithms to identify objects that share or don’t share a bid. This includes identifying objects that might be relative bidding, which is close in shape and location relative to size, or overlap bidding, which is close in location and shape for an overlapping area.

Todd Broadhurst:
Next slide, please. There are significant advantages to being able to combine hundreds, perhaps thousands of data sources, both geospatial and other data, both public and private. The needs to clean and collate that data and provided into a myriad of sources. Crisis mapping, for example, was the motivating use cases presented to Tamr for the development of our geospatial mastering capability. In particular, it was for the mapping of Port-au-Prince Haiti after the 2010 earthquake devastated much of that city. The challenge was to use available data sets such as open street map, Geographic Names Information System, the GNIS, multinational geospatial co-production program in conjunction with information such as text images with geotagging from social media to determine how things had shifted and what infrastructure and facilities were still available.

Todd Broadhurst:
Because Tamr’s matching capability can take advantage of location, shape, size, name, and other descriptive metadata, it’s able to find matches in this type of challenging context. But there is an untapped variety of applications for clean geospatial data and the information it can provide, whether you’re looking at fleet management with predictive maintenance or route selections or things like military movements, to tracking census or geographic studies, from patient masterings especially with the latest concerns about vaccine tracking and inoculation and those types of things to have safe harbor mapping which we’ve done before in disaster responses. Tamr can drastically reduce your failure rate and improve the time to mission effectiveness of your geospatial projects. Together with Maxar, you can be assured to have the cleanest, most pertinent data for your geospatial programs. Howard, that concludes my part of the presentation, if you want to open it up for question and answer.

Howard Wahlberg:
Absolutely, Todd. First of all, thank you both, Mr. Broadhurst and Mr. Bacastow, our two Todds, for an absolutely riveting look at a very, very complex landscape. It was particularly enraptured by our discussion about Haiti as we got towards the end. We do have a few minutes for some Q&A while we’re aligning those up. I just want to remind everybody that in order to submit a question, don’t forget to use the ask-a-question tab on your webinar console. It looks like we have a couple coming in already. First question is for Tamr. What are some of the best practices for adopting machine learning technology in the public sector?

Todd Broadhurst:
Well, you want to make sure you have your objectives fairly clear and laid out. You don’t want to go into it as a sandbox without clear objectives. You want to be able to know from this perspective where your data sets are and how you can effectively tap those data sets.

Howard Wahlberg:
Great. All right, looks like we’ve got another one stacked up. Actually, a few coming in. This one is for Maxar. Where do you see earth intelligence technology going next?

Todd Bacastow:
Great. Thanks, Howard. Yeah, I think we touched a little bit on this topic in the presentation, but certainly we’re seeing a need for and demand around persistence and need for better understanding around complex challenges, right? Our complex situations or threats. What we’re seeing certainly as more and more earth observation sensors come online, the volume of data is increasing. In addition, there’s more variety of data. We talked a little bit about multi-source and how geospatial attributes or characteristics of the data are something that’s becoming more prevalent across a variety of technologies. I also think, in addition to persistence, we’re seeing a trend towards the velocity is increasing. Much shorter timelines to be able to do task and receive data, as well as then be able to harness various AI and machine learning capabilities to derive insights and then perform sense-making on that data to assist human analysts in that process.

Howard Wahlberg:
Fascinating, especially the change in the speed in the last short amount of time. It’s astonishing how quickly this can all be done. All right, going back to Tamr. What has been one of the largest geospatial data sets that Tamr has mastered? Follow up question, what was a key element to that project’s success?

Todd Broadhurst:
That’s a great question. On the geospatial side, we’ve done dozens of data sets and integrated those, and these data sets can be into dozens and dozens of terabytes. That’s one of those things, it’s getting ahold of those data sets and integrating what you need and how you have access to that. The key element of that particular project success was such a close working relationship with the customer and the customer’s access to those data sets. They had a good idea of the geospatial and GeoJSON data sets they wanted to incorporate. They also knew exactly the public information they wanted to integrate along with some other agency data. That was a key element to the project success, was they had senior level buy-in but we also had mid-level buy-in of the people that had control of those data silos.

Howard Wahlberg:
All right. I’m going to throw this out to both of you, just identify yourself as one of the two Todds. What are the key things that you see in propelling federal agencies shift towards automated data management? Who wants to take that one first?

Todd Broadhurst:
I’ll go first since I [inaudible 00:44:29] I think that more and more senior management, your SES and your senior federal managers, want to see actionable real-time data and data analytics. There’s no question that the geospatial is one of those data areas that that is essential. If you are going to be making split level choices based on actionable intelligence from a geospatial, you need the most information possible. It’s not just all of the great stuff that the Maxar is calling for, but do you integrate, do you have pictures from ships? Do you have pictures from airplanes? Do you have all of this data that can be correlated together to get the most actual picture.

Todd Broadhurst:
But Howard, I mean, for the question itself, I think the biggest thing is just the amount of data that we’ve seen explosion across the federal government and in commercial. I think I read a great line that the world has made more data every two years than it made from the dawn of time to 2003. If you think about how much data that is, and geospatial is no difference, I mean, we’re talking about these huge data files that are small change tracks. I’ll turn it over to Todd to get his feeling on it, but that’s where I’m at.

Todd Bacastow:
Yeah. I think this goes very much along the lines of the trends that we were speaking of related to, I think, being able to empower the analysts. When you look at the amount of data and the timelines and the complexity of the challenges that analysts and decision makers are faced with today, you have to have automation. You cannot humanly keep up with these type of challenges. There’s been some interesting anecdotes and reports on how many millions of analysts you would need in order to keep up with these trends, and it’s just not possible. We believe that solutions that provide both empower humans and compliment analysts with algorithms actually are the way forward. The analyst is very much a part of that process. It’s not allowing an algorithm or machine to take over by any means.

Todd Bacastow:
I think the other major trend is that in the era of great power competition, our adversaries are adopting these capabilities. In order to keep pace and keep ahead of the adversaries, or in a business context, it’s around advantage over your competitors that you have to have these capabilities or else you’re going to fall behind. I think the real trick is coming up with the right way in the right places to implement this capability so that it truly does result in an advantage. We need to be really careful to measure that advantage and also look at the ethics around this too, to make sure that we’re staying true to who we are, and really we’re able to better serve the nations that we’re dedicated to serve.

Howard Wahlberg:
Awesome. Great. Okay. Here we have someone who identifies himself as a geospatial analyst with eight years of experience. How can AI and ML ease my workload and enhance my productivity?

Todd Broadhurst:
I’ll go. Todd and I like to talk about garbage in and garbage out. I talked earlier about the fact that 80% of these analytic projects fail. They fail because of the bad data. Maxar insurers, as their subscription, that data is already gone through by analytics. [inaudible 00:48:36] it’s the most compelling need for that customer. We also clean that data so that you can spend more time on analyzing the data and producing outcomes, not worrying about the data sources, worrying about that they are similar or can talk to each other. Pushing it towards the front end, being able to have people solve problems, not set up the backend.

Todd Bacastow:
Yeah, [inaudible 00:49:12] that I totally agree. I think the only thing I would add too is I think a trend that we’re seeing is really a democratization around the AI machine learning technologies, where early in the development of these technologies, there’s multiple waves and that’s right where started with very much rules-based AI and has gotten more into statistical AI, and then perhaps the next wave is more around being able to perform sense-making. But, as these technologies are democratized, I think we’re going to see more and more they’re part of our daily lives and our daily workflow. I know there’s certainly AI skeptics out there. But I think, if you take a step back and look at our daily life, right? There’s spell checkers that use natural language processing.

Todd Bacastow:
There are smart speakers where we can use voice commands to request our favorite music or ask questions, right? Or there are smart playlists that adapt to our tastes and preferences around music. There’s doorbell cameras that identify when a package has been dropped on my front door and then also picked up. I mean, these technologies are very real and they’re having an impact on our daily lives. As we think about how they’re being applied in a professional analytic context, they’re very much having an impact today as well. I think I mentioned, for example, SecureWatch. One of the capabilities that was actually just … I saw LinkedIn post about it yesterday, where there’s services to detect objects at scale, and that are being published out.

Todd Bacastow:
You may be the recipient of some of those alerts or those tips and cues. I think considering how that might feed into your analytic workflow, how it might make life easier, or allow you to do a better job for some of the analytic functions that you’re performing, or perhaps take some of the tedious counting things off your plate, I think there’s … Those are some of the strategies and capabilities that I think are very much available today, and we’re going to see increasingly these available in a variety of workflows.

Howard Wahlberg:
Great. Thank you both. Okay. Here’s one of our attendees asking the following, “Geospatial data enrichment through third parties has become increasingly common and important in the effort to truly gain a 360 degree view of entities from numerous hierarchies to millions of data points coming in from earth intelligence companies like Maxar, as well as the various file formats that these data packets arrive in. What challenges does the mix of internal and external databases create and how can an agency overcome these challenges?” I’ll throw it out to both of you.

Todd Broadhurst:
I’ll take first again. As I mentioned, the challenge is that these data sources and databases all talk a different language. They all have different semantics. They may be talking about the same thing, but one may spell Howard a different way. If you miss that information, or they just put it as an H, how do we combine that together? How do we glean the most pertinent information about an entity across all of these that talk different? That’s a huge place that we can definitely help. But also, just deciding what type of information and publicly available enrichment that customer wants to access. We had talked earlier about going in our agency, et cetera. Obviously, there’s going to be a lot of physical security and information security requirements in there.

Todd Broadhurst:
I mentioned the golden tear lines and being able to let only certain people access or view certain information based on their clearance level or their sensitivity level. Those are all challenges I see with integrating these internal and external databases. They can definitely overcome these by the things like golden tear lines and making sure you have an audit trail and that you know exactly where all the touch points are and really locking those systems down. But also making sure that there’s not a bi-directional talk to some of these public access points as well. I’ll turn it over to you, Todd.

Todd Bacastow:
Yeah, I think this gets to the point about data provenance and being able to … I think one, there’s certainly that permissioning and security aspect of this question, so being able to understand where data comes from and allow the right users to access the right data. The other part of it relates to the point that I touched on around conflation and being able to co-register different sources of data. I think this certainly is a topic that is of high interest to many organizations, because they’re both consumers of geospatial data as well as producers of the data. In many applications, we’re seeing a combination of organizations wanting to combine both, right? Provided they’re able to do this in a way that tracks the provenance and understands what the source of a data is, what the level of accuracy of that data might be?

Todd Bacastow:
There’s a lot of great open data that’s out there and available. But then being able to master that in a way that creates that gold standard so that you can rely on trust on that data is a really key important aspect of that. The geospatial community on this often comes up with open mapping data and data that’s being sourced from the open. What is liability of this data compared to data that you’re going out and producing yourself or through traditional mapping means? If you were to ignore all that data, you’d be missing of really valuable information. The trick is being able to perform that conflation in a way that takes advantage of the best of each of these types of sources has to offer, but tracking the progress and the process.

Howard Wahlberg:
Not to mention the changes over time that we see take place, which we see all the time, particularly with countries that are no longer the same country that they were 30 years ago, where the street names are changed and things like that. All right, looks like we have one more question. It might be the last one. I’m not sure how much time we’ll have. My team runs on-prem, but are looking to migrate to the cloud. How can you support us?

Todd Broadhurst:
Tamr can run on a customer’s premise or as cloud native across all the federally approved platforms. We can obviously take in and correlate all of your data sets. That can obviously help cut down on amount of human interaction that’s required. I feel like I need to … My senior SE is actually on the call, Mr. Ed Custer. Ed, did you want to speak to this? I feel like I need to call you out since you have attended.

Ed Custer:
Well, this [inaudible 00:57:04] a couple of your questions earlier, we can help you support anything from on-prem to cloud and help you take your data from on-prem to cloud or run in a hybrid environment. Some of the earlier questions about data governance and why machine learning, I mean, what we’ve seen in the marketplace is that the human being, a single human analyst is quickly overwhelmed. There’s just too much data coming in from too many places. Imagine one person trying to sift through multiple terabytes, just to really find the 50 to a 100 records they need for the particular part of the world they’re looking for or working with, and human-guided machine learning can help you do that, and do it very quickly so that you spend your time getting the answers you need to your boss and helping them be successful.

Todd Bacastow:
Through Maxar’s perspective, we operate both in the cloud and on-premise. The SecureWatch solution that I spoke of earlier runs on AWS in the cloud. But we also recognize a need to be entirely flexible to support customer missions. We see a trend towards users operating on the edge. We can operate on-cloud, on-premise, online, offline, depending on what those needs are. All of the capabilities that we develop build that flexibility in mind to meet user missions in a variety of context and settings.

Howard Wahlberg:
Outstanding. Well, we are very nearly at the top of the hour. That concludes the time we have for this webinar and the question and answer. First of all, on behalf of our attendees and AFCEA SIGNAL Media, Todd, Todd, and Ed, Mr. Custer, I want to express our thanks for an absolutely spot on discussion and really great answers to those questions, and a very big thank you to all of our attendees who submitted those questions. If any of that went unanswered, the two Todds and their teams, Tamr and Maxar, are going to follow up with you directly offline. You can revisit this webinar on demand anytime you like, as well as view the recorded versions of previous SIGNAL Media webinars on AFCEA’s SIGNAL Media website at www.afcea.org/signal/webinar. That concludes this SIGNAL Media webinar. Thanks again for your participation, and have a good rest of your day.