The 1st of May marked my 16th work anniversary in SAS and also my 10th work anniversary in SAS Middle East. Though I don't often mention my employer, this post will be dedicated to the company I consider my second home and the primary source of my many learnings and wealth of experiences.
SAS Institute is a privately held software company located in Cary, North Carolina, founded in 1976 by Dr. Jim Goodnight and colleagues from North Carolina State University. SAS Institute initially focused on developing software for statistical analysis and data management. Its first product, SAS (Statistical Analysis System), quickly gained popularity among researchers and statisticians - esp. bio-research, establishing the company as a leading provider of statistical software in the 1980s. In the 1990s, SAS shifted its focus to commercial applications and expanded its offerings to include business intelligence software. The company introduced SAS Enterprise Miner in 1993, a data mining tool and SAS's flagship that helped businesses uncover patterns and relationships in their data.
Throughout the 2000s, SAS continued to expand its product offerings and established itself as a leader in the analytics industry. In 2000 SAS acquired DataFlux - to expand its offering in master data management, data quality, and data integration landscape. In 2005 SAS released SAS9 - a major upgrade to its platform and product suite. The company acquired JMP, a data visualization and exploration tool, in 2007, and in 2011, it introduced SAS Visual Analytics, a platform for data visualization and exploration. In 2014 the company launched SAS Viya, a cloud-based platform for advanced analytics and machine learning.
My journey started as a Data integration / Business intelligence consultant, where my primary tasks revolved around data management. I passed my SAS Base programming certification and used sas programming language to read, transform, enrich, and design data for consumption by our customer's analyst or to get actual insights.
SAS is unmatched when it comes to reading the data - at least from my experience!
For SAS - this is the basics.
But maybe you want to read data from the internet - like reading an actual HTML file directly from an URL on the web server. Or maybe you need to dig a bit deeper and want to construct your own HTTP request. What if you want to read the URL, parse the data - cut out only the email addresses, and store them in the final Excel file? And what if you want also to read other links from the same file and crawl those further down? Or maybe you want first to collate the list of the log files in a particular directory, and then you want to read and parse them one by one? All of this is again no brainer for SAS.
OK, I will stop here, as I hope I made it quite clear that SAS can probably extract and read any data you can practically think of. Also, please understand I have completely skipped the performance and optimization available when performing the above tasks using SAS.
After some time, I moved from doing data management tasks to more extensive use of wider SAS tools and capabilities, where data management was only one piece of the puzzle. In this phase, I used a suite of tools available to deliver data-driven solutions. These tools comprised of:
Again, the above-described solutions and tools are only a fraction of what SAS had and provided to consultants and customers to fulfill their use cases.
Yet the above should provide a hint that - whether you wanted to consume and analyze the data as a resident data steward or you wanted to deploy an enterprise-wide data management platform that seamlessly transitioned into the web (thin-client) based distribution channel for diverse consumers and all of this supported by underlying IT and governance suite of tools - we all had it.
Let me give you a practical example of utilizing the SAS tools to solve actual POC use-case within the fraud domain. Our task was to identify cycles (cyclic transactions) among corporate customers with intend to build good credit history resulting in application fraud.
The first part that could raise an eyebrow would be fuzzy matching of the names, but not for SAS as we could use one of the many fuzzy matching techniques available within the platform:
In the final solution we used SAS DQ Matchcodes primarily for easy accuracy adjustments.
The second tricky part could be the actual identification of the cycles. We identified more than one way of doing this - we tried and implemented three alternative solutions as we wanted to provide the one with the best performance. Three variants comprised of
1) Finding cycles using SAS Social Network Analysis tool
+ specifically designed for network analytics and application of network algorithm
- required post-processing as time constraints and amounts were not possible to filter within the algorithm
2) Finding cycles using SAS programming language (datastep + macros)
+ custom-designed and built with all filtering conditions considered (applying conditions = Name, Amount filtering, etc.)
+ required very little storage space
- required a lot of CPU resources, and execution time grew exponentially
3) Finding cycles using SAS SQL [selected solution]
+ applied the filtering conditions and allowed easy manipulation and alteration of code
+ was by far the fastest of all algorithms
- required relatively high storage for execution
In the pictures above, you can see that the final solution was assessing approx. 1.5M transactions, among which numerous cycles have been identified. Though the algorithm's speed isn't captured in the report, the execution for the above scenario lasted less than a 1min.
My SAS - the future
The above was just a single example of the capabilities of SAS, but thousands of other use cases can be solved in the same seamless fashion. The platform on top of which the SAS solution is built contains functions, procedures, and programming language constructs that were built and optimized through decades of development and fine-tuning.
And though I don't want to boast about it - SAS is ranked as a leader by independent research companies like Gartner, Forrester or Aite in categories like Data science, AML, Fraud, Case Management, Data Integration, and Data Quality.
So looking into the future, SAS seems well-prepared for what lies ahead thanks to decades of market experience, gathered assets and intellectual property, tools, and our platform.
Still, there is this one extra thing that matters the most, and here I will cite the founder of SAS - Jim Goodnight, who says: "I often say that 95% of my assets drive out of the front gate every night, and it’s my job to make sure they come back the next day."
SAS employees = my colleagues!
Knowing many amazing people in SAS globally - is why I believe that SAS matters and maybe even more than ever!
Thank you - Jim Goodnight - for SAS! Thank you - SAS - for 16 years of Curiosity!
If you reached here, you might feel something is still missing - what about analytics? Where are AI and machine learning?
Two points here:
1) In this blog, I chose to focus on the data part as it was, still is, and probably in the near future will be the most labor-intensive part of a data scientist job (you remember that 80-20 rule, right?)
2) Tapping into Analytics would further extend an already long blog post, so I will park it and address it in a separate article in the future