Abstract

The Bureau of Customs or BOC facilitates the accounting, valuation and duties collection of all the goods going into the country. Over the years the BOC has processed millions of import transactions, and in its website, the import reports starting from 2012 are made available to the public. This study analyzes these reports and utilizes unsupervised machine learning models to detect and identify potential outliers amongst the millions of these transactions.

The study then measures the extent of outliers occurrence for each goods type and characterizes these outliers based on their key features differentiators versus the inliers. The ensemble model of Isolation Forest is used for the outliers detection. The data and models are segregated for each goods type, and for each, the hyperparameter of contamination level is optimized. The resulting models are retroactively applied to subsets of the baseline data of Sep 2015 to Aug 2016 to identify and characterize the outliers.

The models are also applied to the subsequent months of Sep 2016 to Aug 2017 to project the outliers incidence. Finally, the study outlines possible application and further improvement on the outlier detection models for the imported good shipments.