Abstract: The real-world big data are largely unstructured, interconnected, and dynamic, in the form of natural language text. On the other hand, modern computers have demonstrated their tremendous power on search and reasoning on structured data. A key challenge to enable machine intelligence is to transform massive unstructured big data into structured knowledge. Many researchers rely on costly manual labeling and curation to extract structures and knowledge from unstructured data. However, such an approach is not scalable, especially considering that a lot of text corpora are highly dynamic and domain specific. We argue that massive text data itself may disclose a large body of hidden patterns, structures, and knowledge. Equipped with domain-independent and domain-dependent knowledge bases, we should explore the power of massive data to turn unstructured data into structures. Moreover, by organizing massive text documents into multidimensional text cubes, we show structured knowledge can be extracted and used effectively.
In this talk, we introduce a set of methods developed recently in our group for such exploration, including mining quality phrases, entity recognition and typing, multi-faceted taxonomy construction, and construction and exploration of multi-dimensional structured cubes and networks. We show that data-driven approach could be a promising direction at transforming massive text data into structured knowledge.
Bio: Jiawei Han is Michael Aiken Chair Professor in the Department of Computer Science, University of Illinois at Urbana-Champaign. He has been researching into data mining, text mining, machine learning, information networks, and database systems, with over 900 journal and conference publications. He has chaired or served on many program committees of international conferences in most data mining and database conferences. He also served as the founding Editor-In-Chief of ACM Transactions on Knowledge Discovery from Data, the Director of Information Network Academic Research Center supported by U.S. Army Research Lab (2009-2016), and the co-Director of KnowEnG, an NIH funded Center of Excellence in Big Data Computing (2014-2019). He is Fellow of ACM, Fellow of IEEE, and received 2004 ACM SIGKDD Innovations Award, 2005 IEEE Computer Society Technical Achievement Award, 2009 M. Wallace McDowell Award from IEEE Computer Society, and 2018 Japan’s Funai Achievement Award. His co-authored book "Data Mining: Concepts and Techniques" has been adopted as a textbook popularly worldwide.
Abstract: My group at IBM has spent the last 8 years putting machine learning in the hands of our customers. Today's machine learning requires expertise that most organizations don't possess in large quantities. For machine learning to really take off and reach a larger set of users significant strides need to be made in its usability. How should humans work collaboratively with machines to initially train them and subsequently improve them? What changes are needed in the way that we currently think about machine learning algorithms and toolkits. In this talk I will outline the full end to end journey of one of our machine learning systems (Watson Assistant). I will highlight the human computer interaction challenges that occur in its lifecycle, discuss the distinct algorithms that it uses at different parts of the journey and advocate for a different set of machine learning toolkits that better support typical machine learning lifecycles.
Bio: Robert Yates is a Distinguished Engineer at IBM, focusing on IBM Watson Assistant, IBM’s virtual assistant technology. Rob has spent the past 7 years working on Watson, primarily focused on conversational AI. As a lead architect, he oversees the engineering approaches for AI and works closely with product management to define the product roadmap. Prior to working with Watson, Rob worked on collaboration software and has deep expertise in the intersection of AI and user interaction.
Differential privacy, and other random noise-based privacy approaches,
are becoming a standard for balancing public use of data with privacy
concerns. Unfortunately, even simple cases for differential privacy,
such as counts, become challenging when faced with real-world issues
such as missing data imputation, multiple uses of data, and complex,
multi-stage surveys. This talk will introduce differential privacy
and present solutions to two such challenges: uncoordinated access
to data, and imputation of missing data values. The talk will show
how established but little-used aspects of differential privacy theory
can have significant real-world impact.
This work supported by the U.S. Census Bureau under CRADA CB16ADR0160002. The views and opinions expressed in this talk are those of the speaker and not the U.S. Census Bureau.
Bio: Dr. Clifton works on data privacy, particularly with respect to analysis of private data. This includes privacy-preserving data mining, data de-identification and anonymization, and limits on identifying individuals from data mining models. He also works more broadly in data mining, including fairness issues, data mining of text, and data mining techniques applied to interoperation of heterogeneous information sources. Fundamental data mining challenges posed by these applications include skew and bias in learning, extracting knowledge from noisy data, identifying knowledge in highly skewed data (few examples of "interesting" behavior), and limits on learning. He also works on database support for widely distributed and autonomously controlled information, particularly issues related to data privacy.
Prior to joining Purdue in 2001, Dr. Clifton was a principal scientist in the Information Technology Division at the MITRE Corporation. Before joining MITRE in 1995, he was an assistant professor of computer science at Northwestern University. He has a Ph.D. (1991) and M.A. (1988) from Princeton University, and Bachelor'sr and Master's degrees (1986) from the Massachusetts Institute of Technology. From 2013-2015 he served as a rotating program director in the Division of Information and Intelligent Systems and the Secure and Trustworthy Cyberspace Program at the National Science Foundation.
Abstract: We are living in a time where the digital transformation of media has contributed to the ubiquitous dissemination of fake facts and a credibility crisis with news media. News are generated today differently, increasingly automating the process using machine intelligence, and news are generated by different actors, with social media enabling everyone, including malicious actors, to become the source of news. These developments have fuelled an inflation of fake news.
We will first review some of the key developments that affect today the space of news media. Then we turn our attention to the question of how we can address some of the problems using technology, in particular artificial intelligence. The first problem we consider will be news bias. We show that bias exists in various forms and outline methods to fight it. The second problem we consider will be fake news, in particular in the science domain. We will introduce our approach and platform for evaluating the quality of scientific news articles employing automated methods. These efforts are in our view critical in helping to reestablish trusted news channels and give citizen a chance to obtain truthful and unbiased information in the future.
Bio: Karl Aberer is a professor in the School of Computer and Communications Sciences at EPFL. He received his PhD in Mathematics, ETH Zürich, 1991. From 1991 to 1992 he was postdoctoral fellow at the International Computer Science Institute (ICSI) at the University of California, Berkeley. In 1992 he joined the Integrated Publication and Information Systems Institute (IPSI) of GMD in Germany, where he was leading the research division Open Adaptive Information Management Systems. In 2000 he joined EPFL as full professor.
His research interests are on foundations, algorithms and infrastructures for distributed information management, including semantic interoperability, information retrieval, social networks, trust management and applications to scientific and sensor data management. He has produced more than 400 scientific publications, including more than 70 peer-reviewed journal articles and 300 peer-reviewed conference proceeding publications.
He was the director of the Swiss National Centre for Mobile Information and Communication Systems NCCR MICS from 2005 to 2012, Vice-President for Information Systems of EPFL from 2012 to 2016, and has been consulting the Swiss Government as a member of the Swiss Research and Technology Council from 2004 to 2011. He is currently member of the expert group on cyber-defense, consulting the department of defense on questions related to cyber-security. He is co-founder of LinkAlong, a startup established in 2017 providing data analytic capabilities for open source documents based on technologies for knowledge extraction developed in his research.
Abstract: Autonomous cars are on the rise. Powered by advances in AI, such systems are the transportation robots expected to solve one of the most challenging problems of computing. The real-time and dynamic conditions of this problem make it so challenging. There is another dimension to it - trust. Can we trust an autonomous vehicle to share the same road, lane, sidewalk or the community where we live? Can we trust such a vehicle not to be a pawn in the hands of hackers? What are the underlying challenges, approaches, and open problems in cyber security, privacy towards developing a holistic trust in such robots? In this talk, we shall discuss some of these pillars of trusting autonomous vehicles.
Bio: Dr. Ashish Kundu is an ACM Distinguished Member, ACM Distinguished Speaker and a worldwide leader in Security, Privacy, and Compliance. He is currently at Nuro.AI as its Head of Cybersecurity, Dr. Kundu was a Master Inventor and Research Staff Member in Security Research at IBM T J Watson Research Center, Yorktown Heights, New York. He has been working in the area of Cybersecurity, Privacy and Compliance for more than 15 years. Dr. Kundu received Ph.D. in Computer Science from Purdue University. His work has led to more than 130 patents filed with more than 100 patents granted, and more than 40 research papers. He was recognized with the prestigious CERIAS Diamond Award for his outstanding contribution to cybersecurity research during his Ph.D.