Realizing the automated and online detection of crowd anomalies from surveillance CCTVs is a research-intensive and application-demanding task. This research proposes a novel technique for detecting crowd abnormalities through analyzing the spatial and temporal features of input video signals. This integrated solution defines an image descriptor (named spatio-temporal feature descriptor—STFD) that reflects the global motion pattern of crowds over time. A designed convolutional neural network (CNN) has then been adopted to classify dominant or large-scale crowd abnormal behaviors. The work reported has focused on: (1) detecting moving objects in online (or near real-time) manner through spatio-temporal segmentations of crowds identified by the similarity of group trajectory structures in the temporal space and the foreground blocks based on the Gaussian mixture model in the spatial space; (2) dividing multiple clustered groups based on the spectral clustering methods through treating image pixels from segmented regions as dynamic particles; (3) creating STFD descriptor instances by calculating corresponding attributes such as collectiveness, stability, conflict and crowd density for individuals (particles) in the corresponding groups; (4) inputting generated STFD descriptor instances into the devised CNN to detect suspicious crowd behaviors. For the test and evaluation of the devised models and techniques, the PETS database has been selected as the primary experimental data sets. Results against benchmarking models and systems have shown promising advancements of this novel approach in terms of accuracy and efficiency for crowd anomaly detection.