Biele­feld Cen­ter for Data Sci­ence

BiCDaS Logo
© Uni­ver­sität Biele­feld
This is a col­lec­tion of re­sources (tools, soft­ware, lit­er­a­ture, etc.) mem­bers of BiC­DaS have found use­ful in their daily work. We also in­cluded a list of or­ga­ni­za­tions Data Sci­ence en­thu­si­asts may find ap­peal­ing.

This list is by no means com­pre­hen­sive and will be ex­tended con­tin­u­ally.

Some non-​profit or­ga­ni­za­tions re­lated to Data Sci­ence

EuADS Eu­ro­pean As­so­ci­a­tion for Data Sci­ence was founded only re­cently and aims to fos­ter co­op­er­a­tion and com­mu­ni­ca­tion among Data Sci­en­tists in Eu­rope

GfKl The Ger­man Gesellschaft für Klas­si­fika­tion (roughly So­ci­ety for Clas­si­fi­ca­tion) cel­e­brates its 40th an­niver­sary in 2017. It has about 300 mem­bers and aims to pro­mote data clas­si­fi­ca­tion and sig­nal pro­cess­ing. Re­cently Data Sci­ence So­ci­ety has been added as a sec­ond name.

DHd The Ger­man plat­form Dig­i­tal Hu­man­i­ties im deutschsprachi­gen Raum (roughly Dig­i­tal Hu­man­i­ties in German-​speaking coun­tries) claims to rep­re­sent the in­ter­ests of Dig­i­tal Hu­man­i­ties re­searchers. Founded in 2013 it has about 400 mem­bers (as of 2021).

The Jupyter Note­book is a use­ful tool for data ex­plo­ration. Code, plot the re­sults, in­sert for­mat­ted text in any se­quence and thereby have your code, re­sults and re­search notes all in­ter­min­gled at your fin­ger­tips. The Jupyter project orig­i­nated in Python but can be con­fig­ured for a wide array of lan­guages. It is con­tin­u­ally de­vel­oped as an open source project co­or­di­nated at the Berke­ley In­sti­tute for Data Sci­ence (BIDS).

Python is an all pur­pose pro­gram­ming lan­guage which has gained tremen­dous pop­u­lar­ity in the Data Sci­ence com­mu­nity in the last few years. It of­fers high code read­abil­ity, high ex­pres­sive­ness and a high-​level com­mand set. Its "bat­ter­ies in­cluded" phi­los­o­phy to­gether with large eco-​system of open-​source li­braries has added sub­stan­tially to its pop­u­lar­ity and util­ity.

Some mod­ules (li­braries) of par­tic­u­lar use for data sci­en­tist are: numpy, scipy, scikit-​learn, pan­das and scikit-​image.

  • Numpy
  • Scipy
  • SciKit Learn
  • Pan­das
  • SciKit Image

The Apache Flink plu­gin is used for data pro­cess­ing. While it can process fi­nite data sets (batch-​mode), it re­ally shines in the pro­cess­ing of con­tin­u­ous data streams. Its in­te­gra­tion in the Apache (Web-)Server Soft­ware of­fers some ad­van­tages such as cluster-​mode (many hosts in­volved in pro­cess­ing) and fault tol­er­ance.

Data Wran­gling with Python

© O'Reilly Media, Inc.

by Jacque­line Kazil, Katharine Jar­mul

An ex­cel­lent in­tro­duc­tion to Data Sci­ence in Python that is ac­ces­si­ble for Python new­bies while not being an ac­tual Python text­book. The au­thors focus just as much on meth­ods of data ac­qui­si­tion, se­lec­tion, prepa­ra­tion and sto­ry­telling, as on the lan­guage and dif­fer­ent mod­ules (li­braries). The au­thors use real-​world data for their ex­am­ples from pub­lic data bases, e.g., the WHO data repos­i­tory, which makes learn­ing a lot more fun and thrilling. The au­thors made data and code ac­ces­si­ble via git.

ISBN-​13: 978-​1491948811

Data Sci­ence from Scratch

© O'Reilly Media, Inc.

by Joel Gruz

For those al­ready fa­mil­iar with Python and those pre­fer­ring a more method-​centred ap­proach, this book might be the best al­ter­na­tive. The au­thor cov­ers typ­i­cal top­ics for data sci­ence be­gin­ners like cor­re­la­tion, re­gres­sion and ma­chine learn­ing and their im­ple­men­ta­tion in Python. He con­tin­u­ously uses the ex­am­ple of the fic­tive com­pany data­scien­testa which gives this book a nice red thread.

ISBN-​13: 978-​1491901427

Not So Stan­dard De­vi­a­tions

Roger Peng (Johns Hop­kins Bloomberg School of Pub­lic Health), Hi­lary Parker (Stitch Fix) and oc­ca­sional guests talk about Data Sci­ence mixed with some real-​live-talk and the never-​ceasing Python vs. R dis­cus­sion. This (blog/web­site/pod­cast) is en­ter­tain­ment and ed­u­ca­tion at its finest.

