zum Hauptinhalt wechseln zum Hauptmenü wechseln zum Fußbereich wechseln Universität Bielefeld Play Search

Biele­feld Cen­ter for Data Sci­ence

BiCDaS Logo
© Uni­ver­sität Biele­feld
Zum Haupt­in­halt der Sek­tion wech­seln

News

Weit­ere Mel­dun­gen

More on Data Sci­ence...

This is a col­lec­tion of re­sources (tools, soft­ware, lit­er­a­ture, etc.) mem­bers of BiC­DaS have found use­ful in their daily work. We also in­cluded a list of or­ga­ni­za­tions Data Sci­ence en­thu­si­asts may find ap­peal­ing.

This list is by no means com­pre­hen­sive and will be ex­tended con­tin­u­ally.

Some non-​profit or­ga­ni­za­tions re­lated to Data Sci­ence

EuADS Eu­ro­pean As­so­ci­a­tion for Data Sci­ence was founded only re­cently and aims to fos­ter co­op­er­a­tion and com­mu­ni­ca­tion among Data Sci­en­tists in Eu­rope

GfKl The Ger­man Gesellschaft für Klas­si­fika­tion (roughly So­ci­ety for Clas­si­fi­ca­tion) cel­e­brates its 40th an­niver­sary in 2017. It has about 300 mem­bers and aims to pro­mote data clas­si­fi­ca­tion and sig­nal pro­cess­ing. Re­cently Data Sci­ence So­ci­ety has been added as a sec­ond name.

DHd The Ger­man plat­form Dig­i­tal Hu­man­i­ties im deutschsprachi­gen Raum (roughly Dig­i­tal Hu­man­i­ties in German-​speaking coun­tries) claims to rep­re­sent the in­ter­ests of Dig­i­tal Hu­man­i­ties re­searchers. Founded in 2013 it has about 400 mem­bers (as of 2021).

The Jupyter Note­book is a use­ful tool for data ex­plo­ration. Code, plot the re­sults, in­sert for­mat­ted text in any se­quence and thereby have your code, re­sults and re­search notes all in­ter­min­gled at your fin­ger­tips. The Jupyter project orig­i­nated in Python but can be con­fig­ured for a wide array of lan­guages. It is con­tin­u­ally de­vel­oped as an open source project co­or­di­nated at the Berke­ley In­sti­tute for Data Sci­ence (BIDS).

Python is an all pur­pose pro­gram­ming lan­guage which has gained tremen­dous pop­u­lar­ity in the Data Sci­ence com­mu­nity in the last few years. It of­fers high code read­abil­ity, high ex­pres­sive­ness and a high-​level com­mand set. Its "bat­ter­ies in­cluded" phi­los­o­phy to­gether with large eco-​system of open-​source li­braries has added sub­stan­tially to its pop­u­lar­ity and util­ity.

Some mod­ules (li­braries) of par­tic­u­lar use for data sci­en­tist are: numpy, scipy, scikit-​learn, pan­das and scikit-​image.

  • Numpy
  • Scipy
  • SciKit Learn
  • Pan­das
  • SciKit Image

The Apache Flink plu­gin is used for data pro­cess­ing. While it can process fi­nite data sets (batch-​mode), it re­ally shines in the pro­cess­ing of con­tin­u­ous data streams. Its in­te­gra­tion in the Apache (Web-)Server Soft­ware of­fers some ad­van­tages such as cluster-​mode (many hosts in­volved in pro­cess­ing) and fault tol­er­ance.

Data Wran­gling with Python

© O'Reilly Media, Inc.

by Jacque­line Kazil, Katharine Jar­mul

An ex­cel­lent in­tro­duc­tion to Data Sci­ence in Python that is ac­ces­si­ble for Python new­bies while not being an ac­tual Python text­book. The au­thors focus just as much on meth­ods of data ac­qui­si­tion, se­lec­tion, prepa­ra­tion and sto­ry­telling, as on the lan­guage and dif­fer­ent mod­ules (li­braries). The au­thors use real-​world data for their ex­am­ples from pub­lic data bases, e.g., the WHO data repos­i­tory, which makes learn­ing a lot more fun and thrilling. The au­thors made data and code ac­ces­si­ble via git.

ISBN-​13: 978-​1491948811

Data Sci­ence from Scratch

© O'Reilly Media, Inc.

by Joel Gruz

For those al­ready fa­mil­iar with Python and those pre­fer­ring a more method-​centred ap­proach, this book might be the best al­ter­na­tive. The au­thor cov­ers typ­i­cal top­ics for data sci­ence be­gin­ners like cor­re­la­tion, re­gres­sion and ma­chine learn­ing and their im­ple­men­ta­tion in Python. He con­tin­u­ously uses the ex­am­ple of the fic­tive com­pany data­scien­testa which gives this book a nice red thread.

ISBN-​13: 978-​1491901427

Not So Stan­dard De­vi­a­tions

Roger Peng (Johns Hop­kins Bloomberg School of Pub­lic Health), Hi­lary Parker (Stitch Fix) and oc­ca­sional guests talk about Data Sci­ence mixed with some real-​live-talk and the never-​ceasing Python vs. R dis­cus­sion. This (blog/web­site/pod­cast) is en­ter­tain­ment and ed­u­ca­tion at its finest.

Zum Seitenanfang

Datenschutzeinstellung

Diese Webseite verwendet Cookies und ähnliche Technologien. Einige davon sind essentiell, um die Funktionalität der Website zu gewährleisten, während andere uns helfen, die Website und Ihre Erfahrung zu verbessern. Falls Sie zustimmen, verwenden wir Cookies und Daten auch, um Ihre Interaktionen mit unserer Webseite zu messen. Sie können Ihre Einwilligung jederzeit unter Datenschutzerklärung einsehen und mit der Wirkung für die Zukunft widerrufen. Auf der Seite finden Sie auch zusätzliche Informationen zu den verwendeten Cookies und Technologien.