Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Artículo

Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Weizhong Li; Adam Godzik · Bioinformatics · 2006

Página del recurso

Lectura rápida. Revisá los datos básicos del recurso y luego accedé al contenido desde el botón principal. En esta ficha solo se muestra la información necesaria para identificar la obra, citarla y abrirla.

Autor / responsable

Weizhong Li; Adam Godzik

Editorial

Bioinformatics

Año

2006

Idioma

en

Acceso al recurso

Entrá al contenido desde la opción principal o elegí otra fuente disponible.

Acceso principal

Página del recurso

Página de referencia del recurso. El texto completo no está confirmado automáticamente.

Abrir recurso

Resumen

Descripción general del contenido del recurso.

MOTIVATION: In 2001 and 2002, we published two papers (Bioinformatics, 17, 282-283, Bioinformatics, 18, 77-82) describing an ultrafast protein sequence clustering program called cd-hit. This program can efficiently cluster a huge protein database with millions of sequences. However, the applications of the underlying algorithm are not limited to only protein sequences clustering, here we present several new programs using the same algorithm including cd-hit-2d, cd-hit-est and cd-hit-est-2d. Cd-hit-2d compares two protein datasets and reports similar matches between them; cd-hit-est clusters a DNA/RNA sequence database and cd-hit-est-2d compares two nucleotide datasets. All these programs can handle huge datasets with millions of sequences and can be hundreds of times faster than methods based on the popular sequence comparison and database search tools, such as BLAST.

Cómo citar

Elegí el formato que necesitás y copiá la referencia al portapapeles.

APA 7

Li, W. & Godzik, A. (2006). Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. https://doi.org/10.1093/bioinformatics/btl158

MLA

Li, Weizhong, and Adam Godzik. "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences." 2006. https://doi.org/10.1093/bioinformatics/btl158.

Chicago

Li, Weizhong and Adam Godzik. 2006. "Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences.". https://doi.org/10.1093/bioinformatics/btl158.

Harvard

Li, W. and Godzik, A. 2006, Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences, Bioinformatics, available at: https://doi.org/10.1093/bioinformatics/btl158 [Accessed 28 Jun. 2026].

Compartir e imprimir

Guardá la ficha, copiá su enlace permanente o imprimila como PDF.

Exportar referencia

Si usás un gestor bibliográfico, podés exportar el registro en los formatos más comunes.

RIS BibTeX

Detalles del recurso

Información bibliográfica útil para confirmar que se trata del material correcto.

Título: Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences

Autor / colaboradores: Weizhong Li; Adam Godzik

Editorial: Bioinformatics

Año de publicación: 2006

Idioma: en

Materias

Explorá otros recursos relacionados a partir de estas materias.

Cluster analysis; Computer science; Sequence (biology); Protein sequencing; Sequence database; Sequence alignment; Data mining; Computational biology; Database; Bioinformatics; Peptide sequence; Biology